Random Number Generation

From Qwiki

Jump to: navigation, search

Random numbers are an important part of computer simulations, which in turn are important tools in the physical sciences. However with the exception of some new technologies (that use, for instance, Shot Noise fluctuations in electrical current to generate random numbers), real "randomness" is impossible to generate because computers all execute code in a purely deterministic manner. Fortunately, clever people have figured out ways to generate deterministic sequences of numbers that have properties that are not too different from what you'd expect if the numbers were generated from a truly random process. Such methods are known as pseudo-random number generators, and their desirable properties include, for example, very little statistical correlation between pairs numbers in the sequence and a very long repeat period (all computer-generated sequences obviously must repeat themselves eventually). The topic of pseudo-random number generators is well documented elsewhere (Numerical Recipes Textbooks is a good starting point), and we are by no means experts in it anyway, so we will not discuss it further. However, we list some guidelines below that may or may not be helpful for people writing computer simulations.

Contents

Uniform pseudo-random number generation

The most common application of pseudo-random number generation is generating numbers chosen from a uniform distribution, with

p_x(x) = \frac 1 L

where x is a random variable lying in some finite interval [a,a + L]. Typically, other distributions are simulated by appropriate transformations of uniform numbers, so the quality of the uniform random number generator used by your code is extremely important even if you do not use uniform random numbers.

  • Never trust the rand() function built into a Fortran, C or C++ compiler. Some of them may be perfectly good but there are no standards that must be met by such functions, so some of them may be quite bad.
  • RANLIB and some of the other libraries provided by netlib are easy to use and provide a simple but decent uniform generator and some methods for generating random numbers according to other distributions.
  • MATLAB uses a fairly sophisticated uniform random generator and a dedicated algorithm for generating Gaussian random numbers, as well as a number of methods for generating random numbers according to other distributions.
  • The GNU scientific library (GSL) incorporates several sophisticated uniform random number generators that can be selected by the user, as well as a large number of methods for generating random numbers according to other distributions.
  • There are a number of other less-common implementations of standard uniform number generators. Check Google. Also, some generators such as Mersenne Twister have code made publicly available by their creators.

When given multiple uniform random number generators to choose from, the best choice is often a compromise between statistical performance and computation time. We at Mabuchi Lab tend to go with either the Matlab generators or the Mersenne Twister (built into GSL) because they are proven to produce high-quality random numbers and execute pretty quickly on modern computers.

Transformation to other distributions

One way to generate random numbers from a distribution on other than uniform is to transform some uniform random numbers. We know from Transformation of Random Variables that if we define an invertible function y = ξ(x) of a random variable x, then

p_y (y) = \frac{p_x(\xi^{-1}(y))}{\xi'(\xi^{-1}(y))}

relates the probability densities of x and y. If we think of x as a uniform random variable generated by a computer, and of py(y) as the probability density of the random variables we would like to generate, then all we need to do is solve for ξ(x). If we generate a sequence {xn} of uniform random numbers, then the sequence \xi\left(\{x_n\}\right) will be distributed according to py(y)!

Example: Exponential random variables

This seems to be the canonical example. Suppose we want to generate numbers according to

p_y(y) = \frac 1 \mu e^{-y/\mu}.

From above we have (assuming our uniform random numbers lie in an interval of unit length)

\frac 1 \mu e^{-y/\mu} = \frac 1 {\xi'(\xi^{-1}(y))}

which becomes, by the Inverse function theorem,

\frac 1 \mu e^{-y/\mu} = \frac d {dy} \xi^{-1} (y).

We integrate to get ξ − 1(y) = ey / μ, or ξ(x) = − μlog(x).

Example: Points on/in a sphere

Another example that might actually be useful is for generating points lying uniformly on the surface of a sphere of radius R. By this we mean that the probability that a point lies inside any region of area A is proportional to A. If we parametrize the sphere using the angles θ and φ in the standard way (φ is the azimuthal angle) then we know that the area of a region swept out by the infinitesimal increments dθ and dφ is

A = R2sin(θ)dθdφ,

so that the probability density of the point (θ,φ) is

p(\theta, \phi) = \frac{\sin\theta}{4\pi}.

Since this does not depend on φ we can generate values for φ by simply taking uniform random numbers on the interval [0,2π). Noting that φ contributes a factor of 1 / 2π to the probability density and allowing x to lie uniformly in [ − 1,1] we have

\frac{\sin \theta}{2} = \frac 1 {2\xi_\theta'\left(\xi_\theta^{-1}(\theta)\right)}
which we solve to get
ξθ(x) = arccos( − x).

If instead we had wanted to generate points in three dimensions uniformly distributed within the sphere, we would have had

p(r, \theta, \phi) = \frac{3}{4 \pi R^3} r^2 \sin(\theta).

As before when we treated φ and θ separately, we treat r by itself and select φ and θ as described above. For r we have (with x in [0,1]):

\frac{3}{R^3}r^2 = \frac{1}{\xi_r'\left(\xi_r^{-1}(r)\right)},

which gives ξr(x) = Rx1 / 3.

Transformation by rejection

The transformation method described above is rather elegant. However, it relies on the probability density py(y) to be both invertible and easily integrated in order to get a simple transformation formula. Sometimes these criteria are just not met.

It is always possible to generate random numbers according to any bounded distribution py(y) over a finite interval [a,b] in the following way:

  1. Generate x, a uniform random number on [a,b].
  2. Generate z, a uniform random number on [0,\max_{a \leq y \leq b} p_y(y)].
  3. Compute py(x). If z \leq p_y(x) then we accept x as our random number. However, if z > py(x), we return to step 1.

Proof that this works

Note that this method can be proved by simple geometric arguments: If we plot the curve py(y) over its domain, and draw a rectangle of width ba and height maxpy, then we see that every point (x,z) chosen lies inside the rectangle, but only the points that lie below the curve are accepted. It is closely related to methods for Monte-Carlo integration.

This does not mean that we shouldn't prove that it works algebraically. We now compute the probability density p(x) of the number that our procedure has generated, beginning by computing the probability that --- in a single iteration --- we generate an x that we reject:

\begin{matrix}P(reject)&=&\int_a^b dx \int_0^{\max p_y} dz P(z > p_y(x), x, z)\\&=&\frac{1}{(b-a)\max p_y}\int_a^b dx \int_{p_y(x)}^{\max p_y} dz\\&=&1-\frac{1}{(b-a)\max p_y}.\end{matrix}

In a similar way we compute the probability density that in one iteration we both choose a particular x and accept it:

\begin{matrix}p(x, accept) &=& \int_0^{\max p_y}dz P(z < p_y(x), x, z)\\&=& \frac 1 {(b-a)\max p_y} \int_0^{p_y(x)} dz\\&=& \frac{p_y(x)}{(b-a) \max p_y}.\end{matrix}

Now we can compute what we are looking for. We note that x can be the number we choose to accept by either selecting it and accepting it on our first iteration, or by rejecting our first choice and choosing/accepting x on our second iteration, and so on. Thus

\begin{matrix}p(x) &=& p(x, accept) \sum_{n=0}^\infty p(reject)^n\\&=&\frac{p(x, accept)}{1-p(reject)}\end{matrix}

which evaluates to py(x), completing the proof that this technique produces random numbers with the appropriate distribution.

Personal tools