Transformation of Random Variables
From Qwiki
Suppose we have a random variable
with probability distribution
. Now
suppose we transform to a new random variable
, where
is some
(deterministic) function. What is the probability distribution
of the new variable?
Van Kampen tells us that the new distribution can be derived from the following intuitive (but a bit subtle) formula:
![P_y(y) = \int \delta\left[y-f(x)\right] P_X(x) dx](/images/math/0/1/c/01c215b258620eeabe2b5040b51aac67.png)
It's clear that the new distribution is just given by adding up the probability of points where the image of
under
is equal to
, so that's nice and intuitive. However, the
formula in terms of the δ-function hides a Jacobian factor, which is a bit subtle. To see this,
consider two deterministic functions
and
, and suppose
has an inverse defined everywhere. Then
where all we did was make the change of variables
in the integral. So even in the simple
case of an invertible transformation, we can't just "plug in" the point where the argument of the
-function is null.
Now back to calculating the distribution of our new random variable
. Let's take advantage of the
Fourier Transform of the
-function

Using this representation, the change of variables formula becomes

That last line shows that
is the inverse Fourier Transform of
. In
other words,
is the Characteristic Function of
, and we wrote it in a
nice form that doesn't require manipulating a
-function.
It is easy to see that the formula works in higher dimensions too. Let
and
. Then the Characteristic Function of
is given by

Examples
Sum of Independent Trials
Let
represent a sequence of N independent, identically distributed (IID) variables, with the
drawn from distribution
. Now let

Plugging into our formula from above, we find

![E[y^2]-E[y]^2 = (-i)^2 \left.\frac{\partial^2 G_Y}{\partial k^2}\right|_{k=0} -E[y]^2= \frac{1}{N}\left(E[x^2]-E[x]^2\right)](/images/math/0/f/4/0f417654f66d6c214f7f32af4d648ef3.png)
So we see that averaging
independent trials preseves the mean and reduces the variance by a factor
. Note that we derived that result completely generally, without assuming anything about the distribution
. And we never even had to perform an integral. Actually, this calculation gets us 90% of the way to the
Central Limit Theorem
Consider the variables define above, for very large
. Suppose also that the samples
have mean 0,
. Finally let
be the variance of
. Then we may make a Taylor expansion around
to find
![G_Y(k) = G_X(k/N)^N = \left[ 1-\frac{1}{2}\frac{k^2 \sigma^2}{N}+O(1/N^2)\right]^N \approx \exp\left[-\frac{1}{2} k^2 \sigma^2\right]](/images/math/6/8/0/680c2c26b617852a526b44768154177f.png)
. Therefore, under fairly general conditions, any process that results from averaging many independent, identically distributed variables with mean 0 and finite variance is approximately Gaussian distributed.

