Transformation of Random Variables

From Qwiki

Jump to: navigation, search

Suppose we have a random variable X\, with probability distribution  P_X(x)\,. Now suppose we transform to a new random variable Y=f(X)\,, where f(x)\, is some (deterministic) function. What is the probability distribution P_Y(y)\, of the new variable?

Van Kampen tells us that the new distribution can be derived from the following intuitive (but a bit subtle) formula:

 P_y(y) = \int \delta\left[y-f(x)\right] P_X(x) dx

It's clear that the new distribution is just given by adding up the probability of points where the image of X\, under f\, is equal to Y\,, so that's nice and intuitive. However, the formula in terms of the δ-function hides a Jacobian factor, which is a bit subtle. To see this, consider two deterministic functions f(x)\, and g(x)\,, and suppose f(x)\, has an inverse defined everywhere. Then

\int g(x) \delta\left[y-f(x)\right] dx = \int g\left[f^{-1}(u)\right]
\delta\left[y-u\right] \frac{du}{f'\left[f^{-1}(u)\right]} =
\frac{g\left[f^{-1}(y)\right]}{f'\left[f^{-1}(y)\right]} \neq g\left[f^{-1}(y)\right]

where all we did was make the change of variables  u = f(x)\, in the integral. So even in the simple case of an invertible transformation, we can't just "plug in" the point where the argument of the \delta\,-function is null.

Now back to calculating the distribution of our new random variable Y\,. Let's take advantage of the Fourier Transform of the \delta\,-function

\delta(x) = \int \frac{dk}{2 \pi} e^{-i kx}

Using this representation, the change of variables formula becomes

 P_Y(y) = \frac{1}{2\pi}\iint dx dk e^{-i k (y-f(x))} P_X(x) \equiv \frac{1}{2\pi} \int dk e^{-i k
y} G_Y(k)
 G_Y(k) = \int dx e^{ i k f(x)}P_X(x)

That last line shows that P_Y(y)\, is the inverse Fourier Transform of G_Y(k)\,. In other words, G_y(k)\, is the Characteristic Function of Y\,, and we wrote it in a nice form that doesn't require manipulating a \delta\,-function.

It is easy to see that the formula works in higher dimensions too. Let \mathbf{X}\in\mathbb{R}^n and \mathbf{f}:\mathbb{R}^n\rightarrow\mathbb{R}^m. Then the Characteristic Function of \mathbf{Y}=\mathbf{f}(\mathbf{X})\, is given by

G_\mathbf{Y}(\mathbf{k}) = \int d^{(n)}\mathbf{x} e^{i \mathbf{k}\cdot \mathbf{f}(\mathbf{x})}
P_\mathbf{X}(\mathbf{x})\,


Examples

Sum of Independent Trials

Let \mathbf{X} = \{X_1,\cdots,X_N\} represent a sequence of N independent, identically distributed (IID) variables, with the X_m\, drawn from distribution P_X(x)\,. Now let

Y = f(\mathbf{X})=\frac{1}{N}\sum_{m=1}^{N}X_m

Plugging into our formula from above, we find

G_Y(k) = G_X(k/N)^N\,
from which we can derive two immediate consequences:

E[y] = -i \left.\frac{\partial G_Y}{\partial k}\right|_{k=0} = -iG_X(0)^{N-1} G'_X(0)=E[x] E[y^2]-E[y]^2 = (-i)^2 \left.\frac{\partial^2 G_Y}{\partial k^2}\right|_{k=0} -E[y]^2= \frac{1}{N}\left(E[x^2]-E[x]^2\right)

So we see that averaging N\, independent trials preseves the mean and reduces the variance by a factor 1/N\,. Note that we derived that result completely generally, without assuming anything about the distribution P_X(x)\,. And we never even had to perform an integral. Actually, this calculation gets us 90% of the way to the

Central Limit Theorem

Consider the variables define above, for very large N\,. Suppose also that the samples X_m\, have mean 0, E[X] = 0\,. Finally let \sigma^2 = E[y^2]<\infty\, be the variance of Y\,. Then we may make a Taylor expansion around k=0\, to find

 G_Y(k) = G_X(k/N)^N = \left[ 1-\frac{1}{2}\frac{k^2 \sigma^2}{N}+O(1/N^2)\right]^N \approx \exp\left[-\frac{1}{2} k^2 \sigma^2\right]
. But this is just the characteristic function of a Gaussian distribution with mean 0 and variance \sigma^2\,. Therefore, under fairly general conditions, any process that results from averaging many independent, identically distributed variables with mean 0 and finite variance is approximately Gaussian distributed.
Personal tools