### Properties of Expected Value

#### Change of Variables Theorem

We now know that the expected value of a random variable gives the center of the distribution of the variable. This idea is much more powerful than might first appear. By finding expected values of various functions of a random vector, we can measure many interesting features of the distribution of the vector.

Thus, suppose that X is a random vector taking values in a subset S of Rn and suppose that r is a function from S into R. Then r(X) is a random variable and we would like to comput E[r(X)]. However, to compute this expected value from the definition would require that we know the density function of r(X) (a difficult problem, in general). Fortunately, there is a much better way, given by the change of variables theorem for expected value.

1. Show that if X has a discrete distribution with density function f then

Similarly, if X has a continuous distribution with density function f then

2. Prove the continuous version of the change of variables theorem when r is discrete (i.e., r has countable range).

3. Suppose that X has probability density function

f(x) = x2 / 10 for x in {-2, -1, 0, 1, 2}

Find E[1 / (1 + X2)]

4. Suppose that X has density function

f(x) = x2 / 3 for -1 < x < 2

Find E(X1/3)

5. Suppose that (X, Y) has probability density function

f(x, y) = (x + y) / 4 for 0 < x < y < 2

Find E(X2Y).

#### Basic Properties

The exercises below gives basic properties of expected value. These properties are true in general, but restrict your proofs to the discrete and continuous cases separately; the change of variables theorem is the main tool you will need. In these exercises X and Y are random variables for an experiment and c is a constant.

6. Show that E(X + Y) = E(X) + E(Y)

7. Show that E(cX) = cE(X)

8. Show that if X 0 then E(X) 0.

9. Show that if X Y then E(X) E(Y)

10. Show that |E(X)| E(|X|)

The results in Exercises 6-10 are so basic that it is important to understand them on an intuitive level. Indeed, these properties are in some sense implied by the interpretation of expected value given in the law of large numbers.

11. Suppose that X and Y are independent. Show that

E(XY) = E(X)E(Y)

Exercise 11 shows that independent random variables are uncorrelated.

12. Suppose that (X, Y) has density function

f(x, y) = (3 / 2) x2y for 0 < x < 1, 0 < y < 2

Use the result in Exercise 12 to find E[X3(Y2 + 1)].

#### Nonnegative Variables

13. Let X be a nonnegative random variable for an experiment. Show that

14. Suppose that X has the power distribution with parameter a > 1, which has density function

f(x) = (a - 1)x-a for x > 1

Use the result of Exercise 13 to find E(X).

15. Use the result of Exercise 13 to prove Markov's inequality: If X is a nonnegative random variable, then for t > 0,

16. Compute both sides of the Markov's inequality when X has the power distribution with parameter a > 1.

f(x) = (a - 1)x-a for x > 1

17. Use the result of Exercise 13 to prove the change of variables formula when the random vector X is continuous and r is nonnegative.

The following result is similar to Exercise 13, but is specialized to nonnegative integer variables:

18. Suppose that N is a discrete random variable that takes values in the set of nonnegative integers. Show that

19. Suppose that N has density function

f(n) = (1 - q)qn for n = 0, 1, 2, ...

where q in (0, 1) is a parameter. Use the result of Exercise 18 to find E(N).

#### Moments

If X is a random variable and k is a positive integer, the expected value

E[(X - a)k]

is known as the k'th moment of X about a. When a = E(X), the mean, the moments are called central moments. The second central moment is especially important; it is known as the variance.

#### Jensens's Inequality

Our next sequence of exercises will establish an important inequality known as Jensen's inequality. First we need a definition. A real-valued function g defined on an interval I of R is said to be convex on I if for each t in I, there exist numbers a and b (that may depend on t) such that

at + b = g(t), ax + b g(x) for x in I

21. Interpret the conditions in the convexity definition geometrically (in terms of graphs). The line y = ax + b is called a supporting line.

You may be more familiar with convexity in terms of the following theorem from calculus:

22. Show that g is convex on I if g is twice differentiable on I and has non-negative second derivative on I. Hint: Show that for each t in I, the tangent line at t is a supporting line.

23. Prove Jensen's inequality: If X takes values in an interval I and g is convex on I, then

E[g(X)] g[E(X)]

Hint: In the definition of convexity given above, let t = E(X) and replace x with X. Then take expected values through the inequality.

#### Conditional Expected Value

The expected value of a random variable X is based, of course, on the probability measure P for the experiment. This probability measure could be a conditional probability measure, conditioned on a given event B for the experiment (with P(B) > 0). The usual notation is E(X | B), and this expected value is computed by the definition given above, except that the conditional density f(x | B) replaces the ordinary density f(x). It is very important to realize that, except for notation, no new concepts are involved. The results we have established for expected value in general have analogues for these conditional expected values.

24. Suppose that X has probability density function

f(x) = x2 / 3 for -1 < x < 2

Find E(X | X > 0).

25. Suppose that (X, Y) has probability density function

f(x, y) = (x + y) / 4 for 0 < x < y < 2

Find E(XY | Y > 2X).

Now suppose that X is a random vector taking values in a subset S of Rn and Y a random variables. Then

E(Y | X = x)

simply means the expected value computed relative to the conditional distribution of Y given X = x. For fixed x, this expected value satisfies all properties of expected value generally. Moreover, it is the best predictor of Y, in a certain sense, given that X = x.

26. In the setting above, prove the following version of the law of total probability:

1. If X has a discrete distribution with density function f then

1. If X has a continuous distribution with density funciton f then