## Estimating the Mean With Unknown Variance |

Simulation of the mean estimation experiment

Let us return to the problem of estimating the unknown mean of the normal distribution when the standard deviation is also unknown. In Section 3, we modified the procedure for the case when the standard deviation is known by using confidence bounds of the form

where the quantile z is appropriate for the type of interval and the confidence level. This was a purely ad-hoc procedure, and we saw that these confidence bounds did not work well for small samples.

If you go back and look at the derivation in Section 2, you will see that what we really need to know is the distribution of the standard score when the sample standard deviation is used:

** 1. **Show that

** 2. **Show that *Z* has the standard normal distribution.

** 3. **Show that *V* has the chi-square distribution with *n*
- 1 degrees of freedom.

** 4. **Show that *Z *and *V*
are independent. (*Hint*: the sample mean and sample
standard deviation of a sample
from the normal distribution are independent.)

** 5. **Conclude from Exercises 1-4 that *T*
has the student *t*
distribution with *n* - 1 degrees of freedom.

For a number *p* in (0, 1), we will denote the *p*'th
quantile of the *t *distribution with *n *degrees of
freedom by *t*_{n,p}. Thus, by
definition, if random variable *T* has the *t*
distribution with *n* degrees of freedom then

P(T<t_{n,p}) =p.

For selected values of *n *and *p*, values of these
quantiles are given in the table
of the *t *distribution.

We can now easily derive the confidence bounds

** 6. **Show that

** 7. **Show that the expression in
Exercise 6 can be equivalently written as

From Exercise 7, it follows that

is a 1 - *a* confidence interval for the distribution
mean. Note that the length of this
confidence interval is random, because it depends on the sample
variance. This is in contrast with the case in which the variance
of the underlying distribution is known, where the length of the
confidence interval is fixed.

** 8. **Use a derivation similar to
Exercises 1 and 2 to show that a 1 - *a* confidence lower
bound for the distribution mean is

** 9. **Use a derivation similar to
Exercises 6 and 7 to show that a 1 - *a* confidence upper
bound for the distribution mean is

In the simulation of the mean estimation experiment, you can
choose from a list box whether to use population standard
deviation s or the sample standard
deviation *S* in the construction of the confidence
interval. You can choose from another list box whether to use
quantiles from the normal distribution or from the *t*
distribution . In either case, the density of the chosen
distribution is shown in the middle graph. The quantiles are
recorded and the interval defined by the quantiles is shown as a
blue bar in the middle graph. When you run the simulation, the
value of the appropriate standard score* *is recorded in the
third table and plotted as a red line on the horizontal axis. The
event that this line falls in the critical interval is equivalent
to the event that the confidence interval successfully captured
the mean (and thus the success indicator variable *I*
takes the value 1).

** 10. **In the mean estimation experiment select *Use
S* and *Use t quantiles*. Select the normal
distribution with mean 0 and standard deviation 2, and select
two-sided intervals. For each of the following sample sizes and
confidence levels, run the experiment 1000 times with an update
frequency of 10. Note the size and location of the confidence
intervals an how well the proportion of successful intervals
approximates the theoretical confidence level.

*n*= 5, 80%.*n*= 5, 90%.*n*= 10, 90%.*n*= 30, 90%.

** 11. **In the mean estimation experiment select *use
S* and *use z*. Select the normal distribution with
mean 0 and standard deviation 2, and select two-sided intervals.
For each of the following sample sizes and confidence levels, run
the experiment 1000 times with an update frequency of 10. Note
the size and location of the confidence intervals an how well the
proportion of successful intervals approximates the theoretical
confidence level.

*n*= 5, 80%.*n*= 5, 90%.*n*= 10, 90%.*n*= 30, 90%.

In Exercise 10, you are using the correct procedure and thus you should have noticed good agreement between the proportion of successful intervals and the theoretical confidence level in all cases. In Exercise 11, you are using our incorrect, ad hoc procedure. When the sample size is small, you should have noticed that the proportion of successful intervals was consistently smaller than the theoretical confidence level.

It's easy to understand the observed behavior mathematically.
The *t* distribution has larger variance than the standard
normal distribution. Thus the *t* quantiles for a given
confidence level are larger in absolute value than the *z*
quantiles for that confidence level and hence the interval
constructed using the *t* quantiles is larger than the
interval constructed using the *z* quantiles. On the other
hand, the *t* distribution converges to the standard
normal distribution as *n* increases and thus the
difference is slight when the sample size is large.

When the distribution from which we are sampling is not normal, the procedure of this section is still used to obtain approximate confidence bounds. The procedure works well as long as the sample size is large and the distribution is not too far from normal.

** 12. **In the mean estimation experiment, select *Use
S* and *Use t*. Select the gamma distribution with shape
parameter 1 and scale parameter 1. Select two-sided intervals and
confidence level 0.90. For each of the following sample sizes,
run the experiment 1000 times with an update frequency of 10.
Note how well the proportion of successful intervals approximates
the theoretical confidence level.

*n*= 5.*n*= 10.*n*= 30.

** 13. **In the mean estimation experiment, select *Use
S *and *Use t*. Select the gamma distribution with
shape parameter 5 and scale parameter 1. Select two-sided
intervals and confidence level 0.90. For each of the following
sample sizes, run the experiment 1000 times with an update
frequency of 10. Note how well the proportion of successful
intervals approximates the theoretical confidence level.

*n*= 5.*n*= 10.*n*= 30.

** 14. **In the mean estimation experiment, select *Use
S *and *Use t*. Select the Poisson distribution with
mean 1. Select two-sided intervals and confidence level 0.90. For
each of the following sample sizes, run the experiment 1000 times
with an update frequency of 10. Note how well the proportion of
successful intervals approximates the theoretical confidence
level.

*n*= 5.*n*= 10.*n*= 30.

## Interval Estimation |