Unit 32: Confidence Intervals Notes

Slide 1
This leaning unit concerns confidence intervals.

Slide 2
Up to this time, we’ve been talking about hypothesis testing, one of the two forms of inference. The other form of inference is called confidence intervals. Hypothesis testing and confidence intervals are based on the same assumptions. The difference between the two forms of inference is that hypothesis testing tells you whether the sample mean you obtain differs significantly from that specified in the null hypothesis, while confidence intervals give you the answer to the same question but also allow you to estimate the population mean from a sample and put bounds on it – to say that it is very likely to lie within an interval, called the confidence interval.

Slide 3
Suppose we go out and obtain a sample of size 100 from a population of mean mu and standard deviation of 50.  We can define the sampling distribution of the mean, which is derived from taking a very large number of samples of size 100, computing their means and getting a frequency distribution of those means. This distribution has a standard deviation of sigma, which in this case is 50 divided by the square root of the sample size, 100, which we call the standard error of the mean. 50 divided by the square root of 10 is 5, so the standard error of the mean is 5.

What is the probability that the mean, xbar, of any random sample of size 100, will have a value between mu and one standard error of the mean above mu? From the unit on Z scores and the Normal Curve, we know this probability is .3413 – Because the normal distribution is symmetric about the mean, this is also the probability that x bar will have a value between mu and one standard error below the mean.

Therefore, the probability is approximately 68% that the sample mean we obtain will lie between 1 SE above the mean and 1 SE below the mean of the population.

Slide 4
Now suppose we go out and obtain a sample and we compute its mean, x bar.

Let’s assume that this mean corresponds to that shown in the figure. We know with a probability of 68%, the sampling mean we obtained will lie within one standard error of the population mean mu. We can show this by constructing an interval around the mean of our sample.

Let’s call + 1 SEM above x bar the upper limit (U ) and – 1 SEM below xbar , the lower limit (L) of the interval around the mean . In this case, the interval is the 68% confidence interval, that is we are 68% confident that the actual population mean lies within this interval. 68%.

x bar is referred to as the point estimate. Given that the study design allows me to obtain only a single sample mean, x bar is the best estimate I have of the mean of the underlying population.

Slide 5
In science, one generally wishes to be more confident than 68%. Suppose one decides that one would like to be 95% confident in estimating the population mean.

If we go back to the original sampling distribution of the mean, we can define the region that includes 95% of the possible values of x bar. Both tails combined would encompass 5% of the distribution. The Z-score corresponding to areas of .025 (that is, half of .05) is 1.96. Therefore, 95% of the time, I can predict that the mean of any random sample will be within 1.96 SEM of the population mean. Put another way, I can assert with 95% confidence that the mean of the population will lie in the CI from – 1.96 SEM to + 1.96 SEM.

Slide 6
What we have just discussed is called a 2-sided CI and this is the type most frequently encountered. It corresponds to the 2-tailed hypothesis test. There are also 1-sided CIs corresponding to the 1-tailed hypothesis test. 1-sided confidence intervals are not commonly used and therefore our discussion of them will be limited to a single slide.

Suppose one obtained the sample mean shown above and one wanted to be 95% confident that the true population mean was greater or equal to the lower limit, L. In such a case the upper limit would not be defined or could be thought of as being plus infinity. Because we are concerned with only a single tail of the distribution, the lower limit would be closer to the mean in this case than the lower limit associated with a 2-sided confidence interval.

Slide 7
Let’s try a computational example.

Suppose I know the true standard deviation of the underlying population is sigma. Then, I can use the normal curve as the model of the sampling distribution of x bar and the lower limit of my confidence interval L would equal x bar, the sample mean, – Z times the SEM (which is sigma over the square root of n). In a similar way, U would equal x bar + Z times the SEM.

Let’s assume I’m interested in estimating the underlying IQ of a population of children who were nutritionally deprived. I’ll use a two-sided 95% CI. The standard deviation of IQ in the population, sigma is known to be 16 and the population mean in developed countries is 100. Let’s suppose I draw a random sample of 100 children who are nutritionally deprived and the mean IQ in that sample turns out to 92.3.

My task is to construct the 95% confidence interval for the mean IQ of the population. The best guess I have for the mean IQ of the population is the mean of my sample 92.3, which is the point estimate. But what range of values would encompass 95% of the possible values of the population mean?

The standard error of the mean equals sigma divided by the square root of n which equals 16 divided by the square root of 100 or 1.6.

Slide 8
Substituting Z, sigma, n and x bar in the equations for L and U gives the results 89.164 and 95.436 for the upper and lower limits of the 95% confidence interval for the population mean IQ.

So, with a sample mean or point estimate of 92.3, we can say with 95% confidence that the mean IQ of the underlying population lies between 89.164 and 95.436.

Slide 9
Notice specifying the CI is very different from being able to say simply that we can reject the null hypothesis that the population mean is 100.

It turns out that we still can reject the null hypothesis, however, because the 95% confidence interval does not include this value. To understand this, consider the two figures shown above. In the left panel, we will assume that the sample mean falls in one of the two critical regions for a two-tailed hypothesis test with alpha of .05. If this happens, we would reject the null hypothesis. Notice however that when this happens, the mean under the null hypothesis, 100, is not included in the 95% confidence interval around the point estimate x bar.

Conversely, in the right panel, suppose that the sample mean does not fall in one of the two critical regions for the two-tailed hypothesis test. If this happens, we would fail to reject the null hypothesis. When this is the case, the 95% confidence interval around the point estimate x bar will always include the population mean mu.

Therefore we can use the confidence interval to carry out a hypothesis test. If the mean under the null hypothesis falls outside of the confidence interval, we reject the null hypothesis. If the mean under the null hypothesis is included in the confidence interval, we fail to reject the null hypothesis.

As you can see, the confidence interval gives me two pieces of information. It tells me that I can reject the null hypothesis and in addition it tells me that the true mean of the nutritionally deprived children lies between 89.164 and 95.436 with 95% certainty.

Slide 10
Suppose I wanted only to be 90% confident that the population mean fell in an interval.

In that case, instead of a Z-score of 1.96, I would use a Z-score corresponding to an area of .90/2 or .45 and look this up in column 2 of the table. The Z-score corresponding to this area is 1.65 -- Remember, the rule is to choose Z-score corresponding to smaller of the two tail areas in column 3.

The lower limit of the 90% confidence interval would be 89.660 instead of 89.164 for the 95% confidence interval.

Similarly, the upper limit of the 90% confidence interval would be 94.949 instead of 95.436 for the 95% confidence interval.

Notice that the confidence interval corresponding to 90% is somewhat more narrow than that corresponding to 95%. Therefore, if I reduce my level of confidence in the point estimate, I will end up with a more narrow confidence interval.  In the extreme if I were to reduce my level of confidence to 1% , Z would becomes .03 and my 1% CI is (92.25, 92.35). So the more confident I want to be that the CI includes the true population mean, the wider that confidence interval has to be.

This concept is counterintuitive to many people, but one that you need to understand.

Slide 11
The assumptions underlying CI for the mean when sigma is known are exactly the same as those for the one-mean Z-test, that is, normality of the underlying population from which the sample is drawn and independence of observations.

Slide 12

Let’s consider another situation.

Suppose sigma is not known. In this case, one replaces Z in the confidence interval calculation by t, and sigma by s, the standard deviation of the sample, and then you look up the appropriate critical t in the t-tables. It is important that can only do this for critical t’s corresponding to specific areas under the curve, For 2-sided tests, these include 80%, 90%, 95%, 98% and 99% CI’s.

Slide 13
Let’s try an example. Suppose we are interested in obtaining a 2-sided 95% CI for the population mean in the following dataset.

First, we must calculate the mean and standard deviation of the sample. The mean we obtain by adding the 5 numbers in the column and dividing by 5. It turns out to be 2. The standard deviation requires that we obtain the sum of x squared. We therefore square each x and add them, getting 36 as a result.

Substituting the sum of x squared and sum of x a quantity squared in the formula for the sample standard deviation gives us a value of 2.

The degrees of freedom is one less than the sample size or 4. Looking up the critical t for a 2-sided 95% CI in the t-table yields 2.776.

The 95% CI is then computed as x bar – t  times s, divided by the square root of n, and xbar plus t  times s, divided by the square root of n.

With 95% confidence, given this random sample, we can say mean of the underlying population is between -.482 and 4.482.

If the mean under the null hypothesis was zero, we would fail to reject the null hypothesis, since this confidence interval includes zero.

Slide 14
Self Assessment

Slide 15
Confidence intervals allow one to both test hypotheses and place bounds on estimates for the population mean based on single samples. The center of the confidence interval (or point estimate of the mean) is simply the mean of the sample we obtain.

When sigma is known, L equals x bar minus critical Z times sigma over square root of n, and U equals x bar plus critical Z times sigma over square root of n. When sigma is not known, L equals x bar minus critical Z times s over square root of n, and U equals x bar plus critical z times s over square root of n.

Slide 16
The amount of confidence we would like to have determines Critical Z or Critical t.

For 2-sided confidence intervals and alpha = .05, Critical Z is the value of z corresponding to alpha/2 = .025 = 1.96.

Critical t is determined by looking up in the t-table in the column corresponding to the 2-sided CI alpha the value associated with the degrees of freedom.

The more confidence one would like to have in the confidence interval, the larger that interval will be.