A downloadable version of this activity is available in the following format:
Purpose
The purpose of this activity is to investigate the relationship between the mean of a population and the mean of the sample means.
Outcomes
This activity will teach students how to
- apply characteristics of normal distributions
- design and conduct surveys and/or simulate data collection to explore sampling variability
- demonstrate an understanding of the application of random numbers to statistical sampling
- demonstrate an understanding of how the size of a sample affects the variation in sample results
- graph and interpret sample distributions of the sample mean and of the sample proportion
Format
This activity will take the form of a teacher-led full-class discussion. Students are encouraged to share information. To facilitate discussion, divide the class into small groups.
Introduction
It is important for students to realize that we are usually unable to collect information about a total population. The goal of sampling is to draw reasonable conclusions about a population by obtaining information from a relatively small part (a sample) of that population. In order to do this, we need to know what formulas to use with the resulting information and what degree of confidence we can claim in the information.
This investigation represents the first step toward developing the concept of confidence intervals.
Classroom instruction
Distribute copies of the student worksheet and Handout 1, “Histograms and frequency polygons for height samples,” to the class.
1. The formulas for the mean and standard deviation of the population are already given. These formulas are written symbolically, using sigma notation. This may be the first time students have seen sigma notation although they will already have performed calculations of mean and standard deviation. As part of the in-class discussion or as a writing assignment, ask students to describe the process of calculating the mean and the standard deviation and to connect the calculation process to the symbolic formulas.
2. Discuss the shape of the histogram (dot graph) of the original dataset. Students should recognize that there appears to be quite a variation in the data. Have them answer the question “Will every sample of the population look the same?”
3. Using the height data, have students select random samples of five heights. For their own information, they should record their data values as well as the mean for each sample.
Divide the task so that each student or group of students is responsible for a fraction of the 100 samples. Create a table on the board or on a transparency so that students can see the sample means as they are recorded.
It is important that students manually and randomly select the samples. Generating this information too quickly using software or calculator technology may mask what is actually taking place. Once they have manually selected their first few samples, you might suggest they use an electronic random generator, such as the ‘randInt’ function (1, 100, 5) in the TI calculator).
4. Have all students complete the frequency table of the collection of 100 sample means. Expect to get some interesting results as students randomly select samples. This reinforces the understanding that we cannot have complete confidence in our expectations.
It is also worthwhile noting that our samples are chosen with replacement, although results would be similar if samples were chosen without replacement. The graph should show more clustering about the mean than was evident in the histogram of raw data.
5. Distribute copies of Appendix A (the Fathom-generated list of 100 samples of 20 heights) to the class. Ask the students to use the frequency table given in their worksheet to sketch the histogram #3 in Document 1.
6. Have the students share their comments on the three graphs with the class. Their comments should include the following observations:
a. The histograms of the sample means show a normal distribution. They are roughly symmetrical about the mean.
b. The mean of the sample mean is roughly the same as the population mean and doesn’t depend on the size of the sample.
c. As the sample size increases, the standard deviation of the sample means decreases. (The ratio of standard deviations for sample size 20 to those for sample size 5 should be about 1:2.)
7. Students are expected to make the following predictions for 100 samples of 40 heights:
a. The histogram will show a tighter clustering about the mean than it did for samples of 20 heights.
b. The mean of the sample means will be approximately the same as the mean of the population.
c. The standard deviation of the sample means will be less than the standard deviation for samples of 20 heights (less than 2.2).
8. Students are expected to make the following observations:
a. The shapes of graphs 2, 3 and 4 are similar, and the shapes of graphs 5 and 6 are similar. Graphs 5 and 6 are taller and narrower and are clustered more tightly (are taller and narrower) about the mean value, with less in the ‘tails”. All graphs are roughly normal. The distribution of the original data shows far more variation than the distributions of the samples.
b. The means are all approximately 164.7, which is the same as the population mean.
c. The standard deviations of graphs 2, 3 and 4 are virtually the same (1.9); the standard deviations of graphs 5 and 6 are the same (1.5). This suggests that:
-
- the standard deviation of the mean depends on the sample size but not on the number of samples chosen, and
- as the sample size increases, the standard deviation of the sample means decreases.
Note: These are very important observations.
9. If n = 30, | |
If n = 50, |
These calculations are very close to the values of standard deviation given in the table.
Contributed by Anna Spanik, Math teacher, Halifax West High School, Nova Scotia.