NOTE: These videos were prepared when the Census at School Project was managed by Statistics Canada. Most of the information is still relevant.
Duration: 10:56 min.
In this episode, we will look at how to use the statistical program Fathom, to analyse a dataset taken from the International Census at School database.
To access the Rich-Text Format (RTF) version, use the document conversion features available in most word processing software, or use a file viewer capable of reading RTF.
Hi, I’m Angela McCanny and I’m a resource teacher for Statistics Canada. In this episode, we will look at how to use the statistical program Fathom to analyse a dataset taken from the international Census at School database.
Fathom is a dynamic data analysis software program from Key Curriculum Press that is licensed in many school boards across Canada. Using Fathom, we can do one and two variable analysis. We can create graphs using categorical and numeric data—both discrete and continuous— and find mean, median, mode, standard deviation and line of best fit, as well.
I am assuming that you have already downloaded a random data sample from the international Census at School website and have imported it into Fathom.
So, now open the Fathom file that you have previously saved and click on the “maximize’ box to make the Fathom window take up the full screen.
This icon shows that we have a collection of data. Double click on the “collection” icon and you will see the data cards for your class. If the data cards do not open on the Cases tab, you might think that you don’t have any data. Just click on the Cases tab at the top of the cards and the data will appear.
Each data card holds the answers to the survey questions for each student in this sample of 100 Australian students who completed the Census at School survey. We can scroll down to the bottom using the slider on the right hand side to see the answers to all of the questions in this survey.
Look at the bottom left corner of the data cards. These numbers tell you that you are looking at case card number 1 out of 100 cases, or students. By clicking on the arrow just to the left of these numbers, you can see the answers for other students in this sample.
The data cards let us see one student at a time. To see the entire sample, it is easier to use a table. So, let’s make the data cards small and drag them to the top left corner, under the collection, to get them out of the way. To make a table, click on the “collection” icon first, then click on the Table icon in the top grey menu bar and drag it into your workspace. You will see a table holding all the answers for this group of 100 students.
To see the more of the data, you can drag on the edge of the table to make it bigger. But I usually leave it small to leave more room for my graph. Instead, you can navigate through the table by dragging the bars at the bottom to see the answers to all the questions, and at the side to view all the students.
To create a graph in Fathom, click on the Graph icon and drag a graph into your workspace. This creates an empty plot. You can click and drag on the corner of the graph to make it larger.
Graphing in Fathom uses the “drag and drop” method and this method is the same for both categorical and numeric data.
We’re going to make this first graph about the students’ dominant hand, which is categorical data, since the data falls into the categories: right handed, left handed or ambidextrous.
So we start by finding the Hand column in the table, using the arrows or sliders— there it is: after the Language spoken at home data.
To drag this “hand” attribute onto the horizontal axis, we click on the Hand heading, and with the mouse button still pressed down, drag the heading over to the graph. As we drag, we see two target zone boxes appear: one on the vertical, or y-axis and the other on the horizontal, or x-axis. This is where we can drop the data. For this graph, we will drag the attribute to the x-axis.
And here we have a bar graph of the dominant hand for these 100 Australian students. It is easy to see the mode (the most frequent response) for dominant hand, which in this case, is being right-handed.
If we want to answer the question of whether boys or girls are more likely to be left-handed, we can plot the gender on the y-axis. Go to the data cards, or the table, and drag the “sex” attribute to the vertical axis. Now we can see that boys are more likely to be left-handed in this sample, and they are also more likely to be ambidextrous.
So far, we have graphed categorical data. We can graph numeric data in Fathom using the same method.
A good example of continuous numeric data is the “height” attribute, where the data can take on any value between about 130 and 200 cm.
To graph this height data, we first need to clear the data on the dominant hand graph. So, right-click on the graph to open the graph context menu and click on Remove X Attribute. Repeat to remove the Y attribute. Now the plot is ready for the next graph.
Scroll through the table to find the Height attribute.
Click on the heading and drag the height information onto the horizontal axis. You will see a dot plot displaying the frequency of the heights. The mode appears as tallest row of dots.
Use the drop-down menu on the graph to see other types of graphs. A histogram is another good graph for continuous data. The box plot is also good for numeric data, as it shows the median value and the upper and lower quartiles.
When we are working with numeric data, it is useful to be able to find the mean or median or the standard deviation. Let’s see how we can use the function commands to further analyse the height data.
To find the mean, right-click on your graph to show the context menu and choose Plot Value. From the menu, choose Functions, then Statistical, then One Attribute. Double-click on mean (be sure to double-click) and click OK. The mean height for this group of students will appear as a vertical line through the graph at the mean value, and the numeric value will appear at the bottom of the graph.
For an exploration of how the values at either end of the range affect the mean, click on the lowest or highest data point in the range and drag it towards the mean line. Notice how the mean value is lowered as this extreme value takes on a value towards the centre of the range. To return the data point to its original value, go to the Edit menu and click Undo Drag.
Draw a median on your graph using the same procedure we used for mean, but this time, selecting the median from the statistical functions. (Right-click on the graph, choose Plot Value, choose Functions, then Statistical, then One Attribute. Double-click on median.) The median height for this group is 160 cm.
To find the standard deviation, repeat the same steps and choose “population standard deviation” (popStdDev). If you scroll down the list of statistical functions, you can see there are many more functions that can be applied to this data.
The last graph that I would like to show you is a graph that compares numeric data on each axis. These are called scatter plots.
Let’s try to answer the question “I wonder if tall students also have big feet?” or “I wonder if there is a connection between height (being tall) and foot size (having big feet)?” To do this, we’ll graph the students’ height and foot length data.
Remove the mean line by right-clicking on the word “mean” at the bottom of the graph and choosing Cut formula. Repeat for median and standard deviation.
We already have Height on the horizontal axis; now drag the Right Foot data to the vertical axis. Immediately, a scatter plot of the data appears.
I notice that the data is not tightly aligned in a straight line, but there is a definite upward trend. We can ask students to create a line of best fit for this data, by adding a movable line to the graph.
Right-click on the graph, and choose Add Movable Line. This line can be pivoted at either end to determine the line that the student decides best fits the data set. As you can see, the equation of the line appears in the bottom left corner of the graph and changes as the student adjusts the line. We teach students that the slope intercept form of the equation of a line is expressed y = mx + b. Fathom assists this understanding by replacing both x and y with the attribute name from the x- and y-axes—in this case, replacing y with the right foot length and x with the height. So the equation for the foot length can clearly be seen to be an equation based on the height.
To help the students see the y-intercept that is appearing in the equation, the x- and y-axes can be adjusted. Click on the lowest number on the x-axis, and drag it to the right until the zero appears on the left end of the axis. Now, click on the lowest y-axis value and drag upwards so that the y-intercept appears. Now, the equation of the line is reflected in the appearance of the graph.
Once the students have created their own lines of best fit, they can check their accuracy by creating a least-squares line.
Right-click on the graph and choose Least-Squares Line. The line appears, along with its equation at the bottom of the graph. The r-squared value appears, indicating that 38% of the variation in the right foot length can be attributed to the height.
So in answer to the question, “I wonder if there is a connection between being tall and having big feet?”, the graph appears to indicate height has an influence on foot size, but it doesn’t tell the full story. There are definitely shorter people with big feet and taller people with small feet.
This is an introduction to the types of data exploration you can do using your Census at School data and Fathom. For further ideas and instruction, you can find help and instructional videos on Key Curriculum Press’ Fathom website and by accessing the Fathom movies from the Help menu on the Fathom screen.
You can use Fathom to analyse your class dataset using the same steps as we have used for the international dataset. And it is fascinating for your students to compare some of the attributes from their class dataset with a sample of international students. I hope your students have a lot of fun with their data discovery.