Checkpoint - Course 1, Unit 1
Patterns in Data

In each unit, there is a final lesson and Checkpoint that helps students summarize the key ideas in the unit. The final Checkpoint will generally be discussed in class, with the teacher facilitating the summarizing, and students making notes in their Math Toolkits (teachers may just refer to this as "notes") of any points they need to remember, adding illustrative examples as needed. If your student is having difficulty with any investigation in this unit, this Checkpoint and the accompanying answers may help you recall the concepts involved, and give you the big picture of what the entire unit is about. If your student has completed the unit, then a version of this should be in his or her notes or toolkit. Students should also have Technology Tips in their toolkits, which may be useful for this unit.


Possible Responses to Unit Summary Checkpoint
The words in bold are vocabulary and concepts your student should be familiar with. Students have learned to include center, shape, spread, and unusual points in any description of one-variable (univariate) data such as the car ratings, and to include a description of the trend when working with two-variable (bivariate) data such as the 1990 and 2000 temperatures in cities. (See Summary of Key Ideas from Course 1, Unit 1 for an explanation of terms.)
a. Describe the kinds of information you can get by examining
  • a stem-and-leaf plot: From a stem-and-leaf plot, we can immediately see the shape of the distribution and determine if it is symmetric or skewed, we can see clusters and gaps, we can spot outliers, we can make a good estimate of the mean by looking for the balance point, and we can find the five-number summary (minimum, lower quartile, median, upper quartile, maximum).
  • a number line plot or histogram: The information that can be obtained from a number line plot or histogram is much the same as what can be obtained from a stem-and-leaf plot. But because data may be grouped into intervals, sometimes the gaps are not apparent. We can lose sight of the actual data values when we organize the data in intervals, so finding exact quartiles may not be possible.
  • a box plot: A box plot allows for immediate identification of the five-number summary, and the general shape of the distribution. Thus, a box plot efficiently gives information about the center and spread of a distribution. It does not give information about gaps. A box plot can be used to identify outliers, any data value that is an unusually high or low value and is separated from the other data.
  • a scatterplot: The previous three graphical displays are used with one-variable (univariate) sets of data. A scatterplot is used with two-variable (bivariate) sets of data. We use the scatterplot to investigate whether there is a general relationship between the two variables, asking whether there seems to be a pattern or trend that might be used to make predictions. The technique of drawing the y = x line on a scatterplot allows comparisons to be made between the size of two variables. If most of the points are below the line, then the value of y is generally less than the corresponding value of x. If most of the points are above the line, then the value of y is generally greater than the corresponding value of x.
  • a plot over time: The goal is to spot trends, increasing or decreasing, to identify periods of greatest change, fluctuation, or cycles.
b. Describe the kinds of situations in which each plot is most useful.
If there is only one variable and the data set is reasonably small, then a stem-and-leaf plot or a number line plot is appropriate. Large data sets need to be organized in intervals and graphed as histograms or summarized in box plots. If there are two variables then a scatterplot or plot over time may be appropriate.
c.

How do you decide which measure of center to use to provide a summary of a distribution? How do you decide on which measure of variability to report?
When choosing a measure of center or variability you need to consider the type of data and how the summary statistic will be used. The mean and mean absolute deviation are both affected by outliers and may not be good representations of the data in cases where outliers are likely. In these cases, the median and the interquartile range may be a better choice.

For a particular set of data, either the mean or the median could be the more useful measure. For example, consider the values of all houses in Springfield, Missouri. A potential buyer would get more information from the median. But the county tax collector would want to know the mean because then the total taxes could be figured.

When the distribution's shape is very skewed or has extreme outliers, the mean will be pulled towards the skewed end, making it not very representative of the typical middle of the set. The median may be a better measure of center for skewed data sets. Sometimes we want to report both the mean and the median, depending on the reason for wanting a measure of center.

We can use the median and IQR to help describe any data set, but they are particularly useful with skewed data sets.

d. What is the effect on measures of center of transforming a set of data by adding a constant to each value or multiplying each value by a positive constant? On measures of variation?
Adding (or subtracting) a constant to each value in a data set will result in the mean being increased (or decreased) by that amount. The distances from the mean will be unchanged, so the MAD will not change. Likewise, the median will increase by the amount added but the IQR will not change. When each value is multiplied by a positive constant, the mean and median are also multiplied by that amount, and so are the MAD and IQR. Students who think visually can refer to the first situation as shifting the entire distribution right or left on the x-axis, without changing the shape. Thus, the center changes but the spread and shape do not. The second situation, multiplying each value by a positive constant, stretches the distribution, and causes both the center and the spread to change.


If you would like to see specific problems from Course 1, Unit 1, a link is provided to Examples of Tasks from Course 1, Unit 1. If you are interested in following up on the Statistics strand in general, then the Scope and Sequence will help you see where different concepts are introduced. On the Statistics page, you can read an explanation of the main statistics concepts as they are developed in all four courses.

Copyright 2013 Core-Plus Mathematics Project. All rights reserved.