Checkpoint
- Course 1, Unit 1
Patterns in Data
In each unit, there is a final lesson and Checkpoint that helps students
summarize the key ideas in the unit. The final Checkpoint will generally
be discussed in class, with the teacher facilitating the summarizing,
and students making notes in their Math Toolkits
(teachers may just refer to this as "notes") of any points they need to
remember, adding illustrative examples as needed. If your student is
having difficulty
with any investigation
in this unit,
this Checkpoint and the accompanying answers may help you recall the
concepts involved, and give you the big picture of what the entire unit
is about.
If your student has completed the unit, then a version of this should
be in his or her notes or toolkit. Students should also have Technology
Tips in their toolkits, which may be useful for this unit.
Possible Responses to Unit Summary Checkpoint
| The words in bold are vocabulary and concepts your
student should be familiar with. Students have learned to include
center, shape, spread, and unusual points in any description of one-variable
(univariate) data such as the car ratings, and to include a description
of the trend when working with two-variable
(bivariate) data such as the 1990 and 2000 temperatures in cities.
(See Summary of Key Ideas from Course
1, Unit 1 for an explanation
of terms.) |
| a. |
Describe the kinds of information you can get by examining
- a stem-and-leaf plot: From a stem-and-leaf plot, we can immediately
see the shape of the distribution and determine if it is
symmetric or skewed, we can see clusters and gaps, we
can spot outliers, we
can make a good estimate of the mean by looking for
the balance point, and we can find the five-number summary (minimum, lower
quartile, median, upper quartile, maximum).
- a number line plot or histogram: The information that can
be obtained from a number line plot or histogram is much the
same
as what can be obtained from a stem-and-leaf plot. But because
data may be grouped into intervals, sometimes the gaps are
not
apparent. We can lose sight of the actual data
values when we organize the data in intervals, so finding
exact quartiles may not be possible.
- a box plot: A box plot allows for immediate identification
of the five-number summary, and the general shape of the distribution.
Thus, a box plot efficiently gives information about the center and spread of
a distribution. It does not give information about gaps. A box
plot can be used to identify outliers,
any data value that is an unusually high or low value and is
separated from the other data.
- a scatterplot: The previous three graphical displays are
used with one-variable (univariate) sets of data. A scatterplot
is used with two-variable (bivariate) sets of data. We use
the scatterplot to investigate
whether there is a general relationship between the two
variables, asking whether there seems to be a pattern or trend
that might be used to make predictions. The technique
of drawing the y = x line on a scatterplot allows comparisons
to be made between the size of two variables. If most of the
points
are below the line, then the value of y is generally less than
the corresponding value of x. If most of the points are above
the line, then the value of y is generally greater than the corresponding
value of x.
- a plot over time: The goal is to spot trends, increasing or decreasing, to identify periods of greatest change,
fluctuation, or cycles.
|
| b. |
Describe the kinds of situations in which each plot
is most useful.
If there
is only one variable and the data set is reasonably small, then a
stem-and-leaf plot or a number line plot is appropriate. Large data
sets need to be organized in intervals and graphed as histograms
or summarized in
box plots. If there are two variables then a scatterplot or plot
over time may be appropriate. |
| c. |
How do you decide which measure of center to use to provide a summary
of a distribution? How do you decide on which measure of variability
to report?
When choosing a measure of center or variability you need to
consider the type of data and how the summary statistic will be
used. The mean and mean absolute deviation are both affected by
outliers and may not be good representations of the data in cases
where outliers are likely. In these cases, the median and the interquartile
range may be a better choice.
For a particular set of data, either the mean or the median could
be the more useful measure. For example, consider the values of
all houses in Springfield, Missouri. A potential buyer would get
more information from the median. But the county tax collector
would want to know the mean because then the total taxes could
be figured.
When the distribution's
shape
is very skewed or has extreme outliers, the mean will be pulled
towards the skewed end, making it not very representative of the
typical middle of the set. The median may be a better measure of
center for skewed data sets. Sometimes we want to report both
the
mean and the median, depending on the reason for wanting a measure
of center.
We can use the median and IQR
to help describe any data set, but they are particularly useful
with skewed data sets.
|
| d. |
What is the effect on measures of center of transforming
a set of data by adding a constant to each value or multiplying each
value by a positive constant? On measures of variation?
Adding (or subtracting) a constant to each value in a data set
will result in the mean being increased (or decreased) by that amount.
The distances from the mean will be unchanged, so the MAD will not
change. Likewise, the median will increase by the amount added but
the IQR will not change. When each value is multiplied by a positive
constant, the mean and median are also multiplied by that amount,
and so are the MAD and IQR. Students who think visually can refer
to the first situation as shifting the entire distribution right
or
left on the x-axis, without changing the shape. Thus, the center
changes but the spread and shape do not. The second situation, multiplying
each value by a positive constant, stretches the distribution, and
causes both the center and the spread to change. |
If you would like to see specific problems from Course 1, Unit 1, a link
is provided to Examples of Tasks from Course
1, Unit 1. If you are interested in following up on the Statistics
strand in general, then the Scope and
Sequence will help you see where different concepts are introduced.
On the Statistics page, you can read
an explanation of the main statistics concepts as they are developed in
all four courses.
|
|