1, Unit 1 - Patterns in Data
In the "Patterns in Data" unit, students learn to organize data using
various graphical displays (number line plot, stem-and-leaf plot, histogram,
box plot, scatterplot) and to summarize data using measures of center
(mean, median, mode) and measures of variability (range, interquartile
range, mean absolute deviation). The goals are to have students competently
describe distributions of data using the ideas of shape, center, and spread,
and to intelligently choose an appropriate graph to display a distribution.
The last lesson, Relationships and Trends, introduces the idea of describing
paired data using algebraic equations and inequalities, providing a bridge
to the algebra in the next unit.
Ideas from Course 1, Unit 1
- Histogram: A way of organizing one-variable data. For example,
in the histogram below of test scores, 3 students have a score of at
least 10 but less than 20, 7 students have a score of at least 20
but less than 30, and so on. (See
student book pages 17-18 for histogram use.)
- Stem-and-leaf plot: The stem-and-leaf plot below shows the
same test scores that were displayed in the histogram. The first line
represents scores of 10, 17, and 19. The second line represents scores
of 21, 21, 23,24,26,27, and 28. (See
student book pages 9 and 32 for stem-and-leaf plot use.)
Stem-and-leaf plots often have a code to explain to the reader: 1|7 means
- Shape of the distribution: Distributions of one-variable data can
be symmetric or skewed. (See student book page 19 for additional descriptions
of shapes of distributions.)
- Gaps: occur where no values exist in a particular interval. There
is a gap in our example test scores between 42 and 69.
- Center: We can use mean, median or mode for the measure of center,
depending on which is most appropriate. Mean = (sum of the data values)
/ (number of data values). Median = middle data value in the ordered
list. Mode = most frequently occurring data value.
- Five-number summary (minimum, 1st quartile, median, 3rd quartile, maximum): Using
our example, we can determine the five-number summary as follows. Put
the values in order and count to the middle; this is the median. The
median is 31. Count to the middle of the first (lower) 50% of the data;
this data value
is the first quartile, Q1. Q1 is 23. Count to the middle of the second
(upper) 50% of the data; this data value is the third quartile, Q3.
Q3 is 40.
- Box plot: Use the five-number summary to make a box plot. You need
a scale on the horizontal axis to make sense of the graph. The box
contains the middle 50% of the data values, starting at the first quartile
and ending at
the third quartile. Interquartile range = Q3 - Q1 = 40 - 23 = 17.
- Outliers: Data values that are far from, and separated from the
rest of the distribution. If the data are represented by a box plot,
then any value that is more than 1.5 times interquartile range
above Q3 or below Q1 will be represented as a dot, separated from the
- Spread of a distribution: Spread (or variability) could be measured
by the range, by the IQR (interquartile range, see above), or by the
MAD (mean absolute deviation). (See
student book pages 48-55 and 63-65 for use of measures of variability.)
- Scatterplot: A way of organizing two variable data. Each point represents
two paired data values.
- Plot over time: Like a scatterplot, but the horizontal axis is
time. (See student book pages 74-84 for uses of scatterplots and plots