## Course 1, Unit 1 - Patterns in Data

Summary
In the "Patterns in Data" unit, students learn to organize data using various graphical displays (number line plot, stem-and-leaf plot, histogram, box plot, scatterplot) and to summarize data using measures of center (mean, median, mode) and measures of variability (range, interquartile range, mean absolute deviation). The goals are to have students competently describe distributions of data using the ideas of shape, center, and spread, and to intelligently choose an appropriate graph to display a distribution. The last lesson, Relationships and Trends, introduces the idea of describing paired data using algebraic equations and inequalities, providing a bridge to the algebra in the next unit.

Key Ideas from Course 1, Unit 1

• Histogram: A way of organizing one-variable data. For example, in the histogram below of test scores, 3 students have a score of at least 10 but less than 20, 7 students have a score of at least 20 but less than 30, and so on. (See student book pages 17-18 for histogram use.)

• Stem-and-leaf plot: The stem-and-leaf plot below shows the same test scores that were displayed in the histogram. The first line represents scores of 10, 17, and 19. The second line represents scores of 21, 21, 23,24,26,27, and 28. (See student book pages 9 and 32 for stem-and-leaf plot use.)

Stem-and-leaf plots often have a code to explain to the reader: 1|7 means 17.
• Shape of the distribution: Distributions of one-variable data can be symmetric or skewed. (See student book page 19 for additional descriptions of shapes of distributions.)

• Gaps: occur where no values exist in a particular interval. There is a gap in our example test scores between 42 and 69.
• Center: We can use mean, median or mode for the measure of center, depending on which is most appropriate. Mean = (sum of the data values) / (number of data values). Median = middle data value in the ordered list. Mode = most frequently occurring data value.
• Five-number summary (minimum, 1st quartile, median, 3rd quartile, maximum): Using our example, we can determine the five-number summary as follows. Put the values in order and count to the middle; this is the median. The median is 31. Count to the middle of the first (lower) 50% of the data; this data value is the first quartile, Q1. Q1 is 23. Count to the middle of the second (upper) 50% of the data; this data value is the third quartile, Q3. Q3 is 40.
• Box plot: Use the five-number summary to make a box plot. You need a scale on the horizontal axis to make sense of the graph. The box contains the middle 50% of the data values, starting at the first quartile and ending at the third quartile. Interquartile range = Q3 - Q1 = 40 - 23 = 17.

• Outliers: Data values that are far from, and separated from the rest of the distribution. If the data are represented by a box plot, then any value that is more than 1.5 times interquartile range (see above) above Q3 or below Q1 will be represented as a dot, separated from the other data.
• Spread of a distribution: Spread (or variability) could be measured by the range, by the IQR (interquartile range, see above), or by the MAD (mean absolute deviation). (See student book pages 48-55 and 63-65 for use of measures of variability.)
• Scatterplot: A way of organizing two variable data. Each point represents two paired data values.
• Plot over time: Like a scatterplot, but the horizontal axis is time. (See student book pages 74-84 for uses of scatterplots and plots over time.)