Course
2, Unit 3 - Patterns of Association
Summary
In the "Patterns of Association" unit, students learn to describe the
association between two variables by interpreting a scatterplot, to interpret
a correlation coefficient, and to understand the limitations of correlation
coefficients. They learn to know when it is appropriate to make predictions
from the regression line, and to understand the effects of outliers and
influential points on the correlation coefficient and on the regression
line.
Key
Ideas from Course 2, Unit 3
- Least squares regression line: A line that fits a set of data,
with slope and intercept chosen to minimize the errors (distance) between
the actual data points and the line. This line also contains the point
.
- Squared errors: A point on the regression line represents a prediction
about the y value (dependent variable) given the x value
(independent variable). Since the points on the regression line are
the result of using the regression equation, there could be a difference
between the actual y value that occurred for this particular x value
(the data point) and the predicted y value (the point on the
line). These differences can be thought of as errors in prediction.
They are visualized as the vertical gaps between given data points
and the regression line. The sum of squared differences gives a measure
of how well a line fits the data.
- PearsonŐs correlation coefficient, r: A single number that
gives a measure of the strength of the linear association between variables.
Perfectly linear data will have an r value of 1 (for a positive
association), or an r value of -1 (for a negative association).
A set of data that is quite linear, and for which the y values
increase as the x values increase, is said to have a strong positive
association. If the points appear to fit a linear trend but the y values
decrease as x increases, the association is negative. A set of data
that is quite random, not at all linear, will have an r value
close to zero.
|
|