|
|||||||||||||||||
Checkpoint
- Course 2, Unit 3
|
|||||||||||||||||
| In this unit, students studied regression and correlation - ways of summarizing the center and spread of an elliptical pattern of paired data. The bolded words are vocabulary and concepts with which your student should become familiar. | |||||||
| a. | Describe how the idea
of a sum of squared differences is used in correlation coefficients
and in regression equations. The difference referred to is the gap between the y value of an observed data point and the point on the regression line, vertically above or below the data point. The point on the line represents a prediction about the y value (dependent variable) given the same x value (independent variable) as the data point. Since the points on the line are the result of using the regression equation, the difference between the actual y value (the data point) and the predicted y value (on the line) can be thought of as an error in prediction. The smaller the error, the better the prediction. Some points will be closer to the regression line than others. The sum of squared differences gives a measure of how well a line fits the data. The calculator is programmed to find the line that fits the data best. This line, called the least squares regression line, has the smallest possible sum of squared differences. Pearson's correlation coefficient, r, is a single number that gives a measure of the strength of the linear association between variables. Perfectly linear data will have a correlation of 1 (for a positive association), or a correlation of -1 (for a negative association). A set of data that is quite random will have a correlation close to zero. The formula for r is |
||||||
|---|---|---|---|---|---|---|---|
| b. | Does a strong correlation
imply a cause-and-effect relationship? No. There were many examples in this unit where two variables were strongly correlated but there was no cause and effect. For example, reading scores and shoe size are strongly correlated. As one increases, the other does also. A line will fit the pattern in a scatterplot quite well. But, we would never claim that having bigger feet causes a child to read better. Rather both of these variables are correlated to a third variable, age. There have been many other anomalous relationships documented, some humorous, like the strong correlation between women's hem lines and the performance of the stock market, or the consumption of liquor and teacher salaries. (In addition to this, there are situations where a high correlation is caused simply because a single data point is far from the rest of the points. This may create a large correlation, while the data may not in fact show a linear trend overall. In this case, the high correlation makes one think there is a strong correlation, when there may be no relationship at all, cause-and-effect or otherwise, between the variables. A point like this is called an influential point. It should be noted that the value of r indicates how closely the points cluster about the regression line. It says nothing about whether such a line is an appropriate model.) |
||||||
| c. | How are regression lines
used? There are two important ways. The first is to make an estimate or prediction of y for some particular value of x. For example, you might make a prediction of ninth grade GPA for a student with a GPA of 2.0 in eighth grade. (See Lesson 3 in the student text.) Or if a researcher collected data on annual cigarette consumption and annual cancer deaths for various countries, he/she might find that the points formed an elliptical cloud on the scatterplot, so computing a line of regression would be appropriate. This equation could then be used to predict cancer deaths for different cigarette consumption rates. The second way a regression line can be used is to make a model for a relationship, not with the idea of making predictions, but with the idea of studying how much of the variation in the response variable is related to changes in the input variable. For example, a scientist might use a regression equation for the (exposure to radioactive waste, cancer deaths) data on page 211 of the student text to try to understand the degree to which exposure to radioactivity contributes to cancer deaths. The scientist may not be interested in predicting cancer deaths for some other community, but could be looking for a model of the effect of exposure to radioactivity. |
||||||
| d. | Refer back to the "Think About This Situation" questions on page 171. How would you answer those questions now?
|
||||||
If you would like to see specific problems from Course 2, Unit 3, a link is provided to Examples of Tasks from Course 2, Unit 3. If you are interested in following up on the Statistics strand in general, then the Scope and Sequence will help you see where different concepts are introduced. On the Statistics page, you can read an explanation of the main statistics concepts as they are developed in all four courses.