Sie sind auf Seite 1von 5

STAT 1430 Recitation 4B Regression

For this recitation (and all future recitations), be sure to show all
formulas, calculations, and units, where appropriate.
We are comparing years of education and hours on the internet in the last month, to
see if a relationship exists. If a relationship does exist, we want to predict Internet use
using education level. The output is given below. Assume a scatterplot shows a linear
pattern.

Descriptive Statistics: Education, Internet

Variable Mean StDev Variance Minimum Maximum


Educatio 11.000 1.920 3.687 7.000 17.000
Internet 26.316 9.411 88.570 2.000 54.000

Pearson’s Correlation: 0.642

1. Interpret the correlation between Education and Internet use.


There is a moderate linear positive correlation between the two
2. Would a line fit this data well? Explain.

3. Define which variable is X and which one is Y. Use applicable units.

4. Find the slope of the best fitting line.

5. Identify what the units of the slope are for the regression line that was calculated
here.

6. Find the Y-intercept of the best fitting line.

7. Find the equation of the best fitting line.

8. Use the line to predict Internet use for someone with 16 years of education.

9. The linear regression of the computer output for the Internet/Education data is
shown below. Use this output to find the equation of the best fitting line and use it
to verify your answer to the equation of the best fitting line earlier in this
assignment.
STAT 1430 Recitation 4B Regression

Predictor Coef SE Coef T P


Constant -8.290 2.665 -3.11 0.002
Education 3.1460 0.2387 13.18 0.000

10. For what education levels is it appropriate to use this line to make predictions? Use
the statistics given in the problem to answer this. Hint: Avoid Extrapolation!

Suppose that the price (in $thousands) and size (in square feet) of a random sample of
houses in Viroqua, Wisconsin was analyzed by a new statistician using Minitab. The
group plans to us the data to help set prices for homes based on their size.

Descriptive Statistics: Size


Variable N N* Mean SE Mean StDev Variance Minimum Q1 Median
Size 10 0 1993 110 349 121907 1526 1699 1957

Variable Q3 Maximum
Size 2271 2595

Descriptive Statistics: Price


Variable N N* Mean SE Mean StDev Variance Minimum Q1 Median
Price 10 0 219.1 19.0 60.1 3612.1 148.0 163.5 200.5

Variable Q3 Maximum
Price 269.8 315.0

Correlation: 0.9041

Regression Analysis
Predictor Coef SE Coef T P-value
Constant -90.88 52.62 -1.73 0.12
Size 0.15556 0.02605 5.97 0.00

11. True or False: Because there is a minus sign on 90.88, we know that the slope of the
best fitting line here is negative.

12. Based on the above output, a scatterplot of this data set that will be used for
prediction purposes would have which variable on which axis?
a. Price on X axis, Size on Y axis
b. Size on X axis, Price on Y axis
c. It doesn’t matter. The regression line won’t change if you switched X and Y
d. “Size” variable on the X axis and “Constant” variable on the Y axis

13. What is the equation of the best-fitting regression line? Find this in two ways. First,
using the regression analysis part of the output, and then using the descriptive
statistics part of the output.

14. Does the Y-intercept have an interpretation here? Why or why not?
STAT 1430 Recitation 4B Regression

15. For each square foot increase in house size, how much does the price increase (in
dollars) on average?

Here are data for calories and salt content (milligrams of sodium) in 17 brands of
meat hot dogs:
Scatterplot of Calories vs Sodium (mg)
200

180

160
Calories

140

120

100
100 200 300 400 500 600
Sodium (mg)

16. For what range of sodium can you make a good prediction about calories?

17. A computer found the regression equation for the above problem is
Calories = 61.6 + 0.232 * Sodium (mg). How do we interpret the slope of this line?
a. As the amount of sodium increases by 1 milligram, the calories increase by 0.232
b. As the amount of sodium increases by 1 milligram, the calories increase by 61.6
c. As the amount of calories increases by 1, the sodium increases by 0.232
milligrams
d. As the amount of calories increases by 1, the sodium increases by 61.6
milligrams.

18. There is an influential point here. What are its approximate values for sodium and
calories? Will this point have a large residual? Why or why not?

19. Interpret the Y-intercept. Does it make sense here?

Suppose the age of a woman is strongly correlated with the age of her husband when
both are marrying for the 2nd time. (Scatterplot shows a linear relationship.)

Variable N Mean StDev Min Max


Woman’s Age 28 31.24 9.49 17.06 58.97
Husband’s Age 28 35.79 8.86 14.00 51.00
STAT 1430 Recitation 4B Regression

Pearson’s Correlation: 0.9586

20. There is a positive linear relationship between these two variables. Explain why this
makes sense in the context of this problem.

21. Suppose you want to use the woman’s age to predict her husband’ age. Before
calculating the slope, do you think it would be >1 or <1? (Think about what slope
means here – change in Y per one unit change in X.) Then find the slope of the
regression line.

22. Suppose you want to use the husband’s age to predict the woman’s age. Before
calculating the slope, will the slope change? If so how? Now find the slope of the
regression line to see if your thoughts are verified.

Let X = quiz 1 score and Y = quiz 2 score. Suppose 5 students have the following
scores, given as quiz 1 score, quiz 2 score (or as X, Y):

Student 1: 10, 10
Student 2: 9, 8
Student 3: 6, 8
Student 4: 5, 9
Student 5: 8, 9

The regression analysis shows R-squared = 58%. The results are below. (Assume the
scatterplot shows a linear relationship.)

Standard Lower Upper


Coefficients Error t Stat P-value 95% 95%

Intercept 7.651163 1.689611 4.528357 0.020147 2.274 13.028


-
Quiz 1 0.151162 0.215978 0.699896 0.534383 0.536 0.839

From the above information and the data set:

23. Draw the scatterplot

24. Write down the equation for the best-fitting line.

25. Find each of the 5 residuals for this data set. (Remember what a residual and how it
is calculated from your lecture notes.) [This is more challenging; try to answer
without your TA’s help – use those around you AND your lecture notes!]
STAT 1430 Recitation 4B Regression

26. Sketch your own residual plot for this data set and interpret. There are 5 points so
there should be 5 residuals.

27. Interpret the R2 value.

Das könnte Ihnen auch gefallen