Beruflich Dokumente
Kultur Dokumente
REGRESSION
Last week
Discussed the ideas behind:
Hypothesis testing
Random Sampling Error
Statistical Significance, Alpha, and p-values
Firstbrief review
Quick Review/Quiz
A health researcher plans to determine if
there is an association between physical
activity and body composition.
Specifically, the researcher thinks that
people who are more physically active (PA)
will have a lower percent body fat (%BF).
H A:
People with PA will have %BF
Our Decision
Reject HO Accept HO
Questions?
Back to correlations
Recall, correlations provide two critical pieces
of information a relationship between two
variables:
1) Direction (+ or -)
2) Strength/Magnitude
A B C
Correlation and Prediction
The stronger the relationship between
two variables, the more accurately you
can use information from one of those
variables to predict the other
Or
Example with BP
Variance: BP
Mean = 119 mmHg
SD = 20
N = 22,270
Average
systolic blood
pressure in
the United
States
Note mean
and variation
(variance) in
the values
170
160 r = 0.81
150
140
Weight
130
120
110
100
90
80
55 65 75
Height
130
120
110
100
90
80
55 65 75
Height
The green line indicates a possible line, the blue arrows
indicate the deviations longer arrows = bigger deviations
This is a crappy attempt it will keep trying new lines until it finds
the best one
Least squares estimation: Fancy process where SPSS draws
every possible line through the points - until finding the line where
the vertical deviations from that line are the smallest
170
160 r = .81
150
140
Weight
130
120
110
100
90
80
55 65 75
Height
Eventually, SPSS will get it right, finding the line that
minimizes deviations, known as:
Line of Best Fit
The Line of Best fit is the end-product of regression
This line will have a certain slope
170
160 r = .81
150
140
Up so
Weight
130
many units
120
110
100
SLOPE
90 In so many others
80
55 65 75
Height
-234
And it will have a value on the y-axis for the
zero value of the x-axis INTERCEPT
The intercept can be seen more clearly if we redraw the
graph with appropriate axes
200
150
100
50
0
Weight
-50 0 20 40 60 80
-100
-150
-200
-234lbs
-250
-300
Height
The intercept will sometimes be a nonsense value in
this case, nobody is 0 inches tall or weighs -234 lbs.
From the line (its equation), we can predict that an increase
in height of 1 inch predicts a rise in weight of 5.4 lbs
170
160 r = .81
150
140
Weight
130
135lbs
120
110
Slope = 5.4
100
90
80
55 65 68 75
Height
Standardi
zed
Unstandardized Coefficien
Coefficients ts
Model B Std. Error Beta t Sig.
1 (Constant) -234.681 71.552 -3.280 .005
Height (in inches) 5.434 1.067 .806 5.092 .000
a. Dependent Variable: Weight (in pounds)
INTERCEPT SLOPE
Y = b + mX
or
Y = a + bX
SLOPE
INTERCEPT
130
120
110
100 Small Error
90
80
55 65 75
Height
130
Large Residual
120
110
100
90 Small Residual
80
55 65 75
Height
Lets look at a
scatterplot first
All my assumptions are good, should be
able to produce a decent prediction
Strength? Direction?
Statistically significant correlations will (usually)
produce statistically significant predictors
r2 = ?? 0.66
Y-intercept = 1.259
Slope = 1.245
20-yard dash is a statistically significant predictor
What is our equation to predict 40-yard dash?
Equation
40yard dash time =
1.245(20yard time) + 1.259
If a player ran the 20-yard dash in 2.5 seconds,
what is their estimated 40-yard dash time?
1.245(2.5) + 1.259 =
4.37 seconds
If the player actually ran 4.53 seconds, what is
the residual?
Residual = Real Predicted
4.53 4.37 = 0.16
Significance vs. Importance in
Regression
A statistically significant model/variable does NOT
mean the equation is good at predicting
QUESTIONS?
Upcoming
In-class activity
Homework:
Cronk Section 5.3
Holcomb Exercises 29, 44, 46 and 33