Sie sind auf Seite 1von 31

IOM 530: Intro.

to Statistical Learning

LINEAR REGRESSION
Chapter 03

IOM 530: Intro. to Statistical Learning

Outline
The Linear Regression Model
Least Squares Fit
Measures of Fit
Inference in Regression
Other Considerations in Regression Model
Qualitative Predictors
Interaction Terms
Potential Fit Problems
Linear vs. KNN Regression

IOM 530: Intro. to Statistical Learning

Outline
The Linear Regression Model
Least Squares Fit
Measures of Fit
Inference in Regression
Other Considerations in Regression Model
Qualitative Predictors
Interaction Terms
Potential Fit Problems
Linear vs. KNN Regression

IOM 530: Intro. to Statistical Learning

The Linear Regression Model


Yi = b0 + b1X1 + b2 X2 +

+ bp Xp +e

The parameters in the linear regression model are very

easy to interpret.
0 is the intercept (i.e. the average value for Y if all the

Xs are zero), j is the slope for the jth variable Xj

j is the average increase in Y when Xj is increased by

one and all other Xs are held constant.

IOM 530: Intro. to Statistical Learning

Least Squares Fit


We estimate the

parameters using
least squares i.e.
minimize

1
MSE = Yi - Yi
n i=1

1 n
= Yi - b0 - b1 X1 n i=1

- b p X p

IOM 530: Intro. to Statistical Learning

Relationship between population and


least squares lines
Population
line

Yi = b0 + b1X1 + b2 X2 +

Least Squares
line

Yi = b0 + b1 X1 + b2 X2 +

+ bp Xp +e
+ b p X p

We would like to know 0 through p i.e. the population line.


Instead we know 0 through p i.e. the least squares line.
Hence we use 0 through p as guesses for 0 through p and Yi
as a guess for Yi. The guesses will not be perfect just as X is
not a perfect guess for .

IOM 530: Intro. to Statistical Learning

Measures of Fit: R2
Some of the variation in Y can be explained by variation

in the Xs and some cannot.

R2 tells you the fraction of variance that can be explained

by X.

RSS
Ending Variance
R 1
1
2
Starting Variance
(Yi Y )
2

R2 is always between 0 and 1. Zero means no variance has


been explained. One means it has all been explained (perfect
fit to the data).

IOM 530: Intro. to Statistical Learning

Inference in Regression
Estimated (least squares) line.

The regression line from the

sample is not the regression line


from the population.
What we want to do:

Assess how well the line


describes the plot.
Guess the slope of the
population line.
Guess what value Y would take
for a given X value

14
12
10
8
6
4
2
0
-10

-5

0
X

10

True (population) line. Unobserved

IOM 530: Intro. to Statistical Learning

Some Relevant Questions


1.

Is j=0 or not? We can use a hypothesis test to answer


this question. If we cant be sure that j0 then there is
no point in using Xj as one of our predictors.

1.

Can we be sure that at least one of our X variables is a


useful predictor i.e. is it the case that 1= 2== p=0?

IOM 530: Intro. to Statistical Learning

10

1. Is j=0 i.e. is Xj an important variable?


We use a hypothesis test to answer this question
H0: j=0
Calculate

vs
t

Ha: j0
Number of standard deviations
away from zero.

SE( j )

If t is large (equivalently p-value is small) we can be sure

that j0 and that there is a relationship


Regression coefficients
Coefficient
Constant
7.0326
TV
0.0475

1 is 17.67 SEs from 0

Std Err
0.4578
0.0027

SE( 1 )

t-value
15.3603
17.6676

p-value
0.0000
0.0000

P-value

IOM 530: Intro. to Statistical Learning

11

Testing Individual Variables


Is there a (statistically detectable) linear relationship between
Newspapers and Sales after all the other variables have been
accounted for?
Regression coefficients
Coefficient
Constant
2.9389
TV
0.0458
Radio
0.1885
Newspaper
-0.0010
Regression coefficients
Coefficient
Constant
12.3514
Newspaper
0.0547

Std Err
0.3119
0.0014
0.0086
0.0059

Std Err
0.6214
0.0166

t-value
9.4223
32.8086
21.8935
-0.1767

t-value
19.8761
3.2996

p-value
0.0000
0.0000
0.0000
0.8599

p-value
0.0000
0.0011

No: big p-value


Small p-value in
simple regression

Almost all the explaining that Newspapers could do in simple


regression has already been done by TV and Radio in multiple
regression!

IOM 530: Intro. to Statistical Learning

12

2. Is the whole regression explaining


anything at all?
Test for:

H0: all slopes = 0

(1=2==p=0),

Ha: at least one slope 0


ANOVA Table
Source
Explained
Unexplained

df
2
197

SS
4860.2347
556.9140

MS
2430.1174
2.8270

F
859.6177

p-value
0.0000

Answer comes from the F test in the ANOVA (ANalysis Of


VAriance) table.
The ANOVA table has many pieces of information. What we
care about is the F Ratio and the corresponding p-value.

IOM 530: Intro. to Statistical Learning

Outline
The Linear Regression Model
Least Squares Fit
Measures of Fit
Inference in Regression
Other Considerations in Regression Model
Qualitative Predictors
Interaction Terms
Potential Fit Problems
Linear vs. KNN Regression

13

IOM 530: Intro. to Statistical Learning

14

Qualitative Predictors
How do you stick men and women (category listings)

into a regression equation?


Code them as indicator variables (dummy variables)

For example we can code Males=0 and Females= 1.

IOM 530: Intro. to Statistical Learning

15

Interpretation
Suppose we want to include income and gender.
Two genders (male and female). Let

0 if male
Genderi =
1 if female

then the regression equation is

b0 + b1Incomei if male
Yi b0 + b1Incomei + b2Genderi =
b0 + b1Incomei + b2 if female
2 is the average extra balance each month that females

have for given income level. Males are the baseline.

Regression coefficients
Constant
Income
Gender_Female

Coefficient
233.7663
0.0061
24.3108

Std Err
39.5322
0.0006
40.8470

t-value
5.9133
10.4372
0.5952

p-value
0.0000
0.0000
0.5521

IOM 530: Intro. to Statistical Learning

16

Other Coding Schemes


There are different ways to code categorical variables.
Two genders (male and female). Let
-1 if male
Genderi =
1 if female

then the regression equation is

b0 + b1Incomei -b2, if male


Yi b0 + b1Incomei + b2Genderi =
b0 + b1Incomei + b2 , if female
2 is the average amount that females are above the

average, for any given income level. 2 is also the


average amount that males are below the average, for
any given income level.

IOM 530: Intro. to Statistical Learning

Other Issues Discussed


Interaction terms
Non-linear effects
Multicollinearity
Model Selection

17

IOM 530: Intro. to Statistical Learning

18

Interaction
When the effect on Y of increasing X1 depends on

another X2.

Example:
Maybe the effect on Salary (Y) when increasing Position (X1)
depends on gender (X2)?
For example maybe Male salaries go up faster (or slower)
than Females as they get promoted.
Advertising example:
TV and radio advertising both increase sales.
Perhaps spending money on both of them may increase
sales more than spending the same amount on one alone?

IOM 530: Intro. to Statistical Learning

19

Interaction in advertising
Sales = b0 + b1 TV + b2 Radio+ b3 TV Radio
Sales = b0 +(b1 + b3 Radio)TV + b2 Radio
Spending $1 extra on TV increases average sales

Interaction Term

by 0.0191 + 0.0011Radio
Sales = b0 + (b2 + b3 TV) Radio + b2 TV
Spending $1 extra on Radio increases average

sales by 0.0289 + 0.0011TV


Parame ter Estimates
Term
Intercept
TV
Radio
TV*Radio

Estimate
6.7502202
0.0191011
0.0288603
0.0010865

Std Error t Ratio Prob>|t|


0.247871 27.23 <.0001*
0.001504 12.70 <.0001*
0.008905
3.24 0.0014*
5.242e-5 20.73 <.0001*

20

IOM 530: Intro. to Statistical Learning

Parallel Regression Lines


Line for women

Expanded Estimates
170

Nominal factors expanded to all levels


Term
Estimate Std Error

t Ratio

Prob>|t|

Intercept
Gender[female]
Gender[m ale]
Pos ition

77.52
3.53
-3.53
21.60

<.0001
0.0005
0.0005
<.0001

112.77039
1.8600957
-1.860096
6.0553559

1.454773
0.527424
0.527424
0.280318

160
150
140

Regression equation

130

female: salary = 112.77+1.86 + 6.05 position

120

males: salary = 112.77-1.86 + 6.05 position

Different
intercepts

Same
slopes

2.5

7.5

Position

Line for men


Parallel lines have the same slope.
Dummy variables give lines
different intercepts, but their
slopes are still the same.

10

IOM 530: Intro. to Statistical Learning

21

Interaction Effects
Our model has forced the line for men and the line for

women to be parallel.
Parallel lines say that promotions have the same salary

benefit for men as for women.


If lines arent parallel then promotions affect mens and

womens salaries differently.

22

IOM 530: Intro. to Statistical Learning

Should the Lines be Parallel?


170
160
150
140

Expanded Estimate s

130

Nominal factors expanded to all levels


Term
Estimate Std Error

t Ratio

Prob>|t|

Intercept
112.63081
Gender[female]
1.1792165
Gender[m ale]
-1.179216
Pos ition
6.1021378
Gender[female]*Position0.1455111
Gender[m ale]*Pos ition -0.145511

75.85
0.79
-0.79
20.58
0.49
-0.49

<.0001
0.4280
0.4280
<.0001
0.6242
0.6242

1.484825
1.484825
1.484825
0.296554
0.296554
0.296554

Interaction between gender and position

120
110
0 1 2 3 4

5 6 7

8 9 10

Position

Interaction is not significant

IOM 530: Intro. to Statistical Learning

Outline
The Linear Regression Model
Least Squares Fit
Measures of Fit
Inference in Regression
Other Considerations in Regression Model
Qualitative Predictors
Interaction Terms
Potential Fit Problems
Linear vs. KNN Regression

23

IOM 530: Intro. to Statistical Learning

24

Potential Fit Problems


There are a number of possible problems that one may
encounter when fitting the linear regression model.
1. Non-linearity of the data
2. Dependence of the error terms
3. Non-constant variance of error terms
4. Outliers
5. High leverage points
6. Collinearity
See Section 3.3.3 for more details.

IOM 530: Intro. to Statistical Learning

Outline
The Linear Regression Model
Least Squares Fit
Measures of Fit
Inference in Regression
Other Considerations in Regression Model
Qualitative Predictors
Interaction Terms
Potential Fit Problems
Linear vs. KNN Regression

25

IOM 530: Intro. to Statistical Learning

26

KNN Regression
kNN Regression is similar to the kNN classifier.
To predict Y for a given value of X, consider k closest

points to X in training data and take the average of the


responses. i.e.
f (x) =

1
yi

K xi Ni

If k is small kNN is much more flexible than linear

regression.
Is that better?

IOM 530: Intro. to Statistical Learning

KNN Fits for k =1 and k = 9

27

IOM 530: Intro. to Statistical Learning

28

KNN Fits in One Dimension (k =1 and k =


9)

IOM 530: Intro. to Statistical Learning

Linear Regression Fit

29

IOM 530: Intro. to Statistical Learning

KNN vs. Linear Regression

30

IOM 530: Intro. to Statistical Learning

Not So Good in High Dimensional


Situations

31