Sie sind auf Seite 1von 12

ADM2304

Multiple Regression
Dr. Suren Phansalker
1. Background:
The method of least squares can be expanded to include
more than one predictor. The method is known as multiple
regression.
Multiple regression is a rich methodology and one of the
most important tools in Statistics.
We can only scratch the surface here, but you can take
entire courses on multiple regression later in your statistics
career!
1

Simple Linear Regression:


For a simple regression, the model underlying the
regression line had the form: y = 0 + 1x + , where 0 is
the intercept and 1 is the slope.
Multiple Linear Regression:
For a multiple linear regression with k predictors, the
model is: y = 0 + 1x1 + 2 x2 ++ k xk +
The slope of each x-variable tells us how much y will
change for a unit increase in that variable, for a
particular value of all the other predictor variables in
the equation.
2

Basic Caveats:
When more than one predictor is in the model, we have to
be careful not to think of the slopes as the simple effect of
that predictor on the response.
Rather, each slope is the effect of that predictor on the
response with all the other predictors in the equation at a
given value for the other variables.
All the Independent or Explanatory Variables, x1, x2, xk
must be linear with 1 as the exponent.

2. Assumptions and Conditions:


Linearity Assumption:
Straight enough condition: Check the scatterplot for
each candidate predictor variablethe shape must
not be obviously curved or we cant consider that
predictor in our multiple regression model.
Independence Assumption:
Randomization condition: Check the residuals plot
(part 1)the residuals should appear to be randomly
scattered without any pattern.
Equal Variance Assumption:
Does the plot thicken? condition: Check the
residuals plot (part 2)the spread of the residuals
should be uniform and also without any pattern. 4

3. ANOVA Table and the F Test:


Since we have already worked with ANOVA, we know
what its about.
The ANOVA table in your multiple regression output
tests the hypothesis that all of the slope coefficients are
zero:

H 0 : 1 = 2 =

= k = 0

Large values of the F-statistic will lead us to reject the


null hypothesis.
We will then conclude that the slope coefficients are not
all equal to zerothat the multiple regression model is
better than just using the mean for predicting our
response.
5

4. t-Tests for Individual Coefficients:


Once we conclude that not all of the slope coefficients are
zero, we can test the individual slope coefficients.
For each coefficient we test:

H0 : j = 0

The regression table gives us a t-statistic to perform such


a test.
Assuming that our assumptions and conditions (including
the nearly Normal condition) are met, the t-statistic will
follow a t-distribution with n k 1 degrees of freedom.
Fortunately, the regression output also provides P-values
for the t-statistics, so we can make conclusions about the
individual null hypotheses based on these P-values.
6

5. Confidence Interval for Coefficients:


A confidence interval for j is:

bj t

*
n k 1

SE ( b j )

Use the standard errors, SE(bj) provided by the MiniTab


output.
And remember, the meaning of a coefficient depends on
all the other predictors in the multiple regression model!
The complexity of multiple regression is that the
coefficients of each predictor change depending on what
other predictors are in the model.
Note: The exception to this is if the predictors are truly
independentsomething that will happen only in data
7
from carefully designed experiments.

6. Quality of Multiple Linear Regression, R2:


Where should we stop? While we shouldnt be guided
only by R2 when we choose models, for models with the
same number of predictor variables, a model with all
coefficients statistically significant (small P-values) and a
higher R2 is preferred to one with any coefficient not
statistically significant and a lower R2.
7. Quality of Multiple Linear Regression, R2Adj:
The R2Adj statistic can be found in the regression output or
by using appropriate formulas as shown else where.
It is a rough attempt to adjust for the simple fact that
when we add another predictor to the model, the R2 cant
go down and will most likely get larger.
The R2Adj can be difficult to interpret, but it will allow us
to compare multiple regression models with different
8
numbers of predictors.

8. Interpreting Coefficients:
Dont think the sign of a coefficient is specialthe sign
of a coefficient also depends on the other predictors in the
model.
If a coefficients t-statistic is not significant, dont
interpret it at allyou cant be sure that the value of the
corresponding parameter isnt really zero.
Dont fit a linear regression to data that arent straight.
Watch out for the plot thickening.
Make sure the errors are nearly Normalcheck the
histogram and Normal probability plot of the residuals.
Watch out for high-influence points and outliers.
9

9. An Example of Multiple Linear Regression:


See the Lecture9&10MTB2304 File for the Data and other details.
MTB > regress c1 3 c2-c4 c5 c6

Regression Analysis: ValueM$ versus SalesM$, ProfitM$, AssetsM$

The regression equation is


ValueM$ = 9190 - 0.269 SalesM$ + 14.3 ProfitM$ + 0.048
AssetsM$

Predictor
Constant
SalesM$
ProfitM$
AssetsM$

S = 10075.0

Coef
9190
-0.2686
14.337
0.0482

SE Coef
3250
0.1464
2.383
0.1094

R-Sq = 73.1%

T
2.83
-1.83
6.02
0.44

P
0.010
0.081
0.000
0.664

R-Sq(adj) = 69.2%

10

Analysis of Variance:

Source
Regression
Residual Error
Total

DF
3
21
24

Source
SalesM$
ProfitM$
AssetsM$

Seq SS
1968928435
3799622474
19735602

DF
1
1
1

SS
5788286511
2131605507
7919892018

MS
1929428837
101505024

F
19.01

P
0.000

11

Unusual Observations:
Obs

9
20

SalesM$
51250
34087
102814
62715
14652

ValueM$
90055
25728
24952
21206
16930

Fit
66869
6410
27345
41268
25967

SE Fit
6217
4080
7814
4249
8878

Residual
23186
19318
-2393
-20062
-9037

St Resid
2.92R
2.10R
-0.38 X
-2.20R
-1.90 X

R denotes an observation with a large


standardized residual.
X denotes an observation whose X value gives
it large influence.

12

Das könnte Ihnen auch gefallen