1 Slide © 2006 Thomson/South-Western

Chapter 13
Multiple Regression
Multiple Regression Model
Least Squares Method
Multiple Coefficient of Determination
Model Assumptions
Testing for Significance
Using the Estimated Regression

Equation
for Estimation
and Prediction
Qualitative
Independent
Variables
2006 Thomson/South-Western

The equation that describes how the
dependent variable y is related to the
independent variables x1, x2, . . . xp and an
error term is called the multiple regression
model.
y = 0 + 1x1 + 2x2 + . . . + pxp +
where:
0, 1, 2, . . . , p are the parameters, and
is a random variable called the error term
Multiple Regression Equation

The equation that describes how the mean
value of y is related to x1, x2, . . . xp is called
the multiple regression equation.
E(y) = 0 + 1x1 + 2x2 + . . . + pxp
Estimated Multiple Regression Equation

A simple random sample is used to
compute sample statistics b0, b1, b2, . . . , bp
that are used as the point estimators of the
parameters 0, 1, 2, . . . , p.
The estimated multiple regression equation is:
y =^b0 + b1x1 + b2x2 + . . . + bpxp
Estimation Process
E(y) = 0 + 1x1 + 2x2 +. . .+ pxp +

Multiple Regression Equation
E(y) = 0 + 1x1 + 2x2 +. . .+ pxp
Unknown parameters are
Sample Data:
x1 x2 . . . xp y
.
.
.
.
.
.
0 , 1 , 2 , . . . , p
b0, b1, b2, . . . , bp

provide estimates of
0, 1, 2, . . . , p
Estimated Multiple
Regression Equation
y b0 b1x1 b2x2 ... bpxp
Sample statistics are

b0, b1, b2, . . . , bp
5
.
.
Least Squares Method
Least Squares Criterion

min (yi yi )2
Computation of Coefficient Values
The formulas for the regression coefficients

b0, b1, b2, . . . bp involve the use of matrix algebra.
We will rely on computer software packages to
perform the calculations.
Example: Programmer Salary Survey

A software firm collected data for a sample
of 20 computer programmers. A suggestion
was made that regression analysis could
be used to determine if salary was related
to the years of experience and the score
on the firms programmer aptitude test.
The years of experience, score on the

aptitude
test, and corresponding annual salary ($1000s)
for a
sample of 20 programmers is shown on the
next
slide.
2006
Thomson/South-Western
7

Exper. Score Salary
4
7
1
5
8
10
0
1
6
6
78
100
86
82
86
84
75
80
83
91
24
43
23.7
34.3
35.8
38
22.2
23.1
30
33
Exper. Score Salary

9
2
10
5
6
8
4
6
3
3
88
73
75
81
74
87
79
94
70
89
38
26.6
36.2
31.6
29
34
30.1
33.9
28.2
30

Suppose we believe that salary (y) is
related to the years of experience (x1) and the
score on
the programmer aptitude test (x2) by the
following
y = 0 + 1x1 + 2x2 +
regression model:
where
y = annual salary ($1000)
x1 = years of experience
x2 = score on programmer aptitude test
Solving for the Estimates of 0, 1, 2
Least Squares
Output
Input Data
x1
4
2006
x2
y
Computer
Package
for Solving
Multiple
Regression
Problems
78
24
7 100
43
.
.
.
.
.
.
b0 =
b1 =
b2 =
R2 =
etc.
10
Excel Worksheet (showing partial data

entered)
1
2
3
4
5
6
7
8
9
A
B
C
Programmer Experience (yrs) Test Score
1
4
78
2
7
100
3
1
86
4
5
82
5
8
86
6
10
84
7
0
75
8
1
80
D
Salary ($K)
24.0
43.0
23.7
34.3
35.8
38.0
22.2
23.1
Note: Rows 10-21 are not shown.
11
Excels Regression Dialog Box
12
Excels Regression Equation Output

A
38
39
Coeffic. Std. Err. t Stat P-value
40 Intercept
3.17394 6.15607 0.5156 0.61279
41 Experience
1.4039 0.19857 7.0702 1.9E-06
42 Test Score 0.25089 0.07735 3.2433 0.00478
43
Note: Columns F-I are not shown.
13
Estimated Regression Equation
SALARY
SALARY =
= 3.174
3.174 +
+ 1.404(EXPER)
1.404(EXPER) +
+ 0.251(SCORE)
0.251(SCORE)
Note: Predicted salary will be in thousands of dollars.
14
Interpreting the Coefficients

In multiple regression analysis, we
interpret each
regression coefficient as follows:
bi represents an estimate of the change in y
corresponding to a 1-unit increase in xi when all
other independent variables are held constant.
15

b
b11 =
= 1.
1. 404
404
Salary is expected to increase by $1,404 for
each additional year of experience (when the
variable
score on programmer attitude test is held
constant).
16

b
b22 =
= 0.251
0.251
Salary is expected to increase by $251 for
each
additional point scored on the programmer
aptitude
test (when the variable years of experience is
held
constant).
17
Relationship Among SST, SSR, SSE

SST =
SSE
SSR
2
2
2
(
y
y
)
(
y
y
)
(
y
y
)
i
i
i i
where:
SST = total sum of squares
SSR = sum of squares due to regression
SSE = sum of squares due to error
18
Excels ANOVA Output

A
32
33
34
35
36
37
38
ANOVA
df
SS
MS
F
Significance F
Regression
2 500.3285 250.1643 42.76013 2.32774E-07
Residual
17 99.45697 5.85041
Total
19 599.7855
SSR
SST
19

R2 = SSR/SST
R2 = 500.3285/599.7855 = .83418
20
Adjusted Multiple Coefficient

of Determination
Ra2
n 1
1 (1 R )
n p 1
2
20 1
R 1 (1 .834179)
.814671
20 2 1
2
a
21
Adjusted Multiple Coefficient

of Determination
Excels Regression Statistics

A
23
24
25
26
27
28
29
30
31
32
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.913334059
R Square
0.834179103
Adjusted R Square
0.814670762
Standard Error
2.418762076
Observations
20
22
Assumptions About the Error Term

The
The error
error is
is aa random
random variable
variable with
with mean
mean of
of zero.
zero.
The
The variance
variance of
of ,, denoted
denoted by
by
22,, is
is the
the same
same for
for all
all
values
values of
of the
the independent
independent variables.
variables.
The
The values
values of
of are
are independent.
independent.
The
The error
error is
is aa normally
normally distributed
distributed random
random variable
variable
reflecting
reflecting the
the deviation
deviation between
between the
the yy value
value and
and the
the
expected
expected value
value of
of yy given
given by
by 00 +
+ 11xx11+
+ 22xx22+
+ .. .. +
+ ppxxpp..
23
Testing for Significance

In
In simple
simple linear
linear regression,
regression, the
the FF and
and tt tests
tests provide
provide
the
the same
same conclusion.
conclusion.
In
In multiple
multiple regression,
regression, the
the FF and
and tt tests
tests have
have different
different
purposes.
purposes.
24
Testing for Significance: F Test

The
The FF test
test is
is used
used to
to determine
determine whether
whether aa significant
significant
relationship
relationship exists
exists between
between the
the dependent
dependent variable
variable
and
and the
the set
set of
of all
all the
the independent
variables.
The
The FF test
test is
is referred
referred to
to as
as the
the test
test for
for overall
overall
significance.
significance.
25
Testing for Significance: t Test

IfIf the
the FF test
test shows
shows an
an overall
overall significance,
significance, the
the tt test
test is
is
used
used to
to determine
determine whether
whether each
each of
of the
the individual
individual
independent
independent variables
variables is
is significant.
significant.
A
A separate
separate tt test
test is
is conducted
conducted for
for each
each of
of the
the
independent
variables in
in the
the model.
model.
We
We refer
refer to
to each
each of
of these
these tt tests
tests as
as aa test
test for
for individual
individual
significance.
significance.
26
Testing for Significance: F Test

Hypotheses H0: 1 = 2 = . . . = p = 0
Ha: One or more of the parameters
is not equal to zero.
Test Statistics F = MSR/MSE
Rejection RuleReject H0 if p-value < or if F > F
where F is based on an F distribution
with p d.f. in the numerator and
n - p - 1 d.f. in the denominator.
27
F Test for Overall Significance

Hypotheses H0: 1 = 2 = 0
Ha: One or both of the parameters
is not equal to zero.
For = .05 and d.f. = 2, 17; F.05 = 3.59
Rejection Rule
Reject H0 if p-value < .05 or F > 3.59
28
Excels ANOVA Output

A
32
33
34
35
36
37
38
ANOVA
df
SS
MS
F
Significance F
Regression
2 500.3285 250.1643 42.76013 2.32774E-07
Residual
17 99.45697 5.85041
Total
19 599.7855
p-value used to test

for
overall significance
29

F = MSR/MSE
Test Statistics
= 250.16/5.85 = 42.76
Conclusion
p-value < .05, so we can reject H0.

(Also, F = 42.76 > 3.59)
30
Testing for Significance: t Test

Hypotheses
H 0 : i 0
H a : i 0
Test Statistics
bi
t
sbi
Rejection Rule Reject H0 if p-value < or

if t < -tor t > twhere t
is based on a t distribution
with n - p - 1 degrees of freedom.
31
t Test for Significance

of Individual Parameters
Hypotheses
H 0 : i 0
H a : i 0
Rejection Rule
For = .05 and d.f. = 17, t.025 = 2.11
Reject H0 if p-value < .05 or if t > 2.11
32


A
38
39
40 Intercept
3.17394 6.15607 0.5156 0.61279
41 Experience
1.4039 0.19857 7.0702 1.9E-06
42 Test Score 0.25089 0.07735 3.2433 0.00478
43
t statistic and p-value used to test for

the individual significance of
Experience
33


A
38
39
40 Intercept
3.17394 6.15607 0.5156 0.61279
41 Experience
1.4039 0.19857 7.0702 1.9E-06
42 Test Score 0.25089 0.07735 3.2433 0.00478
43
t statistic and p-value used to test for

the individual significance of Test
Score
34

Test Statistics
b1 1. 4039
7.07
sb1 .1986
b2 . 25089
3. 24
sb2 . 07735
Conclusions Reject both H0: 1 = 0 and H0: 2 = 0.

Both independent variables are
significant.
35
Testing for Significance: Multicollinearity

The
The term
term multicollinearity
multicollinearity refers
refers to
to the
the correlation
correlation
among
among the
the independent
variables.
When
When the
the independent
variables are
are highly
highly correlated
correlated
(say,
(say, |r
|r || >
> .7),
.7), it
it is
is not
not possible
possible to
to determine
determine the
the
separate
separate effect
effect of
of any
any particular
particular independent
independent variable
variable
on
on the
the dependent
dependent variable.
variable.
36
Testing for Significance: Multicollinearity

IfIf the
the estimated
estimated regression
regression equation
equation is
is to
to be
be used
used only
only
for
for predictive
predictive purposes,
purposes, multicollinearity
multicollinearity is
is usually
usually
not
not aa serious
serious problem.
problem.
Every
Every attempt
attempt should
should be
be made
made to
to avoid
avoid including
including
independent
variables that
that are
are highly
highly correlated.
correlated.
37
Using the Estimated Regression Equation

for Estimation and Prediction
The
The procedures
procedures for
for estimating
estimating the
the mean
mean value
value of
of yy
and
and predicting
predicting an
an individual
individual value
value of
of yy in
in multiple
multiple
regression
regression are
are similar
similar to
to those
those in
in simple
simple regression.
regression.
We
We substitute
substitute the
the given
given values
values of
of xx11,, xx22,, .. .. .. ,, xxpp into
into
the
the estimated
estimated regression
regression equation
equation and
and use
use the
the
corresponding
corresponding value
value of
of yy as
as the
the point
point estimate.
estimate.
38
Using the Estimated Regression Equation

for Estimation and Prediction
The
The formulas
formulas required
required to
to develop
develop interval
interval estimates
estimates
^
for
the
mean
value
of
for the mean value of^yy and
and for
for an
an individual
individual value
value
of
of yy are
are beyond
beyond the
the scope
scope of
of the
the textbook.
textbook.
Software
Software packages
packages for
for multiple
multiple regression
regression will
will often
often
provide
provide these
these interval
interval estimates.
estimates.
39
Qualitative Independent Variables

In
In many
many situations
situations we
we must
must work
work with
with qualitative
qualitative
independent
variables such
such as
as gender
gender (male,
(male, female),
female),
method
method of
of payment
payment (cash,
(cash, check,
check, credit
credit card),
card), etc.
etc.
For
For example,
example, xx22 might
might represent
represent gender
gender where
where xx22 =
= 00
indicates
indicates male
male and
and xx22 =
= 11 indicates
indicates female.
female.
In
In this
this case,
case, xx22 is
is called
called aa dummy
dummy or
or indicator
indicator variable.
variable.
40
Example: Programmer Salary Survey
As an extension of the problem involving the

computer programmer salary survey, suppose
that management also believes that the
annual salary is related to whether the
individual has a graduate degree in
computer science or information systems.
The years of experience, the score on the
programmer
aptitude test, whether the individual has a
relevant
graduate degree, and the annual salary ($1000)
for each
of the sampled 20 programmers are shown on the
2006
41
next

Exper. Score Degr. Salary Exper. Score Degr. Salary
4
7
1
5
8
10
0
1
6
6
78
100
86
82
86
84
75
80
83
91
No
Yes
No
Yes
Yes
Yes
No
No
No
Yes
24
43
23.7
34.3
35.8
38
22.2
23.1
30
33
9
2
10
5
6
8
4
6
3
3
88
73
75
81
74
87
79
94
70
89
Yes
No
Yes
No
No
Yes
No
Yes
No
No
38
26.6
36.2
31.6
29
34
30.1
33.9
28.2
30
42
Estimated Regression Equation
y = b0 + b1x1 + b2x2 + b3x3

where:
y =^annual salary ($1000)
x1 = years of experience
x2 = score on programmer aptitude test
x3 = 0 if individual does not have a graduate degree
1 if individual does have a graduate degree
x3 is a dummy variable
43
Excels Regression Statistics

A
23
24
25
26
27
28
29
30
31
32
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.920215239
R Square
0.846796085
Adjusted R Square
0.818070351
Standard Error
2.396475101
Observations
20
44
Excels ANOVA Output

A
32
33
34
35
36
37
38
ANOVA
df
SS
MS
F
Significance F
Regression
3 507.896 169.2987 29.47866 9.41675E-07
Residual
16 91.88949 5.743093
Total
19 599.7855
45
Excels Regression Equation

Output
A
38
39
40
41
42
43
44
Coeffic. Std. Err.

Intercept
7.94485 7.3808
Experience 1.14758 0.2976
Test Score 0.19694 0.0899
Grad. Degr. 2.28042 1.98661
t Stat P-value
1.0764 0.2977
3.8561 0.0014
2.1905 0.04364
1.1479 0.26789
Not significant
46
Excels Regression Equation

Output
A
38
39
40
41
42
43
44
Coeffic. Low. 95%

Intercept
7.94485 -7.701739
Experience 1.14758 0.516695
Test Score 0.19694
0.00635
Grad. Degr. 2.28042 -1.931002
Up. 95%
23.5914
1.77847
0.38752
6.49185
Low. 95.0%
-7.7017385
0.51669483
0.00634964
-1.9310017
Up. 95.0%
23.591436
1.7784686
0.3875243
6.4918494
Note: Columns C-E are hidden.
47
More Complex Qualitative Variables

IfIf aa qualitative
qualitative variable
variable has
has kk levels,
levels, kk -- 11 dummy
dummy
variables
variables are
are required,
required, with
with each
each dummy
dummy variable
variable
being
being coded
coded as
as 00 or
or 1.
1.
For
For example,
example, aa variable
variable with
with levels
levels A,
A, B,
B, and
and C
C could
could
be
be represented
represented by
by xx11 and
and xx22 values
values of
of (0,
(0, 0)
0) for
for A,
A, (1,
(1, 0)
0)
for
for B,
B, and
and (0,1)
(0,1) for
for C.
C.
Care
Care must
must be
be taken
taken in
in defining
defining and
and interpreting
interpreting the
the
dummy
dummy variables.
variables.
48
More Complex Qualitative Variables

For example, a variable indicating level of
education could be represented by x1 and x2
values as follows:
Highest
Degree
x1
x2
Bachelors
0 0
Masters
1 0
Ph.D.
0 1
49
End of Chapter 13
50

1 Slide © 2006 Thomson/South-Western

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

1 Slide © 2006 Thomson/South-Western

Hochgeladen von

Copyright:

Verfügbare Formate

Chapter 13

Multiple Regression Model

Least Squares Method

Multiple Coefficient of Determination

Testing for Significance

Using the Estimated Regression

Multiple Regression Model

Multiple Regression Equation

Estimated Multiple Regression Equation

E(y) = 0 + 1x1 + 2x2 +. . .+ pxp +

b0, b1, b2, . . . , bp

y b0 b1x1 b2x2 ... bpxp

Sample statistics are

Least Squares Method

Least Squares Criterion

Computation of Coefficient Values

The formulas for the regression coefficients

Multiple Regression Model

Example: Programmer Salary Survey

The years of experience, score on the

Multiple Regression Model

Exper. Score Salary

Multiple Regression Model

Solving for the Estimates of 0, 1, 2

Solving for the Estimates of 0, 1, 2

Excel Worksheet (showing partial data

Note: Rows 10-21 are not shown.

Solving for the Estimates of 0, 1, 2

Excels Regression Dialog Box

Solving for the Estimates of 0, 1, 2

Excels Regression Equation Output

Note: Columns F-I are not shown.

Estimated Regression Equation

Interpreting the Coefficients

Interpreting the Coefficients

Interpreting the Coefficients

Multiple Coefficient of Determination

Relationship Among SST, SSR, SSE

Multiple Coefficient of Determination

Excels ANOVA Output

Multiple Coefficient of Determination

Adjusted Multiple Coefficient

Adjusted Multiple Coefficient

Excels Regression Statistics

Assumptions About the Error Term

Testing for Significance

Testing for Significance: F Test

Testing for Significance: t Test

Testing for Significance: F Test

F Test for Overall Significance

F Test for Overall Significance

Excels ANOVA Output

p-value used to test

F Test for Overall Significance

p-value < .05, so we can reject H0.

Testing for Significance: t Test

Rejection Rule Reject H0 if p-value < or

t Test for Significance

t Test for Significance

Excels Regression Equation Output

Note: Columns F-I are not shown.

t statistic and p-value used to test for

t Test for Significance