Sie sind auf Seite 1von 38

Business Statistics, 4e

by Ken Black

Chapter 13
Discrete Distributions

Simple Regression
Analysis

Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
13-1
Learning Objectives

• Compute the equation of a simple regression line from a


sample of data, and interpret the slope and intercept of the
equation.
• Understand the usefulness of residual analysis in testing the
assumptions underlying regression analysis and in
examining the fit of the regression line to the data.
• Compute a standard error of the estimate and interpret its
meaning.
• Compute a coefficient of determination and interpret it.
• Test hypotheses about the slope of the regression model and
interpret the results.
• Estimate values of Y using the regression model.

Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
13-2
Regression and Correlation
• Regression analysis is the process of
constructing a mathematical model or
function that can be used to predict or
determine one variable by another variable.

• Correlation is a measure of the degree of


relatedness of two variables.

Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
13-3
Simple Regression Analysis

• bivariate (two variables) linear regression --


the most elementary regression model
– dependent variable, the variable to be
predicted, usually called Y
– independent variable, the predictor or
explanatory variable, usually called X

Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
13-4
Airline Cost Data
Number of
Passengers Cost ($1,000)
X Y
61 4.280
63 4.080
67 4.420
69 4.170
70 4.480
74 4.300
76 4.820
81 4.700
86 5.110
91 5.130
95 5.640
97 5.560
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
13-5
Scatter Plot of Airline Cost Data

4
Cost ($1000)

0
0 20 40 60 80 100 120
Number of Passengers

Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
13-6
Regression Models
 Deterministic Regression Model

Y =  0 +  1X

 Probabilistic Regression Model

Y =  0 +  1X + 

 0 and 1 are population parameters

 0 and 1 are estimated by sample statistics b0 and b1

Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
13-7
Equation of the Simple Regression
Line
Yˆ  b0  b1 X
where : b 0
= the sample intercept

b = the sample slope


1

Yˆ = the predicted value of Y

Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
13-8
Least Squares Analysis

  X   Y 
  X  X Y  Y   XY  nXY  XY 
n
b  
 X  X   X n X
2 2 2

1 2

X 2
 X
n

 Y  X
b Y b X  n b n
0 1 1

Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
13-9
Least Squares Analysis

 X  Y 
SSXY    X  X Y  Y    XY  n

 X  X 
2
 X
X
2
SSXX  
2

n
SSXY
b1  SSXX

 Y  X
b  Y b X  n b n
0 1 1

Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
13-10
Solving for b1 and b0 of the Regression
Line: Airline Cost Example (Part 1)
Number of
Passengers Cost ($1,000)
X Y X2 XY

61 4.28 3,721 261.08


63 4.08 3,969 257.04
67 4.42 4,489 296.14
69 4.17 4,761 287.73
70 4.48 4,900 313.60
74 4.30 5,476 318.20
76 4.82 5,776 366.32
81 4.70 6,561 380.70
86 5.11 7,396 439.46
91 5.13 8,281 466.83
95 5.64 9,025 535.80
97 5.56 9,409 539.32

X = 930 Y = 56.69 X 2
= 73,764  XY = 4,462.22

Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
13-11
Solving for b1 and b0 of the Regression
Line: Airline Cost Example (Part 2)

SS XY   XY   X Y  4,462.22 
(930)(56.69)
 68.745
n 12

( X ) 2 (930) 2
SS XX  X 2

n
 73,764 
12
 1689

SS XY 68.745
b1    .0407
SS XX 1689

b0 
Y  b1
X 
56.69
 (.0407)
930
 1.57
n n 12 12

Yˆ  1.57  .0407 X

Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
13-12
Graph of Regression Line
for the Airline Cost Example

4
Cost ($1000)

0
0 20 40 60 80 100 120
Number of Passengers

Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
13-13
Airline Cost: Excel Summary Output
SUMMARY OUTPUT
Regression Statistics

Multiple R 0.94820033

R Square 0.89908386

Adjusted R Square 0.88899225

Standard Error 0.17721746

Observations 12

ANOVA

df SS MS F Significance F

Regression 1 2.79803 2.79803 89.092179 2.7E-06

Residual 10 0.31406 0.03141

Total 11 3.11209

Coefficients Standard Error t Stat P-value


Intercept 1.56979278 0.33808 4.64322 0.0009175
Number of Passengers 0.0407016 0.00431 9.43887 2.692E-06

Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
13-14
Residual Analysis:
Airline Cost Example
Number of Predicted
Passengers Cost ($1,000) Value Residual
X Y Ŷ Y  Yˆ

61 4.28 4.053 .227


63 4.08 4.134 -.054
67 4.42 4.297 .123
69 4.17 4.378 -.208
70 4.48 4.419 .061
74 4.30 4.582 -.282
76 4.82 4.663 .157
81 4.70 4.867 -.167
86 5.11 5.070 .040
91 5.13 5.274 -.144
95 5.64 5.436 .204
97 5.56 5.518 .042

 (Y  Yˆ )  .001

Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
13-15
Excel Graph of Residuals
for the Airline Cost Example

0.2

0.1
Residual

0.0

-0.1

-0.2

-0.3
60 70 80 90 100

Number of Passengers

Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
13-16
Nonlinear Residual Plot

0 X

Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
13-17
Nonconstant Error Variance

0 X

0 X

Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
13-18
Graphs of Nonindependent
Error Terms

0 X 0 X

Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
13-19
Healthy Residual Plot

0 X

Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
13-20
Standard Error of the Estimate

 
Sum of Squares Error
2

SSE   Y Y
  Y  b0  Y  b1  XY
2
Standard Error
of the
Estimate SSE
Se  n2

Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
13-21
Determining SSE
for the Airline Cost Example
Number of
Passengers Cost ($1,000) Residual
X Y Y  Yˆ (Y  Yˆ ) 2

61 4.28 .227 .05153


63 4.08 -.054 .00292
67 4.42 .123 .01513
69 4.17 -.208 .04326
70 4.48 .061 .00372
74 4.30 -.282 .07952
76 4.82 .157 .02465
81 4.70 -.167 .02789
86 5.11 .040 .00160
91 5.13 -.144 .02074
95 5.64 .204 .04162
97 5.56 .042 .00176

 (Y  Yˆ )  .001  (Y  Yˆ ) 2
=.31434

Sum of squares of error = SSE = .31434

Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
13-22
Standard Error of the Estimate
for the Airline Cost Example

Y Yˆ 
Sum of Squares Error
2
SSE  

Standard Error  0.31434


of the
Estimate SSE
Se  n  2
0.31434

10
 0.1773
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
13-23
Coefficient of Determination
  Y
2

SSYY   Y Y    Y
2

2

n
SSYY  exp lained var iation  un exp lained var iation
SSYY  SSR  SSE
SSR SSE
1 
SSYY SSYY
2 SSR
r  SSYY
SSE
 1
SSYY
SSE
 1
  0 r 1
2
2
Y
Y  n
2

Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
13-24
Coefficient of Determination
for the Airline Cost Example

SSE  0.31434

SSYY  Y 
Y
 270.9251 
2
 56.69
 3.11209
2 2

n 12
SSE
r  1
2
89.9% of the variability
SSYY of the cost of flying a
.31434 Boeing 737 is accounted for
 1 by the number of passengers.
3.11209
 .899
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
13-25
Hypothesis Tests for the Slope
of the Regression Model
b  1
t
H 0:  1  0
1

S b

H 1:  1  0 where: S 
S e
b
SSXX
H 0:  1  0 
SSE
S e
n2
H 1:  1  0
SSXX  
2

  X
2

X
H 0:  1  0 n
  the hypothesized slope
H 1:  1  0
1

df  n  2
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
13-26
Hypothesis Test: Airline Cost
Example (Part 1)

H 0:  1  0 df  n  2  10  2  10
  .05
H 1:  1  0  2.228
t .025,10

If | t | 2.228, reject H0
If  2.228  t  2.228, do not reject H0

Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
13-27
Hypothesis Test: Airline Cost
Example (Part 2)

.0407  0
t
.1773
2

73,764 
(930)
12
 9.43

Since t  9.43  2.228, reject H0

Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
13-28
Testing the Overall Model (Part 1)

H 0:  1  0
dfreg  k  1
dferr  n  k  1  12  1  1  10
H 1:  1  0   .05
F .05,1,10
 4.96
IfF  4.96, reject H0
If F  4.96, do not reject H0

Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
13-29
Testing the Overall Model (Part 2)
ANOVA

df SS MS F Significance F

Regression 1 2.79803 2.79803 89.092179 2.7E-06

Residual 10 0.31406 0.03141

Total 11 3.11209

SSreg 2.7980
2.7980
F 1   89.09
dfreg MSreg 0.3141 0.03141
F  10
SSerr MSerr
dferr F = 89.09 > 4.96, reject H0

Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
13-30
Point Estimation
for the Airline Cost Example

Yˆ  1.57  0.0407 X
For X  73,
Yˆ  1.57  0.040773
 4.5411 or $4,541.10

Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
13-31
Confidence Interval to Estimate Y :
Airline Cost Example
1 X  X
Yˆ  t  , n  2 S e
 0  
2

2 n SSXX
where : X 0  a particular value of X

SSXX =  X 2

 X
2

n
For X 0  73 and a 95% confidence level ,

 73  77.5
2

4.5411  2.2280.1773
1

930 
2
12
73,764 
12
 4.5411  1220
4.4191  E Y 73  4.6631
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
13-32
Confidence Interval to Estimate the
Average Value of Y for some Values of X:
Airline Cost Example

X Confidence Interval

62 4.0934 + .1876 3.9058 to 4.2810


68 4.3376 + .1461 4.1915 to 4.4837
73 4.5411 + .1220 4.4191 to 4.6631
85 5.0295 + .1349 4.8946 to 5.1644
90 5.2230 + .1656 5.0674 to 5.3986

Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
13-33
Prediction Interval to Estimate Y
for a given value of X

ˆ 1 X 0 X
Y  t  ,n  2 S e 1  
 
2

2 n SSXX
where : X 0  a particular value of X

SSXX =  X
2

X
2

Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
13-34
Confidence Intervals for Estimation

Regression Plot
6

5
Cost

4 Regression

95% CI

95% PI

60 70 80 90 100
Number of Passengers

Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
13-35
MINITAB Regression Analysis of
the Airline Cost Example
The regression equation is
Cost = 1.57 + 0.0407 Number of Passengers

Predictor Coef StDev T P


Constant 1.5698 0.3381 4.64 0.001
Number o 0.040702 0.004312 9.44 0.000

S = 0.1772 R-Sq = 89.9% R-Sq(adj) = 88.9%

Analysis of Variance

Source DF SS MS F P
Regression 1 2.7980 2.7980 89.09 0.000
Residual Error 10 0.3141 0.0314
Total 11 3.1121

Obs Number o Cost Fit StDev Fit Residual St Resid


1 61.0 4.2800 4.0526 0.0876 0.2274 1.48
2 63.0 4.0800 4.1340 0.0808 -0.0540 -0.34
3 67.0 4.4200 4.2968 0.0683 0.1232 0.75
4 69.0 4.1700 4.3782 0.0629 -0.2082 -1.26
5 70.0 4.4800 4.4189 0.0605 0.0611 0.37
6 74.0 4.3000 4.5817 0.0533 -0.2817 -1.67
7 76.0 4.8200 4.6631 0.0516 0.1569 0.93
8 81.0 4.7000 4.8666 0.0533 -0.1666 -0.99
9 86.0 5.1100 5.0701 0.0629 0.0399 0.24
10 91.0 5.1300 5.2736 0.0775 -0.1436 -0.90
11 95.0 5.6400 5.4364 0.0912 0.2036 1.34
12 97.0 5.5600 5.5178 0.0984 0.0422 0.29

Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
13-36
Pearson Product-Moment
Correlation Coefficient

SSXY
r
 SSX  SSY 


  X  X Y  Y 
  X  X   Y Y 
2 2

  X  Y 
 XY  n



  X
2

  Y 2 
Y  
2


1 r  1
2

 X n  n 
  

Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
13-37
Three Degrees of Correlation

r<0 r>0

r=0
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons.
13-38

Das könnte Ihnen auch gefallen