Beruflich Dokumente
Kultur Dokumente
8.1 Introduction
Regression analysis is the statistics that deal with the relationship between two or more
variables. The simplest relationship between x and y is the linear relationship:
y=β0+ β1∙x
β0 : y-intercept when x = 0.
β1 : dy/dx (slope of the line)
Y = β0+ β1∙x + ε
is the equation that relates the random response variable Y to the predictor value x. The
random variable ε is referred as the random error term, and ε ~N(0,).
The line defined by the equation “y=β0+ β1∙x” is referred as the population (or, true)
regression line. For a given value of x, the expected value of the variable Y is,
Thus, the population regression line is the line defining the mean of Y (Y) for a given
value of x. The variance of Y is,
2
Y2 = Var[β0+ β1∙x + ε] = Var[ε] =
Step 1. Provide a scatter plot for the sample set (xi , yi)
Step 2. If a linear trend is observed, estimate parameters of the model employing the
principle of least squares:
For a given sample set (xi , yi) of size n, the deviation of yi from the line y=β0+ β1∙x, is,
yi – y = yi –(β0+ β1∙x)
n
f ( 0 , 1 ) y i 0 1 x
2
i 1
The point estimates ̂ 0 are ˆ1 that minimize f ( 0 , 1 ) are called the least-squares
estimates of 0 and 1 respectively. The estimated regression line (or, the least-squares
line) is
y ˆ0 ˆ1 x
Thus,
f ˆ0 , ˆ1
2 y i ˆ0 ˆ1 xi 1 0
ˆ
0
f ˆ0 , ˆ1
2 y i ˆ0 ˆ1 xi xi 0
ˆ
1
Hence,
ˆ 1
xi x yi y S XY
xi x
2 S XX
ˆ 0
yi ˆ 1 xi y ˆ 1 x
n
S XY xi y i xi y i n
S XX xi2 xi 2 n
The computations of ̂ 0 and ˆ1 require only the summary statistics xi , y i , xi2
, xi y i .
Definition: The predicted values ŷ1, ŷ2, …ŷn are obtained through successful substitutions
of x1 , x2 , … xn into the estimated regression line:
ŷ i ˆ 0 ˆ 1 xi
SSE y i ŷ i 2 y i2 ˆ 0 y i ˆ 1 xi y i
The estimate of 2 is,
SSE
s2
n2
SST S YY y i y 2 y i2 y i 2 n
r2 = 1- SSE / SST
Definition: The sample correlation coefficient for the n sample pairs (x1,y1), (x2,y2), …,
(xn,yn), is
S xy
r
S xx S yy
Step 4. Provide residuals plot to see if yi-ŷi values are consistent with the uniformity
assumption (i.e., ε ~N(0,)).
Scatter Plot
800
600
400
y
200
0
0 500 1000 1500 2000 2500 3000 3500
x
Calculations:
30 30 30 30
xi 54879 yi 11155 xi2 122746511 yi2 5077294
i 1 i 1 i 1 i 1
30
xi yi 24778631
i 1
x 54789 / 30 1829 y 11155 / 30 371 .8
S xx xi2 xi / n 122746511 54879 2 / 30 2.2356 10 7
2
SSE yi2 ˆ0 yi ˆ1 xi yi 5077294 14.1 11155 0.19556 24778631 74187
SST S yy 9.2920 10 5
1 x x 2 1 x 18292
sŶ s 51.5
n S xx 30 2.2356 10 7
Sample
95% CI
400
y
200
0
0 500 1000 1500 2000 2500 3000 3500
x
Least squares line: y = 14.1+ 0.196x r=0.96
DATA
i xi yi ŷ Residual: ŷ –yi SŶ
1 200 114.8 53.2 61.6 20.073
2 511 115.7 114.0 1.7 17.155
3 543 140 120.3 19.7 16.864
4 758 194.3 162.3 32.0 14.978
5 814 90.6 173.3 -82.7 14.508
6 897 217.4 189.5 27.9 13.832
7 975 228.5 204.8 23.7 13.222
8 1183 201.3 245.5 -44.2 11.740
9 1261 202.8 260.7 -57.9 11.251
10 1344 318.6 276.9 41.7 10.781
11 1338 186.8 275.8 -89.0 10.813
12 1571 302.4 321.3 -18.9 9.809
13 1637 346.1 334.2 11.9 9.628
14 1702 373.4 347.0 26.4 9.499
15 1766 342.4 359.5 -17.1 9.423
16 1757 382.6 357.7 24.9 9.431
17 1912 394.1 388.0 6.1 9.441
18 2077 439.9 420.3 19.6 9.777
19 2131 429 430.8 -1.8 9.955
20 2236 482.6 451.4 31.2 10.388
21 2550 395.4 512.8 -117.4 12.242
22 2491 466.9 501.3 -34.4 11.841
23 2522 461.4 507.3 -45.9 12.049
24 2528 531.2 508.5 22.7 12.090
25 2644 569.7 531.2 38.5 12.922
26 2879 619.9 577.1 42.8 14.795
27 3005 726.8 601.8 125.0 15.879
28 3112 646.6 622.7 23.9 16.832
29 3217 643.6 643.2 0.4 17.792
30 3318 590.6 663.0 -72.4 18.734
150
residuals
100
50
0
y
-50
-100
-150
0 500 1000 1500 2000 2500 3000 3500
x
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.959249948
R Square 0.920160462
Adjusted R Square 0.91730905
Standard Error 51.47349033
Observations 30
ANOVA
df SS MS F Significance F
Regression 1 855009.2689 855009.2689 322.7034339 6.64289E-17
Residual 28 74186.56579 2649.520207
Total 29 929195.8347
Standard
Coefficients Error t Stat P-value Lower 95% Upper 95%
Intercept 14.1047458 22.020494 0.640528129 0.527037012 -31.00224201 59.2117336
X Variable 1 0.19556219 0.01088637 17.96394817 6.64289E-17 0.17326245 0.21786193