Sie sind auf Seite 1von 27

Statistical Aspects of Regression

and are only estimates of and


Key question: How accurate are these estimates?
Statistical procedures allow us to formally address this question.

What Factors Affect Accuracy of OLS Estimates?


Graphical Intuition:
Figure 5.1 (small number of data points)
Figure 5.2 (large number of data points but very scattered)
Figure 5.3 (large number of data points but not very scattered)
Figure 5.4 (large number of data points, but clustered near one value for X)

What Factors Affect Accuracy of OLS Estimates?


Consider fitting a line through the XY-plots in Figures 5.1-5.4.
You would be most confident in the line you fit in Figure 5.3
Larger number of data points + less scattering (i.e. less variability in errors) + more
variability in X = more accurate estimates.
Note: Figures 5.1, 5.2, 5.3 and 5.4 all contain artificially generated data with =0,
=1.

Figure 5.1: Very Small Sample Size

0
0

0.5

1.5

2.5

3.5

4.5

-1

-2
X

Figure 5.2: Large Sample Size, Large Error Variance


8

0
0

-2

-4
X

Figure 5.3: Large Sample Size, Small Error Variance


6

0
0

-1
X

Figure 5.4: Limited Range of X Values


10

0
0

0.5

1.5

2.5

3.5

-2

-4
X

A Confidence Interval for


Uncertainty about accuracy of the estimate can be summarised in a
confidence interval
95% confidence interval for is given by:
[ t s , + t s ]
b b
b b

tb is a critical value from the Student t-distribution --- calculated


automatically in computer packages

sb = standard error of is a measure of the accuracy of

SSR
( N 2 ) ( X X )2
i

A Confidence Interval for (cont.)


tb controls the confidence level (e.g. tb is bigger for 95% confidence than 90%).
sb varies directly with SSR (i.e. how variable the residuals are)
sb varies inversely with N, the number of data points
sb varies inversely with (X X )2 , which is related to the variance/variability of
i
X.
Note: Different computer packages label confidence intervals in different ways.
E.g. Excel labels bounds of confidence interval as Lower 95% and Upper
95%

10

Intuition of Confidence Interval


Useful (but formally incorrect) intuition: There is a 95% probability that the
true value of lies in the confidence interval.
Correct intuition: If you repeatedly use the above formula for calculating a
confidence interval, 95% of the intervals you construct will contain the true
value for .
Can choose any level of confidence you want (e.g. 90%, 99%).

11

Example: Confidence Intervals for the Data sets in Figures 5.1-5.4

Data Set
Figure 5.1
Figure 5.2
Figure 5.3
Figure 5.4

.91
1.04
1.00
1.52

90%
Confid.
Interval
[-.92,2.75]
[.75,1.32]
[.99,1.01]
[-1.33,4.36]

95%
Confid.
Interval
[-1.57,3.39]
[.70,1.38]
[.99,1.02]
[-1.88,4.91]

99%
Confid.
Interval
[-3.64,5.47]
[.59,1.49]
[.98,1.03]
[-2.98,6.02]

12

Example: The Regression of Executive Compensation on Profits


Y = executive compensation
X = profits

= .000842
95% Confidence interval: [.00061,.001075]

13

Example: The Regression of Lot Size on House Price


OLS results:

Y = 34,136 + 6.59 X ,

The OLS estimate of the marginal effect of X on Y is 6.59.


Increasing lot size by an extra square foot is associated with a $6.59 increase
in house price.
The 95% confidence interval for is [5.72,7.47].
14

We are 95% confident that the effect of lot size on house is at least $5.72 and
at most $7.47.

15

Hypothesis Testing
Test whether =0 (i.e. whether X has any explanatory power)
One way of doing it: look at confidence interval, check whether it contains zero.
If no, then you are confident 0.

16

Alternative (equivalent) way is to use t-statistic (often called t-ratio)


t=

Big values for t indicate 0.


Small values for t indicate =0.

17

Hypothesis Testing (cont.)


Q: What do we mean by big and small?
A: Look at P-value.
If P-value .05 then t is big and conclude 0.
If P-value >.05 then t is small and conclude =0.

18

Useful (but formally incorrect) intuition:


P-value measures the probability that = 0.
.05 = 5% = level of significance
Other levels of significance (e.g. 1% or 10%) occasionally used

19

Example: The Regression of Executive Compensation on Profits


(cont.)

95% Confidence interval: [.00061,.001075]


Confidence interval does not contain zero, so conclude that 0.

20

Alternatively:
t-ratio is 7.227937. Is this big?
Yes, the P-value is 5.510-10 which is much less than .05.
Hence, we conclude again that 0.

21

Jargon

The coefficient on profits is significantly different from zero.

Profits have statistically significant explanatory power for executive


compensation.

The hypothesis that = 0 can be rejected at the 5% significance level.

22

Testing on R2: The F-statistic


Test whether R2=0 (i.e. whether X has any explanatory power)
Note: In simple regression testing R2=0 and =0 are the same, but in multiple
regression they will be different.
F-statistic is a test statistic analogous to t-statistic (e.g. small values of it
indicate R2=0).

( N 2) R 2
F=
(1 R 2 )

23

Testing on R2: The F-statistic (cont.)

For test with 5% level of significance:

If P-value is > .05 conclude R2=0.

If P-value is .05 conclude R20.

Excel calls the P-value for this test Significance F

24

Example: The Regression of Executive Compensation on Profits


(cont.)

P-value = Significance F = 5.510-10.


Since P-value < .05 conclude R20.
Profits do have explanatory power for Y.

25

Chapter Summary
1. Accuracy of OLS estimates depends on number of data points, variability of the
explanatory variable and variability of the errors.
2. The confidence interval provides an interval in which you can be confident lies.
3. The width of the confidence interval depends on the same factors as affect the accuracy
of OLS estimates. In addition, the width of the confidence interval depends on the
confidence level.
4. A hypothesis test of whether =0 can be used to find out whether the explanatory
variable belongs in the regression. The P-value is a measure of how plausible the
hypothesis is.
5. If the P-value for the hypothesis test of whether =0 is less than .05 then you can reject
the hypothesis at the 5% level of significance.

26

6. If the P-value for the hypothesis test of whether =0 is greater than .05 then you cannot
reject the hypothesis at the 5% level of significance.
7. A hypothesis test of whether R2=0 can be used to investigate whether the regression
helps explain the dependent variable.

27

Das könnte Ihnen auch gefallen