Lecture 3 - Statistical Aspects of Regression

Statistical Aspects of Regression
and are only estimates of and

Key question: How accurate are these estimates?
Statistical procedures allow us to formally address this question.
What Factors Affect Accuracy of OLS Estimates?

Graphical Intuition:
Figure 5.1 (small number of data points)
Figure 5.2 (large number of data points but very scattered)
Figure 5.3 (large number of data points but not very scattered)
Figure 5.4 (large number of data points, but clustered near one value for X)
What Factors Affect Accuracy of OLS Estimates?

Consider fitting a line through the XY-plots in Figures 5.1-5.4.
You would be most confident in the line you fit in Figure 5.3
Larger number of data points + less scattering (i.e. less variability in errors) + more
variability in X = more accurate estimates.
Note: Figures 5.1, 5.2, 5.3 and 5.4 all contain artificially generated data with =0,
=1.
Figure 5.1: Very Small Sample Size
0
0
0.5
1.5
2.5
3.5
4.5
-1
-2
X
Figure 5.2: Large Sample Size, Large Error Variance

8
0
0
-2
-4
X
Figure 5.3: Large Sample Size, Small Error Variance

6
0
0
-1
X
Figure 5.4: Limited Range of X Values

10
0
0
0.5
1.5
2.5
3.5
-2
-4
X
A Confidence Interval for

Uncertainty about accuracy of the estimate can be summarised in a
confidence interval
95% confidence interval for is given by:
[ t s , + t s ]
b b
b b
tb is a critical value from the Student t-distribution --- calculated

automatically in computer packages
sb = standard error of is a measure of the accuracy of
SSR
( N 2 ) ( X X )2
i
A Confidence Interval for (cont.)

tb controls the confidence level (e.g. tb is bigger for 95% confidence than 90%).
sb varies directly with SSR (i.e. how variable the residuals are)
sb varies inversely with N, the number of data points
sb varies inversely with (X X )2 , which is related to the variance/variability of
i
X.
Note: Different computer packages label confidence intervals in different ways.
E.g. Excel labels bounds of confidence interval as Lower 95% and Upper
95%
10
Intuition of Confidence Interval

Useful (but formally incorrect) intuition: There is a 95% probability that the
true value of lies in the confidence interval.
Correct intuition: If you repeatedly use the above formula for calculating a
confidence interval, 95% of the intervals you construct will contain the true
value for .
Can choose any level of confidence you want (e.g. 90%, 99%).
11
Example: Confidence Intervals for the Data sets in Figures 5.1-5.4
Data Set
Figure 5.1
Figure 5.2
Figure 5.3
Figure 5.4
.91
1.04
1.00
1.52
90%
Confid.
Interval
[-.92,2.75]
[.75,1.32]
[.99,1.01]
[-1.33,4.36]
95%
Confid.
Interval
[-1.57,3.39]
[.70,1.38]
[.99,1.02]
[-1.88,4.91]
99%
Confid.
Interval
[-3.64,5.47]
[.59,1.49]
[.98,1.03]
[-2.98,6.02]
12
Example: The Regression of Executive Compensation on Profits

Y = executive compensation
X = profits
= .000842
95% Confidence interval: [.00061,.001075]
13
Example: The Regression of Lot Size on House Price

OLS results:
Y = 34,136 + 6.59 X ,
The OLS estimate of the marginal effect of X on Y is 6.59.

Increasing lot size by an extra square foot is associated with a $6.59 increase
in house price.
The 95% confidence interval for is [5.72,7.47].
14
We are 95% confident that the effect of lot size on house is at least $5.72 and
at most $7.47.
15
Hypothesis Testing
Test whether =0 (i.e. whether X has any explanatory power)
One way of doing it: look at confidence interval, check whether it contains zero.
If no, then you are confident 0.
16
Alternative (equivalent) way is to use t-statistic (often called t-ratio)

t=
Big values for t indicate 0.

Small values for t indicate =0.
17
Hypothesis Testing (cont.)

Q: What do we mean by big and small?
A: Look at P-value.
If P-value .05 then t is big and conclude 0.
If P-value >.05 then t is small and conclude =0.
18
Useful (but formally incorrect) intuition:

P-value measures the probability that = 0.
.05 = 5% = level of significance
Other levels of significance (e.g. 1% or 10%) occasionally used
19

(cont.)
95% Confidence interval: [.00061,.001075]

Confidence interval does not contain zero, so conclude that 0.
20
Alternatively:
t-ratio is 7.227937. Is this big?
Yes, the P-value is 5.510-10 which is much less than .05.
Hence, we conclude again that 0.
21
Jargon
The coefficient on profits is significantly different from zero.
Profits have statistically significant explanatory power for executive

compensation.
The hypothesis that = 0 can be rejected at the 5% significance level.
22
Testing on R2: The F-statistic

Test whether R2=0 (i.e. whether X has any explanatory power)
Note: In simple regression testing R2=0 and =0 are the same, but in multiple
regression they will be different.
F-statistic is a test statistic analogous to t-statistic (e.g. small values of it
indicate R2=0).
( N 2) R 2
F=
(1 R 2 )
23
Testing on R2: The F-statistic (cont.)
For test with 5% level of significance:
If P-value is > .05 conclude R2=0.
If P-value is .05 conclude R20.
Excel calls the P-value for this test Significance F
24

(cont.)
P-value = Significance F = 5.510-10.

Since P-value < .05 conclude R20.
Profits do have explanatory power for Y.
25
Chapter Summary
1. Accuracy of OLS estimates depends on number of data points, variability of the
explanatory variable and variability of the errors.
2. The confidence interval provides an interval in which you can be confident lies.
3. The width of the confidence interval depends on the same factors as affect the accuracy
of OLS estimates. In addition, the width of the confidence interval depends on the
confidence level.
4. A hypothesis test of whether =0 can be used to find out whether the explanatory
variable belongs in the regression. The P-value is a measure of how plausible the
hypothesis is.
5. If the P-value for the hypothesis test of whether =0 is less than .05 then you can reject
the hypothesis at the 5% level of significance.
26
6. If the P-value for the hypothesis test of whether =0 is greater than .05 then you cannot
reject the hypothesis at the 5% level of significance.
7. A hypothesis test of whether R2=0 can be used to investigate whether the regression
helps explain the dependent variable.
27

Lecture 3 - Statistical Aspects of Regression

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Lecture 3 - Statistical Aspects of Regression

Hochgeladen von

Copyright:

Verfügbare Formate

Statistical Aspects of Regression

and are only estimates of and

What Factors Affect Accuracy of OLS Estimates?

What Factors Affect Accuracy of OLS Estimates?

Figure 5.1: Very Small Sample Size

Figure 5.2: Large Sample Size, Large Error Variance

Figure 5.3: Large Sample Size, Small Error Variance

Figure 5.4: Limited Range of X Values

A Confidence Interval for

tb is a critical value from the Student t-distribution --- calculated

sb = standard error of is a measure of the accuracy of

A Confidence Interval for (cont.)

Intuition of Confidence Interval

Example: Confidence Intervals for the Data sets in Figures 5.1-5.4

Example: The Regression of Executive Compensation on Profits

Example: The Regression of Lot Size on House Price

The OLS estimate of the marginal effect of X on Y is 6.59.

Alternative (equivalent) way is to use t-statistic (often called t-ratio)

Big values for t indicate 0.

Hypothesis Testing (cont.)

Useful (but formally incorrect) intuition:

Example: The Regression of Executive Compensation on Profits

95% Confidence interval: [.00061,.001075]

The coefficient on profits is significantly different from zero.

Profits have statistically significant explanatory power for executive

The hypothesis that = 0 can be rejected at the 5% significance level.

Testing on R2: The F-statistic

Testing on R2: The F-statistic (cont.)

For test with 5% level of significance:

If P-value is > .05 conclude R2=0.

If P-value is .05 conclude R20.

Excel calls the P-value for this test Significance F

Example: The Regression of Executive Compensation on Profits

P-value = Significance F = 5.510-10.

Das könnte Ihnen auch gefallen