Beruflich Dokumente
Kultur Dokumente
Notes
By
Edward Leamer
January 26, 1999
What is an R2?.......................................................................................................................................... 2
What is a t-value?..................................................................................................................................... 2
Examples of the effect of sample size ....................................................................................................... 3
Questions about these numbers that you might be asking yourself: ........................................................ 4
Why is the difference between the adjusted R2 and the R2 hardly noticeable, except when the sample
size is very small?............................................................................................................................. 4
Why is the adjusted R2 a U-shaped function of sample size? .............................................................. 4
Why does the standard error fall with sample size and the t-value increase? ....................................... 4
How big should be my R2 ......................................................................................................................... 4
Example: Two Multiple Regressions that Explain GDP......................................................................... 5
Which of these equations is actually better? .......................................................................................... 6
How big should my t-value be? I’m a t=1 kind of guy. ............................................................................. 6
Summary Table ........................................................................................................................................ 7
What’s an R2 and what’s a t-value? Turning Numbers into Knowledge
Page 2 of 7
What is an R2?
The R2 measures the percent of variation in the “dependent” variable that can be accounted for or
“explained” by your “independent” variables. For example, you can account for 97% of the variation in
adult heights using wingspan1 as a predictor. Using the same wingspan variable, you can account for only
40% of the variability in their weights.
The estimate represents the computer’s attempt to find the equation that best summarizes the data you are
studying. The standard error measures how sure the computer feels about its choice of estimate. The
standard error for one variable cannot be compared effortlessly with the standard error of another variable.
One reason is that units in which the variables are measured affect the standard errors. The t-value is the
ratio of the estimate divided by the standard error. It is another way of measuring how sure the computer
feels. Unlike the standard error, the t-value doesn’t depend on the units. Unlike the standard error, the t-
value can be compared across all variables.
Remember that all those numbers that come jumping out of the computer when you ask for a regression are
only answering four questions:
But there is something else that affects the answer to question 4: the sample size. Noisy experiments don’t
provide the best information, but if you run a lot of them, you can average out the noise to find the truth.
To illustrate that point I have successively trimmed more and more data from the sample and computed the
corresponding regression. The numbers are reported in Table 1 below. If all the quarterly data are
included there are 204 observations. If a little less than a decade of data are trimmed, the number drops to
170; then to 130, to 90, to 50 and last to only 10. Take a look at the numbers in the tables and the figures
below that illustrate the numbers. Think of some questions that these numbers suggest. We’ll answer
those questions next.
0 .5
0 .4
0 .3
R-squared
Adj. R-sq
0 .2
0 .1
0
0 50 100 150 200
N u m b e r o f Observations (ending in 1998:2)
0.6 10
Estimate and Standard
0.4
8
Absolute t-value
0.2
0 6 Estimate
Error
Equation 2
Take another look at Equation 1. The previous quarters’ growth of GDP, the unemployment rate
and the growth in defense expenditures together account for 14.5% of the variability of the rate of growth
of real GDP across quarters. That looks pretty bad. Equation 2 looks a lot better with an R2 of 0.9996.
But think again. The R2 of the first equation is an answer to a different question than the R2 of the second
regression. The first is asking what percentage of the historical variation in the growth rate can be
explained by these variables. The second is asking what percentage of the historical variability of real
GDP can be explained. Most of the historical variability of real GDP is just the time trend. Any variable
with a strong time trend can account for a lot of the so-called variability in the level of real GDP. The one
that is doing the job here is the previous quarter’s GDP. It gets that astronomical t-value of 394. But to
say that a good prediction of next quarter’s GDP is this quarter’s GDP plus $4810 for each unemployed
person may or may not give you a very accurate predictor of the growth of GDP. To decide, you’d have
to compute the R2 differently.
Summary Table
Low R2 High R2
Low t-value Absolute Misery – the effect of your Forget that variable. Another
favorite variable isn’t detectable variable is doing the work and giving
using these data, and your equation you a high R2. Maybe if you discard
overall sucks. one of the other variables, then your
favorite one will look better. The
ones that are correlated with your
favorite might be causing the
problem.
High t-value Oh Joy. You have found a way to Bliss – the best it can get.. But be
measure the effect of your favorite alert – your enemies are trying to
variable using these data. The other find just the right variable to include
variables in your equation are not to knock yours out of the equation.
crowding out your favorite. And think again how you defined
Maybe you are worried that your your dependent variable. What it
equation overall is not explaining means to have a high R2 depends on
much of the variability of the the variable. An R2 of 0.9 in an
dependent variable. Don’t be upset. equation explaining quarterly GDP is
This is the normal state of affairs for no big deal. You can do that using
variables defined in a way that makes the previous quarter’s GDP. An R2 of
them hard to predict, like growth 0.5 for explaining the growth of GDP
rates or changes over time. using only lagged variables is terrific.
Think about medicine as an example. It is easier to predict your daughter’s
Maybe they can find out that coffee height at age 16 than how much she
drinking has a statistically significant will grow from birthday 16 to
and even important effect on the birthday 17.
probability of heart disease, but still
not be able to predict whether or not
you will suffer from heart disease,
regardless of your coffee-drinking
habits.