You are on page 1of 30

Chapter 3

Learning to Use Regression Analysis

Copyright 2011 Pearson Addison-Wesley. All rights reserved.

Slides by Niels-Hugo Blunch Washington and Lee University

Steps in Applied Regression Analysis


The first step is choosing the dependent variable this step is determined by the purpose of the research (see Chapter 11 for details)
After choosing the dependent variable, its logical to follow the following sequence:
1. Review the literature and develop the theoretical model
2. Specify the model: Select the independent variables and the functional form 3. Hypothesize the expected signs of the coefficients

4. Collect the data. Inspect and clean the data


5. Estimate and evaluate the equation 6. Document the results
2011 Pearson Addison-Wesley. All rights reserved.

3-1

Step 1: Review the Literature and Develop the Theoretical Model


Perhaps counter intuitively, a strong theoretical foundation is the best start for any empirical project
Reason: main econometric decisions are determined by the underlying theoretical model Useful starting points:
Journal of Economic Literature or a business oriented publication of abstracts

Internet search, including Google Scholar


EconLit, an electronic bibliography of economics literature (for more details, go to www.EconLit.org)
2011 Pearson Addison-Wesley. All rights reserved.

3-2

Step 2: Specify the Model: Independent Variables and Functional Form

After selecting the dependent variable, the specification of a model involves choosing the following components:
1. the independent variables and how they should be measured, 2. the functional (mathematical) form of the variables, and

3. the properties of the stochastic error term

2011 Pearson Addison-Wesley. All rights reserved.

3-3

Step 2: Specify the Model: Independent Variables and Functional Form (cont.)
A mistake in any of the three elements results in a specification error For example, only theoretically relevant explanatory variables should be included Even so, researchers frequently have to make choices also denoted imposing their priors Example: when estimating a demand equation, theory informs us that prices of complements and substitutes of the good in question are important explanatory variables But which complementsand which substitutes?

2011 Pearson Addison-Wesley. All rights reserved.

3-4

Step 3: Hypothesize the Expected Signs of the Coefficients


Once the variables are selected, its important to hypothesize the expected signs of the regression coefficients
Example: demand equation for a final consumption good First, state the demand equation as a general function: (3.2)

The signs above the variables indicate the hypothesized sign of the respective regression coefficient in a linear model
2011 Pearson Addison-Wesley. All rights reserved.

3-5

Step 4: Collect the Data & Inspect and Clean the Data
A general rule regarding sample size is the more observations the better
as long as the observations are from the same general population!

The reason for this goes back to notion of degrees of freedom (mentioned first in Section 2.4) When there are more degrees of freedom:
Every positive error is likely to be balanced by a negative error (see Figure 3.2) The estimated regression coefficients are estimated with a greater deal of precision
2011 Pearson Addison-Wesley. All rights reserved.

3-6

Figure 3.1 Mathematical Fit of a Line to Two Points

2011 Pearson Addison-Wesley. All rights reserved.

3-7

Figure 3.2 Statistical Fit of a Line to Three Points

2011 Pearson Addison-Wesley. All rights reserved.

3-8

Step 4: Collect the Data & Inspect and Clean the Data (cont.)
Estimate model using the data in Table 2.2 to get:
Inspecting the dataobtain a printout or plot (graph) of the data Reason: to look for outliers
An outlier is an observation that lies outside the range of the rest of the observations

Examples:
Does a student have a 7.0 GPA on a 4.0 scale? Is consumption negative?

2011 Pearson Addison-Wesley. All rights reserved.

3-9

Step 5: Estimate and Evaluate the Equation


Once steps 14 have been completed, the estimation part is quick
using Eviews or Stata to estimate an OLS regression takes less than a second!

The evaluation part is more tricky, however, involving answering the following questions:
How well did the equation fit the data? Were the signs and magnitudes of the estimated coefficients as expected?

Afterwards may add sensitivity analysis (see Section 6.4 for details)
2011 Pearson Addison-Wesley. All rights reserved.

3-10

Step 6: Document the Results


A standard format usually is used to present estimated regression results:
(3.3) The number in parentheses under the estimated coefficient is the estimated standard error of the estimated coefficient, and the t-value is the one used to test the hypothesis that the true value of the coefficient is different from zero (more on this later!)
2011 Pearson Addison-Wesley. All rights reserved.

3-11

Case Study: Using Regression Analysis to Pick Restaurant Locations

Background:
You have been hired to determine the best location for the next Woodys restaurant (a moderately priced, 24-hour, family restaurant chain)

Objective:
How to decide location using the six basic steps of applied regression analysis, discussed earlier?

2011 Pearson Addison-Wesley. All rights reserved.

3-12

Step 1: Review the Literature and Develop the Theoretical Model


Background reading about the restaurant industry
Talking to various experts within the firm
All the chains restaurants are identical and located in suburban, retail, or residential environments So, lack of variation in potential explanatory variables to help determine location Number of customers most important for locational decision Dependent variable: number of customers (measured by the number of checks or bills)
2011 Pearson Addison-Wesley. All rights reserved.

3-13

Step 2: Specify the Model: Independent Variables and Functional Form

More discussions with in-house experts reveal three major determinants of sales:
Number of people living near the location

General income level of the location


Number of direct competitors near the location

2011 Pearson Addison-Wesley. All rights reserved.

3-14

Step 2: Specify the Model: Independent Variables and Functional Form (cont.)
Based on this, the exact definitions of the independent variables you decide to include are:
N = Competition: the number of direct competitors within a twomile radius of the Woodys location P = Population: the number of people living within a three-mile radius of the location I = Income: the average household income of the population measured in variable P

With no reason to suspect anything other than linear functional form and a typical stochastic error term, thats what you decide to use

2011 Pearson Addison-Wesley. All rights reserved.

3-15

Step 3: Hypothesize the Expected Signs of the Coefficients

After talking some more with the in-house experts and thinking some more, you come up with the following:

(3.4)

2011 Pearson Addison-Wesley. All rights reserved.

3-16

Step 4: Collect the Data & Inspect and Clean the Data
You manage to obtain data on the dependent and independent variables for all 33 Woodys restaurants
Next, you inspect the data The data quality is judged as excellent because:
Each manager measures each variable identically All restaurants are included in the sample All information is from the same year

The resulting data is as given in Tables 3.1 and 3.3 in the book (using Eviews and Stata, respectively)
2011 Pearson Addison-Wesley. All rights reserved.

3-17

Step 5: Estimate and Evaluate the Equation


You take the data set and enter it into the computer
You then run an OLS regression (after thinking the model over one last time!) The resulting model is:

(3.5)

Estimated coefficients are as expected and the fit is reasonable


Values for N, P, and I for each potential new location are then obtained and plugged into (3.5) to predict Y
3-18

2011 Pearson Addison-Wesley. All rights reserved.

Step 6: Document the Results


The results summarized in Equation 3.5 meet our documentation requirements
Hence, you decide that theres no need to take this step any further

2011 Pearson Addison-Wesley. All rights reserved.

3-19

Table 3.1a Data for the Woodys Restaurants Example (Using the Eviews Program)

2011 Pearson Addison-Wesley. All rights reserved.

3-20

Table 3.1b Data for the Woodys Restaurants Example (Using the Eviews Program)

2011 Pearson Addison-Wesley. All rights reserved.

3-21

Table 3.1c Data for the Woodys Restaurants Example (Using the Eviews Program)

2011 Pearson Addison-Wesley. All rights reserved.

3-22

Table 3.2a Actual Computer Output (Using the Eviews Program)

2011 Pearson Addison-Wesley. All rights reserved.

3-23

Table 3.2b Actual Computer Output (Using the Eviews Program)

2011 Pearson Addison-Wesley. All rights reserved.

3-24

Table 3.3 Data for the Woodys Restaurants Example (Using the Stata Program)

2011 Pearson Addison-Wesley. All rights reserved.

3-25

Table 3.3b Data for the Woodys Restaurants Example (Using the Stata Program)

2011 Pearson Addison-Wesley. All rights reserved.

3-26

Table 3.4a Actual Computer Output (Using the Stata Program)

2011 Pearson Addison-Wesley. All rights reserved.

3-27

Table 3.4b Actual Computer Output (Using the Stata Program)

2011 Pearson Addison-Wesley. All rights reserved.

3-28

Key Terms from Chapter 3


The six steps in applied regression analysis
Dummy variable Cross-sectional data set Specification error Degrees of freedom

2011 Pearson Addison-Wesley. All rights reserved.

3-29