You are on page 1of 9

Panel Data Modeling

 «

 1

 2

 3

 4



 ...7

 next »

1. Module objective

This module attempts to explore the possibilities of panel data modeling. The model gives high stress on data structure. The modelling on such problems
usually increases model accuracy. In this section, we highlight the followings for the panel data modeling:

1. What is panel data?

2. Modelling structure

3. Fixed effect panel data model

4. Random effect panel data model

5. Balanced/ unbalanced panel data model

What is panel data or longitudinal data?

In general, data are usually fitted in the modeling in four different ways. This includes

Times series data

Cross sectional data

Pooled data

Panel data

Panel data is the cross sectional data but they observed periodically; or else, it is time series data but observed as a cross sectional set up. Panel data or
pooled or longitudinal data is nothing but an admixture of cross section and time series data that is data acquired from repeated observation over a certain
period of time. A typical panel data set has both a cross-sectional dimension and a time series dimension. In particular, the same cross-sectional units (e.g.
individuals, families, firms, cities, states) are observed over time.
Model representation:

In general, we have the following modelling structure for data analysis:

Case 1; Model with Cross Sectional Data :

For i = 1, 2,….N :

Case 2; Model with Time Series Data:

For t = 1, 2,….T

Case 3; Model with Pooled Data:

For t = 1, 2,….T

Case 3; Model with Pooled Data :

For i = 1, 2,….N & t = 1, 2,….T

Case 4; Model with Panel Data:

For i = 1, 2,….N & t = 1, 2,….T

Where, is the intercept parameters

is the slope parameters.

Let where

If fixed (constant) for all i = 1, 2, 3…. N and [i.e. zero mean and constant variance] then we can use pooled data model and the
procedure is like usual regression with OLS application.

common (homogenous) in intercept against heterogeneous in intercept

Put F - test; if [i.e. deviation from mean]

If is accepted, we can go for pooled data model. Here, the technique is used when the data is just combining cross-section data and time-series
data and this combination of data is a new set of data (called pooled data) without taking any consideration of cross-section and time-series behaviour.

If is rejected, we can go for panel data analysis; otherwise, we can go for pooled data analysis.

In the present set up, we suppose to reject the null hypothesis and moved into panel data analysis. Panel data modelling has two parts: fixed effect model
and random effect model. Panel data model has two different approaches: Fixed Effect Model and Random effect model

In the fixed effect model, intercept will change across individuals or over time or both.

In the random effect model, all individual characteristics and cross-sectional specifics are captures in the residuals. So, in REM, the residual has individual
component, time series component and both components.

The detail structure is as follows:

Case 1: Independent Regression

where
If and

Then we can estimate the model by separating its time component so that we have T regressions each having N observations.

This can be as follows:

where

where

Or else, we can have N regressions and each having T observations

This can be as follows:

where

where

Case 2: Pooled Data Approach

where

Here, we assume that (intercept) and the residuals are constant over cross sectional units and time series units.

It is, however, very rare in reality. So, we should consider model where intercepts or residuals change over time and across individual.

Case 3: Fixed Effect Model

Let

Here, variations of individuals and over time are captured in the intercepts.

The structure is as follows:

Where and are dummy

for

for for otherwise.

OLS can be applied to estimate the parameters


When and

No of parameters:

numbers; numbers; and

Degree of freedom=

To investigate whether α is constant for all I and t, we have to go for F test

Or else,

If F is rejected, FEM is better.

The fixed effect model covers the followings steps:

Fixed-effects (or within) Estimator

Each variable is demeaned (i.e. subtracted by its average)

Dummy Variable Regression (i.e. put in a dummy variable for each cross-sectional unit, along with other explanatory variables). This may cause estimation
difficulty when N is large.

First-difference Estimator: Each variable is differenced once over time, so we are effectively estimating the relationship between changes of variables.

Case 4: Random Effect Model

Here, variations of individuals and over time are captured in the residuals.

Here, random error is composed of error of individual component, error of time component and error of both.

REM is represented as follows:

Let

Where

ui is error for cross section; vt is error for time series; wit is error for both

The assumptions is that

So for REM, But for OLS (pooled data),

REM can be estimated by OLS, if

Otherwise, REM is estimated using GLS, which is of two steps


Step 1: Estimate REM by OLS and calculate RSS to estimate sample variance

Step 2: by using sample variance estimated at the step 1, use GLS to estimate the parameters of the model.

If errors are normally distributed (by assumption), MLE can be used

Thumb rule: If use FEM and if use REM

To know the FEM or REM, Hausman Specification Test can be applied.

Lecture 23 - Panel Data Modeling(Contd.)

 «

 1

 2

 3

 4


 ...6

 next »
Difference between random effect and fixed effect estimators

RE estimates are more efficient (or more precise) if αi is uncorrelated with the explanatory variables.

Otherwise, the FE estimate is consistent, while the RE estimates are inconsistent.

Difference between balanced and unbanaced panel

Balanced Panel indicates panel data with observations for the same time periods for all individuals. Otherwise, the data are unbalanced. If a panel data set is
unbalanced for reasons uncorrelated with uit, estimation consistency using FE will not be affected. The "attrition" problem: If an unbalanced panel is a result
of some selection process related to uit, then endogeneity problem is present and need to be dealt with using some correction methods. This problem cannot
be solved by just deleting the units that have missing observations for some time periods.

Difference between balanced and unbanaced panel

Balanced Panel indicates panel data with observations for the same time periods for all individuals. Otherwise, the data are unbalanced. If a panel data set is
unbalanced for reasons uncorrelated with uit, estimation consistency using FE will not be affected. The "attrition" problem: If an unbalanced panel is a result
of some selection process related to uit, then endogeneity problem is present and need to be dealt with using some correction methods. This problem cannot
be solved by just deleting the units that have missing observations for some time periods.

Important tests for panel data modelling

Hausmann test: Comparing the RE and FE estimates, if the estimates are statistically different, then the RE assumption is probably invalid. In this case FE has
to be used. Otherwise, RE is more efficient.

Breusch and Pagan test: This is to test the hypothesis that there are no random effects.

The sample problems

The determinants of FDI inflows in India:

Dependent variable: FDI

Independent variables:

Power; Education; Health; Transport; Research and Development; Domestic Investment; Profit; Risk

The modelling process

The general framework of panel data model is as follows:

The fixed effects regression model is of three different forms: within-group fixed effect model, first difference fixed effect model and least square dummy
variable (LSDV) fixed effects model.

The within-group fixed effects model is in following form:

This is known as the within groups regression model because it is explaining the variations about the mean of the dependent variable in terms of the
variations about the means of the explanatory variables for the group of observations relating to a given individual.
The first difference fixed effect model is as follows:

Here the unobserved effect is eliminated by subtracting the observations for the previous time period from the observation for the current time period, for all
time periods.

The LSDV regression model is as follows:

Here, the unobserved effect is brought explicitly into the model. Zi is considered as dummy variable, where it is equal to 1 in the case of an observation
relating to individual I and 0 otherwise. Formally, the unobserved effect is being treated as the coefficient of the individualspecific dummy variable. The

weight of represents the fixed effect on the dependent variable Yi for individual i.

It is to be noted that when the variables of interest are constant for each individual, a fixed effects regression is not an effective tool because such variables
cannot be included. So the alternative approach is the use of random effect regression model. It has two conditions. First, Z i should be drawn randomly from
a given distribution>This may well be the case if the individual observations constitute a random sample from a given population. If this is the case, the αi
may be treated as random variables, drawn from a given distribution and we can write the model is follows:

Where and

The second condition is that the Zi variables are distributed independently of all of the Xj variables. If this is not the case, α and hence u, will not be
uncorrelated with the Xj variables and the random effects estimation will be biased and inconsistent. We would have to use fixed effects estimation instead,
even if the first condition seems to be satisfied.

Results of Panel Data Model

Results of Panel Data Model


========================================================================
Test Statistics Fixed Effect Model Random Effect Model
========================================================================
Intercept 0.51 [0.58] 0.33 [0.31]

PWR 0.03 [0.02] 0.004 [0.01]


EDU -1.00 [0. 8] -0.01 [0.33]
HEA 1.42 [0.24] -0.188 [0.13]
TRA 0.52 [0.44] 0.834 [0.65]
R&D -0.56 [0.8] 3.04 [0.71]
DOI 0.01 [0.00] 0.01 [0.00]
PRO 0.03 [0.02] 0.024 [0.01]
RIS -0.06 [0.06] -0.053 [0.03]
R2 (within) 0.20 0.300
R2 (between) 0.001 0.831
R2 (overall) 0.0001 0.687
HFR 5.110

BPL 6.95

=================================================================
=======

References for further reading:

Montgomery, D. C., Peck, E. A., and G. G. Vining: Introduction to Linear Regression Analysis, Wiley India, New York, 2006.

Dielman, Terry E.: Applied Regression Analysis for Business and Economics, PWS-Kent, Boston, 1991.
Draper, N. R., and H. Smith: Applied Regression Analysis, 3d ed., John Wiley & Sons, New York, 1998.

Frank, C. R., Jr.: Statistics and Econometrics, Holt, Rinehart and Winston, New York, 1971.

Goldberger, Arthur S.: Introductory Econometrics, Harvard University Press, 1998.

Graybill, F. A.: An Introduction to Linear Statistical Models, vol. 1, McGraw- Hill, New York, 1961.

Greene, William H.: Econometric Analysis, 4th ed., Prentice Hall, Englewood Cliffs, N. J., 2000.

Griffiths, William E., R. Carter Hill and George G. Judge: Learning and Practicing Econometrics, John Wiley & Sons, New York, 1993.

Gujarati, Damodar N.: Essentials of Econometrics, 2d ed., McGraw-Hill, New York, 1999.

Hill, Carter, William Griffiths, and George Judge: Undergraduate Econometrics, John Wiley& Sons, New York, 2001.

Johnston, J.: Econometric Methods, 3d ed., McGraw-Hill, New York, 1984.

Katz, David A.: Econometric Theory and Applications, Prentice Hall, Englewood Cliffs, N.J., 1982.

Koop, Gary: Analysis of Economic Data, John Wiley & Sons, New York, 2000.

Koutsoyiannis, A.: Theory of Econometrics, Harper & Row, New York, 1973.

Maddala, G. S.: Introduction to Econometrics, John Wiley & Sons, 3d ed., New York, 2001.

Mills, T. C.: The Econometric Modelling of Financial Time Series, Cambridge University Press, 1993.

Mittelhammer, Ron C., George G. Judge, and Douglas J. Miller: Econometric Foundations, Cambridge University Press, New York, 2000.

Mukherjee, Chandan, Howard White, and Marc Wuyts: Econometrics and Data Analysis for Developing Countries, Routledge, New York, 1998.

Pindyck, R. S., and D. L. Rubinfeld: Econometric Models and Econometric Forecasts, 4th ed., McGraw-Hill, New York, 1990.

Verbeek, Marno: A Guide to Modern Econometrics, John Wiley & Sons, New York, 2000.

Wooldridge, Jeffrey M.: Introductory Econometrics, South-Western College Publishing, 2000.

FAQS (frequently asked questions):

1. In panel data

a) The same cross – sectional units are surveyed over time

b) Different cross – sectional units are surveyed over time

c) Different cross – sectional units are surveyed at a point in time

d) Cross – sectional units are surveyed in detail

e) None of the above

2. While dealing with random effects model the most appropriate procedure to be adopted is

a) OLS procedure

b) GLS procedure

c) Seemingly unrelated procedure

d) 2 stage least square procedure

e) None of the above

3. The H0 we test using Hausman statistics is that

a) FEM and REM estimators differ substantially

b) FEM and REM estimators differ do not substantially

c) FEM and REM estimators are equal to zero

d) FEM and REM estimators are not equal to zero

e) None of the above

4. Hausman test statistics follows


a) Normal distribution

b) T distribution

c) Chi‐square distribution

d) F distribution

e) None of the above

Self evaluation tests/ quizzes

1. In 1985, neither Florida nor Georgia had laws banning open alcohol containers in vehicle passenger compartments. By 1990, Florida had passed such a
law, but Georgia had not.

a) Suppose you can collect random samples of the driving-age population in both states, for 1985 and 1990. Let arrest be a binary variable equal to unity if a
person was arrested for drunk driving during the year. Without controlling for any other factors, write down a linear probability model that allows you to test
whether the open container law reduced the probability of being arrested for drunk driving. Which coefficient in your model measures the effect of the law?

b) Why might you want to control for other factors in the model? What might some of these factors be?

2. What is meant by an error components model (ECM)? How does it differ from FEM? When is ECM appropriate? And when is FEM appropriate?

3. In order to determine the effects of collegiate athletic performance on applicants, you collect data on applications for a sample of Division I colleges for
1985, 1990, and 1995.

a) What measures of athletic success would you include in an equation? What are some of the timing issues?

b) What other factors might you control for in the equation?

c) Write an equation that allows you to estimate the effects of athletic success on the percentage change in applications. How would you estimate this
equation? Why would you choose this method?

4. Suppose that, for one semester, you can collect the following data on a random sample of college juniors and seniors for each class taken: a standardized
final exam score, percentage of lectures attended, a dummy variable indicating whether the class is within the student's major, cumulative grade point
average prior to the start of the semester, and SAT score.

a) Why would you classify this data set as a cluster sample? Roughly how many observations would you expect for the typical student?

b) If you pool all of the data together and use OLS, what are you assuming about unobserved student characteristics that affect performance and attendance
rate? What roles do SAT score and prior GPA play in this regard?

c) If you think SAT score and prior GPA do not adequately capture student ability, how would you estimate the effect of attendance on final exam
performance?