Sie sind auf Seite 1von 11

SAS Global Forum 2007 Statistics and Data Analysis

Paper 170-2007

Analyzing Time Series Cross-Sectional Data with the PANEL Procedure


Jan Chvosta and Donald Erdman, SAS Institute, Cary, NC

ABSTRACT
A brief theoretical overview of methods available to analyze time series cross-sectional (panel) data is provided. The
PANEL procedure in SAS/ETS® software is used to demonstrate the implementation of these methods in SAS®. An
empirical example is given to compare different techniques and demonstrate the PANEL procedure's capabilities.
Dynamic panel methods are discussed and compared to other techniques. Finally, the PANEL procedure is compared
to other SAS procedures, such as the MIXED procedure and the TSCSREG procedure.

INTRODUCTION
In recent years, estimation techniques that use time series cross-sectional (panel) data approaches have become
widely used. The PANEL procedure in SAS/ETS software fits classes of linear models that arise when time series and
cross-sectional data are combined. It is capable of fitting the following models:

• one-way and two-way models


• random-effects and fixed-effects models
• autoregressive and moving average models
o Parks method
o dynamic panel method (GMM)
o Da Silva method

This paper uses simulated data to compare these techniques and outline their advantages and disadvantages. The
paper starts with a brief theoretical overview of panel data methods. Several examples are given to demonstrate
these techniques and their implementation in the PANEL procedure. The PANEL procedure is then compared with
other SAS procedures.

PANEL MODELS
In this paper, the term panel refers to pooled data on time series cross-sectional bases. Typical examples of panel
data include observations on households, countries, firms, trade, etc. For example, in the case of survey data on
household income, the panel is created by repeatedly surveying the same households in different time periods
(years). The model is typically written in the following form:

y it = α + X it β +u it i = 1, K , N ; t = 1, K , T (1)

where y it is the response of household i at time t, u it = μ i + vit , μ ~ IID(0, σ μ2 ) , and vit ~ IID(0, σ v2 ).
Often, the future is correlated with the past. It is easy to imagine that the household’s income in time period t is
related to its past realizations (income in time period t–1).

y it = α + δy it −1 + X it β +u it i = 1, K , N ; t = 1,K, T (2)

Even though the dynamic nature of the model reflects the real relationship between the independent and dependent
variables more accurately, the introduction of the lagged dependent variable can pose a variety of problems. Given
the model structure, the dependent variables y it and y it −1 are functions of μ i , and estimation using ordinary least
squares (OLS) will result in biased, inefficient, and inconsistent estimates. As discussed by Baltagi (1995), other
models specifically designed for panel data also suffer from efficiency issues. This paper uses the PANEL procedure
to demonstrate issues that might arise with a model that is dynamic in nature. ODS Graphics plots are used to
demonstrate results from various models.

1
SAS Global Forum 2007 Statistics and Data Analysis

DATA GENERATING PROCESS


The following AR(1) panel data generating process (DGP) was adopted from Bond, Bowsher, and Windmeijer (2001).
One hundred cross sections, each containing six time periods, were generated. The DGP can be described as
follows:

y it = α + δy it + X it β +u it i = 1,K ,100; t = 1,K,6 (3)

where u it = μ i + vit , μ ~ N (0, σ μ2 ) , and vit ~ N (0,4). The cross-sectional variance σ μ2 was set to three
different values: 1, 16, and 36. For the first time period in each cross section, the DGP is specified as follows:

ηi
y i1 = + X it β + ei (4)
1−δ

whereei ~ N (0,4). The X vector of explanatory variables contains three variables distributed as
X 1 K X 3 ~ N (3,4). In the simulation, the following coefficient assumptions are made:
α = 0, δ = 0.4, β1 = 1, β 2 = 3, and β 3 = 5. The following SAS statements are used to generate data to fit the
model described in Equations (3) and (4):

data new;
delta = 0.4;
array x[6];
do i=1 to 100;
e_i = 4*rannor(1234);
mu_i = 36*rannor(32444);
do t=1 to 6;
do k = 1 to 6;
x[k] = 3+4*(rannor(58785));
end;
if t = 1 then y = mu_i/(1-delta) + x1 + 5*x2 +10*x3 + e_i;
else do;
v_it = 4*rannor(34454);
y = gamma * y_t1 + mu_i + x1 + 5*x2 +10*x3 + v_it;
end;
output;
y_t1 = y;
end;
end;
run;

MODEL ESTIMATION
When introduced to a new method, users are likely to compare it with other estimation frameworks and techniques
that are available. In this section, the generated data are used and several different models are estimated. We start
with a simple OLS model that ignores the time series cross-sectional nature of the data by using the REG procedure.

proc reg data = two plot(unpackpanel)=all;


model y = y_1 x1 x2 x3 /noint;
run;

The model is estimated for three different cross-sectional error specifications, and the new PLOT option is used to
obtain fit diagnostics for residuals (Figure 1).

2
SAS Global Forum 2007 Statistics and Data Analysis

Figure 1. Distribution of Residuals and Plot of Residuals from OLS Regression (PROC REG)

3
SAS Global Forum 2007 Statistics and Data Analysis

The residuals do not show any distortions, and the plot of residual values by predicted values for y resembles white
noise. These findings are not surprising since the errors in Equation (1) are set to be homoscedastic. This assumption
will be relaxed later to examine other models with heteroscedastic errors. The results from the OLS regression are
presented in Table 1.

Table 1. OLS Estimates


Parameter Estimates
Variable True Value
μi~N(0,1) μi~N(0,16) μi~N(0,36)
Y_1 0.4 0.397 0.621 0.839
(0.01) (0.01) (0.01)
X1 1 0.990 0.328 -0.312
(0.05) (0.14) (0.19)
X2 3 2.985 2.420 1.808
(0.05) (0.14) (0.19)
X3 5 5.058 4.617 4.159
(0.05) (0.14) (0.19)

It can be easily seen that as the cross-sectional effects become stronger, the OLS estimates become more biased.
Using the following statements, one-way fixed- and random-effects models are estimated using the PANEL procedure
for the same three cross-sectional error specifications:

proc panel data = two plot=all;


id i t;
model y = y_1 x1 x2 x3 /fixone ranone noint;
run;

The PLOT=ALL option is used to obtain two diagnostic panels to examine the fit of the model. The panels for the one-
way random-effects model are presented in Figures 2 and 3. The first panel was created using all 100 cross sections;
the second panel depicts only the first 10 cross sections.

Figure 2. Fit Diagnostics Panel from One-Way Random-Effects Model (PROC PANEL) – Part I

4
SAS Global Forum 2007 Statistics and Data Analysis

Figure 3. Fit Diagnostics Panel from One-Way Random-Effects Model (PROC PANEL) – Part II

There are no visible residual patterns, and the model has an overall good fit. The coefficients presented in Tables 2
and 3 are unbiased and relatively close to the true values.

Table 2. One-Way Random-Effects Estimates


Parameter Estimates
Variable True Value
μi~N(0,1) μi~N(0,16) μi~N(0,36)
Y_1 0.4 0.396 0.416 0.517
(0.01) (0.01) (0.01)
X1 1 0.996 1.008 1.006
(0.05) (0.06) (0.09)
X2 3 2.985 2.982 3.004
(0.05) (0.06) (0.09)
X3 5 5.057 5.073 5.227
(0.05) (0.06) (0.09)

Table 3: One-Way Fixed-Effects Estimates


Parameter Estimates
Variable True Value
μi~N(0,1) μi~N(0,16) μi~N(0,36)
Y_1 0.4 0.381 0.395 0.381
(0.01) (0.01) (0.01)
X1 1 1.001 0.905 1.001
(0.05) (0.06) (0.05)
X2 3 2.963 3.074 2.963
(0.05) (0.05) (0.05)
X3 5 5.010 5.222 5.010
(0.05) (0.06) (0.06)

5
SAS Global Forum 2007 Statistics and Data Analysis

Comparison of Tables 1, 2, and 3 clearly demonstrates the advantages of the one-way fixed- and random-effects
models over OLS. All three models perform well when cross-sectional effects are small; however, OLS becomes
increasingly biased as the cross-sectional effects become stronger.

Since the model specification in Equation (2) includes a lagged dependent variable, the obtained standard errors in
Tables 1, 2, and 3 are likely to be inefficient. One-way to correct for the inefficiencies is by using GMM for panel
models developed by Arellano and Bond (1991), as follows:

proc panel data = test plot(unpackpanel)=all;


id i t;
instrument depvar exogenous = (x4 x5 x6);
model y = y_1 x1 x2 x3 /gmm twostep maxband=5 nolevels noint;
run;

It can be clearly seen from Equation (2) that the dependent variable y is not exogenous, since its values depend on its
previous realizations. Arellano and Bond (1991) show that the dependent variable can still be used as one of the
instruments if properly lagged. This is accomplished by using the DEPVAR option in the INSTRUMENT statement.
The INSTRUMENT statement can include other variables that are not correlated with the error term. In this model, x4,
x5, and x6 are considered to be purely exogenous and uncorrelated with the error. In other models, it is possible that
future values of available instruments are correlated with the error term but their past and current realizations are not.
The fact that the past and present realizations are not correlated with the error enables us to use them as instruments
with the PREDETERMINED option. The following INSTRUMENT statement is used to describe a model with two
exogenous and one predetermined variables:

instrument depvar exogenous = (x4 x5) predetermined = (x6);

The Arellano and Bond method is very useful in dealing with autoregressive data. It is important to realize, however,
that using too many instruments can produce biased parameter estimates and cause computational difficulties since
the weighting matrix becomes very large. In Arellano and Bond’s original paper, only the past values of dependent
variable are used as instruments. In theory, any variables that are not correlated with the error can be used. However,
you have to make sure that the selected instruments are strong and that the model is not misspecified. Inclusion of
unnecessary instruments can be partially prevented with the MAXBAND option. Results of the GMM estimation with
x4, x5, and x6 specified as exogenous variables are presented in Table 4.

Table 4. GMM Estimates


Parameter Estimates
Variable
True Value μi~N(0,1) μi~N(0,16) μi~N(0,36)
Y_1 0.4 0.396 0.381 0.395
(0.01) (0.01) (0.01)
X1 1 0.915 1.001 0.908
(0.06) (0.05) (0.06)
X2 3 3.076 2.963 3.085
(0.05) (0.05) (0.05)
X3 5 5.221 5.010 5.222
(0.06) (0.06) (0.07)

As with the one-way random-effects model, the PLOT option can be used to produce graphical output that helps to
diagnose the fit. Sargan and m-tests are also provided to test for overidentification and autocorrelation of residuals.

Another condition that can further complicate the analysis is heteroscedasticity. The heteroscedasticity is simulated by
making the error term vit ~ N (5 + 3 * ABS ( y t −1 )). The analysis presented in Figures 1–2 and Tables 1–4 was
re-created using the new error specification. Analysis of residuals from OLS regression in Figure 4 shows the possible
presence of heteroscedasticity.

6
SAS Global Forum 2007 Statistics and Data Analysis

Figure 4. Distribution of Residuals and Plot of Residuals from OLS Regression (PROC REG) –
Heteroscedastic Error

7
SAS Global Forum 2007 Statistics and Data Analysis

Heteroscedasticity can also be seen from the Q-Q plot presented in Figure 5. Because of the deviation from
normality, you might want to consider a model with heteroscedasticity correction. Graphical output from the one-way
random-effects model presented in Figure 6 also shows the presence of heteroscedasticity. For example, the spread
of residuals is increasing with time on the residual plot in Figure 6. The PANEL procedure offers several ways to
correct for heteroscedasticity. In the one-way fixed- or random-effects model, the HCCME option can be specified.
The heteroscedasticity correction can take four different forms. For the discussion of the heteroscedasticity-consistent
covariance matrix estimator (HCCME), see Davidson and MacKinnon (1993) and MacKinnon and White (1985). If the
one-step GMM option is specified in the PANEL procedure, heteroscedasticity can be corrected by using the
ROBUST option.

Figure 5. Q-Q Plot of Residuals from OLS Regression – Heteroscedastic Error

PANEL PROCEDURE AND SAS


The new PANEL procedure enhances the features that were implemented in the TSCSREG procedure. The new
methods added include between estimators, pooled estimators, and dynamic panel estimators using GMM. The
CLASS statement creates classification variables that are used in the analysis. The FLATDATA statement allows the
data to be in a compress form. The TEST statement includes new options for Wald, Lagrange multiplier, and
likelihood ratio tests. Since the presence of heteroscedasticity can result in inefficient and biased estimates of the
variance covariance matrix in the OLS framework, several methods producing heteroscedasticity-consistent
covariance matrices (HCCME) were added. The new RESTRICT statement specifies linear restrictions on the
parameters. The PANEL procedure now produces graphical displays by using ODS Graphics. The new plots include
residual, predicted, and actual value plots, Q-Q plots, histograms, and profile plots. The OUTPUT statement enables
the user to output data and estimates that can be used in other analysis. It is typically difficult to create lagged
variables in the panel setting. If lagged variables are created in a DATA step, several programming steps including
loops are often needed. The PANEL procedure makes creating lagged values easy by including the LAG statement.
The LAG statement, depending on the lag order, can generate a large number of missing values. The PANEL
procedure offers a solution to the loss of potentially useful observations by replacing the missing values with zeros,
overall mean, time mean, or cross section mean (LAG, ZLAG, XLAG, SLAG, and CLAG statements).

8
SAS Global Forum 2007 Statistics and Data Analysis

Figure 6. Fit Diagnostics Panel from One-Way Random-Effects Model (PROC PANEL) Heteroscedastic Error –
Part I

Figure 7. Fit Diagnostics Panel from One-Way Random-Effects Model (PROC PANEL) Heteroscedastic Error –
Part II

9
SAS Global Forum 2007 Statistics and Data Analysis

The following SAS statements are used to create lagged values:

proc panel data=new;


lag y(1) / out=test;
id i t;
run;

Even though the new PANEL procedure represents a collection of powerful analytical and visual tools, it is important
to remember that other procedures available in SAS/ETS and SAS/STAT software can include models that are not
implemented in the PANEL procedure. The LOGISTIC procedure offers fixed-effects models with nonnormal errors in
panel setting. The NLMIXED procedure offers an implementation of nonlinear fixed- and random-effects models. The
GLIMMIX procedure offers the most complete alternative for both fixed- and random-effects models in linear and
nonlinear settings. Other procedures offer the same types of models. For example, it is possible to fit a two-way
random-effects model by using the MIXED procedure as follows:

proc mixed data=two method=type3;


class i t;
model y = x1 x2 x3 /solution;
random i t;
run;

The same model can be estimated using the PANEL procedure as follows:

proc panel data=two;


model y = x1 x2 x3 / rantwo vcomp = fb;
id i t;
run;

Fixed-effects models are typically easy to implement through the use of dummy variables in many SAS procedures.
The random-effects models are more complex and require specialized procedures. Methods available in the PANEL
procedure along with a list of procedure handling time series cross-sectional data are depicted in Figure 8. New
additions that are available only in the PANEL procedure are shown in green.

Figure 8. Panel Data Procedures

Panel
Procedure

One-Way Two-Way Autoregressive Between


Models Models Models Models

Between
Random Random Parks Dynamic Between
Fixed Effect Fixed Effect Time
Effect* Effect* Method Panel Groups Periods

*Includes the
following:
Fuller-Battese
Wansbeek and
Kapteyn
Wallace and Hussain
Nerlove

10
SAS Global Forum 2007 Statistics and Data Analysis

CONCLUSION
This paper demonstrated the use of the new PANEL procedure in SAS/ETS software. It used simulated data with
known parameter values to show advantages and disadvantages of different methods. Graphical displays produced
using ODS Graphics were used to diagnose the fit of different models and correct for data distortion.

It is no surprise that OLS performed relatively poorly, because it ignores the time series cross-sectional nature of
data. Using a simulation it was shown that a proper method, including one-way fixed or random effects, can correct
for the estimate bias. If heteroscedasticity is present, the PANEL procedure offers several ways to correct for it. If the
data are dynamic in nature, the PANEL procedure offers the Arellano and Bond GMM method to regain efficiency. It
is important to remember that additional tools not available in the PANEL procedure can be found in other SAS/ETS
SAS/STAT procedures. For example, the LOGISTIC procedure offers fixed-effects models with nonnormal errors.
Nonlinear models can be estimated using the NLMIXED procedure.

REFERENCES
Arellano, M. and Bond, Stephan (1991), “Some Tests of Specification for Panel Data: Monte Carlo Evidence and an
Application to Employment Equations,” Review of Economic Studies, vol. 58, no. 2, 277–297.

Bond C., Bowsher, C., and Windmeijer, F. (2001), "Criterion-Based Inference for GMM in Autoregressive Panel Data
Models,” Economics Letters, 73(3), 379–388.

Baltagi, B. H. (1995), Econometric Analysis of Panel Data, New York: John Wiley and Sons.

Davidson, Russell and MacKinnon, James G. (1993), Estimation and Inference in Econometrics, Oxford: Oxford
University Press.

MacKinnon, James G. and White, H. (1985), “Some Heteroscedasticity Consistent Covariance Matrix Estimators with
Improved Finite Sample Properties,” Journal of Econometrics, 29, 305–325.

SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS
Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies.

11

Das könnte Ihnen auch gefallen