You are on page 1of 13

Department of Economics

College of Management and Economics


Visayas State University
ViSCA, Baybay City, Leyte

Application of Regression Analysis Using Stata in


Estimating Supply Elasticity of Crude Oil in United
Arab Emirates (UAE)

CHRISTOPHER A. LLONES

Presented to:
Dr. Moises Neil V. Serio

As Partial Requirements in
Econometrics
st
1 Semester S.Y. 2015-2016
SEPTEMBER 2015

Table of Contents
Table of Contents
Introduction
Data Collection
Data Analysis
Choice of Model and Variables
Scope and Limitation
Regression Analysis
Summary and Descriptive Statistics 0f the Model
Modifying and Organizing the Variables
Correlation
Regression
Regression Diagnostic Tests
Test for the Normality of Residuals
Test for Heteroscedasticity
Test for Multicollinearity
Test for Specification Error
Test for Autocorrelation
Prais-Winsten Regression
Findings
References

Page
i
1
1
1
1
1
2
2
2
3
4
4
4
5
6
7
7
8
8
10

Introduction
United Arab Emirates (UAE) is a member of the Organization of Petroleum Exporting
Country (OPEC) and the country is located at the Middle East. The country has a huge
production of crude oil which made UAE the fourth largest supplier of crude oil amounting to
12% of the total world supply of crude oil based from Energy Supply Security 2014 of
International Energy Authority (IEA). Crude oil is a non-renewable resource found in natural
underground reservoir and it has no close substitute yet found making its demand inelastic in the
side of consumer and supply elasticity also is quite inelastic since market of oil is monopolistic
in nature.
This paper aimed to apply analysis in regression to estimate supply elasticity of crude oil
of United Arab Emirates (UAE) as a function of real oil price of crude oil, production of crude
oil advanced by 1 year and export of crude oil lagged 1 year.
Data Collection
The data used in this paper were collected from the website of OPEC and World Bank.
Data Analysis
The study used descriptive statistics to describe data used and Stata to conduct regression
analysis and diagnostic tests in estimating the coefficient of supply elasticity in crude oil.
Choice of Model and Variables
The study used a basic double-log model with lag and lead variable to estimate the
coefficient of elasticity of crude oil. The dependent variable is the annual export of crude oil of
UAE to represent supply outside the country. Explanatory variables were real crude oil price,
production of crude oil advanced by 1 year and export of crude oil lagged by 1 year. A variable
has been lagged by a year to account the effect of previous supply of the country at present
exportation while a variable has been advanced by a year to account future anticipation in the
market of oil by the exporter which can affect present willingness to supply.
Scope and Limitation
The data used a time series data from 1960-2014 collected from OPEC and World Bank.
This paper focused primarily in conducting and applying regression analysis and gave less
elaboration in discussing the implication of the estimates generated from the model. The author
discourages that the model would be used in any policy recommendations. This papers primary
objective was only to apply methods in regression using Stata as the statistical software.

Regression Analysis
The model was based from export of crude oil=f (real oil price, production of crude oil
advanced by a year, export of crude oil lagged by 1year). This can be expressed as;
CrudeXport= 0 + 1 RealOilPrice + 3 LeadProd + 3 LagXport +

Summary and Descriptive Statistics of the Model


. summarize
Variable

Obs

Mean

year
CrudeXport
RealOilPrice
CrudeProd

55
35
43
53

1987
7.634224
3.024035
7.165557

Std. Dev.
16.02082
.3228516
.5816263
.9962443

Min

Max

1960
6.975414
2.257588
2.639057

2014
8.161945
4.23931
7.936303

. misstable summarize, all

Variable
year
CrudeXport
RealOilPrice
CrudeProd

Obs=.
20
12
2

Obs<.
Obs>.

Obs<.

Unique
values

Min

Max

55
35
43
53

55
34
42
51

1960
6.975414
2.257588
2.639057

2014
8.161945
4.23931
7.936303

Using the command summarize the stata has provided a summary of the variable where
it shows number of observation, mean, the standard deviation and the minimum and maximum
value of the variables. The starting year is 1960 until 2014 based from min. and max. of the
variable year in the summary table which has 55 observations. Then, if number of observation is
below 55 the variables has missing observations. Using the command misstable summarize, all
stata will generate table summarizing number of missing observation (obs=.), number of
observation (obs<.) where observation is less than missing values since stata treats missing
observations as large positive values. Therefore, the model has 20, 12 and 2 missing values in
export of crude oil, real oil price and crude production, respectively.

Modifying and Organizing the Variables


Since the model will be in a double-log form, both the explained and the explanatory
variables will be transformed into logarithmic form. Using the expression replace variable,
ln(variable) the variables will be in log form without changing the variables name.

7
. d CrudeXport RealOilPrice CrudeProd
variable name
CrudeXport
RealOilPrice
CrudeProd

storage
type
int
float
int

display
format

value
label

%8.0g
%9.0g
%8.0g

variable label
Xport Crude
Real Oil Price
Crude Production

. replace
CrudeXport=ln( CrudeXport)
CrudeXport was int now float
(35 real changes made)
. replace RealOilPrice=ln( RealOilPrice)
(43 real changes made)
. replace CrudeProd=ln( CrudeProd)
CrudeProd was int now float
(55 real changes made, 2 to missing)

After transforming variables into a log form the storage type will change into float as
shown by the command d (short for describe) which shows storage type before the variable
was transformed into log form. Since the data will be in a time series, the command tsset was
used to tell stata that the data will be in time-series data and also we could use lag and lead
options to generate a lag and lead variables which can only be used if the data would be in timeseries.
. tsset year, yearly
time variable:
delta:

year, 1960 to 2014


1 year

. gen lagCrudeXport=L1.CrudeXport
(21 missing values generated)
. gen leadCrudeProd=F1.CrudeProd
(2 missing values generated)

The option yearly would tell stata that the time variable is annually. Then the time-series
operator L. (for lag) and F. (for lead) can be used. The number 1 means that export in crude
will be lagged by 1 year and crude production will be advance by a year. This is to capture the
researchers hypothesis that expected production and past exportation will affect the amount to
be supplied at present aside from price.

Correlation
Before regression, it would be useful to determine the possible associations among the
variables in the model. The command pwcorr would perform a pairwise correlation.
. pwcorr CrudeXport RealOilPrice CrudeProd lagCrudeXport leadCrudeProd
CrudeX~t RealOi~e CrudeP~d lagCru~t leadCr~d
CrudeXport
RealOilPrice
CrudeProd
lagCrudeXp~t
leadCrudeP~d

1.0000
0.7437
0.9844
0.9551
0.9681

1.0000
0.6701
0.7250
0.6385

1.0000
0.9681
0.9855

1.0000
0.9168

1.0000

The result of the pairwise correlation means that all the independent variable has a
positive association with the independent variable. The magnitude of associations among
variables are quite strong since the coefficient is close to 1.

Regression
Using the basic double-log form of the model;
CrudeXport= 0 + 1 RealOilPrice + 3 LeadProd + 3 LagXport +
The independent variable would be regressed by its explanatory variables using the regress
command in stata.
. regress CrudeXport RealOilPrice leadCrudeProd lagCrudeXport
Source

SS

df

MS

Model
Residual

3.20418579
.096579859

3
29

1.06806193
.00333034

Total

3.30076565

32

.103148927

CrudeXport

Coef.

RealOilPrice
leadCrudeP~d
lagCrudeXp~t
_cons

.0580351
.6313507
.357686
-.0604724

Std. Err.
.0258407
.0851185
.0882998
.309896

t
2.25
7.42
4.05
-0.20

Number of obs
F(
3,
29)
Prob > F
R-squared
Adj R-squared
Root MSE
P>|t|
0.032
0.000
0.000
0.847

=
=
=
=
=
=

33
320.71
0.0000
0.9707
0.9677
.05771

[95% Conf. Interval]


.005185
.4572639
.1770926
-.6942809

.1108853
.8054375
.5382794
.573336

Based from the F-test which measures the overall significance of the model, the model is
significant at 1% significance level since the p-value is less than 0.01 margin error. This implies
that at least one of the explanatory variables has able to explain the variability in crude export.
The explanatory variables have able to explain the variability in the exportation of crude oil by
97% based from the R-square. Based also from a t-test that tests if the individual independent
variables have linear relationship with the dependent variable, the independent variables:
RealOilPrice, leadCrudeProd and lagCrudeXport are significant at 5% and 1%. However, before
making an inference out of the results in this regression, the model must undergo a diagnostic
test to determine if the coefficients are unbiased and p-values are valid.

Regression Diagnostic Tests


Test for the Normality of Residuals
Although normality is not required in order to obtain unbiased estimate of the regression
coefficients however, normality is required for valid hypothesis testing, that is the normality
assumptions assures that the p-value for t-test and f-test will be valid. A graphical approach can
be used to test the normality of residuals using the command kdensity and pnorm where
kdensity stands for kernel density estimate. After regression the command predict will create a
variable for residuals.

. predict r, residual
(22 missing values generated)

. pnorm r

. kdensity r, normal

Using
result

the
in

kernel density estimate and normal probability (pnorm), it shows a slight deviation from normal.
Nonetheless, the residuals were quite close to a normal distribution. In order to have a clear
result if the residuals are normally distributed, a Shapiro-Wilk test for normality will be used
using the command swilk.
. swilk r
Shapiro-Wilk W test for normal data
Variable

Obs

33

0.97667

0.796

z
-0.473

Prob>z
0.68200

The null hypothesis of Shapiro-Wilk test is that the distributions are normal. Based from
the result, it fails to reject the null hypothesis and accept that residuals are normally distributed.

Test for Heteroscedasticity


According to Wooldridge (2009), one of the main assumptions of Ordinary Least Square
(OLS) regression is the homogeneity of the variance of the residuals. If the variance of the
residuals were not homogeneous then the residual variance is heteroscedastic. Presence of
heteroscedasticity does not affect the unbiasedness of the estimates but can cause invalidity of
inference. The command rvfplot, yline(0) will generate a graphical estimate to detect
heteroscedasticity.
. rvfplot, yline(0)

It is quite difficult to trace a pattern if heteroscedasticity is present using the plot above
since the number of points in not enough to established a good pattern. However, it can be

10

roughly estimated that heteroscedasticity is not present since the data points is not quite
narrowing to the right. Using White test and Breusch-Pagan test it can be concluded if
heteroscedasticity is present using the p-value. The command estat imtest and estat hettest is
for White test and Breusch-Pagan test, respectively.
. estat imtest
Cameron & Trivedi's decomposition of IM-test
Source

chi2

df

Heteroskedasticity
Skewness
Kurtosis

7.90
2.00
0.85

9
3
1

0.5440
0.5723
0.3552

Total

10.76

13

0.6311

. estat hettest
Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
Ho: Constant variance
Variables: fitted values of CrudeXport
chi2(1)
Prob > chi2

=
=

1.19
0.2756

The p-values of the tests are against the null hypothesis that the variance of the residuals
are homogeneous or homoscedastic. Since the p-value for both test is large and not significant
then, the test fails to reject the null hypothesis and accept that the variance of the residuals is
homogeneous or homoscedastic. If the results reject the null hypothesis, the option vce(robust)
in the regression will be used to come up with an estimate of the coefficients adjusted for the
presence of heteroscedasticity.

Test for Multicollinearity


When variables are linear combinations of other independent variables then the problem
of multicollinearity exists. As the degree of multicollinearity increases, the regression model
estimate of the coefficient become unstable and the standard error for the coefficients can get
wildly inflated (Chen et.al., 2003). Using the command vif after regression, stata can detect if
multicollinearity among variables exists.
. vif
Variable

VIF

1/VIF

lagCrudeXp~t
leadCrudeP~d
RealOilPrice

7.09
6.27
1.96

0.140980
0.159400
0.510924

Mean VIF

5.11

The rule of thumb states that vif with greater than 10 and tolerance (1/vif) less than 0.1
shows presence of multicollinearity. Then the result for variance inflation factor (vif) and
tolerance here is fine then, the variables is not a near perfect linear combination of the other or
the variables is not capturing the same thing.

11

Test for Specification Error


A model specification error occurs when one or more relevant varables ae omitted from
the model or one or more irrelevant variable are included in the model (Chen et.al., 2003). The
linktest command detect that the test should not find any additional significant independent
variable in the model.
. linktest
Source

SS

df

MS

Model
Residual

3.20427346
.09649219

2
30

1.60213673
.003216406

Total

3.30076565

32

.103148927

CrudeXport

Coef.

_hat
_hatsq
_cons

.7272369
.0181491
1.022852

Std. Err.
1.651505
.1098676
6.196664

Number of obs
F( 2,
30)
Prob > F
R-squared
Adj R-squared
Root MSE

P>|t|

0.44
0.17
0.17

0.663
0.870
0.870

=
=
=
=
=
=

33
498.11
0.0000
0.9708
0.9688
.05671

[95% Conf. Interval]


-2.645586
-.2062304
-11.63242

4.10006
.2425286
13.67813

The test creates two variables the _hat and _hatsq, the _hatsq should not be significant so
that the predictor of our model is specified correctly. The _hat is the variable of prediction and
_hatsq is the variable of the squared prediction. The primary concern in this test is the test for
_hatsq. Based from the link test _hatsq is not significant and it fails to reject the assumptions that
the model is specified correctly.
The ovtest command shows if the model has omitted a variable that is essential in the
model and supposedly be included in the model.
. ovtest
Ramsey RESET test using powers of the fitted values of CrudeXport
Ho: model has no omitted variables
F(3, 26) =
0.82
Prob > F =
0.4969

The test fails to reject the null hypothesis that the model has no omitted variables, then
there are no omitted variables in the model.
Lastly, using a lag and lead in time series data is prone to autocorrelation. The command
estat bgodfrey for Breusch-Godfey and estat dwatson for Durbin-Watson test for serial
correlation of the error term or disturbance.
. estat bgodfrey
Breusch-Godfrey LM test for autocorrelation
lags(p)
1

chi2

df

4.867

Prob > chi2


0.0274

H0: no serial correlation


. estat dwatson
Durbin-Watson d-statistic(

4,

33) =

2.55771

The two tests rejected the null hypothesis of no serial correlation, then the eror term is
serially correlated. Using the Prais-Winsten and Cochrane-orcutt regression, according to the

12

stata manual prais uses the generalized least-squares method to estimate the parameters in a
linear regression model in which the errors are serially correlated. Specifically, the errors are
assumed to follow a first-order autoregressive process. Using the Prais-Winsten regression, a
new estimates of the coefficient can be obtain adjusted for autocorrelation.
. prais CrudeXport RealOilPrice lagCrudeXport leadCrudeProd, rhotype(theil)
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration

0:
1:
2:
3:
4:
5:

rho
rho
rho
rho
rho
rho

=
=
=
=
=
=

0.0000
-0.2901
-0.3060
-0.3064
-0.3064
-0.3064

Prais-Winsten AR(1) regression -- iterated estimates


Source

SS

df

MS

Model
Residual

14.0513271
.084277236

3
29

4.6837757
.002906112

Total

14.1356043

32

.441737635

CrudeXport

Coef.

RealOilPrice
lagCrudeXp~t
leadCrudeP~d
_cons

.0497158
.3543518
.6477729
-.1344191

rho

-.3063733

Std. Err.
.0190028
.0688691
.0662351
.2252445

t
2.62
5.15
9.78
-0.60

Number of obs
F( 3,
29)
Prob > F
R-squared
Adj R-squared
Root MSE
P>|t|
0.014
0.000
0.000
0.555

=
33
= 1611.70
= 0.0000
= 0.9940
= 0.9934
= .05391

[95% Conf. Interval]


.0108507
.2134987
.5123069
-.5950959

.0885809
.4952049
.7832388
.3262576

Durbin-Watson statistic (original)


2.557710
Durbin-Watson statistic (transformed) 2.127848

In summary, the model passed the test in normality of residuals, heteroscedasticity,


multicollinearity, linktest and omitted variable test. The model has failed to pass the test in
autocorrelation then, using the Prais-Winsten regression adjusted for the presence of
autocorrelation a new estimate of the coefficient was obtained.

Findings
The p-value of the F-test is less than the marginal error of 0.01, the model is significant at
1%. We have evidence to say that at least one of the explanatory variables has able to explain the
variability of crude oil export of UAE. Based from the R-square the independent variables has
able to explain the variability of crude oil export by 99%. Furthermore, all the explanatory
variables are significant by 5% and 1% based from the p-value of the t-test. Since the model had
undergone diagnostic tests and estimates of the coefficients are adjusted in the presence of
autocorrelation, we can now make inferences based from the estimates of the coefficients.
Based from the classical law of supply it is expected that real oil price should have a
positive sign, as well as for future production of crude oil and past exportation of the country.
The coefficient of RealOilprice is the supply elasticity of crude oil for United Arab Emirates. The
elasticity of supply at 0.0497 is inelastic which means that exportation of crude oil is not
responsive to changes in price of crude oil. Crude oil has no close substitute then the demand for
this good is inelastic. United Arab Emirate is a member of the Organization of Petroleum
Exporting Country (OPEC) which has the power of a monopoly to set prices then it coincides

13

with the estimates that the elasticity of supply of crude oil would be inelastic. A 1% decrease in
price would only decrease exportation by 0.049% since prices is dictated by the OPEC itself. A
percent increase in the previous exportation of crude oil of UAE will increase its present
exportation by 0.354 percent, the percentage of increase in present is lower than the previous
because sellers kept supply at low level to maintain higher price level and also crude oil are nonrenewable resources. Lastly, if anticipated production would increase by 1% exportation of crude
oil will increase by 0.67%. The explanation is quite straightforward, when production of the
good increases sellers has more to supply.

14

References
Chen, X., Ender, P., Mitchell, M. and Wells, C. (2003). Regression with Stata, from
http://www.ats.ucla.edu/stat/stata/webbooks/reg/default.htm
INTERNATIONAL ENERGY AGENCY (IEA), 2014. Energy Supply Security 2104: Emergency
Response of IEA Countries, pp. 502-510
ORGANIZATION of the PETROLEUM EXPORTING COUTTRIES (OPEC) 2015. OPEC
Annual Statistical Bulletin- 50th Edition.
WOOLDRIDGE, JEFFREY, 2009. Introductory Econometrics, Fourth Edition, p. 339-435