Sie sind auf Seite 1von 9

1

Demand Estimation by Using Regression Analysis



Regression Analysis a statistical method used to establish a relationship between a variable
(Dependent Variable) and other factors that will affect it (Independent Variables).

This relationship can be expressed as a functional form:

Q = a
0
+ a
1
A + a
2
B + a
3
C

Demand Estimation for a product or service using regression analysis is important in the
business world especially to the corporate executives and managers because it will enable
them to make reasonable forecast for their goods and services in the near future. The
manager can narrow down those factors that are important in influencing their sales and
thereby formulate appropriate strategies or policies to achieve their management objectives.

The actual process of Regression Analysis can be very complex but it can be summarized
into FOUR important steps:
1. Model Specification: Set the objective and identify the important variables which
have influence on the dependent variable.
2. Data collected for all the variables specified.
3. Choice of a function form
e.g. Linear or non-linear form
4. Estimation and interpretation of results.

1. Model Specification
If we want to study the factors affecting the demand for automobiles (Qx) in the country, we
must identify the most important variables that are believed to affect the demand for
automobiles
e.g. a) Price of the automobile (Px)
b) Per capita income (Yc)
c) No. of working population (L)
d) Rate of interest, etc (I)
Qx = f (Px, Yc, L, I)

2


2. Data collection on the variables.
2 types of data:
a) Time Series Data
Data is collected for each variable over time (yearly, quarterly, monthly or daily,
etc)

b) Cross-Sectional Data
Data are collected for same time period but from different section or geographical
area of the society.

Types of data to be used depend on the availability of data.
a) Primary data Data collected from the field through market survey, sampling, &
etc.

b) Secondary data These are published data by relevant authority such as
Statistical Department, Economic Reports, etc.

3. Specifying the form of Equation.
i) The simplest model to deal with and the one which is often also the most realistic
is the linear model.

e.g. Qx = a
0
+

a
1
Px + a
2
Y + a
3
L + a
4
I + ..+ e
a
0
, a1,., a4 are parameters (coefficients) to be estimated
e = disturbance term or error term

ii) Non- Linear model
Sometimes a non-linear form may be the data better than a linear equation.

Qx = a
0
Px
1
.Yc
2
. L
3
. I
4
(Power Function)

4. Testing the (Econometric) Result
To evaluate the regression results several statistics are examined.
a) The sign of each estimated coefficient must be checked to see if it conforms to
what is expected on the theoretical grounds.
b) Coefficient of Determination, R
2

c) t tests (coefficient)
d) Durbin-Watson statistics, etc.
e) The F-statistics (F-stats)

3
Note : The statistical procedure in solving Multiple Regression Problems can be very
complicated. Fortunately there are many computer softwares available to achieve our
objective.
i.e TSP (Time-Series Processor) or SPSS can be used to solve our problems.
4
REGRESSION ANALYSIS

It describes the way in which one variable is related to another. Regression analysis derives
an equation that can be used to estimate the unknown values of one variable on the basis of
known values of another variable.

(a) Simple Regression Analysis

Y = a + bX where Y is sales volume & X is advertising expenditure
Example 1
(Taken from ECO556 Manual Table 4.1, page 136 )

Year Sales (Y)
(million dollars)
Advertising Expenditure (X)
(million dollars)
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
44
58
48
46
42
60
52
54
56
40
10
13
11
12
11
15
12
13
14
9

The results from computer print out :

LS// Dependent variable is SAL
SMPL range 1997 -2006
Number of observation 10
Variable Coefficient Std. Error T-Stat
(t_computed
value)
2-Tail Sig.

C
ADV
7.6000000
3.5333333
6.332345
0.5222813
1.2001912
6.751919
0.264
0.000

R-squared 0.851212 Mean of dependent var 50.00000
Adjusted R-squared 0.832614 S.D of dependent var 6.992059
S.E. of regression 2.860653 Sum of squared resid 65.46667
Durbin-Watson stat 1.224915 F-statistic 45.76782
Log likelihood -23.58417

^ ^ ^
Y = a + bX

^ ^ ^
=> Y = 7.60 + 3.53X
5
(b) Multiple Regression Analysis

Y = a
1
+ b
1
X
1
+ b
2
X
2


where Y is sales volume , a
1
is the intercept
X
1
is advertising expenditure , b
1
is the Y/X
1
, marginal effect of adv on sales
X
2
is price of the product , b
2
is the Y/X
2
, marginal effect of price on sales

Example 2
(Taken from ECO556 Manual Table 4.3, page 141 )

Year Sales (Y)
(million dollars)
Advertising
Expenditure (X1)
(million dollars)
Price
(X2)
(million dollars)
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
44
58
48
46
42
60
52
54
56
40
10
13
11
12
11
15
12
13
14
9
1
1.2
2
1.8
2.1
0.8
1.4
2.0
1.5
1.0

The result from computer print out :

LS// Dependent variable is SAL
SMPL range 1997 - 2006
Number of observation 10
Variable Coefficient Std. Error T-Stat
(t
computed
value)
2-Tail Sig.

C
ADV
P
11.60403
3.4936051
-2.3836921
6.9633945
0.5078770
1.9495316
1.6665152
6.8788413
-1.2226999
0.140
0.000
0.261

R-squared 0.877397 Mean of dependent var 50.00000
Adjusted R-squared 0.842367 S.D of dependent var 6.992059
S.E. of regression 2.776058 Sum of squared resid 53.94549
Durbin-Watson stat 1.41 F-statistic 25.04734

^ ^ ^ ^
Y = a
1
+ b
1
X
1
+ b
2
X
2


^ ^ ^ ^
=> Y = 11.60 + 3.49X
1
- 2.38X
2

6
Evaluation of Results (Computer Printouts)

These are the importance statistical results should be interpreted:

a. The sign of each estimated coefficient
b. Coefficient of determination (R
2
)
c. Standard error of estimate (Se)
d. The t-statistics (t-stats)
e. The F-statistics (F-stats)

Interpretation :
a. The sign of each estimated coefficient must be checked to see if it conforms to what is
expected on the theoretical grounds.
^ ^ ^
From Example 1: Y = 7.6 + 3.53X

The estimated function show positive value (+ 3.53) , so it conforms to the expected
economic theory. If we spend $1 on Advertisement (X) then the Sales(Y) will
increase by 3.53 units.

b. Coefficient of determination (R
2
)
The value of R
2
ranges from 0 to 1

R
2
= 0 (it shows that none of the independent variables explain the changes in the
dependent variable)
R
2
= 1 (it shows that all the changes in the dependent variable is explained by the
variation in the independent variables)
R
2
= 0.85 (it shows that 85% of the changes in the dependent variables is explained by
the variation in the independent variables, advertising expenditure. The other
15% cannot be explained by the regression analysis. This may be due to the
omission of some important independent variables.)





7
c. Standard error of estimate (SEE)
It is a measure of dispersion of data points from the line of best fit (regression line).
Actual points do not lie on the regression line but are dispersed above and below the
line. Thus, the value predicted by regression line will be subjected to error.
Therefore, the SEE measures the probable error in the predicted value.

For example, data from table 4.1, when the advertising expenditure is $9 the sales is
$40. If we use the regression results, the sales is $39.37. Therefore the value
predicted will have an error.
The std. error of estimation can be calculated by using the following formula:

n
SEE = (Y t Y)
2

t=1
n - k

SEE is useful to estimate the range within which the dependent variable will lie at a
specified probability. At 95% probability the dependent variable will lie in the
predicted interval of:

Y + t
n k
* SEE


Where Y is the predicted value of dependent value based on the regression,
n k is the degree of freedom (df), it is used to get the critical value for
students distribution, n is the number of observation and k is the number of
coefficient estimated.







8
Example:
SEE = 2.86 At 95% confidence interval of sales when Adv. Exp. (X) = 9
and
Y = 39.37 then Y + (t
n k
)* SEE
=> 39.37 + (2.306)(2.86)
39.37 + 6.595
Thus, at 95% C.I. when adv. Exp. is $9 million, the range of Sales from
$32.78 to $45.97 million

d. T-Statistics
The t-statistics is used in t test to determine if there is a significant relationship between
the dependent and each of the independent variable. To do this test, we need the std.error
of coefficient (Sb) and calculate the t value. Then we compare the calculated t value
and the critical t value from the student t distribution table.

The t value is calculated by dividing the value of coefficient (b) by Sb:

Calculated t = b

Sb

i.e : Calculated t = 3.53 = 6.79
0.52
To calculate the critical value from student t distribution table:
n k = 10 2 = 8 d.f. at 95% C.I and the t critical = 2.306
Since t computed (6.79) > t critical (2.306) then adv.exp. is statistically significant in
explaining the variations in sales at 95% C.I.
Note: if there is more than one independent variable then you have to test significance for all
the independent vars.








9
e. Durbin-Watson Statistics

It indicates that whether the presence or absence of auto correlation means the
problem that can arise in regression analysis with time series data.

There are 3 possibilities where autocorrelation or multi-co linearity problem can
arise:
When independent variables are interrelated or duplicated
Where independent variables have been miss- specified
Where important independent variables are found missing.


f. F-statistics
It is another test of overall explanatory power of regression analysis. (Refer pg 147 manual)


----end of short notes on demand estimation----

Das könnte Ihnen auch gefallen