Sie sind auf Seite 1von 12

Econometrics Student Notes Module 2

The Simple Regression Model


The Basic Concept

Regression is a statistical measure to determine the (strength of) relationship between a


dependent variable (explained variable, response variable or regressand) and a list of
independent variables (explanatory variables, predictor variables or regressors). It is a process
of estimating the relationship among variables. It looks into the dependence of the regressand
on the regressors). It is not the correlation but we want to know how and how much the
dependent variable changes in response to changes in the dependent variable(s). We need,
sometimes, to predict the values of the dependent variable with the help of the values of the
regressors.

Consider the Demand Function. Demand theory suggests that the quantity demanded depend
on various variables like price, income of the consumer, taste, prices of other variables etc. We
want to know how the quantity demanded may change due to changes in some of the
independent variables like price.

Usually we denote the dependent variable by Y and the regressors as X (or X 1, X2 etc. in case of
multiple regressors). In regression analysis, we try to explain the variable Y in terms of the
variable X. Remember that the variable X may not be the only factor effecting Y. Also the
relationship may not be exact e.g. for the same Y we may have different X values and for the
same X value we may have different Y values. One row shows a pair of X and Y. We handle this
by looking on the averages and try to know how the values of the variable Y change in response
to changes in the variable X, on the average.

Before performing regression, we also need to have an idea about the nature of the functional
relationship of the variable. The relationship may be linear, quadratic, exponential etc. There
are many regression models and we select the model that closely approximates the relationship
among the variables. We can have an idea about the type of relationship by looking into, what
we call, a scatter diagram.

Scatter Diagram

Scatter diagram shows the pairs of actual observations. We usually plot the dependent variable
against an explanatory variable to see if we can observe a pattern. If the pattern shows a linear
relation, we use a linear regression model.
The above diagram shows that the expenditure on food is a direct (increasing) function of the
income levels. The dots showing the plots of the pairs of observation resemble a linear shape
(straight line). The points do not lie exactly on a straight line but are scattered around a
hypothetical straight line. In the diagram below, the annual sales seem to be inversely related
to the price of the commodity. This is because the dots of pairs of observation seem to be
scattered around a (hypothetical) straight line that is negatively sloped.
Remember that straight lines are show by equations of the type 𝑌 = 𝑎 + 𝑏 𝑋 where 𝑎 is the y
intercept (the point where the straight line intersects the Y-axis) and 𝑏 is the slope of the line
(the change in the variable Y due to one unit change in the variable X or Δ𝑌/Δ𝑋).
In simple regression, we try to estimate the best (explained later) values of 𝑎 and 𝑏 by applying
appropriate techniques. One of the techniques is called Ordinary Least Square (OLS).
Simple Regression Line by OLS
• The relationship seems to be ‘linear’ that can be captured with the equation of a
Straight line (Y = a + b X)
• We may need to predict Y if the value of X is given
• We capture the relation by writing a ‘simple regression equation’
𝑌 = 𝑎 + 𝑏 𝑋 + 𝑒 OR 𝑌 = 𝛽0 + 𝛽1𝑋 + 𝑒
Residual: Note that we have added 𝑒 which is called an error term or residual. We add this
because the actual values do not exactly lie on a straight line but maybe scattered around it. To
account for this difference, we capture it in the residual 𝑒. When we estimate the parameters
‘a’ and ‘b’, they do not provide exact estimates of the value of the dependent variable. The
difference is called error term or residual
𝒀𝒊 = 𝜷𝟎 + 𝜷𝟏𝑿𝒊 + e𝒊 (with subscripts)
Subscript: The subscript 𝑖 shows that the variable may have multiple observations (as you learnt
in summation algebra). 𝜷𝟎 And 𝜷𝟏 are written instead of 𝑎 and 𝑏 so that we follow the
tradition of regression analysis.
In both the diagrams above and below, we have imposed a straight line on the scatter diagram
to show how the points are scattered around the straight line and if we move along the straight
line, we approximate the relation of Y and X. A good technique applied on an appropriate
situation may well approximate the relationship (with smaller values of )

Regression Explained

Population Regression Equation is an assumed equation that may have possibly been estimated
from a population. We will use samples to get the values of the parameters 𝜷𝟎 and 𝜷𝟏 as all the
population may not be available or observed.
𝒀𝒊 = 𝜷𝟎 + 𝜷𝟏𝑿𝒊 + e𝒊
Here
Yi = Dependent Variable or Explained Variable. 𝜷𝟎 and 𝜷𝟏 are Parameters we need to estimate.
X is the Independent Variable OR Explanatory Variable.
Regression equation estimation
𝒀𝒊 = 𝜷𝟎 + 𝜷𝟏𝑿𝒊 + e𝒊 is the population regression equation
Let a and b be the estimated values of 𝜷𝟎 and 𝜷𝟏 respectively
We estimate a and b from a sample. The ‘estimated value’ of Y based on the estimated
regression equation is
Regression equation estimation

It is good to have low values of errors (residuals). Negative and Positive Errors cancel each
other and we want to ‘Magnify’ larger errors so we focus on the ‘Square of Errors’ and try to
minimize their sum. In least square estimation, we minimize the ‘Sum of Squared Residuals’ or
‘Sum of Square of Errors’ .

We try to estimate the parameters a and b for which we have the minimum possible ‘Sum of
Square of Residuals’.

Other values of ‘a’ and ‘b’ may provide larger SSR.


• Finding the values of ‘a’ and ‘b’ in the regression equation is a minimization problem

Also Remember that


• For ‘Optimization’ we take the first derivative and set it equal to zero
Important: Here although X and Y are variables but for this minimization problem only
we will consider ‘a’ and ‘b’ to be the unknowns as we are trying to estimate the values of
‘a’ and ‘b’
The above minimization becomes
To find the values of ‘a’ and ‘b’ we need some observations of X and Y. We can solve the normal
equations and find the values of ‘a’ and ‘b’. Solving these equations gives the values of
parameters 𝑎 and 𝑏.
Example:
Consider the following example where X = Income in thousand rupees and Y = expenditure on
food items (thousand rupees)

Substituting values in the normal equations gives us:


145 = 5𝑎 + 𝑏 (175) &
5280 = a(175) + b(6375)
OR
145 = 5𝑎 + 175 𝑏
5280 = 175 a + 6375 b
Solving them simultaneously gives us:
a = 0.3 and b = 0.82
We can write the regression line as
𝑌 = 0.3 + 0.82 𝑋
Interpretation
The value of a is the Y-intercept and the value of b is the slope of the line (rate of change of Y
w.r.t. X or derivative of Y w.r.t. X)

Trend Values and Errors:


We can substitute the values of X in the estimated regression equation and find Trend Values
The First trend value is computed as Y = 0.3 + 0.82 (25) = 20.8 and so on. If you change the
values of a and b and compute new squares of errors, the new value would be larger than the
value here (Least Square of errors)

Das könnte Ihnen auch gefallen