Sie sind auf Seite 1von 34

Introduction to Simple Linear Regression

Regression model
• Relation between variables where changes in
some variables may “explain” or possibly “cause”
changes in other variables.

• Explanatory variables are termed the


independent variables and the variables to be
explained are termed the dependent variables.
• Regression model estimates the nature of the
relationship between the independent and
dependent variables.

– Change in dependent variables that results from


changes in independent variables, i.e. size of the
relationship.
– Strength of the relationship.
– Statistical significance of the relationship.
Examples
• Dependent variable is retail price of gasoline in Regina –
independent variable is the price of crude oil.
• Dependent variable is employment income – independent
variables might be hours of work, education, occupation, sex,
age, region, years of experience, unionization status, etc.

• Price of a product and quantity produced or sold:


– Quantity sold affected by price. Dependent variable is
quantity of product sold – independent variable is price.
– Price affected by quantity offered for sale. Dependent
variable is price – independent variable is quantity sold.
For example;
• Imagine you’re a researcher studying any of the
following:
• Do socio-economic status and race affect
educational achievement?
• Do education and IQ affect earnings?
• Do exercise habits and diet effect weight?
• Are drinking coffee and smoking cigarettes
related to mortality risk?
• Does a particular exercise intervention have
an impact on bone density that is a distinct
effect from other physical activities?
600 160

140

500

120

400

100

300 80

60

200

40

100

20

0 0
1981M01

1982M01

1984M01

1985M01

1986M01

1987M01

1988M01

1991M01

1992M01

1993M01

1994M01

1995M01

1997M01

1998M01

1999M01

2000M01

2001M01

2002M01

2004M01

2005M01

2007M01

2008M01
1983M01

1989M01

1990M01

1996M01

2003M01

2006M01
Crude Oil price index, 1997=100, left axis Regular gasoline prices, regina, cents per litre, right axis

Source: CANSIM II Database (Vector v1576530 and v735048


respectively)
Bivariate and multivariate models
Bivariate or simple regression model
(Education) x y (Income)
Multivariate or multiple regression model
(Education) x1
(Sex) x2
y (Income)
(Experience) x3
(Age) x4
Model with simultaneous relationship
Price of wheat Quantity of wheat produced
Bivariate or simple linear regression (ASW, 466)
• x is the independent variable
• y is the dependent variable
• The regression model is
y   0  1 x  
• The model has two variables, the independent or explanatory
variable, x, and the dependent variable y, the variable whose
variation is to be explained.
• The relationship between x and y is a linear or straight line
relationship.
• Two parameters to estimate – the slope of the line β1 and the
y-intercept β0 (where the line crosses the vertical axis).
• ε is the unexplained, random, or error component. Much
more on this later.
Regression line
• The regression model is y   0  1 x  
• Data about x and y are obtained from a sample.
• From the sample of values of x and y, estimates b0 of
β0 and b1 of β1 are obtained using the least squares
or another method.
• The resulting estimate of the model is
yˆ  b0  b1 x
• The symbol ŷ is termed “y hat” and refers to the
predicted values of the dependent variable y that are
associated with values of x, given the linear model.
Relationships
• Economic theory specifies the type and structure of
relationships that are to be expected.
• Historical studies.
• Studies conducted by other researchers – different
samples and related issues.
• Speculation about possible relationships.
• Correlation and causation.
• Theoretical reasons for estimation of regression
relationships; empirical relationships need to have
theoretical explanation.
Example 1
• Sales data of 10 months for a coffee house
situated near a prime location of a city
comprising the number of customers (in
hundreds) and monthly sales (in Thousand
Rupees) are given below:
x y
6 1
6.1 6
6.2 8
6.3 10
6.5 11
7.1 20
7.6 21
7.8 22
8 23
8.1 25
x y xy x^2
6 1 6 36
6.1 6 36.6 37.21
6.2 8 49.6 38.44
6.3 10 63 39.69
6.5 11 71.5 42.25
7.1 20 142 50.41
7.6 21 159.6 57.76
7.8 22 171.6 60.84
8 23 184 64
8.1 25 202.5 65.61
Σx= 69.7 Σy= 147 Σxy= 1086.4 Σx^2= 492.21
• Model: Y = f(X)

• Equation: Y = a + bX + ei

• B= nΣxy – Σx Σy/nΣx^2- (Σx)^2


Example
• The following is the data relate to advertising
expenditure (in Lakhs) and their corresponding
sales (in crores)
• Adv Exp: 10 12 15 23 20
• Sales: 14 17 23 25 21
A) Find the equation of the least square line
fitting the data
B) Estimate the value of sales corresponding to
advertising exp of 30 lakh
Value X Fitted value Residuals (X-Y)
y=8.608+0.172x

10 15.728
-5.728
12 17.152
-5.152
15 19.288
-4.288
23 24.984
-1.984
20 22.848
-2.848
• Y= 8.608 + 0.712 *(10)
• Y= 8.608 + 0.712 *(12)
• Y= 8.608 + 0.712 *(15)
………
………
………

• Y= 8.608 + 0.712 * (30) Y= 29.968 Cr


Uses of regression
• Amount of change in a dependent variable that results
from changes in the independent variable(s) – can be
used to estimate elasticities, returns on investment in
human capital, etc.
• Attempt to determine causes of phenomena.
• Prediction and forecasting of sales, economic growth,
etc.
• Support or negate theoretical model.
• Modify and improve theoretical models and
explanations of phenomena.
R/s b/w Correlation & Regression
(i) Correlation studies the linear relationship
between two variables
Regression analysis is a mathematical measure
of the average relationship between two or
more variables.
(ii) Correlation has limited application because it
gives the strength of linear relationship while
the purpose of regression is to "predict" the
value of the dependent variable for the given
values of one or more independent variables.
(iii) Correlation makes no distinction between
independent and dependent variables while
linear regression does it, i.e. correlation does
not consider the concept of dependent and
independent variables while in regression
analysis one variable is considered as dependent
variable and other(s) is/are as independent
variable(s)
Multiple R.

• This is the correlation coefficient. It tells you


how strong the linear relationship is.

• For example, a value of 1 means a perfect


positive relationship and a value of zero
means no relationship at all.
R Square
• R-squared is a statistical measure of how close
the data are to the fitted regression line.
• It is also known as the coefficient of
determination, or the coefficient of multiple
determination for multiple regression.
• it is the percentage of the response variable
variation that is explained by a linear model.
• R-squared = Explained variation / Total variation.
• R-squared is always between 0 and 100%: 0%
indicates that the model explains none of the
variability of the response data around its mean.
• 100% indicates that the model explains all the
variability of the response data around its mean.
• In general, the higher the R-squared, the better the
model fits your data
Standard Error
• Measure of the statistical accuracy of an
estimate, equal to the standard deviation of
the theoretical distribution of a large
population of such estimates.
yˆ  2461  297 x

R2 = 0.311
Significance = 0.0031
Outliers
• Rare, extreme values may distort the
outcome.
– Could be an error.
– Could be a very important observation.
• Outlier: more than 3 standard deviations from
the mean.

30
GPA vs. Time Online

12

10

8
Time Online

0
50 55 60 65 70 75 80 85 90 95 100
GPA
GPA vs. Time Online

6
Time Online

0
50 55 60 65 70 75 80 85 90 95 100
GPA
160

140
Regular gasoline prices, regina, cents per litre

120

100

80

60

Correlation =
40
0.8703
20

0
0 100 200 300 400 500 600
Crude Oil Price Index (1997=100)

Source: CANSIM II Database (Vector v1576530 and v735048


respectively)
U-Shape d Re lationship

12

10

6
Y

0
0 2 4 6 8 10 12
X

Correlation = +0.12.
34

Das könnte Ihnen auch gefallen