5 views

Uploaded by SoniyaKanwalG

Simple Linear Regression Collection and statistics

- Chapter 3 Complete Solutions
- ECO 740
- Research-Can Potato Be Substitute for Our Staple Food
- kumpulan jurnal.pdf
- Multiple Regression.doc
- G Statiscis Chapter 8
- Regression
- Airline Results
- RELTIQUES_V1N1
- csae-wps-2012-18
- Analysis of Ambulance service
- Sample Midterm
- Answer to Econometric for Finance.pdf
- Regression Analysis
- Lesson 4 Theory
- SMChap003.docx
- Nov_2014
- Regressi On
- O2
- SMChap003

You are on page 1of 14

variables are associated with one another, but it does not give any

ideas of the kind of relationship.

Regression models help investigating bivariate and multivariate

relationships between variables, where we can hypothesize that 1

variable depends on another variable or a combination of other

variables.

Normally relationships between variables in political science and

economics are not exact unless true by definition, but relationships

include most often a non-structural or random component, due to

the probabilistic nature of theories and hypotheses in PolSci,

measurement errors etc.

Regression analysis enables to find average relationships that may

not be obvious by just eye-balling the data explicit formulation of

structural and random components of a hypothesized relationship

between variables.

Example: positive relationship between unemployment and

government spending

Linear relationship between x (explanatory variable) and y

(dependent variable)

Epsilon describes the random component of the linear relationship

between x and y

-10

-5

10

15

yi x i i

-2

2

x

yi x i i

(spending) in observation i (e.g. in the UK)

Y is determined by 2 components:

1. the non-random/ structural component alpha+beta*xi

where x is the independent/ explanatory variable

(unemployment) in observation i (UK) and alpha and

beta are fixed quantities, the parameters of the

model; alpha is called constant or intercept and

measures the value where the regression line

crosses the y-axis; beta is called coefficient/ slope,

and measures the steepness of the regression line.

2. the random component called disturbance or error

term epsilon in observation i

A simple example:

x has 10 observations: 0,1,2,3,4,5,6,7,8,9

The true relationship between y and x is: y=5+1*x, thus, the true y

takes on the values: 5,6,7,8,9,10,11,12,13,14

There is some disturbance e.g. a measurement error, which is

standard normally distributed: thus the y we can measure takes

on the values: 6.95,5.22,6.36,7.03,9.71,9.67,10.69,13.85,

13.21,14.82 which are close to the true values, but for any given

observation the observed values are a little larger or smaller than

the true values.

the relationship between x and y should hold on average true but

is not exact

When we do our analysis, we dont know the true relationship and

the true y, we just have the observed x and y.

We know that the relationship between x and y should have the

following form: y=alpha+beta*x+epsilon (we hypothesize a linear

relationship)

The regression analysis estimates the parameters alpha and

beta by using the given observations for x and y.

The simplest form of estimating alpha and beta is called ordinary

least squares (OLS) regression

alpha

10

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

y

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

OLS-Regression:

Draw a line through the scatter plot in a way to minimize the deviations of

the single observations from the line:

y7

epsilon7

yhat7

y1

yhat1

5

x

10

Minimize the sum of all squared deviations from the line (squared residuals)

y i x i i i yi x i

Fitted values

the values of the dependent variable (values on the line) are called

predicted values of the regression (yhat): 4.97,6.03,7.10,8.16,9.22,

10.28,11.34,12.41,13.47,14.53 these are very close to the true values;

the estimated alpha = 4.97 and beta = 1.06

OLS regression

Ordinary least squares regression: minimizes the squared residuals

yi x i i

i yi x i

y i x i i

n

(Y

Y

)

)

i i i min

Components:

DY: y; at least 1 IV: x

Constant or

intercept term: alpha

Regression coefficient,

slope: beta

Error term, residuals: epsilon

-5

0

5

10

i 1

-10

i 1

15

-2

2

x

The relationship between x and y is described by the function:

y i x i i

The difference between the dependent variable y and the estimated systematic

influence of x on y is named the residual:

ei yi x i

To receive the optimal estimates for alpha and beta we need a choice-criterion;

in the case of OLS this criterion is the sum of squared residuals: we calculate

alpha and beta for the case in which the sum of all squared deviations

(residuals) is minimal

n

min

ei

,

i 1

min

,

i 1

yi x i

S ,

Taking the squares of the residual is necessary since a) positive and negative

deviation do not cancel each other out, b) positive and negative estimation

errors enter with the same weight due to the squaring down, it is therefore

irrelevant whether the expected value for observation yi is underestimated or

overestimates

Since the measure is additive no value is of outmost relevance.

Especially large residuals receive a stronger weight due to squaring.

Minimizing the function requires to calculate the first order conditions with

respect to alpha and beta and set them zero:

S ,

I:

y

i 1

S ,

II :

x i 0

y

n

i 1

x i x i 0

This is just a linear system of two equations with two unknowns alpha and

beta, which we can mathematically solve for alpha:

n

I : yi x i 0

i 1

yi x i y x

i 1

and beta:

n

II : yi x i x i 0

i 1

n

yi x i x i x i2 0

i 1

yi x i y x x i x i2 0

i 1

n

yi x i yx i xx i x i2 0

i 1

n

yi y x x i 0

i 1

y xi x

i 1

n

i 1

n

i 1

i 1

i 1

y x x i x i 0

n

yi y x x i

i 1

xi x

i 1

y

n

Cov x, y

V ar x

X 'X 1X ' y

minimize the sum of squared residuals and satisfy the

second order conditions of the minimizing problem.

Thus we need the second derivatives of the two

functions with respect to alpha and beta which are

given by the so called Hessian matrix (matrix of

second derivatives). (I spare the mathematical

derivation)

The Hessian matrix has to be positive definite (the

determinant must be larger than 0) so that and

globally minimize the sum of squared residuals. Only

in this case alpha and beta are optimal estimates for

the relationship between the dependent variable y and

the independent variable x.

Regression coefficient:

n

yx

(x

i 1

x)(yi y)

(x

i 1

x)

divided by the variance of x.

reg y x

Source |

SS

df

MS

-------------+---------------------------------------------Model | 1248.96129 1 1248.96129

Residual | 1363.2539 98 13.9107541

-------------+---------------------------------------------Total | 2612.21519 99 26.386012

Number of obs =

F( 1, 98)

=

Prob > F

=

R-squared

=

Adj R-squared =

Root MSE

=

100

89.78

0.0000

0.4781

0.4728

3.7297

---------------------------------------------------------------------------------------------------y|

Coef.

Std. Err.

t

P>|t|

[95% Conf. Interval]

-------------+------------------------------------------------------------------------------------x | 1.941914 .2049419 9.48 0.000 1.535213 2.348614

_cons | .8609647 .4127188 2.09 0.040 .0419377 1.679992

---------------------------------------------------------------------------------------------------If x increases by 1 unit, y increases by 1.94 units: the interpretation is linear and

straightforward

Interpretation: example

alpha

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

alpha=4.97, beta=1.06

Education and earnings: no education gives you a

minimal hourly wage of around 5 pounds. Each

additional year of education increases the hourly wage

by app. 1 pound:

beta=1.06

4

y

5

x

Fitted values

10

Since alpha and beta are estimates of the unknown parameters,

y i x i estimates the mean function or the systematic part of

the regression equation. Since a random variable can be predicted

best by the mean function (under the mean squared error criterion),

yhat can be interpreted as the best prediction of y. the difference

between the dependent variable y and its least squares prediction is

the least squares residual: e=y-yhat =y-(alpha+beta*x).

A large residual e can either be due to a poor estimation of the

parameters of the model or to a large unsystematic part of the

regression equation

For the OLS model to be the best estimator of the relationship

between x and y several conditions (full ideal conditions, GaussMarkov conditions) have to be met.

If the full ideal conditions are met one can argue that the OLSestimator imitates the properties of the unknown model of the

population. This means e.g. that the explanatory variables and the

error term are uncorrelated.

- Chapter 3 Complete SolutionsUploaded byVictor
- ECO 740Uploaded byAhmad Asifa Nural Anuar
- Research-Can Potato Be Substitute for Our Staple FoodUploaded byImran525
- kumpulan jurnal.pdfUploaded byyuda
- Multiple Regression.docUploaded bykuashask2
- G Statiscis Chapter 8Uploaded bySonnet Bhowmik
- RegressionUploaded byManish Rauniyar
- Airline ResultsUploaded byandri00
- RELTIQUES_V1N1Uploaded bygustavo orozco
- csae-wps-2012-18Uploaded byBianca Nascimento
- Analysis of Ambulance serviceUploaded byBilalSheikh
- Sample MidtermUploaded byArafath Cherukuri
- Answer to Econometric for Finance.pdfUploaded byJame
- Regression AnalysisUploaded byGabriel Sim
- Lesson 4 TheoryUploaded byuniversoexacto
- SMChap003.docxUploaded byMohamedBanoon
- Nov_2014Uploaded byLuca Allevi
- Regressi OnUploaded byfansuri80
- O2Uploaded byYash Maurya
- SMChap003Uploaded byPooja Jha Kashyap
- Ch24 AnswersUploaded byamisha2562585
- Software Testing Defect Prediction Model-A Practical ApproachUploaded byInternational Journal of Research in Engineering and Technology
- Child Mortality Reduction a Closer ViewUploaded byLusi Yang
- Lewis Price DispersionUploaded byHa Minh
- EC15_Exam1_Answers_s10.pdfUploaded by6doit
- Topic 4 Panel Regression Model WbleUploaded byHan Yong
- Transportation Statistics: table 04 03Uploaded byBTS
- burgin distance vs acceleration regressionUploaded byapi-249668435
- correlationandregression-111121213514-phpapp02Uploaded byVrajeshBhavsar
- 2013.Test.V1Uploaded byAnonymous gUySMcpSq

- An Introduction to Probability Theory by William FellerUploaded bySushil Kumar Gautam
- RegressionUploaded byluispedro1985
- Anoka_Hennepin_Probability_and_Statistics-4 2013 NEW_4.pdfUploaded bySoniyaKanwalG
- Simple Linear Regression CollectionUploaded byAdriana Nicoleta
- regression.pdfUploaded bySyaza Ibrahim
- total-selection-leftover-seats-morning (1).pdfUploaded bySoniyaKanwalG
- total-selection-leftover-seats-morning (1).pdfUploaded bySoniyaKanwalG
- Sl RegressionUploaded bySoniyaKanwalG
- First List Sf f16Uploaded bySoniyaKanwalG
- English-Vocabulary-in-Use-Advanced-with-Answers.pdfUploaded bycoolbaby
- Quant1_Week8_OLS.pdfUploaded bySoniyaKanwalG
- e6-34Uploaded bySoniyaKanwalG
- Simple RegressionUploaded bySoniyaKanwalG
- Chapter 9Uploaded byAniket Batra
- Notes Part 2Uploaded byAXA2000
- Hec List 2015Uploaded bySoniyaKanwalG
- An Introduction to Probability Theory by William FellerUploaded bySoniyaKanwalG
- Asaan Lughat e Quran-.pdfUploaded bySoniyaKanwalG
- bdmpi14pcUploaded bySoniyaKanwalG
- List 4 Merit f16Uploaded bySoniyaKanwalG
- Chapter 9Uploaded bySoniyaKanwalG
- ch04Uploaded bySoniyaKanwalG
- Ch2 MO TheoryUploaded bywadelkrid1270
- ECONOMICS BOOKUploaded byKhokanuzzaman Khokan
- What's Chemical EngineeringUploaded bylia murti
- 5bFRSUploaded bySoniyaKanwalG
- Lecture 25Uploaded bySoniyaKanwalG
- 1 Inorganic Chemistry7Uploaded bySoniyaKanwalG
- Chapter 9Uploaded byAniket Batra

- Meteor ShowersUploaded byJenniferNovabosBercero
- U (5)Uploaded byrudra_mazumdar
- Creative Approaches to Writing MaterialsUploaded bySelma Caballero Hernandez
- probUploaded byKashif Jawaid
- ImageProcessing TutorialUploaded byoicfbd
- Programming and Data Management for IBM SPSS Statistics 20Uploaded byMartín Pérez H
- Upgrading From HP EVA Storage to 3PAR StoreServUploaded byReinaldo Guzman
- Bea - Understanding Macondo Well Failures-BB_DHSG-Jan 2011Uploaded byDr K Gallowglaich
- unwind analysis essayUploaded byapi-251842097
- Referrals OpeningsUploaded byvijblr
- Reliance Case StudyUploaded bySaurabh Hasija
- math 4 curriculum guide draft 2016-2017Uploaded byapi-291159151
- SQL Injection Pocket ReferenceUploaded byhackmeyou
- Cost of Quality_Tip of the IcebergUploaded byLyne Lerin
- 00 anderson_2009_businesssimulationsandcognitivelearning.pdfUploaded byIkrom
- Strategy Implementation and Project ManagementUploaded byapi-3707091
- DWMUploaded byMohammed Aatif
- Ternary simulationUploaded byChelo Zubieta
- NbvLwU Camunda BPM Compared to Alternatives 08 2015Uploaded bymhenley8059
- peters resumeUploaded byapi-320084026
- Activated Carbon FilterUploaded bythirumalai
- Vivienne's Personal Philosophy of DanceUploaded byVivienne Grace
- MTH 209 Week 3 CheckpointUploaded byuop2018
- National Science Foundation: minutes 10 03Uploaded byNSF
- 1. Fodor Jerry Representations Philosophical Essays on the Foundations of Cognitive Science Harvester Studies in Cognitive Science 1981Uploaded byXimena Cano
- GokuUploaded byGianMarcoOnofreCcora
- 0.5. Sap Netweaver GatewayUploaded bySathish B Sathish
- Business EthicsUploaded byjob ocay
- Arthur Cayley Research PaperUploaded byWilliam Li
- 4.0 Trait Approach to Leadership MZ 04 (1)Uploaded byAnonymous xMXvFt