Introduction To Linear Regression: Js-Ssnce

Introduction to Linear Regression
JS
-S
SN C
Scatter Plots and Correlation

Correlation analysis is used to measure strength of the association (linear relationship) between two variables
No causal effect is implied
JS
Only concerned with strength of the relationship
-S
SN C
A scatter plot (or scatter diagram) is used to show the relationship between two variables
Scatter Plot Examples

Linear relationships y y Curvilinear relationships
-S
x
SN C
x y
JS
x

(continued)
Strong relationships y Weak relationships
-S
x
SN C
E
y
x y
JS
x

(continued)
No relationship y
-S JS
y
SN C
x x
Correlation Coefficient
(continued)
The sample correlation coefficient r is an estimate of and is used to measure the strength of the linear relationship in the sample observations
JS
-S
SN C
The population correlation coefficient (rho) measures the strength of the association between the variables
Features of and r
Unit free Range between -1 and 1 The closer to -1, the stronger the negative linear relationship The closer to 1, the stronger the positive linear relationship The closer to 0, the weaker the linear relationship
JS
-S
SN C
Examples of Approximate r Values

y y y
r = -1
y
-S
SN C
r = -.6
y
r=0
JS
x
r = +.3
r = +1
Calculating the Correlation Coefficient

Sample correlation coefficient:
SN C
( x x)( y y ) [ ( x x ) ][ ( y y ) ]
2 2
-S
or the algebraic equivalent:
where: r = Sample correlation coefficient n = Sample size x = Value of the independent variable y = Value of the dependent variable
JS
n xy x y
[n( x 2 ) ( x )2 ][n( y 2 ) ( y )2 ]
Calculation Example
SN C
E
xy y2 1225 2401 729 1089 3600 441 2025 2601 x2 64 81 49 36 169 49 121 144 =713 280 441 189 198 780 147 495 612 =3142 =14111
Tree Height y 35 49 27 33 60 21 45 51 =321
Trunk Diamete r x 8 9 7 6 13 7 11 12 =73
JS
-S
Calculation Example
(continued)
60
SN C
Tree Height, y 70
[n( x 2 ) ( x)2 ][n( y 2 ) ( y)2 ] 8(3142) (73)(321)
50
40
[8(713) (73)2 ][8(14111) (321) 2 ]
30
-S JS
0 2 4 6 8 10 12 14
0.886
r = 0.886 relatively strong positive linear association between x and y
20
10
Trunk Diameter, x
n xy x y
Excel Output
Excel Correlation Output Tools / data analysis / correlation
Tree Height Trunk Diameter 1 0.886231 1
Tree Height and Trunk Diameter
JS
Correlation between
-S
Tree Height Trunk Diameter
SN C
Introduction to Regression Analysis

Regression analysis is used to:
Predict the value of a dependent variable based on the value of at least one independent variable Explain the impact of changes in an independent variable on the dependent variable
Independent variable: the variable used to explain the dependent variable
JS
Dependent variable: the variable we wish to explain
-S
SN C
Simple Linear Regression Model

Relationship between x and y is described by a linear function
JS
Changes in y are assumed to be caused by changes in x
-S
SN C
Only one independent variable, x
Types of Regression Models

Positive Linear Relationship Relationship NOT Linear
JS
Negative Linear Relationship
-S
SN C
No Relationship
Population Linear Regression

The population regression model:
E
Independent Variable
Population y intercept Dependent Variable
JS
y 0 1x
-S
Linear component Random Error component
SN C
Population Slope Coefficient
Random Error term, or residual
Linear Regression Assumptions

Error values () are statistically independent Error values are normally distributed for any given value of x
The underlying relationship between the x variable and the y variable is linear
JS
The probability distribution of the errors has constant variance
-S
The probability distribution of the errors is normal
SN C
Population Linear Regression

(continued)
Observed Value of y for xi
SN C
i
y 0 1x
Slope = 1
-S JS
xi
Predicted Value of y for xi
Random Error for this x value
Intercept = 0
Estimated Regression Model
Estimated (or predicted) y value
Estimate of the regression intercept
-S
SN C
The sample regression line provides an estimate of the population regression line
Estimate of the regression slope Independent variable
i b0 b1x y
The individual random error terms ei have a mean of zero
JS
Least Squares Criterion

b0 and b1 are obtained by finding the values of b0 and b1 that minimize the sum of the squared residuals
2 e (y y)
JS
-S
SN C
E
0
(y (b
b1x))
The Least Squares Equation

The formulas for b1 and b0 are:
b1 ( x x )( y y ) (x x)
2
JS
algebraic equivalent:
-S
and
b1
x y xy n 2 ( x ) 2 x n
SN C
E
b0 y b1 x
Interpretation of the Slope and the Intercept
b1 is the estimated change in the average value of y as a result of a one-unit change in x
JS
-S
SN C
b0 is the estimated average value of y when the value of x is zero
Finding the Least Squares Equation
Other regression measures will also be computed as part of computerbased regression analysis
JS
-S
SN C
The coefficients b0 and b1 will usually be found using computer software, such as Excel or Minitab
Simple Linear Regression Example

A real estate agent wishes to examine the relationship between the selling price of a home and its size (measured in square feet) A random sample of 10 houses is selected Dependent variable (y) = house price in
$1000s
Independent variable (x) = square feet
JS
-S
SN C
Sample Data for House Price Model
312 279 308 199 219 405 324 319 255
SN C -S
245
JS
House Price in $1000s (y)
Square Feet (x) 1400 1600 1700 1875 1100 1550 2350 2450 1425 1700
Graphical Presentation
House price model: scatter plot and regression line
450 400 350 300 250 200 150 100 50 0
House Price ($1000s)
SN C
1000 1500 Square Feet
E
2000
Slope = 0.10977
Intercept = 98.248
JS
0 500
-S
2500
3000
house price 98.24833 0.10977 (square feet)
Interpretation of the Intercept, b0
Here, no houses had 0 square feet, so b0 = 98.24833 just indicates that, for houses within the range of sizes observed, $98,248.33 is the portion of the house price not explained by square feet
JS
-S
b0 is the estimated average value of Y when the value of X is zero (if x = 0 is in the range of observed x values)
SN C
Interpretation of the Slope Coefficient, b1
Here, b1 = .10977 tells us that the average value of a house increases by .10977($1000) = $109.77, on average, for each additional one square foot of size
JS
-S
b1 measures the estimated change in the average value of Y as a result of a oneunit change in X
SN C
Example: House Prices
245 312 279 308 199 219 405 324 319 255
1400 1600 1700 1875 1100 1550 2350 2450 1425 1700
JS
-S
Predict the price for a house with 2000 square feet
SN C
House Price in $1000s (y)
house price 98.25 0.1098 (sq.ft.)
Square Feet (x)
Estimated Regression Equation:
Example: House Prices (continued)
house price 98.25 0.1098 (sq.ft.) 98.25 0.1098(2000)
The predicted price for a house with 2000 square feet is 317.85($1,000s) = $317,850
JS
317.85
-S
SN C
Predict the price for a house with 2000 square feet:
Summary
Introduced correlation analysis Discussed correlation to measure the strength of a linear association Introduced simple linear regression analysis Calculated the coefficients for the simple linear regression equation
JS
-S
SN C
Summary
(continued)
JS
-S
Described inference about the slope Addressed estimation of mean values and prediction of individual values Discussed residual analysis
SN C
R software regression
yx=c(245,1400,312,1600,279,1700,308,1875,19 9,1100,219,1550,405,2350,324,2450,319,1425, 255,1700) mx=matrix(yx,10,2, byrow=T) hprice=mx[,1] sqft=mx[,2] reg1=lm(hprice~sqft) summary(reg1) plot(reg1)
JS
-S
SN C

Introduction To Linear Regression: Js-Ssnce

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Introduction To Linear Regression: Js-Ssnce

Hochgeladen von

Copyright:

Verfügbare Formate

Introduction to Linear Regression

Scatter Plots and Correlation

No causal effect is implied

Only concerned with strength of the relationship

Scatter Plot Examples

Scatter Plot Examples

Scatter Plot Examples

Examples of Approximate r Values

Calculating the Correlation Coefficient

or the algebraic equivalent:

Tree Height y 35 49 27 33 60 21 45 51 =321

Trunk Diamete r x 8 9 7 6 13 7 11 12 =73

[n( x 2 ) ( x)2 ][n( y 2 ) ( y)2 ] 8(3142) (73)(321)

[8(713) (73)2 ][8(14111) (321) 2 ]

Tree Height and Trunk Diameter

Tree Height Trunk Diameter

Introduction to Regression Analysis

Independent variable: the variable used to explain the dependent variable

Dependent variable: the variable we wish to explain

Simple Linear Regression Model

Changes in y are assumed to be caused by changes in x

Only one independent variable, x

Types of Regression Models

Negative Linear Relationship

Population Linear Regression

Population y intercept Dependent Variable

Population Slope Coefficient

Random Error term, or residual

Linear Regression Assumptions

The probability distribution of the errors has constant variance

The probability distribution of the errors is normal

Population Linear Regression

Predicted Value of y for xi

Random Error for this x value

Estimated Regression Model

Estimated (or predicted) y value

Estimate of the regression intercept

The individual random error terms ei have a mean of zero

Least Squares Criterion

The Least Squares Equation

Interpretation of the Slope and the Intercept

b1 is the estimated change in the average value of y as a result of a one-unit change in x

b0 is the estimated average value of y when the value of x is zero

Finding the Least Squares Equation

Simple Linear Regression Example

Independent variable (x) = square feet

Sample Data for House Price Model

312 279 308 199 219 405 324 319 255

House Price in $1000s (y)

house price 98.24833 0.10977 (square feet)

Interpretation of the Intercept, b0

house price 98.24833 0.10977 (square feet)

Interpretation of the Slope Coefficient, b1

house price 98.24833 0.10977 (square feet)

Example: House Prices

Predict the price for a house with 2000 square feet

House Price in $1000s (y)

house price 98.25 0.1098 (sq.ft.)

Square Feet (x)

Estimated Regression Equation:

Example: House Prices (continued)

house price 98.25 0.1098 (sq.ft.) 98.25 0.1098(2000)

Predict the price for a house with 2000 square feet: