Beruflich Dokumente
Kultur Dokumente
JS
-S
SN C
JS
-S
SN C
A scatter plot (or scatter diagram) is used to show the relationship between two variables
-S
x
SN C
x y
JS
x
-S
x
SN C
E
y
x y
JS
x
-S JS
y
SN C
x x
Correlation Coefficient
(continued)
The sample correlation coefficient r is an estimate of and is used to measure the strength of the linear relationship in the sample observations
JS
-S
SN C
The population correlation coefficient (rho) measures the strength of the association between the variables
Features of and r
Unit free Range between -1 and 1 The closer to -1, the stronger the negative linear relationship The closer to 1, the stronger the positive linear relationship The closer to 0, the weaker the linear relationship
JS
-S
SN C
r = -1
y
-S
SN C
r = -.6
y
r=0
JS
x
r = +.3
r = +1
SN C
( x x)( y y ) [ ( x x ) ][ ( y y ) ]
2 2
-S
where: r = Sample correlation coefficient n = Sample size x = Value of the independent variable y = Value of the dependent variable
JS
n xy x y
[n( x 2 ) ( x )2 ][n( y 2 ) ( y )2 ]
Calculation Example
SN C
E
xy y2 1225 2401 729 1089 3600 441 2025 2601 x2 64 81 49 36 169 49 121 144 =713 280 441 189 198 780 147 495 612 =3142 =14111
JS
-S
Calculation Example
(continued)
60
SN C
Tree Height, y 70
50
40
30
-S JS
0 2 4 6 8 10 12 14
0.886
r = 0.886 relatively strong positive linear association between x and y
20
10
Trunk Diameter, x
n xy x y
Excel Output
Excel Correlation Output Tools / data analysis / correlation
Tree Height Trunk Diameter 1 0.886231 1
JS
Correlation between
-S
SN C
JS
-S
SN C
JS
-S
SN C
JS
-S
SN C
No Relationship
E
Independent Variable
JS
y 0 1x
-S
Linear component Random Error component
SN C
The underlying relationship between the x variable and the y variable is linear
JS
-S
SN C
SN C
i
y 0 1x
Slope = 1
-S JS
xi
Intercept = 0
-S
SN C
The sample regression line provides an estimate of the population regression line
Estimate of the regression slope Independent variable
i b0 b1x y
JS
JS
-S
SN C
E
0
(y (b
b1x))
JS
algebraic equivalent:
-S
and
b1
x y xy n 2 ( x ) 2 x n
SN C
E
b0 y b1 x
JS
-S
SN C
Other regression measures will also be computed as part of computerbased regression analysis
JS
-S
SN C
The coefficients b0 and b1 will usually be found using computer software, such as Excel or Minitab
JS
-S
SN C
SN C -S
245
JS
Square Feet (x) 1400 1600 1700 1875 1100 1550 2350 2450 1425 1700
Graphical Presentation
House price model: scatter plot and regression line
450 400 350 300 250 200 150 100 50 0
House Price ($1000s)
SN C
1000 1500 Square Feet
E
2000
Slope = 0.10977
Intercept = 98.248
JS
0 500
-S
2500
3000
Here, no houses had 0 square feet, so b0 = 98.24833 just indicates that, for houses within the range of sizes observed, $98,248.33 is the portion of the house price not explained by square feet
JS
-S
b0 is the estimated average value of Y when the value of X is zero (if x = 0 is in the range of observed x values)
SN C
Here, b1 = .10977 tells us that the average value of a house increases by .10977($1000) = $109.77, on average, for each additional one square foot of size
JS
-S
b1 measures the estimated change in the average value of Y as a result of a oneunit change in X
SN C
245 312 279 308 199 219 405 324 319 255
1400 1600 1700 1875 1100 1550 2350 2450 1425 1700
JS
-S
SN C
The predicted price for a house with 2000 square feet is 317.85($1,000s) = $317,850
JS
317.85
-S
SN C
Summary
Introduced correlation analysis Discussed correlation to measure the strength of a linear association Introduced simple linear regression analysis Calculated the coefficients for the simple linear regression equation
JS
-S
SN C
Summary
(continued)
JS
-S
Described inference about the slope Addressed estimation of mean values and prediction of individual values Discussed residual analysis
SN C
R software regression
yx=c(245,1400,312,1600,279,1700,308,1875,19 9,1100,219,1550,405,2350,324,2450,319,1425, 255,1700) mx=matrix(yx,10,2, byrow=T) hprice=mx[,1] sqft=mx[,2] reg1=lm(hprice~sqft) summary(reg1) plot(reg1)
JS
-S
SN C