Beruflich Dokumente
Kultur Dokumente
Scatter Diagrams
A scatter plot is a graph that may be used to represent the relationship between two variables. Also referred to as a scatter diagram.
(a) Linear
(b) Linear
(c) Curvilinear
(d) Curvilinear
(e) No Relationship
Correlation
The correlation coefficient is a quantitative measure of the strength of the linear relationship between two variables. The correlation ranges from + 1.0 to - 1.0. A correlation of 1.0 indicates a perfect linear relationship, whereas a correlation of 0 indicates no linear relationship.
Correlation
SAMPLE CORRELATION COEFFICIENT
( x x )( y y ) [ ( x x ) ][ ( y y ) ]
2 2
where: r = Sample correlation coefficient n = Sample size x = Value of the independent variable y = Value of the dependent variable
Correlation
SAMPLE CORRELATION COEFFICIENT or the algebraic equivalent:
[n( x 2 ) ( x) 2 ][n( y 2 ) ( y ) 2 ]
n xy x y
Correlation
(Example 11-1)
(Table 11-1)
Sales y 487 445 272 641 187 440 346 238 312 269 655 563 Years x 3 5 2 8 2 6 7 1 4 2 9 6 yx 1,461 2,225 544 5,128 374 2,640 2,422 238 1,248 538 5,895 3,378 y2 237,169 198,025 73,984 410,881 34,969 193,600 119,716 56,644 97,344 72,361 429,025 316,969 x2 9 25 4 64 4 36 49 1 16 4 81 36
4,855
Correlation
(Example 11-1)
[n( x ) ( x) ][n( y ) ( y ) ]
2 2 2 2
n xy x y
0.8325
Correlation
(Example 11-1)
Correlation
Spurious correlation occurs when there is a correlation between two otherwise unrelated variables.
y 0 1 x
where: y = Value of the dependent variable x = Value of the independent variable 0= Populations y-intercept 1 = Slope of the population regression line = Error term, or residual
y y
Y
390 400 300 312 200
150 60 x y
100
i b0 b1 x y
where:
b1
algebraic equivalent:
( x x )( y y ) (x x)
2
b1
x y xy n 2 ( x ) 2 x n
and
b0 y b1 x
SSE y b0 y b1 xy
2
(Table 11-3)
Sales y 487 445 272 641 187 440 346 238 312 269 655 563 Years x 3 5 2 8 2 6 7 1 4 2 9 6 xy 1,461 2,225 544 5,128 374 2,640 2,422 238 1,248 538 5,895 3,378 y2 237,169 198,025 73,984 410,881 34,969 193,600 119,716 56,644 97,344 72,361 429,025 316,969 x2 9 25 4 64 4 36 49 1 16 4 81 36
4,855
b1
x y xy n 2 ( x ) 2 x n
175.8288 49.9101( x) y
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0% 175.8288191 54.98988674 3.197475563 0.00953244 53.30369475 298.3539434 53.30369475 298.3539434 49.91007584 10.50208428 4.752397191 0.000777416 26.50996978 73.3101819 26.50996978 73.3101819
) 0 ( y y
SUM OF SQUARED RESIDUALS
) ( y y
where: TSS = Total sum of squares n = Sample size y = Values of the dependent variable y= Average value of the dependent variable
TSS ( y y )
where: SSE = Sum of squares error n = Sample size y = Values of the dependent variable = Estimated value for the average of y for the y given x value
) SSE ( y y
where: SSR = Sum of squares regression y= Average value of the dependent variable y = Values of the dependent variable = Estimated value for the average of y for the y given x value
y) SSR ( y
SSR R TSS
2
69.31% of the variation in the sales data for this sample can be explained by the linear relationship between sales and years of experience.
R r
2
Residual Analysis
Before using a regression model for description or prediction, you should do a check to see if the assumptions concerning the normal distribution and constant variance of the error terms have been satisfied. One way to do this is through the use of residual plots.
Key Terms
Coefficient of Determination Correlation Coefficient Dependent Variable Independent Variable Least Squares Criterion Regression Coefficients Regression Slope Coefficient Residual Scatter Plot Simple Linear Regression Analysis Spurious Correlation