Sie sind auf Seite 1von 9

DCC 3132 - STATISTICS

CHAPTER 6
CORRELATION AND REGRESSION

6.1 CORRELATION

- So far we have considered the statistics of one variable.


- In this chapter, we are going to study whether there is any linear relation between two
independent random variables.
- A linear relation is one where a change in one variable corresponds to a constant change in
another variable.

6.1.1 SCATTER DIAGRAM

- The two independent variable in the study of correlationship are also known as bivariate.
- A faster way to determine whether there is a linear correlation between the bivariate is to
represent the two variables by using a scatter diagram. Figure below shown the type of scatter
diagram.

No correlation Positive correlation Strong positive correlation

CORRELATION & REGRESSION 1


DCC 3132 - STATISTICS

No correlation Negative correlation Strong negative correlation

No correlation Perfect positive correlation Perfect negative correlation

No correlation No correlation

CORRELATION & REGRESSION 2


DCC 3132 - STATISTICS

EXAMPLE 6.1

The score obtained by 10 students in Mathematics and Physics are shown in the table below.
Students A B C D E F G H I J
Mathematics 20 25 37 43 45 56 70 75 80 90
score
Physics 15 20 30 35 30 43 50 60 70 78
score

Draw a scatter diagram to show the scores obtained by the ten students. From your point of
view, is there a linear relation between Mathematics and Physics scores?

CORRELATION & REGRESSION 3


DCC 3132 - STATISTICS

6.1.2 CORRELATION COEFFICIENT

- The scatter diagram provides a visual idea of the relationship between the bivariate and the
type of linear correlation. Hence, we can determine whether there is a linear correlation
between the bivariate or not.
- However, we need to know the degree of linear correlation between the bivariate.
- Thus, a numerical value can fulfill this need.
- The correlation coefficient is a numerical value which indicates the degree of linear
correlation between the bivariate.
- The correlation coefficient has value between -1 and 1.
- A correlation coefficient of 1 indicates a perfect positive linear, a correlation coefficient of -1
indicates a perfect negative correlation while a zero correlation coefficient indicates no linear
correlation between the bivariate.
- Correlation coefficient for the bivariate X and Y is defined as:

Covariance (X,Y)
𝑟 = √(𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒𝑋 . 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑌)

= Sxy
√(Sxx . Syy)

Where;
Sxy =[ n∑xy – (∑x)(∑y) ] / n2
Sxx = [ n∑x2 – (∑x)2 ] / n2
Syy = [ n∑y2 – (∑y)2 ] / n2

- The following notations are used to rewrite the above formulae:


S’xy = n2Sxy = n∑xy – (∑x)(∑y)
S’xx = n2Sxx = n∑x2 – (∑x)2
S’yy = n2Syy = n∑y2 – (∑y)2

Pearson correlation coefficient, r = S’xy


√ [(S’xx)(S’yy)]

CORRELATION & REGRESSION 4


DCC 3132 - STATISTICS

EXAMPLE 6.2

Find the correlation coefficient for the bivariate x and y.


x 25 47 35 20 37 10 12 42
y 2.4 5.8 3.2 2.5 4.0 0.5 1.8 5.6

SOLUTION
Construct the table of values for x, y, x2, y2 and xy

x y x2 y2 xy

25 2.4 625 5.76 60

47 5.8 2209 33.64 272.6

35 3.2 1225 10.24 112

20 2.5 400 6.25 50

37 4 1369 16 148

10 0.5 100 0.25 5

12 1.8 144 3.24 21.6

42 5.6 1764 31.36 235.2

228 25.8 7836 106.74 904.4

S’xy = n∑xy – (∑x)(∑y) =

S’xx = n∑x2 – (∑x)2 =

S’yy = n∑y2 – (∑y)2 =

Pearson correlation coefficient, r =

CORRELATION & REGRESSION 5


DCC 3132 - STATISTICS

EXAMPLE 6.3

Find the Pearson correlation coefficient for the bivariate x and y


x 32 66 69 68 77 72
y 19 70 32 83 36 84

SOLUTION
Construct the table of values for x, y, x2, y2 and xy

x y x2 y2 xy

32 19 1024 361 608

66 70 4356 4900 4620

69 32 4761 1024 2208

68 83 4624 6889 5644

77 36 5929 1296 2772

72 84 5184 7056 6048

384 324 25878 21526 21900

S’xy = n∑xy – (∑x)(∑y) =

S’xx = n∑x2 – (∑x)2 =

S’yy = n∑y2 – (∑y)2 =

Pearson correlation coefficient, r =

CORRELATION & REGRESSION 6


DCC 3132 - STATISTICS

6.2 REGRESSION
- The scatter diagram shows how close the set of linearly correlated points are distributed
near a straight line.
- The line is known the regression line.

6.2.1 REGRESSION LINE BY GRAPH

- The regression line can be obtained in the following ways:


i. Plot the pairs of points on Cartesian coordinates
ii. Calculate the mean of x and y
iii. Plot the mean point (x,y) on the Cartesian coordinates.
iv. Then draw a line that passes through the mean point.
v. From the graph, get the line intercepts the y-axis, c
vi. The common equation of the regression line is y = mx + c

EXAMPLE 6.4

Consider the values of the bivariate x and y in the table below.


x 2 4 5 6 8 9 10 11
y 1 2 3 3.5 4.5 5 6 7

Find the regression line equation then get the value of y when x = 55, and x when y = 4.2
by using the equation

CORRELATION & REGRESSION 7


DCC 3132 - STATISTICS

6.2.2 REGRESSION LINE BY FORMULAE

I. Regression line y on x

- Regression line y on x  y = A + Bx
- Since,
S’xy = n2Sxy = n∑xy – (∑x)(∑y)
S’xx = n2Sxx = n∑x2 – (∑x)2
- Therefore, B = S’xy / S’xx
A = y – Bx

EXAMPLE 6.5

Find the regression line of y on x for the bivariate x and y that are shown in the table below

x 22 52 20 46 29 24 53 61 30 55

y 76 51 73 42 88 32 26 24 54 37

Construct the table of values for x, y, x2 and xy

x y x2 xy
22 76 484 1672
52 51 2704 2652
20 73 400 1460 x = ∑x/n
46 42 2116 1932
= 392 / 10
29 88 841 2552
24 32 576 768 = 39.2
53 26 2809 1378
61 24 3721 1464 y = ∑y/n
30 54 900 1620 = 503 / 10
55 37 3025 2035
= 50.3
∑x = 392 ∑y = 503 ∑x = 17576 ∑xy = 17533
2

S’xy =
S’xx =
B =

CORRELATION & REGRESSION 8


DCC 3132 - STATISTICS

II. Regression line x on y

- Regression line x on y  x = C + Dy
- Since,
S’xy = n2Sxy = n∑xy – (∑x)(∑y)
S’yy = n2Syy = n∑y2 – (∑y)2
- Therefore, D = S’xy / S’yy
C = x – Dy

EXAMPLE 6.6

Find the regression line of x on y for the bivariate x and y that are shown in the table below

x 73 64 90 83 10 12 10 86 39 35

y 13 27 36 15 76 83 39 42 18 57

Construct the table of values for x, y, y2 and xy


x y y2 xy
73
64
90
83
10
12
10
86
39
35
∑x = 502 ∑y = 406 ∑y2 = 21942 ∑xy = 15617

CORRELATION & REGRESSION 9