Sie sind auf Seite 1von 9

# DCC 3132 - STATISTICS

CHAPTER 6
CORRELATION AND REGRESSION

6.1 CORRELATION

## - So far we have considered the statistics of one variable.

- In this chapter, we are going to study whether there is any linear relation between two
independent random variables.
- A linear relation is one where a change in one variable corresponds to a constant change in
another variable.

## 6.1.1 SCATTER DIAGRAM

- The two independent variable in the study of correlationship are also known as bivariate.
- A faster way to determine whether there is a linear correlation between the bivariate is to
represent the two variables by using a scatter diagram. Figure below shown the type of scatter
diagram.

## CORRELATION & REGRESSION 1

DCC 3132 - STATISTICS

## No correlation Perfect positive correlation Perfect negative correlation

No correlation No correlation

## CORRELATION & REGRESSION 2

DCC 3132 - STATISTICS

EXAMPLE 6.1

The score obtained by 10 students in Mathematics and Physics are shown in the table below.
Students A B C D E F G H I J
Mathematics 20 25 37 43 45 56 70 75 80 90
score
Physics 15 20 30 35 30 43 50 60 70 78
score

Draw a scatter diagram to show the scores obtained by the ten students. From your point of
view, is there a linear relation between Mathematics and Physics scores?

## CORRELATION & REGRESSION 3

DCC 3132 - STATISTICS

## 6.1.2 CORRELATION COEFFICIENT

- The scatter diagram provides a visual idea of the relationship between the bivariate and the
type of linear correlation. Hence, we can determine whether there is a linear correlation
between the bivariate or not.
- However, we need to know the degree of linear correlation between the bivariate.
- Thus, a numerical value can fulfill this need.
- The correlation coefficient is a numerical value which indicates the degree of linear
correlation between the bivariate.
- The correlation coefficient has value between -1 and 1.
- A correlation coefficient of 1 indicates a perfect positive linear, a correlation coefficient of -1
indicates a perfect negative correlation while a zero correlation coefficient indicates no linear
correlation between the bivariate.
- Correlation coefficient for the bivariate X and Y is defined as:

Covariance (X,Y)
𝑟 = √(𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒𝑋 . 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑌)

= Sxy
√(Sxx . Syy)

Where;
Sxy =[ n∑xy – (∑x)(∑y) ] / n2
Sxx = [ n∑x2 – (∑x)2 ] / n2
Syy = [ n∑y2 – (∑y)2 ] / n2

## - The following notations are used to rewrite the above formulae:

S’xy = n2Sxy = n∑xy – (∑x)(∑y)
S’xx = n2Sxx = n∑x2 – (∑x)2
S’yy = n2Syy = n∑y2 – (∑y)2

√ [(S’xx)(S’yy)]

## CORRELATION & REGRESSION 4

DCC 3132 - STATISTICS

EXAMPLE 6.2

## Find the correlation coefficient for the bivariate x and y.

x 25 47 35 20 37 10 12 42
y 2.4 5.8 3.2 2.5 4.0 0.5 1.8 5.6

SOLUTION
Construct the table of values for x, y, x2, y2 and xy

x y x2 y2 xy

37 4 1369 16 148

## CORRELATION & REGRESSION 5

DCC 3132 - STATISTICS

EXAMPLE 6.3

## Find the Pearson correlation coefficient for the bivariate x and y

x 32 66 69 68 77 72
y 19 70 32 83 36 84

SOLUTION
Construct the table of values for x, y, x2, y2 and xy

x y x2 y2 xy

## CORRELATION & REGRESSION 6

DCC 3132 - STATISTICS

6.2 REGRESSION
- The scatter diagram shows how close the set of linearly correlated points are distributed
near a straight line.
- The line is known the regression line.

## - The regression line can be obtained in the following ways:

i. Plot the pairs of points on Cartesian coordinates
ii. Calculate the mean of x and y
iii. Plot the mean point (x,y) on the Cartesian coordinates.
iv. Then draw a line that passes through the mean point.
v. From the graph, get the line intercepts the y-axis, c
vi. The common equation of the regression line is y = mx + c

EXAMPLE 6.4

## Consider the values of the bivariate x and y in the table below.

x 2 4 5 6 8 9 10 11
y 1 2 3 3.5 4.5 5 6 7

Find the regression line equation then get the value of y when x = 55, and x when y = 4.2
by using the equation

## CORRELATION & REGRESSION 7

DCC 3132 - STATISTICS

## 6.2.2 REGRESSION LINE BY FORMULAE

I. Regression line y on x

- Regression line y on x  y = A + Bx
- Since,
S’xy = n2Sxy = n∑xy – (∑x)(∑y)
S’xx = n2Sxx = n∑x2 – (∑x)2
- Therefore, B = S’xy / S’xx
A = y – Bx

EXAMPLE 6.5

Find the regression line of y on x for the bivariate x and y that are shown in the table below

x 22 52 20 46 29 24 53 61 30 55

y 76 51 73 42 88 32 26 24 54 37

## Construct the table of values for x, y, x2 and xy

x y x2 xy
22 76 484 1672
52 51 2704 2652
20 73 400 1460 x = ∑x/n
46 42 2116 1932
= 392 / 10
29 88 841 2552
24 32 576 768 = 39.2
53 26 2809 1378
61 24 3721 1464 y = ∑y/n
30 54 900 1620 = 503 / 10
55 37 3025 2035
= 50.3
∑x = 392 ∑y = 503 ∑x = 17576 ∑xy = 17533
2

S’xy =
S’xx =
B =

## CORRELATION & REGRESSION 8

DCC 3132 - STATISTICS

## II. Regression line x on y

- Regression line x on y  x = C + Dy
- Since,
S’xy = n2Sxy = n∑xy – (∑x)(∑y)
S’yy = n2Syy = n∑y2 – (∑y)2
- Therefore, D = S’xy / S’yy
C = x – Dy

EXAMPLE 6.6

Find the regression line of x on y for the bivariate x and y that are shown in the table below

x 73 64 90 83 10 12 10 86 39 35

y 13 27 36 15 76 83 39 42 18 57

## Construct the table of values for x, y, y2 and xy

x y y2 xy
73
64
90
83
10
12
10
86
39
35
∑x = 502 ∑y = 406 ∑y2 = 21942 ∑xy = 15617