Sie sind auf Seite 1von 20

Chapter 4

Correlation and Regression

Agenda
Objective
Introduction
r-Pearsons Coefficient of Correlation
Spearmans Coefficient of Rank Correlation
Linear Regression
The End

Objective
At the end of this chapter, you should be able to
1. Calculate r-Pearsons coefficient of correlation and
determine its type.
2. Calculate
Spearmans
coefficient
correlation and determine its type

of

rank

3. Find the regression line and estimate y given x.

Introduction
In this chapter, you are going to study the
relationship between two variables in
mathematical form by understanding the
concept of correlation and regression.
For coefficient of correlation, youll learn
two methods of calculation:
i.

r-Pearson correlation of coefficient

ii.

Spearman rank correlation of coefficient.

For regression, youll learn how to obtain a


straight line equation, called regression
equation using least square method.

Introduction Cont
Once a set of data of two variables has been
collected, suppose the data is about height (xvalue)
and weight (yvalue) of adults, one would want to see
if there is any relationship between the two variables.
By plotting the points on a graph (Figure 1), called a
Scatter diagram, we can see the distribution pattern
of the data. By analyzing the pattern, one can deduce
whether theres any relationship between the
variables, and if the relationship exists, whether it is
a positive linear, negative linear or a non-linear
relationship.

Figure 1 : Scatter

Introduction Cont
A scatter diagram can only show whether any
correlation exists and its type but to measure the
strength of relationship between the two variables, a
statistic called
__________________________is
used. the coefficient of
There are two methods to calculate
correlation:
1. r-Pearson correlation of coefficient
2. Spearman's coefficient of rank correlation

Introduction Cont
A scatter diagram can only show whether any
correlation exists and its type but to measure the
strength of relationship between the two variables, a
statistic called
coefficient of
correlation
used.
There are twoismethods
to calculate the coefficient of
correlation:
1. r-Pearson correlation of coefficient
2. Spearman's coefficient of rank correlation

The values of coefficient of correlation lie


between 1.0 and 1.0.

A value that is very close to 1.0 means the two

variables have a strong negative relationship. A


negative correlation means as the yvalue
increases, xvalue decreases.

A value that is very close to 1.0 means the two

variables have a strong positive relationship. A


positive correlation means as the yvalue
increases, xvalue also increases.
12/01/15

Copyright UniKL 2005

Introduction Cont

Figure 2: Scatter diagrams for various value of r

r-Pearsons Coefficient of Correlation


This method is used to measure the strength of
quantitative data only. It is also suitable if there is
no extreme (too small or too big) value. The
formula is,

Example 1
Calculate the rPearsons coefficient of correlation of the
following data. Determine the type of coefficient of correlation
obtained.

Spearmans Coefficient of Rank


Correlation
The
Spearman's coefficient of rank correlation measures

linear
or non- linear relationship between the two variables. It can also
be used to measure the degree of relationship for qualitative
variable. The formula is

Example 2
Calculate the Spearmans coefficient of rank correlation for the
following data. Then, determine its type.

Solution:
Rank x from the smallest (rx =1) to the largest x. If there are two x
of equal values, then the rank for each is the average of the two
ranking. There are 3 values of x = 107, thus the ranking are 3, 4
and 5. To assign the correct rank for x = 107, obtain the average
rank for the three values of x. Hence,

Solution:

Linear Regression
When two variables have strong
relationship between them, then we
can estimate the value of one variable
given the value of the other variable.
The method used for estimation is
Least Square Regression line. The line
is in the form of y = ax + b. In this
equation y is called the _______________
while x is the __________________. a is
the gradient of the straight line and b
is the y-intercept.

Linear Regression
When two variables have strong
relationship between them, then we can
estimate the value of one variable given
the value of the other variable. The
method used for estimation is Least
Square Regression line. The line is in the
form of y = ax + b.
In this equation y is called the
dependent variable while x is the
independent variable.
a is the gradient of the straight line and
b is the y-intercept.

Least Square Linear


Regression
The least square linear

regression line is the


best fit line on the
scatter diagram. It is
the line with the least
distance or nearest to
all plotted points.
The equation of the
regression line y on x
is : y = a x + b where

Example 5-3
Find the least square Linear Regression y and x for the following
data. Then estimate the value of y if x = 5.

Solution:

TRY
Refer to the table,
a. Find the least squares
regression line for the data on
incomes and food
expenditures of seven
households given in table
above. Use income as an
independent variables and
food expenditure as a
dependent variable.
[a = 0.2642, b = 1.1414 ; y =0.2642x
=0.2642x + 1.1414]

b. Calculate the correlation

coefficient for the example on


income and food expenditures
of seven households.
[r = 0.96 (strong positive correlation) ]

Websites:
http://www.public.iastate.edu/~dnett/
S401/ncorrelation.pdf

12/01/15

Copyright UniKL 2005

20

Das könnte Ihnen auch gefallen