Sie sind auf Seite 1von 6

UNIT III

CORRELATION
Previously we studied the characteristic of only one variable but there are many situations where
we need to study two variables simultaneously, say x and y. For example the variables may be (i)
income and expenditure, (ii) heights and weights of a group of persons, (iii) price and demand of
the commodity; etc. exhibits movement of two variables.
This reveals the relationship between two variables such that a change in one variable results in a
change in the other, and also a greater change in one leads to a greater change in other so we are
interested to measure numerically the strength of this relationship between the variables. Such
relationship between variables is known as correlation.
MEASURES OF CORRELATION:

CORRELATIO
N
SCATTE
COEFFICIEN
R
T OF
DIAGRA
CORRELATI
QUAN
QUALI
M
ON
TITATI
TATIVE
VE
DATA
DATA
SCATTER DIAGRAM
It is the simplest way of diagrammatic representation on bivariate data. Thus for the bivariate
distribution (x, y), if the values of the variables X and Y are plotted along the x-axis and y-axis
respectively in x-y plane, the diagram of dots so obtained is known as scatter diagram. Through
the scatter diagram one can form fairly good, though vague idea whether the variables are
correlated or not, e.g. if the points are very dense i.e. very close to each other, then there is a
fairly good amount of correlation between the variables and if the points are widely spread , a
poor correlation is expected.

Figure:

Demerits:

Scatter diagram showing different types and degree of


correlation

1. Not suitable if the number of observations is large.


2. It gives a vague idea and not the amount of relationship.
QUANTITATIVE DATA
a. COEFFICIENT OF CORRELATION (or Product moment correlation coefficient)
The degree of linear relationship between two variables x and y is known as coefficient of
correlation and it was given by Karl Pearson so it is also known as Karl Pearsons coefficient of
correlation.
Correlation coefficient between two variables X and Y, usually denoted by

r ( x , y ) or simply

r XY

, is a numerical measure of linear relationship between them and is defined as:

r XY =

Cov (X , Y )
X Y

y
y

( xx )
1

n
r XY =
x
y

1
1
( 2) x 2
( 2) y 2
n
n

1
( xy ) x y
n
,r XY =

Properties of Correlation Coefficient:


1. The correlation coefficient r

is independent of change of origin and scale.

2. The correlation coefficient r is a pure number and is independent of units of


measurement. This means that if the x represents height in inches and y weight in kgs
then the correlation coefficient between x and y will neither be in inches nor in kgs, but
only a number.
3. The correlation coefficient r lies between -1 and +1; i.e. r cannot exceed 1
numerically.
1 r +1
b. QUALITATIVE DATA
RANK CORRELATION (Or Spearmans rank correlation)
The product moment correlation coefficient ( r ) is calculated by using values of the
variables. But many situations arise in which either precise measurement is not available,
or the characters cannot be measured at all. For example, if we consider the relation
between intelligence and beauty, it is not necessary that beautiful individual is intelligent
also. But this method is open to many objections and an exact measurement of two
qualities is not at all possible. Such problems are resolved by arranging individuals in
order of merit or proficiency in the possession of qualities called as rank. This method
was given by Spearman hence it is called as Spearmans rank correlation coefficient
()

=1

6 d 2i
n(n21)

, where d i=Rank of x irank of y i

The limit of Rank correlation also lies between -1 and +1.


TIED RANKS
If some of the individuals receive the same rank in a ranking of merit, they are said to be
tied. Then the formula of tied rank is given as:
2

d i +T X +T Y

6
=1
2

m(m 1)
Where, T X =
12

p( p 1)
and T Y =
12

REGRESSION

The word regression is used to denote estimation or prediction of the average value of one
variable for a specified value of the other variable. The estimation is done by means of suitable
equations, derived on the basis of available bivariate data. Such equation is known as regression
equation.
Definition: Regression analysis is a mathematical measure of the average
relationship between two or more variables in terms of original units of the data.

In regression analysis there are two types of variables. The variables whose value is influenced
or is to be predicted is called dependent variable or regressed or explained variable and the
variable which influences the values or it is used for prediction is called independent variable or
regressor or predictor or explanatory variable.
REGRESSION LINES:
There are always two lines of regression one of Y on X and the other of X on Y. The line of
regression of Y on X is used to estimate or predict the value of Y for any given value of X i.e. Y
is a dependent variable and X is an independent variable. The estimate so obtained will be the
best in the sense that it will have the minimum possible error as defined by the principle of least
squares. We can also obtain an estimate of X for any given value of Y by using Y on X but the
estimate so obtained will not be best since it is obtained on minimizing the sum of squares of
errors of estimates in Y and not in X. Hence to estimate or predict X for any given value of Y, we
use the regression equation X on Y. Here X is dependent and Y is independent. Hence vice versa.
REGRESSION EQUATIONS
Regression equations can be obtained by using any of the following method.
a. Based on least square principle
Yon X
y=a+bx

To estimate the constant a and b Normal Equations are used which are obtained using
least square principle
y =na+b x

xy=a x+ b x 2
Xon Y
x=a' +b ' y
To estimate the constant a and b Normal Equations are used which are obtained using
least square principle
x=n a' + b' y xy=a' y+ b ' y 2
b. Based on Mathematical Average

Yon X

( y y ) =b yx ( xx )
Here

b yx

b yx =r

is called regression coefficient Y on X and is given by

y
x

Xon Y

( xx )=b xy ( y y )
Here

b xy

b xy=r

is called regression coefficient X on Y and is given by

x
y

PROPERTIES OF REGRESSION COEFFICIENTS


1. The coefficient of correlation is the geometric mean of the coefficients of regression.
2. The arithmetic mean of the coefficients of regression is greater than the coefficient of
correlation.
3. The numerical values of both the coefficients of regression cannot be greater than
unity simultaneously.
4. The covariance, the coefficient of correlation and the two regression coefficients have
the same sign.
5. The regression lines always intersect at the same point ( x , y .
6. The angle between the two regression lines depends on the correlation coefficient r.
when r= 0, the two lines are perpendicular to each other i.e. =90 ; when r= +1
or r= -1, they coincide i.e. =0 180 .

MULTIPLE AND PARTIAL CORRELATION & REGRESSION


In bivariate distribution, we know that when the values of one variable are influenced by those of
another, the simple correlation coefficient r provides a measure of degree of relationship between
two variables and estimates of the value of the variable are obtained from the appropriate linear
regression equation. But very often it happens that the values of a variable are influenced not
only by another single variable, but by several others. For example
i.
ii.
iii.

The yield of cop depends on rainfall, temperature, amount of fertilizer applied etc.
The weight of person depends on his height as well as his chest measurement;
The yield of dry fiber from jute plants depends on the height of plant, the diameter of
stem etc.

Multiple Regression: To estimate the value of one variable from those of several
others is known as multiple regression.

Let us consider the case of three variables


x1

can be used for estimating

in terms of

x 1 , x 2x 3
x 2x3

. The simplest kind of relationship which


is an equation of the form

x 1=a+b x 2 +c x 3
Where a, b and c are constants. Applying the method of least squares, the constants are
determined. This equation is usually written as
x
x
x
x 3 )

3
(
( 2x2 )+b13.2
( 1 x 1)=b12.3

x 1 , x 2 , x 3

Where

are the means of variables

x1 , x2 , x3

respectively,

b12.3

b13.2

are

the partial regression coefficients.


Multiple Correlation: to measure the extent of combined influence of a group of
variables upon another variables is known as multiple correlation. Denoted by
or

Multiple correlation coefficients for three variables


R1.23=

x 1 , x 2x 3

is given by

r 212+r 2132r 12 r 13 r 23
2

1r 23

It is always considered as positive


0 R 1.23 1

Partial Correlation: To examine the influence of one variable upon the another
after eliminating the effects of all other variables is called partial correlation.
Denoted by

or

Partial correlation coefficients between


r 12.3=

x 1x2

eliminating the effect of

r 12r 13 r 23

(1r

2
13

) (1r 223 )

The partial correlation coefficient lies between -1 and +1.

x3

is given by

Das könnte Ihnen auch gefallen