Beruflich Dokumente
Kultur Dokumente
CORRELATION
Previously we studied the characteristic of only one variable but there are many situations where
we need to study two variables simultaneously, say x and y. For example the variables may be (i)
income and expenditure, (ii) heights and weights of a group of persons, (iii) price and demand of
the commodity; etc. exhibits movement of two variables.
This reveals the relationship between two variables such that a change in one variable results in a
change in the other, and also a greater change in one leads to a greater change in other so we are
interested to measure numerically the strength of this relationship between the variables. Such
relationship between variables is known as correlation.
MEASURES OF CORRELATION:
CORRELATIO
N
SCATTE
COEFFICIEN
R
T OF
DIAGRA
CORRELATI
QUAN
QUALI
M
ON
TITATI
TATIVE
VE
DATA
DATA
SCATTER DIAGRAM
It is the simplest way of diagrammatic representation on bivariate data. Thus for the bivariate
distribution (x, y), if the values of the variables X and Y are plotted along the x-axis and y-axis
respectively in x-y plane, the diagram of dots so obtained is known as scatter diagram. Through
the scatter diagram one can form fairly good, though vague idea whether the variables are
correlated or not, e.g. if the points are very dense i.e. very close to each other, then there is a
fairly good amount of correlation between the variables and if the points are widely spread , a
poor correlation is expected.
Figure:
Demerits:
r ( x , y ) or simply
r XY
r XY =
Cov (X , Y )
X Y
y
y
( xx )
1
n
r XY =
x
y
1
1
( 2) x 2
( 2) y 2
n
n
1
( xy ) x y
n
,r XY =
=1
6 d 2i
n(n21)
d i +T X +T Y
6
=1
2
m(m 1)
Where, T X =
12
p( p 1)
and T Y =
12
REGRESSION
The word regression is used to denote estimation or prediction of the average value of one
variable for a specified value of the other variable. The estimation is done by means of suitable
equations, derived on the basis of available bivariate data. Such equation is known as regression
equation.
Definition: Regression analysis is a mathematical measure of the average
relationship between two or more variables in terms of original units of the data.
In regression analysis there are two types of variables. The variables whose value is influenced
or is to be predicted is called dependent variable or regressed or explained variable and the
variable which influences the values or it is used for prediction is called independent variable or
regressor or predictor or explanatory variable.
REGRESSION LINES:
There are always two lines of regression one of Y on X and the other of X on Y. The line of
regression of Y on X is used to estimate or predict the value of Y for any given value of X i.e. Y
is a dependent variable and X is an independent variable. The estimate so obtained will be the
best in the sense that it will have the minimum possible error as defined by the principle of least
squares. We can also obtain an estimate of X for any given value of Y by using Y on X but the
estimate so obtained will not be best since it is obtained on minimizing the sum of squares of
errors of estimates in Y and not in X. Hence to estimate or predict X for any given value of Y, we
use the regression equation X on Y. Here X is dependent and Y is independent. Hence vice versa.
REGRESSION EQUATIONS
Regression equations can be obtained by using any of the following method.
a. Based on least square principle
Yon X
y=a+bx
To estimate the constant a and b Normal Equations are used which are obtained using
least square principle
y =na+b x
xy=a x+ b x 2
Xon Y
x=a' +b ' y
To estimate the constant a and b Normal Equations are used which are obtained using
least square principle
x=n a' + b' y xy=a' y+ b ' y 2
b. Based on Mathematical Average
Yon X
( y y ) =b yx ( xx )
Here
b yx
b yx =r
y
x
Xon Y
( xx )=b xy ( y y )
Here
b xy
b xy=r
x
y
The yield of cop depends on rainfall, temperature, amount of fertilizer applied etc.
The weight of person depends on his height as well as his chest measurement;
The yield of dry fiber from jute plants depends on the height of plant, the diameter of
stem etc.
Multiple Regression: To estimate the value of one variable from those of several
others is known as multiple regression.
in terms of
x 1 , x 2x 3
x 2x3
x 1=a+b x 2 +c x 3
Where a, b and c are constants. Applying the method of least squares, the constants are
determined. This equation is usually written as
x
x
x
x 3 )
3
(
( 2x2 )+b13.2
( 1 x 1)=b12.3
x 1 , x 2 , x 3
Where
x1 , x2 , x3
respectively,
b12.3
b13.2
are
x 1 , x 2x 3
is given by
r 212+r 2132r 12 r 13 r 23
2
1r 23
Partial Correlation: To examine the influence of one variable upon the another
after eliminating the effects of all other variables is called partial correlation.
Denoted by
or
x 1x2
r 12r 13 r 23
(1r
2
13
) (1r 223 )
x3
is given by