Sie sind auf Seite 1von 12

1

Correlation & Regression



Correlation:
Correlation means the relationship between two or more variables so that if the change in variable effects to change
in other variable. Then the are said to be correlated.
Correlation means an analysis to the co variation of two or more variables.
For example, the production of paddy is dependent on the rainfall. Here productions of paddy are considered to be a
dependent variable.

Co-efficient of correlation:
The numerical value by which we measure the strength of linear relationship and direction of two variables
is called co-efficient of correlation.

If x

and y
i
are two variables in a set of observations of size n and their arithmetic mean x and y
respectively. Then the co-efficient of correlation between x and y is denoted by r
xy
and defined as,

r
xy
=
( )
( ) ( )
1 r 1 , xy
2 2
+ s s


x y x x
x x
i i
i


Properties of co-efficient of correlation:

1. Co-efficient of correlation is independent of change of origin and scale of measurement.
2. The value of coefficient of correlation lies between -1 to +1.
3. For two independent variables, true value is zero.
4. Co-efficient of correlation is symmetrical with respect to x and y. i.e. r
xy
= r
yx.

5. Co-efficient of correlation is the geometric mean of the regression co-efficient. i.e. r
xy
=
xy yx
b b
6. It is a unit free pure number.
7. It is much affected by the extreme values.
8. It is based upon all the observations.

Uses of Co-efficient of correlation:
1. To find the relationship between two variables.
2. To find the relationship between dependent variable and combined influence of a group of independent
variables.
3. To obtained the moment of a group of related variables.
4. To solve many problem in biology.
5. In social studies like relationship between crime and educations, correlation analysis has got definite role of
play.
6. Regression and ratio of variables are also used upon the measure of correlation.
7. In economics this is used specially.

Difference between correlation and regression:

Correlation Regression
1. Correlation co-efficient is independent
of change of origin and scale of
measurement.
2. The correlation co-efficient is a pure
number.
3. The correlation coefficient of x and y is
symmetrical with respect to x and y i.e.
r
xy
= r
yx
.
1. Regression coefficients are independent of
change of origin but dependent on scale of
measurement.
2. The regression co-efficient has the same
unit as the dependent variable per unit of the
independent variable.
3. The regression coefficients are not
symmetrical i.e. b
yx =
b
xy
.
2
4. Correlation co-efficient lies between -1
to +1.
5. There may be no-sense correlation
between two variables.
6. By correlation we measure the strength
between two random variables.
4. Regression co-efficient lies between - to
+.
5. There is no such thing non-sense regression.

6. By regression analysis we express the inter
relationship between the dependent
variables and independent variables by
means of an equation.


Difference between correlation co-efficient and regression co-efficient:

Correlation co-efficient Regression co-efficient
1. Correlation coefficients are
independent on change of both origin
and scale.
2. It is a symmetrical function i.e. r
xy
= r
yx.

3. Correlation co-efficient lies between -1
to +1.
4. By correlation we measure the strength
of the inter relationship between two
random variables.
5. It is defined by,
r
xy
=
( )( )
( ) ( )



2 2
y y x x
y y x x
i i
i
i





6.
xy yx
b b r =


7. When r =0 then the variables are called
correlated.
1. Regression co-efficient are independent on
change of origin but not of scale.
2. Here, b
yx =
b
xy.

3. Regression co-efficient lies between - to
+.
4. By regression analysis we express the inter
relationship between the dependent
variables and independent variables by
means of an equation.
5. It is defined by,


( )( )
( )


=
2
x x
y y x x
b
i
i i
yx

and
( )( )
( )


=
2
y y
y y x x
b
i
i i
xy

6. 1
2
s s r b b
xy yx


yx xy
b b s s1 .
7. When r = 0 then two lines of regression are
perpendicular to each other.

Regression:
Regression is the measure of the average relationship between two or more variables in terms of the original units
of data.
The statistical technique by which we can estimate the unknown value of one variable (dependent)
from the known value of another variable is called regression.

Let y
i
is the unknown value of one variable and x
i
is the known value of another variable then the
regression line is, bx a y
i
+ = , where b is the slope of the line and a is constant.

Example: Let us consider the productions of paddy of amount y is dependent on the rainfall of amount x. Here x is
independent variable and y is dependent variable. At first a mathematical relation is established between x and y.
Then by the known values of x, the values of y are determined i.e. for a particular rainfall, the productions of paddy
are determined.

Co-efficientofregression:
The mathematical measures of regression are called the co-efficient of regression. The line of regression of y on x is
also called the co-efficient of regression of y on x. It represents the increment in value of dependent variable y.
3
corresponding to a unit change in the value of independent variable x. Then, the co-efficient of regression is
denoted by
yx
b and it defined as,

( )( )
( )
+ s s

yx
i
i i
yx
b
x x
y y x x
b ,
2


Similarly,
( )( )
( )
+ s s

xy
i
i i
xy
b
x x
y y x x
b ,
2


Properties/Theorem:

1. Co-efficient of correlation is independent of origin and scale of measurement.
Or
uv xy
r r = .

Proof: Let,
i
x and
i
y are two variables in a set of observations of size n and their arithmetic mean is x and
y respectively. Then the co-efficient of correlation between x and y denoted by
xy
r and defined as,


( )( )
( ) ( )
1 1 ,
2 2
+ s s


=

xy
i i
i i
xy
r
y y x x
y y x x
r . (1)




Let, the new variables are,

c
a x
u
i
i

= And
k
b y
v
i
i

= Where, a is origin of x
i

a x c u
i i
= b y k v
i i
= b is the scale of x
i
a c u x
i i
+ = b k v y
i i
+ = and b is the origin of y
i

k is the scale of y
i.


n
na
n
c u
n
x
i i
+ =


n
nb
n
k v
n
y
i i
+ =

Taking the summation
On both sides and
Dividing by n
a u c y + = b v k y + =


From (1) we get,
( )( )
( ) ( )

+ +
+ +
=
2
2
b v k b k v u c a c u
b v k b k v a u c a c u
r
i i
i i
xy

( )( )
( ) ( )
( )( )
( ) ( )
( )( )
( ) ( )
( )( )
( ) ( )



=


=


=


=
2
2
2 2
2 2 2 2
2
2
v v u u
v v u u
v v u u ck
v v u u ck
v v k u u c
v v u u ck
v k k v u v c u
v k k v u c c u
i i
i i
i i
i i
i i
i i
i i
i i

4

uv xy
r r =


2. The correlation co-efficient lies between -1 to +1 or 1 1 + s s
xy
r .

Proof: Let,
i
x and
i
y are two variables in a set of observations of size n and their arithmetic mean is x and
y respectively. Then the co-efficient of correlation between x and y denoted by
xy
r and defined as,
( )( )
( ) ( )
1 1 ,
2 2
+ s s


=

xy
i i
i i
xy
r
y y x x
y y x x
r ..(1)

Let the new variables are,

( )
2
) (

=
x x
x x
u
i
i
i
and
( )

=
2
) (
y y
y y
v
i
i
i


Taking the summation and square on both sides,
( )
( )

=
1
2
2
2
2
i
i
i
i
u
x x
x x
u

( )
( )
1
2
2
2
2
=

i
i
i
i
v
y y
y y
v



Now multiplying the new variables,





From equation (1).Now, we know that any squared term always or
non negative



( )
( )
( )
( )
( ) 0 1 2
0 2 2
0 2 1 1
1 ; 1 | 0 . 2
0
2 2
2
2 2
2
>
>
> +
= = > +
>

xy
xy
xy
i o i i i i
i i
r
r
r
v u v u v u
v u


( )
1
0 1
0 1
>
>
>
xy
xy
xy
r
r
r

Again,
( )( )
( ) ( )
( )( )
( ) ( )

=


=



=
xy i i
i i
i i
i i
i i
i i
i i
r v u
y y x x
y y x x
v u
y y x x
y y x x
v u
2 2
2 2
5
1
1
0 1
s
>
>
xy
xy
xy
r
r
r


1 1 + s s
xy
r (Proved)


3. Prove that correlation co-efficient is the geometric mean of the regression co-efficient
Or,
xy yx xy
b b r = .

Proof: Let,
i
x and
i
y are two variables in a set of observations of size n and their arithmetic mean is x and
y respectively. Then the co-efficient of correlation between x and y is
xy
r and defined as,


( )( )
( ) ( )
1 1 ;
2 2
+ s s


=

xy
i i
i i
xy
r
y y x x
y y x x
r ..(1)
Now, by the definition of regression co-efficient on x is,

( )( )
( )
+ s s

yx
i
i i
yx
b
x x
y y x x
b ;
2
(2)
and the regression co-efficient of x and y is,
( )( )
( )
+ s s

xy
i
i i
xy
b
y y
y y x x
b ;
2
(3)



From equation (1), we get,


( )( )
( ) ( )



=
n y y x x
n y y x x
r
i i
i i
xy
/
/
2 2
| Dividing by total no. of observation n.

( )
( ) ( )
n
y y
n
x x
y x
i i

=
2 2
.
, cov


( )
2 2
.
, cov
y x
y x
o o
=

( )
y x
xy
y x
r
o o .
, cov
=

( )
y x xy
r y x o o . . , cov = ..(4)

from (2) we get,

( )( )
( )


=
n x x
n y y x x
b
i
i i
yx
/
/
2

6

( )
) 1 ( _ |
. .
, cov
2
2
equation by
x
r
x
y x
y x xy
o
o o
o
=
=

x
y xy
yx
r
b
o
o .
= .(5)

From (3) we get,
( )( )
( )


=
n y y
n y y x x
b
i
i i
xy
/
/
2


( )
) 1 ( _ |
. .
, cov
2
2
equation from
y
r
y
y x
y x xy
o
o o
o
=
=


y
x xy
xy
r
b
o
o .
= .(6)
Now, by the definition of geometric mean for two regression coefficients,

y
x xy
x
y xy
xy yx
r r
b b
o
o
o
o .
.
.
=

2
xy
r =

xy
r =

xy yx xy
b b r = (Proved)

4. Prove that, the correlation co-efficient is less then the arithmetic mean of two regression co-efficient.
Or, prove that, for two regressions co-efficient is greater than correlation co-efficient.
Or,
|
|
.
|

\
| +
s
2
xy yx
xy
b b
r


Proof: Let,
i
x and
i
y are two variables in a set of observation of size n and their arithmetic mean is x and y
respectively. Then the co-efficient of correlation between x and y is
xy
r and define as,


( )( )
( ) ( )
1 1 ;
2 2
+ s s


=

xy
i i
i i
xy
r
y y x x
y y x x
r (1)
Now, by the definition of regression co-efficient y on x is,

( )( )
( )
+ s s

yx
i
i i
yx
b
x x
y y x x
b ;
2
. (2)
and the regression co-efficient of x on y is,

( )( )
( )
+ s s

xy
i
i i
xy
b
y y
y y x x
b ;
2
.. (3)
from equation (1), we get

7

( )( )
( ) ( )



=
n y y x x
n y y x x
r
i i
i i
xy
/
/
2 2
| Dividing by total no. of observation n.

( )
( ) ( )
n
y y
n
x x
y x
i i

=
2 2
.
, cov


( )
2 2
.
, cov
y x
y x
o o
=
( )
y x
xy
y x
r
o o .
, cov
=
( )
y x xy
r y x o o . . , cov = . (4)

from (2) we get,


( )( )
( )


=
n x x
n y y x x
b
i
i i
yx
/
/
2


( )
2
2
. .
, cov
x
r
x
y x
y x xy
o
o o
o
=
=

x
y xy
yx
r
b
o
o .
= . (5)
from (3) we get,


( )( )
( )


=
n y y
n y y x x
b
i
i i
xy
/
/
2


( )
2
2
. .
, cov
y
r
y
y x
y x xy
o
o o
o
=
=


y
x xy
xy
r
b
o
o .
= ..(4)

We know that the square term is always positive or non-negative
( ) 0 >
y x
o o
( )
( )
y x y x
y x y x
o o o o
o o o o
. . 2
0 . . 2
2 2
2 2
> +
> +

y x
y x
y x
y
y x
x
o o
o o
o o
o
o o
o
2
2
2
>
|
|
.
|

\
|
+ | Dividing both sides by
y x
o o
2 >
|
|
.
|

\
|
+
x
y
y
x
o
o
o
o


8
xy
x
y xy
y
x xy
r
r r
2
. .
>
|
|
.
|

\
|
+
o
o
o
o
| Multiplying by
xy
r 2

( )
xy yx xy
r b b 2 > +
|
|
.
|

\
| +
s
>
|
|
.
|

\
| +

2
2
xy yx
xy
xy
xy yx
b b
r
r
b b

(Proved)


Rank Correlation:
In some situation, it is difficult to measure the values of the variables from bivariate distribution
numerically, but they can be ranked. The correlation co-efficient between these two rank is usually called rank
correlation and its numerical measures is called rank correlation co-efficient, given by Spearman (1904).

Let
i
x and
i
y are two ranks in a set of observations and their sequences are
n
x x x ... .......... ,
2 1
and
n
y y y .. .......... ,
2 1
respectively. Then the rank correlation co-efficient is denoted by and define as,



( )
(
(

=

1
6
1
2
2
n n
d
i

Where,
i
d is the deviations of
i
x from
i
y such that ( )
i i i
y x d = and small n is the total no. of
observation.


Example: beauty contest, honesty, intelligence, merit test etc.


5. Derivation of rank correlation coefficients:

( )
(
(

=

1
6
1
2
2
n n
d
i


Proof: Let,
i
x and
i
y are two ranks of i
th
individual of two characteristics, assuming that no two
individuals are award the same rank and their sequences are
n
x x x ... .......... ,
2 1
and
n
y y y .. .......... ,
2 1
respectively
and each of the variable x and y takes the same values 1,2,.n.

We know that for first n-natural number the arithmetic mean
2
1 +
= =
n
y x and variance,
12
1
2
2 2

= =
n
y x
o o
Let, the distance of two ranks,

i i i
y x d =
( )
i i i
y x x x d + =
( ) ( ) y y x x
i i
= | y x =
9
( ) ( ) { }
2 2
y y x x d
i i i
= | taking square on both sides
( ) ( ) ( )( ) y y x x y y x x d
i i i i i
+ = 2
2 2 2


Now, taking summation on both sides and dividing by total no. observation n.
( ) ( ) ( )( )
n
y y x x
n
y y
n
x x
n
d
i i i i i

=

2
2 2
2

( ) y x
y x
, cov . 2
2 2
+ = o o
( )
n
d
y x
i
y x

+ =
2
2 2
, cov . 2 o o

n
d
n n
i

=
2
2 2
12
1
12
1


n
d
n n
i

+
=
2
2 2
12
1 1


( )
n
d
n
n
d
n
i
i

=
2
2
2
2
12
1 2
12
2 2

( )
n
d
n
y x
i

=
2
2
6
1
, cov . 2
( )
n
d
n
y x
i
2 12
1
, cov
2
2

= ..(1)

We know that the correlation co-efficient between two rank is usually called rank correlation co-efficient.

( )( )
( ) ( )
1 1 ;
/
/
2 2
+ s s


=

xy
i i
i i
xy
r
n y y x x
n y y x x
r

( )
( ) ( )
n
y y
n
x x
y x
i i

=
2 2
.
, cov


( )
2 2
.
, cov
y x
y x
o o
=
( )
y x
xy
y x
r
o o .
, cov
= |
y x
o o =

( )
2
, cov
x
y x
o
=

12
1
2 12
1
2
2
2

=

n
n
d
n
i

10

(
(
(
(

=

12
1
2
1
2
2
n
n
d
i


(
(

=

1
12
.
2
1
2
2
n n
d
i


( )
(
(

=

1
6
1
2
2
n n
d
i

But the rank correlation co-efficient is denoted by

( )
(
(

=

1
6
1
2
2
n n
d
i

(Proved)



6. Regression co-efficient are independent of change of origin but dependent on scale of measurement.
Or,
uv xy
b b = or
vu yx
b b = .


Proof: The regression co-efficient of y on x is
yx
b

( )( )
( )
+ s s

yx
i
i i
yx
b
x x
y y x x
b ;
2
.(1)
Let, the new variables are,

c
a x
u
i
i

= ,
d
b y
v
i
i

= Where, a is origin of
i
x
d is scale of
i
x
and b is scale of
i
x
a u c x
n
na
n
c u
n
x
a c u x
c u a x
i i
i i
i i
+ =
+ =
+ =
=


b v d y
n
nb
n
d v
n
y
b d v y
d v b y
i i
i i
i i
+ =
+ =
+ =
=




from (1) we get,


( )( )
( )

+
+ +
=
2
a u c a c u
b v d b d v a u c a c u
b
i
i i
yx


( )( )
( )


=
2
u c c u
v d d v u c c u
i
i i

( )( )
( )


=
2 2
u u c
v v u u cd
b
i
i i
yx

11

( )( )
( )


=
2
u u c
v v u u d
i
i i






Similarly, the regression co-efficient of x and y is
xy
b

( )( )
( )


=
2
y y
y y x x
b
i
i i
xy


( )( )
( )

+
+ +
=
2
b v d b d v
b v d b d v a u c a c u
i
i i


( )( )
( )


=
2
v d d v
v d d v u c c u
i
i i


( )( )
( )


=
2 2
v v d
v v u u cd
i
i i






(Proved)
Types of correlation:
1. Simple correlation (the relation between two variables)
2. Partial correlation
3. Multiple correlation (the relation between two or more variables)

Simple correlation 1. Positive correlation
2. Negative correlation
3. Zero correlation
4. Linear and non-linear correlation.

1. Positive correlation:-
If the increase (decrease) in one variable results in the corresponding increase (decrease) in the other
i.e. if the changes are in same direction, the variables are positively correlated.

For example In school, students height is increase with respect to their age.

2. Negative correlation:-
If the increase (decrease) in one variable results in the corresponding decrease (increase) in the other
then the variables are negatively correlated.

For example If the price of a commodity increase then the demand decreases.

3. Zero correlation:-
If the one variable increase (decrease) but the other variable remains same then they are not
correlated
For example In University, Students age is increase but their height remains same.
vu yx
b
c
d
b . =
uv xy
b
c
d
b . =
12

Partial correlation:
In partial correlation we measure the relationship between a dependent and one particular variable when all
other variables remain constant.


Q. What are the properties of regression co-efficient?

Ans: 1. Co-efficient of correlation is the geometric mean of the regression coefficients. i.e.
xy yx xy
b b r =

2. Arithmetic mean of the regression coefficients is greater than the correlation co-efficient. i.e.
|
|
.
|

\
| +
s
2
xy yx
xy
b b
r .
3. Regression coefficients are independent of change of origin but not of scale.



Q. What are the uses of regression?

Ans: 1.Regression helps us to estimate the value of dependent variable from the values of independent variable.

2. with the help of regression analysis. We can obtain the measure of correlation that exists between the two
variables.

Das könnte Ihnen auch gefallen