You are on page 1of 65

Welcome

. S. B. Bhattacharjee

Ch 8_1

. S. B. Bhattacharjee

Ch 8_2

What does the term Regression literally


mean?
The term regressions literally mean stepping back
towards the average.

. S. B. Bhattacharjee

Ch 8_3

What is Regression?
Regression is a statistical tool to estimate (or predict)
the unknown values of one variable from known values
of another variable.
Example: If we know that advertising and sales are
correlated, we may find out the expected amount of
sales for a given advertising expenditure or the amount
of expenditure for achieving a fixed sales target.

. S. B. Bhattacharjee

Ch 8_4

What is the utility of studying?


The regression analysis is a branch of statistical theory
that is widely used in almost all the scientific disciplines.
In economics it is the basic technique for measuring or
estimating the relationship among economic variables
that constitute the essence of economic theory and
economic life.
Example: If two variables price (X) and demand (Y) are
closely related one can find out the most probable value
of X for a given value of Y or the most probable value of
Y for a given value of X.
Similarly, if the amount of tax and the rise in the price of
a commodity are closely related, it is possible find out
the expected price for a certain amount of tax levy.
. S. B. Bhattacharjee

Ch 8_5

Distinguish between correlation and


regression analysis?
The points of difference between correlation and
regression analysis are:
While correlation co-efficient is a measure of degree
of relationship between X and Y, the regression
analysis helps study the nature of relationship
between the variables.
The cause and effect relation is more clearly indicated
through regression analysis than by correlation.
Correlation is merely a tool of ascertaining the degree
of relationship between two variables and, therefore,
one can not say that one variable is the cause and the
other the effect, while regression shows the extent of
dependence of one variable on another.

. S. B. Bhattacharjee

Ch 8_6

What are Regression Lines?


If we take the case of two variables X and Y, we shall
have two regression lines as regression line of X on Y
and the regression line of Y on X.
The regression line of Y on X gives most probable
values of Y for given values of X and the regression
line of X on Y gives most probable values of X for
given values of Y. Thus we have two regression lines.

. S. B. Bhattacharjee

Ch 8_7

Under what conditions can there be one


regression line?
When there is either perfect positive or perfect
negative correlation between the two variables, the
two correlation lines will coincide i.e. we will have
one line.
The further the two regression lines are from each
other, the lesser is the degree of correlation and the
nearer the two regression lines to each other, the
higher is the degree of correlation. If variables are
independent, r is zero and the lines of regression are
at right angles i.e. parallel to X axis and Y axis.

. S. B. Bhattacharjee

Ch 8_8

What are regression equations?


Regression Equations are algebraic expressions of
regression lines. Since there are two regression lines
there are two regression equationsthe regression
equation of X on Y is used to describe the variations in
the values of X for given change in Y and the regression
equation of Y on X is used to describe the variation in
the values of Y for given changes in X.

. S. B. Bhattacharjee

Ch 8_9

Why is the line of best fit?


The line of regression is the line which gives the best
estimate to the value of one variable for any specific
value of other variable. Thus the line of regression is the
line of best fit and is obtained by the principles of
least squares.

. S. B. Bhattacharjee

Ch 8_10

What is the general form of the regression


equation of Y on X?
The general form of the linear regression equation of Y
on X is expressed as follows:
Ye = a+bX, where
Ye = dependent variable to be estimated
X = independent variable.
In this equation a and b are two unknown constants
(fixed numerical values) which determine the position
of the line completely. The constants are called the
parameters of the line. If the value of either one or both
of them are changed, another line is determined.
. S. B. Bhattacharjee

Ch 8_11

What is the general form of the regression


equation of Y on X?
The parameter a determines the level of the fitted line
(i.e. the distance of the line directly, above or below the
origin). The parameter b determines the slope of the
line i.e. the change in Y for unit change in X.
To determine the values of a and b, the following two
normal equations are to be solved simultaneously.
Y=Na+bX, XY=aX +bX2.

. S. B. Bhattacharjee

Ch 8_12

What is the general form of the regression


equation of X on Y?
The general form of the regression equation of X on Y
is expressed as follows:
X = a+bY
To determine the values of a and b, the following
two normal equations are to be solved simultaneously.
X = Na+bY, XY = aY+bY2.

Continued
. S. B. Bhattacharjee

Ch 8_13

What is the general form of the regression


equation of X on Y?
These equations are usually called the normal
equations. In the equations X, Y, XY, X2, indicate
totals which are computed from the observed pairs of
values of the two variables X and Y to which the least
squares estimating line is to be fitted and N is the
total number of observed pairs of values.
The geometrical presentation of the linear equation , Y
= a+bX is shown in the diagram below:

. S. B. Bhattacharjee

Ch 8_14

Y
X
b
a+
=
b Y

1 UNIT IN X

. S. B. Bhattacharjee

Ch 8_15

It is clear from this diagram, the height of the line


tells the average value of Y at a fixed value of X.
When X=0, the average value of Y is equal to a . The
value of a is called the Y- intercept since it is the
point at which the straight line crosses the Y- axis.
The slope of the line is measured by b, which gives
the average amount of change of Y per unit change
in the value of X. The sign of b also indicates the
type of relationship between Y and X.

. S. B. Bhattacharjee

Ch 8_16

How are the values of a and b obtained


to determine a regression completely?
The values of a and b are obtained by the method
of least squares which states that the line should be
drawn through the plotted points in such a manner
that the sum of the squares of the vertical deviations
of the actual Y values from the estimated Y values is
the least, or in other words, in order to obtain a line
which fits the points best, (YYe)2 should be minimum.
Such a line is known as line of best fit.

. S. B. Bhattacharjee

Ch 8_17

How can the normal equations be


arrived at?

Continued
. S. B. Bhattacharjee

Ch 8_18

Let S = (Y Ye)2 = (Y a bX)2

Ye = a+bX)

Differentiating partially with respect to a and b,

Or ,

Y a bX 1

Y a bX X 0

Y a bX 0............... 1
Y a bX X 0............ 2

From 1 , we have

1 , Y

a bX
Na b X

From 2 , we have

YX aX bX 2
XY a X b X 2
. S. B. Bhattacharjee

Continued

Ch 8_19

Example:
The following data give the hardness (X) and tensile
strength (Y) of 7 samples of metal in certain units. Find
the linear regression equation of Y on X.
X:

146 152 158 164 170 176 182

Y:

65

78

77

89

82

85

86

Solution:
Regression equation of Y on X is given by
Y = a+bX
The normal equations are:
Y = Na + bX..
(1)
= aX+
. S. B.XY
Bhattacharjee

bX2
(2) Ch 8_20
Continued

Calculation of regression equations


X

X2

Y2

XY

146

75

21316

5625

10950

152

78

23104

6084

11856

158

77

24964

5929

12166

164

89

26896

7921

14596

170

82

28900

6724

13940

176

85

30976

7225

14960

182

86

33124

7396

15652

1148
(=X)

572
(=Y)

189280
(=X2)

46904
(=Y2)

94120
(=XY)

. S. B. Bhattacharjee

Continued..

Ch 8_21

Here, N = 7
Substituting the values in equations (1) and (2), we
get
572 =7a +1148 b . (5)
94120 = 1148 a +189280 b (4)
Multiplying the equation (3) by 164, we get
93808 =1148 a +188272 b ..(5)
Subtracting this equation from (4), we get
b = 0.31
Putting this value of b in equation (3), we have
. S. B. Bhattacharjee

Continued..

Ch 8_22

572 7a 1148 0 31
572 7 a 355 88
7a 572 355 88
7a 216 12
216 12
a
30 87
7

The linear regression equation of Y on X is


Y = 30.87 + 0.31X

. S. B. Bhattacharjee

Ch 8_23

Calculate the regression equations of X on Y


and Y on X from the following data
X:

Y:

Solution:
X

X2

Y2

XY

25

10

16

64

32

25

49

35

X=15

Y=25

. S. B. Bhattacharjee

X2 = 55

Y2 =151

Continued..

XY = 88
Ch 8_24

Regression equation of X on Y is given by X= a +bY.


The normal equations are X = Na+bY
and XY=aY+bY2
Substituting the values, we get
15 =5a+25b .. (1)
88= 25a+151b (2)
Solving (1) and (2), we get =0.5 and b =0.5

. S. B. Bhattacharjee

Ch 8_25

Hence, the regression equation of X on Y is given by


X = 0.5+ 0.5Y
The Regression equation of Y on X is : Y = a+bX
The normal equations are
Y= Na+ bX
XY = aX + bX2
Substituting the values, we get
25 = 5a +15b (iii)
88 = 15 a + 55b (iv)
Solving (iii) and (iv), we get
a = 1.10 and b = 1.3
Hence, the regression equation of Y on X is given by
Y =1.10 +1.30X
. S. B. Bhattacharjee

Continued..

Ch 8_26

What will be the forms of regression equation of X on Y


and regression equation of Y on X on the basis of
deviations taken from arithmetic Means of X and Y?
If we take the deviations of X and Y series from their
respective means, the regression equation of Y on X will
take the form
The value byx can be easily obtained as follows:

Y Y b yx X X

The two normal equations in terms of x and y will then


become
xy

b yx

,
2
x

. S. B. Bhattacharjee

where, x X X and y Y Y .......(1)

y Na b x........... 1
2
xy

a
x

b
x

.... 2

Ch 8_27

Since x =y =0 (deviations being taken from means)


Equation (1) reduces to
Na = 0 a = 0
Equation (2) reduces to
2
xy

b
x

b or b yx

. S. B. Bhattacharjee

xy

2
x

Ch 8_28

After obtaining the value of byx , the regression


equation can easily be written in terms of X and Y by
substituting for y, Y Y and for x, X X .

Similarly, the regression equations X= a+bY is reduced


to X X
= b xy Y Y and the value of b xy can be
xy
similarly obtained as

. S. B. Bhattacharjee

bxy
2
y

Ch 8_29

Example:
The following data give the hardness (X) and tensile
strength (Y) 7 samples of metal in certain units. Find
the linear regression equation of Y on X.
X:

146 152

158 164 170 176 182

Y:

75

77

78

89

82

85

86

Continued..
. S. B. Bhattacharjee

Ch 8_30

Hardness Strength
(X)
(Y)

Y
X(x) X Y (y)

x2

y2

xy

146

75

- 18

- 6.7142

324

45.08

120.86

152

78

- 12

-3.7122

144

13.80

44.57

158

77

-6

- 4.7142

36

22.22

28.29

164

89

7.2858

53.08

170

82

0.2853

36

0.08

1.71

176

85

12

3.2858

144

10.80

39.43

182

86

18

4.2858

324

18.37

77.14

N=7

X=1148 Y=572

. S. B. Bhattacharjee

x=0

y=0

x2=
1008

Continued..

y2
=163.43

xy
=312

Ch 8_31

1148
X
164
7
Y 572

81 71
N
7
xy 312

b yx

0 309 0 31
2
x 1008

the linear regression of Y on X is given as


Y Y b yx X X

Y Y 0 31 X 164
Y 81 71 0 31X 50 84 81 71
Y 0 31X 50 84 81 71
Y 0 31X 30 87
i.c. Y 30 87 0 31X
. S. B. Bhattacharjee

Ch 8_32

What are regression coefficients?


The quantity b in the regression equations (Y= a+bX
and X = a+bY) is called the regression coefficient
or slope co-efficient. Since there are two regression
equations, therefore, there are two regression coefficients regression coefficient
of X on Y and
regression coefficient of Y on X.

. S. B. Bhattacharjee

Ch 8_33

What is regression coefficient of X on Y?


The regression coefficient of X on Y is represented by the
symbol bxy or b1 . It measures the amount of change in X
corresponding to a unit change in Y. The regression co
x
efficient of X on Y is given by
bxy r.
y
When deviations are taken from the means of X and Y, the
regression coefficient is obtained by

xy

b xy
2
y

Continued..
. S. B. Bhattacharjee

Ch 8_34

What is regression coefficient of X on Y?


When deviations are taken from assumed means, the
value of bxy is obtained as follows:

bxy

. S. B. Bhattacharjee

N dx dy dx dy
N d y dy
2

Ch 8_35

What are the assumptions of constructing a


regression modal?
The value of the dependent variable, Y, is dependent in
some degree upon the value of the independent variable,
X. The dependent variable is assumed to be a random
variable, but the values of X are assumed to be fixed
quantities that are selected and controlled by the
experimenter. The requirement that the independent
variables assumes fixed values, however, is not a critical
one. Useful results can still be obtained by regression
analysis in the case where both X and Y are random
variables.
The average relationship between X and Y can be
adequately described by a linear equation Y= a+bX whose
geometrical presentation is a straight line.
. S. B. Bhattacharjee

Ch 8_36

The height of the line tells the average value of Y at a


fixed value of X. When X= 0, the average value of Y is
equal to a. The value of a is called the Y intercept,
since it is the point at which the straight line crosses
the Y-axis. The slope of the line is measured by b,
which gives the average amount of change of Y per
unit change in the value of X. The sign of b also
indicates the type of relationship between Y and X.
Associated with each value of X there is a sub
population of Y. The distribution of the sub
population may be assumed to be normal or non
specified in the sense that it is unknown. In any
event, the distribution of each population Y is
conditional to the value of X.
. S. B. Bhattacharjee

Ch 8_37

The mean of each sub- population Y is called the


expected value of Y for a given X: E Y yx.
X

Furthermore, under the assumption of a linear


relationship between X and Y, all
Y
values of
E
or yx must fall on a straight line.
X
This is
Y
E
yx
X

a bX

Which is the population regression equation for our


bivariate linear model. In this equation a and b are
called the population regression co efficient.
An individual value in each sub-population Y, may be
expressed as:

X
Y E e
Y

. S. B. Bhattacharjee

Ch 8_38

Where e is the deviation of a particular value of Y


from yx and is called the error term or the stochastic
disturbance term. The errors are assumed to be
independent random variables because Ys are
random variables and independent. The expectations
of these errors are zero; E(e) = 0. Moreover, if Ys are
normal variables, the error can also be assumed to be
normal.
It is assumed that the variances of all sub
populations, called variances of the regression, are
identical.

. S. B. Bhattacharjee

Ch 8_39

What is regression coefficient of Y on X ?


The regression Coefficient of Y on X is represented by
byx or b2. It measures the amount of change in Y
corresponding to a unit change in X. The value of b yx is
y
given
b yx r.

When deviations are taken from the means of X on Y,

xy

b yx
2
x

. S. B. Bhattacharjee

Continued..

Ch 8_40

When deviations are taken from assumed mean

b yx

N dxd y dx

. S. B. Bhattacharjee

dy
2
2
N d x d x

Ch 8_41

What are the properties of regression


coefficients?
The coefficient of correlation is the geometric
mean of the two regression coefficients .
Symbolically
r bxy b yx
If one of the regression coefficients is greater than
unity, the other must be less than unity, as the value
of the coefficient correlation cannot exceed unity.
Example: if bxy =1.2 and byx =1.4,

r would be 1 2 1 4
1 29 which is not possible.

. S. B. Bhattacharjee

Continued

Ch 8_42

What are the properties of regression


coefficient?
Both the regression coefficients will have the same
sign i.e. they will be either positive or negative.
The coefficient of correlation will have the same
sign as that of regression coefficient.
Example:

If b xy 0 2 and b yx 0 8,
r 0 2 0 8 0 4

. S. B. Bhattacharjee

Continued

Ch 8_43

What are the properties of regression


co-efficient?
The average value of the two regression coefficients
would be greater than the value of coefficient of
correlation. Symbolically, b b
xy

Example:

If bxy

yx

r
2
0 8 and byx 0 4, the average of

08 0 4
the two values would be
0 6.
2
The value of r would be 0 8 0 4
0 566 which is less than 0 6.
. S. B. Bhattacharjee

Continued

Ch 8_44

Regression coefficients are independent of change


of origin but not scale.

. S. B. Bhattacharjee

Ch 8_45

Example:
Prove that the coefficient of correlation is the
geometric mean of the regression coefficient
Proof: Let bxy be co efficient of X on Y and byx be co
efficient of Y on X.

now, bxy

y
x
r.
; bxy r.
y
x

bxy b yx r

y
x

y
x

r2

or , r 2 bxy b yx r bxy b yx
The coefficient of correlation is the geometric mean
of the two regression coefficients.
. S. B. Bhattacharjee

Ch 8_46

Prove that Regression coefficients are independent


of change of origin but not scale.
b yx

N XY X

Y
2
N X 2 X

or

X a
Y b
and v
h
k
X a hu and Y b kv

Let

and

X a hu

Subtracting

and Y b kv

we get

X X h u u

X X Y Y

b yx
.......... i
2
X X

and

Y Y k v v

Substituting these values in the above formula, we get

u u v v k
h u u k v v k

b yx

bvu
2
2
2
h
h
h u u
u u
Similary , it can be shown that bxy
. S. B. Bhattacharjee

h
buv .
k

Ch 8_47

Example:
The following figures relate to advertisement
expenditure and corresponding sales
Advertisement
(in lakhs of Taka)

60

62

65

70

73

75

71

Sales
( in crores of Taka)

10

11

13

15

16

19

14

Estimate
i) The sales for advertisement expenditure of Tk. 80
lakhs and
ii) The advertisement expenditure for a sales target of
Tk. 25 crores
. S. B. Bhattacharjee

Ch 8_48

Let the advertisement expenditure be denoted by X and


sales by y calculation of regression equations
X

X X
=x

x2

60

64

62

65

Y Y
=y

y2

xy

10

16

32

36

11

18

13

70

15

73

25

16

10

75

49

19

25

35

71

14

X=476 x= 0 x2=196

. S. B. Bhattacharjee

Y=98 y= 0 y2= 476

xy= 100

Ch 8_49

Here,
X

Y 98
476

N 7;

68 and Y

14
N
7
N
7
i Regression equation of Y on X :
Y Y b yx X X ..................(i )

Here,

xy 100

b yx

0 51
2
x 196

Now, from i , we have


Y 14 0 51 X 68
Y 14 0 51X 34 68
0 51X 34 68 14
0 51X 20 68
. S. B. Bhattacharjee

Ch 8_50

When advertisement expenditure, X= 80 lakhs,


Sales,Y= 0.5180 20.68
= 40.8 20.68
= 20.12
The likely sales for advertisement expenditure of Tk.
80 lakhs = Tk. 20.12 crores.

. S. B. Bhattacharjee

Ch 8_51

ii) Regression Equation of X on Y:

X X b xy Y Y
Here,

xy 100

bxy

1 7857 1 79
2
y 56

Putting the values in equation (2), we get

X 68 1 79 Y 14

X 68 1 79 24 9998
X 1 79Y 24 9998 68

X 1 79Y 43

. S. B. Bhattacharjee

Ch 8_52

When sales target,

Y= 25 croes,

Advertisement expenditure , X= 1.79 25+43


= 44.75+43
= 87.75

The likely advertisement expenditure for a sales


target of Tk.25 crores = Tk. 87.75 Lakhs.

. S. B. Bhattacharjee

Ch 8_53

Example:
The following data relate to advertising expenditure
(in lakhs of Taka) and their corresponding sales (in
crores of Taka):
Advertising Expenditure 10

12

15

23

20

Sales

17

23

25

21

14

Estimate i) the sales corresponding to advertising


expenditure of Tk 30 lakhs and
ii) the advertising expenditure for a sales
target of Tka. 35 crores.

. S. B. Bhattacharjee

Ch 8_54

Calculation of regression equations


X

X (x) X

10

-6

36

12

-4

15

Y Y
(y)

y2

xy

14

-6

36

+36

16

17

-3

+12

-1

23

+3

-3

23

+7

49

25

+5

25

+35

20

+4

16

21

+1

+4

X= 80 x= 0

x2=118

. S. B. Bhattacharjee

Y=100

y=0

y2=80 xy=84

Ch 8_55

Here,
X

X
N

80

16
5

Y 100

20
N

(1) Regression equation of Y on X : Y Y b yx X X ........(i )


Now,

XY

b yx
2
X

84

0 712
118

From equation (1) , we have

Y 20 0 712 X 16
Y 0 712 X 11.392 20
Y 0 712 X 8 608

. S. B. Bhattacharjee

Ch 8_56

When the advertisement expenditure is Tk. 30 lakhs,


Sales, Y= 0.712 X +8.608= 21.36+8.608=29.968
Thus the likely sales corresponding to advertisement
expenditure of Tk.30 lakhs is Tk 29.968 crores.
Regression equation of X on Y is given by

X X bxy Y Y

xy 84

bxy

1 05
2
y 80

. S. B. Bhattacharjee

Ch 8_57

X 16 1 05 Y 20

X 16 1 05Y 21
X 16 21 1 05Y
X 5 1 05Y

when the sales target is Tk. 35 crores, the


advertising expenditure, X = - 5 +1.05 35
= - 5 +36.75 = 31.75
The advertising expenditure for a sales target of
Tk. 35 crores is Tk. 31.75 lakhs.

. S. B. Bhattacharjee

Ch 8_58

What is the standard error of estimate?


The measure of variation of the observations around the
computed regression line is referred to as the standard
error of estimate.
Just as the standard deviation is a measure of the
scatter of observations in a frequency distribution
around the mean of that distribution, the standard error
of estimate is a measure of the scatter of the observed
values of Y around the corresponding computed values
of Y on the regression line. It is computed as a standard
deviation, being also a square root of the mean of the
squared deviation. But the deviations here are not
deviations of the items from the arithmetic mean; they
are rather the vertical distances of every dot from the
line of average relationship.
. S. B. Bhattacharjee

Ch 8_59

What are the formulae for calculating


the standard error of estimate?
where

S yx

S yx

N 2
Or

Syx= S.E. of estimate of


regression equation
of Y on X.

2
Y
a Y b YX

. S. B. Bhattacharjee

N 2

Ch 8_60

S xy

S xy

N 2
Or

2
X
a X b xy

N 2

The standard error of estimate can be easily


calculated with the help of the following formula.

i)
ii )

S xy S y 1 r 2
S yx S x 1 r

. S. B. Bhattacharjee

2
Continued.

Ch 8_61

Significance of standard error


The standard error of estimate measures the accuracy
of the estimated figures. The smaller the value of
standard error of estimate, the closer will be dots to
the regression line and the better the estimates based
on the equation for this line.
If standard error of estimate is zero, then there is no
variation about the line and the correlation will be
perfect.
With the help of standard error of estimate, it is
possible for us to ascertain how good and
representative the regression line is as a description
of the average relationship between two series.
. S. B. Bhattacharjee

Ch 8_62

What is coefficient of determination?


The ratio of the unexplained variation to the total
variation represents the proportion of variation in Y that
is not explained by regression on X. Subtraction of this
proportion from 1.0 gives the proportion of variation in Y
that is explained by regression on X. The statistic used
to express this proportion is called the co-efficient of
determination and is denoted by R2. It may be written as
follows:
Variation in Y remaining after regression on X
R2 1

Total variation in Y
Error sum of squares
2
R 1
Total sum of squares

. S. B. Bhattacharjee

Continued.

Ch 8_63

The value of R2 is the proportion of the variation in


the dependent variable Y explained by regression on
the independent variable X.

. S. B. Bhattacharjee

Ch 8_64

. S. B. Bhattacharjee

Ch 8_65