You are on page 1of 65

# Welcome

. S. B. Bhattacharjee

Ch 8_1

. S. B. Bhattacharjee

Ch 8_2

## What does the term Regression literally

mean?
The term regressions literally mean stepping back
towards the average.

. S. B. Bhattacharjee

Ch 8_3

What is Regression?
Regression is a statistical tool to estimate (or predict)
the unknown values of one variable from known values
of another variable.
Example: If we know that advertising and sales are
correlated, we may find out the expected amount of
sales for a given advertising expenditure or the amount
of expenditure for achieving a fixed sales target.

. S. B. Bhattacharjee

Ch 8_4

## What is the utility of studying?

The regression analysis is a branch of statistical theory
that is widely used in almost all the scientific disciplines.
In economics it is the basic technique for measuring or
estimating the relationship among economic variables
that constitute the essence of economic theory and
economic life.
Example: If two variables price (X) and demand (Y) are
closely related one can find out the most probable value
of X for a given value of Y or the most probable value of
Y for a given value of X.
Similarly, if the amount of tax and the rise in the price of
a commodity are closely related, it is possible find out
the expected price for a certain amount of tax levy.
. S. B. Bhattacharjee

Ch 8_5

## Distinguish between correlation and

regression analysis?
The points of difference between correlation and
regression analysis are:
While correlation co-efficient is a measure of degree
of relationship between X and Y, the regression
analysis helps study the nature of relationship
between the variables.
The cause and effect relation is more clearly indicated
through regression analysis than by correlation.
Correlation is merely a tool of ascertaining the degree
of relationship between two variables and, therefore,
one can not say that one variable is the cause and the
other the effect, while regression shows the extent of
dependence of one variable on another.

. S. B. Bhattacharjee

Ch 8_6

## What are Regression Lines?

If we take the case of two variables X and Y, we shall
have two regression lines as regression line of X on Y
and the regression line of Y on X.
The regression line of Y on X gives most probable
values of Y for given values of X and the regression
line of X on Y gives most probable values of X for
given values of Y. Thus we have two regression lines.

. S. B. Bhattacharjee

Ch 8_7

## Under what conditions can there be one

regression line?
When there is either perfect positive or perfect
negative correlation between the two variables, the
two correlation lines will coincide i.e. we will have
one line.
The further the two regression lines are from each
other, the lesser is the degree of correlation and the
nearer the two regression lines to each other, the
higher is the degree of correlation. If variables are
independent, r is zero and the lines of regression are
at right angles i.e. parallel to X axis and Y axis.

. S. B. Bhattacharjee

Ch 8_8

## What are regression equations?

Regression Equations are algebraic expressions of
regression lines. Since there are two regression lines
there are two regression equationsthe regression
equation of X on Y is used to describe the variations in
the values of X for given change in Y and the regression
equation of Y on X is used to describe the variation in
the values of Y for given changes in X.

. S. B. Bhattacharjee

Ch 8_9

## Why is the line of best fit?

The line of regression is the line which gives the best
estimate to the value of one variable for any specific
value of other variable. Thus the line of regression is the
line of best fit and is obtained by the principles of
least squares.

. S. B. Bhattacharjee

Ch 8_10

## What is the general form of the regression

equation of Y on X?
The general form of the linear regression equation of Y
on X is expressed as follows:
Ye = a+bX, where
Ye = dependent variable to be estimated
X = independent variable.
In this equation a and b are two unknown constants
(fixed numerical values) which determine the position
of the line completely. The constants are called the
parameters of the line. If the value of either one or both
of them are changed, another line is determined.
. S. B. Bhattacharjee

Ch 8_11

## What is the general form of the regression

equation of Y on X?
The parameter a determines the level of the fitted line
(i.e. the distance of the line directly, above or below the
origin). The parameter b determines the slope of the
line i.e. the change in Y for unit change in X.
To determine the values of a and b, the following two
normal equations are to be solved simultaneously.
Y=Na+bX, XY=aX +bX2.

. S. B. Bhattacharjee

Ch 8_12

## What is the general form of the regression

equation of X on Y?
The general form of the regression equation of X on Y
is expressed as follows:
X = a+bY
To determine the values of a and b, the following
two normal equations are to be solved simultaneously.
X = Na+bY, XY = aY+bY2.

Continued
. S. B. Bhattacharjee

Ch 8_13

## What is the general form of the regression

equation of X on Y?
These equations are usually called the normal
equations. In the equations X, Y, XY, X2, indicate
totals which are computed from the observed pairs of
values of the two variables X and Y to which the least
squares estimating line is to be fitted and N is the
total number of observed pairs of values.
The geometrical presentation of the linear equation , Y
= a+bX is shown in the diagram below:

. S. B. Bhattacharjee

Ch 8_14

Y
X
b
a+
=
b Y

1 UNIT IN X

. S. B. Bhattacharjee

Ch 8_15

## It is clear from this diagram, the height of the line

tells the average value of Y at a fixed value of X.
When X=0, the average value of Y is equal to a . The
value of a is called the Y- intercept since it is the
point at which the straight line crosses the Y- axis.
The slope of the line is measured by b, which gives
the average amount of change of Y per unit change
in the value of X. The sign of b also indicates the
type of relationship between Y and X.

. S. B. Bhattacharjee

Ch 8_16

## How are the values of a and b obtained

to determine a regression completely?
The values of a and b are obtained by the method
of least squares which states that the line should be
drawn through the plotted points in such a manner
that the sum of the squares of the vertical deviations
of the actual Y values from the estimated Y values is
the least, or in other words, in order to obtain a line
which fits the points best, (YYe)2 should be minimum.
Such a line is known as line of best fit.

. S. B. Bhattacharjee

Ch 8_17

## How can the normal equations be

arrived at?

Continued
. S. B. Bhattacharjee

Ch 8_18

Ye = a+bX)

## Differentiating partially with respect to a and b,

Or ,

Y a bX 1

Y a bX X 0

Y a bX 0............... 1
Y a bX X 0............ 2

From 1 , we have

1 , Y

a bX
Na b X

From 2 , we have

YX aX bX 2
XY a X b X 2
. S. B. Bhattacharjee

Continued

Ch 8_19

Example:
The following data give the hardness (X) and tensile
strength (Y) of 7 samples of metal in certain units. Find
the linear regression equation of Y on X.
X:

## 146 152 158 164 170 176 182

Y:

65

78

77

89

82

85

86

Solution:
Regression equation of Y on X is given by
Y = a+bX
The normal equations are:
Y = Na + bX..
(1)
= aX+
. S. B.XY
Bhattacharjee

bX2
(2) Ch 8_20
Continued

## Calculation of regression equations

X

X2

Y2

XY

146

75

21316

5625

10950

152

78

23104

6084

11856

158

77

24964

5929

12166

164

89

26896

7921

14596

170

82

28900

6724

13940

176

85

30976

7225

14960

182

86

33124

7396

15652

1148
(=X)

572
(=Y)

189280
(=X2)

46904
(=Y2)

94120
(=XY)

. S. B. Bhattacharjee

Continued..

Ch 8_21

Here, N = 7
Substituting the values in equations (1) and (2), we
get
572 =7a +1148 b . (5)
94120 = 1148 a +189280 b (4)
Multiplying the equation (3) by 164, we get
93808 =1148 a +188272 b ..(5)
Subtracting this equation from (4), we get
b = 0.31
Putting this value of b in equation (3), we have
. S. B. Bhattacharjee

Continued..

Ch 8_22

572 7a 1148 0 31
572 7 a 355 88
7a 572 355 88
7a 216 12
216 12
a
30 87
7

## The linear regression equation of Y on X is

Y = 30.87 + 0.31X

. S. B. Bhattacharjee

Ch 8_23

## Calculate the regression equations of X on Y

and Y on X from the following data
X:

Y:

Solution:
X

X2

Y2

XY

25

10

16

64

32

25

49

35

X=15

Y=25

. S. B. Bhattacharjee

X2 = 55

Y2 =151

Continued..

XY = 88
Ch 8_24

## Regression equation of X on Y is given by X= a +bY.

The normal equations are X = Na+bY
and XY=aY+bY2
Substituting the values, we get
15 =5a+25b .. (1)
88= 25a+151b (2)
Solving (1) and (2), we get =0.5 and b =0.5

. S. B. Bhattacharjee

Ch 8_25

## Hence, the regression equation of X on Y is given by

X = 0.5+ 0.5Y
The Regression equation of Y on X is : Y = a+bX
The normal equations are
Y= Na+ bX
XY = aX + bX2
Substituting the values, we get
25 = 5a +15b (iii)
88 = 15 a + 55b (iv)
Solving (iii) and (iv), we get
a = 1.10 and b = 1.3
Hence, the regression equation of Y on X is given by
Y =1.10 +1.30X
. S. B. Bhattacharjee

Continued..

Ch 8_26

## What will be the forms of regression equation of X on Y

and regression equation of Y on X on the basis of
deviations taken from arithmetic Means of X and Y?
If we take the deviations of X and Y series from their
respective means, the regression equation of Y on X will
take the form
The value byx can be easily obtained as follows:

Y Y b yx X X

## The two normal equations in terms of x and y will then

become
xy

b yx

,
2
x

. S. B. Bhattacharjee

## where, x X X and y Y Y .......(1)

y Na b x........... 1
2
xy

a
x

b
x

.... 2

Ch 8_27

## Since x =y =0 (deviations being taken from means)

Equation (1) reduces to
Na = 0 a = 0
Equation (2) reduces to
2
xy

b
x

b or b yx

. S. B. Bhattacharjee

xy

2
x

Ch 8_28

## After obtaining the value of byx , the regression

equation can easily be written in terms of X and Y by
substituting for y, Y Y and for x, X X .

## Similarly, the regression equations X= a+bY is reduced

to X X
= b xy Y Y and the value of b xy can be
xy
similarly obtained as

. S. B. Bhattacharjee

bxy
2
y

Ch 8_29

Example:
The following data give the hardness (X) and tensile
strength (Y) 7 samples of metal in certain units. Find
the linear regression equation of Y on X.
X:

146 152

## 158 164 170 176 182

Y:

75

77

78

89

82

85

86

Continued..
. S. B. Bhattacharjee

Ch 8_30

Hardness Strength
(X)
(Y)

Y
X(x) X Y (y)

x2

y2

xy

146

75

- 18

- 6.7142

324

45.08

120.86

152

78

- 12

-3.7122

144

13.80

44.57

158

77

-6

- 4.7142

36

22.22

28.29

164

89

7.2858

53.08

170

82

0.2853

36

0.08

1.71

176

85

12

3.2858

144

10.80

39.43

182

86

18

4.2858

324

18.37

77.14

N=7

X=1148 Y=572

. S. B. Bhattacharjee

x=0

y=0

x2=
1008

Continued..

y2
=163.43

xy
=312

Ch 8_31

1148
X
164
7
Y 572

81 71
N
7
xy 312

b yx

0 309 0 31
2
x 1008

## the linear regression of Y on X is given as

Y Y b yx X X

Y Y 0 31 X 164
Y 81 71 0 31X 50 84 81 71
Y 0 31X 50 84 81 71
Y 0 31X 30 87
i.c. Y 30 87 0 31X
. S. B. Bhattacharjee

Ch 8_32

## What are regression coefficients?

The quantity b in the regression equations (Y= a+bX
and X = a+bY) is called the regression coefficient
or slope co-efficient. Since there are two regression
equations, therefore, there are two regression coefficients regression coefficient
of X on Y and
regression coefficient of Y on X.

. S. B. Bhattacharjee

Ch 8_33

## What is regression coefficient of X on Y?

The regression coefficient of X on Y is represented by the
symbol bxy or b1 . It measures the amount of change in X
corresponding to a unit change in Y. The regression co
x
efficient of X on Y is given by
bxy r.
y
When deviations are taken from the means of X and Y, the
regression coefficient is obtained by

xy

b xy
2
y

Continued..
. S. B. Bhattacharjee

Ch 8_34

## What is regression coefficient of X on Y?

When deviations are taken from assumed means, the
value of bxy is obtained as follows:

bxy

. S. B. Bhattacharjee

N dx dy dx dy
N d y dy
2

Ch 8_35

## What are the assumptions of constructing a

regression modal?
The value of the dependent variable, Y, is dependent in
some degree upon the value of the independent variable,
X. The dependent variable is assumed to be a random
variable, but the values of X are assumed to be fixed
quantities that are selected and controlled by the
experimenter. The requirement that the independent
variables assumes fixed values, however, is not a critical
one. Useful results can still be obtained by regression
analysis in the case where both X and Y are random
variables.
The average relationship between X and Y can be
adequately described by a linear equation Y= a+bX whose
geometrical presentation is a straight line.
. S. B. Bhattacharjee

Ch 8_36

## The height of the line tells the average value of Y at a

fixed value of X. When X= 0, the average value of Y is
equal to a. The value of a is called the Y intercept,
since it is the point at which the straight line crosses
the Y-axis. The slope of the line is measured by b,
which gives the average amount of change of Y per
unit change in the value of X. The sign of b also
indicates the type of relationship between Y and X.
Associated with each value of X there is a sub
population of Y. The distribution of the sub
population may be assumed to be normal or non
specified in the sense that it is unknown. In any
event, the distribution of each population Y is
conditional to the value of X.
. S. B. Bhattacharjee

Ch 8_37

## The mean of each sub- population Y is called the

expected value of Y for a given X: E Y yx.
X

## Furthermore, under the assumption of a linear

relationship between X and Y, all
Y
values of
E
or yx must fall on a straight line.
X
This is
Y
E
yx
X

a bX

## Which is the population regression equation for our

bivariate linear model. In this equation a and b are
called the population regression co efficient.
An individual value in each sub-population Y, may be
expressed as:

X
Y E e
Y

. S. B. Bhattacharjee

Ch 8_38

## Where e is the deviation of a particular value of Y

from yx and is called the error term or the stochastic
disturbance term. The errors are assumed to be
independent random variables because Ys are
random variables and independent. The expectations
of these errors are zero; E(e) = 0. Moreover, if Ys are
normal variables, the error can also be assumed to be
normal.
It is assumed that the variances of all sub
populations, called variances of the regression, are
identical.

. S. B. Bhattacharjee

Ch 8_39

## What is regression coefficient of Y on X ?

The regression Coefficient of Y on X is represented by
byx or b2. It measures the amount of change in Y
corresponding to a unit change in X. The value of b yx is
y
given
b yx r.

## When deviations are taken from the means of X on Y,

xy

b yx
2
x

. S. B. Bhattacharjee

Continued..

Ch 8_40

## When deviations are taken from assumed mean

b yx

N dxd y dx

. S. B. Bhattacharjee

dy
2
2
N d x d x

Ch 8_41

## What are the properties of regression

coefficients?
The coefficient of correlation is the geometric
mean of the two regression coefficients .
Symbolically
r bxy b yx
If one of the regression coefficients is greater than
unity, the other must be less than unity, as the value
of the coefficient correlation cannot exceed unity.
Example: if bxy =1.2 and byx =1.4,

r would be 1 2 1 4
1 29 which is not possible.

. S. B. Bhattacharjee

Continued

Ch 8_42

## What are the properties of regression

coefficient?
Both the regression coefficients will have the same
sign i.e. they will be either positive or negative.
The coefficient of correlation will have the same
sign as that of regression coefficient.
Example:

If b xy 0 2 and b yx 0 8,
r 0 2 0 8 0 4

. S. B. Bhattacharjee

Continued

Ch 8_43

## What are the properties of regression

co-efficient?
The average value of the two regression coefficients
would be greater than the value of coefficient of
correlation. Symbolically, b b
xy

Example:

If bxy

yx

r
2
0 8 and byx 0 4, the average of

08 0 4
the two values would be
0 6.
2
The value of r would be 0 8 0 4
0 566 which is less than 0 6.
. S. B. Bhattacharjee

Continued

Ch 8_44

## Regression coefficients are independent of change

of origin but not scale.

. S. B. Bhattacharjee

Ch 8_45

Example:
Prove that the coefficient of correlation is the
geometric mean of the regression coefficient
Proof: Let bxy be co efficient of X on Y and byx be co
efficient of Y on X.

now, bxy

y
x
r.
; bxy r.
y
x

bxy b yx r

y
x

y
x

r2

or , r 2 bxy b yx r bxy b yx
The coefficient of correlation is the geometric mean
of the two regression coefficients.
. S. B. Bhattacharjee

Ch 8_46

## Prove that Regression coefficients are independent

of change of origin but not scale.
b yx

N XY X

Y
2
N X 2 X

or

X a
Y b
and v
h
k
X a hu and Y b kv

Let

and

X a hu

Subtracting

and Y b kv

we get

X X h u u

X X Y Y

b yx
.......... i
2
X X

and

Y Y k v v

## Substituting these values in the above formula, we get

u u v v k
h u u k v v k

b yx

bvu
2
2
2
h
h
h u u
u u
Similary , it can be shown that bxy
. S. B. Bhattacharjee

h
buv .
k

Ch 8_47

Example:
The following figures relate to advertisement
expenditure and corresponding sales
Advertisement
(in lakhs of Taka)

60

62

65

70

73

75

71

Sales
( in crores of Taka)

10

11

13

15

16

19

14

Estimate
i) The sales for advertisement expenditure of Tk. 80
lakhs and
ii) The advertisement expenditure for a sales target of
Tk. 25 crores
. S. B. Bhattacharjee

Ch 8_48

## Let the advertisement expenditure be denoted by X and

sales by y calculation of regression equations
X

X X
=x

x2

60

64

62

65

Y Y
=y

y2

xy

10

16

32

36

11

18

13

70

15

73

25

16

10

75

49

19

25

35

71

14

X=476 x= 0 x2=196

. S. B. Bhattacharjee

## Y=98 y= 0 y2= 476

xy= 100

Ch 8_49

Here,
X

Y 98
476

N 7;

68 and Y

14
N
7
N
7
i Regression equation of Y on X :
Y Y b yx X X ..................(i )

Here,

xy 100

b yx

0 51
2
x 196

## Now, from i , we have

Y 14 0 51 X 68
Y 14 0 51X 34 68
0 51X 34 68 14
0 51X 20 68
. S. B. Bhattacharjee

Ch 8_50

## When advertisement expenditure, X= 80 lakhs,

Sales,Y= 0.5180 20.68
= 40.8 20.68
= 20.12
The likely sales for advertisement expenditure of Tk.
80 lakhs = Tk. 20.12 crores.

. S. B. Bhattacharjee

Ch 8_51

X X b xy Y Y
Here,

xy 100

bxy

1 7857 1 79
2
y 56

## Putting the values in equation (2), we get

X 68 1 79 Y 14

X 68 1 79 24 9998
X 1 79Y 24 9998 68

X 1 79Y 43

. S. B. Bhattacharjee

Ch 8_52

Y= 25 croes,

= 44.75+43
= 87.75

## The likely advertisement expenditure for a sales

target of Tk.25 crores = Tk. 87.75 Lakhs.

. S. B. Bhattacharjee

Ch 8_53

Example:
The following data relate to advertising expenditure
(in lakhs of Taka) and their corresponding sales (in
crores of Taka):
Advertising Expenditure 10

12

15

23

20

Sales

17

23

25

21

14

## Estimate i) the sales corresponding to advertising

expenditure of Tk 30 lakhs and
ii) the advertising expenditure for a sales
target of Tka. 35 crores.

. S. B. Bhattacharjee

Ch 8_54

## Calculation of regression equations

X

X (x) X

10

-6

36

12

-4

15

Y Y
(y)

y2

xy

14

-6

36

+36

16

17

-3

+12

-1

23

+3

-3

23

+7

49

25

+5

25

+35

20

+4

16

21

+1

+4

X= 80 x= 0

x2=118

. S. B. Bhattacharjee

Y=100

y=0

y2=80 xy=84

Ch 8_55

Here,
X

X
N

80

16
5

Y 100

20
N

Now,

XY

b yx
2
X

84

0 712
118

## From equation (1) , we have

Y 20 0 712 X 16
Y 0 712 X 11.392 20
Y 0 712 X 8 608

. S. B. Bhattacharjee

Ch 8_56

## When the advertisement expenditure is Tk. 30 lakhs,

Sales, Y= 0.712 X +8.608= 21.36+8.608=29.968
Thus the likely sales corresponding to advertisement
expenditure of Tk.30 lakhs is Tk 29.968 crores.
Regression equation of X on Y is given by

X X bxy Y Y

xy 84

bxy

1 05
2
y 80

. S. B. Bhattacharjee

Ch 8_57

X 16 1 05 Y 20

X 16 1 05Y 21
X 16 21 1 05Y
X 5 1 05Y

## when the sales target is Tk. 35 crores, the

advertising expenditure, X = - 5 +1.05 35
= - 5 +36.75 = 31.75
The advertising expenditure for a sales target of
Tk. 35 crores is Tk. 31.75 lakhs.

. S. B. Bhattacharjee

Ch 8_58

## What is the standard error of estimate?

The measure of variation of the observations around the
computed regression line is referred to as the standard
error of estimate.
Just as the standard deviation is a measure of the
scatter of observations in a frequency distribution
around the mean of that distribution, the standard error
of estimate is a measure of the scatter of the observed
values of Y around the corresponding computed values
of Y on the regression line. It is computed as a standard
deviation, being also a square root of the mean of the
squared deviation. But the deviations here are not
deviations of the items from the arithmetic mean; they
are rather the vertical distances of every dot from the
line of average relationship.
. S. B. Bhattacharjee

Ch 8_59

## What are the formulae for calculating

the standard error of estimate?
where

S yx

S yx

N 2
Or

## Syx= S.E. of estimate of

regression equation
of Y on X.

2
Y
a Y b YX

. S. B. Bhattacharjee

N 2

Ch 8_60

S xy

S xy

N 2
Or

2
X
a X b xy

N 2

## The standard error of estimate can be easily

calculated with the help of the following formula.

i)
ii )

S xy S y 1 r 2
S yx S x 1 r

. S. B. Bhattacharjee

2
Continued.

Ch 8_61

## Significance of standard error

The standard error of estimate measures the accuracy
of the estimated figures. The smaller the value of
standard error of estimate, the closer will be dots to
the regression line and the better the estimates based
on the equation for this line.
If standard error of estimate is zero, then there is no
variation about the line and the correlation will be
perfect.
With the help of standard error of estimate, it is
possible for us to ascertain how good and
representative the regression line is as a description
of the average relationship between two series.
. S. B. Bhattacharjee

Ch 8_62

## What is coefficient of determination?

The ratio of the unexplained variation to the total
variation represents the proportion of variation in Y that
is not explained by regression on X. Subtraction of this
proportion from 1.0 gives the proportion of variation in Y
that is explained by regression on X. The statistic used
to express this proportion is called the co-efficient of
determination and is denoted by R2. It may be written as
follows:
Variation in Y remaining after regression on X
R2 1

Total variation in Y
Error sum of squares
2
R 1
Total sum of squares

. S. B. Bhattacharjee

Continued.

Ch 8_63

## The value of R2 is the proportion of the variation in

the dependent variable Y explained by regression on
the independent variable X.

. S. B. Bhattacharjee

Ch 8_64

. S. B. Bhattacharjee

Ch 8_65