G Statiscis Chapter 8

Welcome
. S. B. Bhattacharjee
Ch 8_1
Ch 8_2
What does the term Regression literally

mean?
The term regressions literally mean stepping back
towards the average.
Ch 8_3
What is Regression?
Regression is a statistical tool to estimate (or predict)
the unknown values of one variable from known values
of another variable.
Example: If we know that advertising and sales are
correlated, we may find out the expected amount of
sales for a given advertising expenditure or the amount
of expenditure for achieving a fixed sales target.
Ch 8_4
What is the utility of studying?

The regression analysis is a branch of statistical theory
that is widely used in almost all the scientific disciplines.
In economics it is the basic technique for measuring or
estimating the relationship among economic variables
that constitute the essence of economic theory and
economic life.
Example: If two variables price (X) and demand (Y) are
closely related one can find out the most probable value
of X for a given value of Y or the most probable value of
Y for a given value of X.
Similarly, if the amount of tax and the rise in the price of
a commodity are closely related, it is possible find out
the expected price for a certain amount of tax levy.
Ch 8_5
Distinguish between correlation and

regression analysis?
The points of difference between correlation and
regression analysis are:
While correlation co-efficient is a measure of degree
of relationship between X and Y, the regression
analysis helps study the nature of relationship
between the variables.
The cause and effect relation is more clearly indicated
through regression analysis than by correlation.
Correlation is merely a tool of ascertaining the degree
of relationship between two variables and, therefore,
one can not say that one variable is the cause and the
other the effect, while regression shows the extent of
dependence of one variable on another.
Ch 8_6
What are Regression Lines?

If we take the case of two variables X and Y, we shall
have two regression lines as regression line of X on Y
and the regression line of Y on X.
The regression line of Y on X gives most probable
values of Y for given values of X and the regression
line of X on Y gives most probable values of X for
given values of Y. Thus we have two regression lines.
Ch 8_7
Under what conditions can there be one

regression line?
When there is either perfect positive or perfect
negative correlation between the two variables, the
two correlation lines will coincide i.e. we will have
one line.
The further the two regression lines are from each
other, the lesser is the degree of correlation and the
nearer the two regression lines to each other, the
higher is the degree of correlation. If variables are
independent, r is zero and the lines of regression are
at right angles i.e. parallel to X axis and Y axis.
Ch 8_8
What are regression equations?

Regression Equations are algebraic expressions of
regression lines. Since there are two regression lines
there are two regression equationsthe regression
equation of X on Y is used to describe the variations in
the values of X for given change in Y and the regression
equation of Y on X is used to describe the variation in
the values of Y for given changes in X.
Ch 8_9
Why is the line of best fit?

The line of regression is the line which gives the best
estimate to the value of one variable for any specific
value of other variable. Thus the line of regression is the
line of best fit and is obtained by the principles of
least squares.
Ch 8_10
What is the general form of the regression

equation of Y on X?
The general form of the linear regression equation of Y
on X is expressed as follows:
Ye = a+bX, where
Ye = dependent variable to be estimated
X = independent variable.
In this equation a and b are two unknown constants
(fixed numerical values) which determine the position
of the line completely. The constants are called the
parameters of the line. If the value of either one or both
of them are changed, another line is determined.
Ch 8_11

equation of Y on X?
The parameter a determines the level of the fitted line
(i.e. the distance of the line directly, above or below the
origin). The parameter b determines the slope of the
line i.e. the change in Y for unit change in X.
To determine the values of a and b, the following two
normal equations are to be solved simultaneously.
Y=Na+bX, XY=aX +bX2.
Ch 8_12

equation of X on Y?
The general form of the regression equation of X on Y
is expressed as follows:
X = a+bY
To determine the values of a and b, the following
two normal equations are to be solved simultaneously.
X = Na+bY, XY = aY+bY2.
Continued
Ch 8_13

equation of X on Y?
These equations are usually called the normal
equations. In the equations X, Y, XY, X2, indicate
totals which are computed from the observed pairs of
values of the two variables X and Y to which the least
squares estimating line is to be fitted and N is the
total number of observed pairs of values.
The geometrical presentation of the linear equation , Y
= a+bX is shown in the diagram below:
Ch 8_14
Y
X
b
a+
=
b Y
1 UNIT IN X
Ch 8_15
It is clear from this diagram, the height of the line

tells the average value of Y at a fixed value of X.
When X=0, the average value of Y is equal to a . The
value of a is called the Y- intercept since it is the
point at which the straight line crosses the Y- axis.
The slope of the line is measured by b, which gives
the average amount of change of Y per unit change
in the value of X. The sign of b also indicates the
type of relationship between Y and X.
Ch 8_16
How are the values of a and b obtained

to determine a regression completely?
The values of a and b are obtained by the method
of least squares which states that the line should be
drawn through the plotted points in such a manner
that the sum of the squares of the vertical deviations
of the actual Y values from the estimated Y values is
the least, or in other words, in order to obtain a line
which fits the points best, (YYe)2 should be minimum.
Such a line is known as line of best fit.
Ch 8_17
How can the normal equations be

arrived at?
Continued
Ch 8_18
Let S = (Y Ye)2 = (Y a bX)2
Ye = a+bX)
Differentiating partially with respect to a and b,
Or ,
Y a bX 1
Y a bX X 0
Y a bX 0............... 1
Y a bX X 0............ 2
From 1 , we have
1 , Y
a bX
Na b X
From 2 , we have
YX aX bX 2
XY a X b X 2
Continued
Ch 8_19
Example:
The following data give the hardness (X) and tensile
strength (Y) of 7 samples of metal in certain units. Find
the linear regression equation of Y on X.
X:
146 152 158 164 170 176 182
Y:
65
78
77
89
82
85
86
Solution:
Regression equation of Y on X is given by
Y = a+bX
The normal equations are:
Y = Na + bX..
(1)
= aX+
. S. B.XY
Bhattacharjee
bX2
(2) Ch 8_20
Continued
Calculation of regression equations

X
X2
Y2
XY
146
75
21316
5625
10950
152
78
23104
6084
11856
158
77
24964
5929
12166
164
89
26896
7921
14596
170
82
28900
6724
13940
176
85
30976
7225
14960
182
86
33124
7396
15652
1148
(=X)
572
(=Y)
189280
(=X2)
46904
(=Y2)
94120
(=XY)
Continued..
Ch 8_21
Here, N = 7
Substituting the values in equations (1) and (2), we
get
572 =7a +1148 b . (5)
94120 = 1148 a +189280 b (4)
Multiplying the equation (3) by 164, we get
93808 =1148 a +188272 b ..(5)
Subtracting this equation from (4), we get
b = 0.31
Putting this value of b in equation (3), we have
Continued..
Ch 8_22
572 7a 1148 0 31
572 7 a 355 88
7a 572 355 88
7a 216 12
216 12
a
30 87
7
The linear regression equation of Y on X is

Y = 30.87 + 0.31X
Ch 8_23
Calculate the regression equations of X on Y

and Y on X from the following data
X:
Y:
Solution:
X
X2
Y2
XY
25
10
16
64
32
25
49
35
X=15
Y=25
X2 = 55
Y2 =151
Continued..
XY = 88
Ch 8_24
Regression equation of X on Y is given by X= a +bY.

The normal equations are X = Na+bY
and XY=aY+bY2
Substituting the values, we get
15 =5a+25b .. (1)
88= 25a+151b (2)
Solving (1) and (2), we get =0.5 and b =0.5
Ch 8_25
Hence, the regression equation of X on Y is given by

X = 0.5+ 0.5Y
The Regression equation of Y on X is : Y = a+bX
The normal equations are
Y= Na+ bX
XY = aX + bX2
Substituting the values, we get
25 = 5a +15b (iii)
88 = 15 a + 55b (iv)
Solving (iii) and (iv), we get
a = 1.10 and b = 1.3
Hence, the regression equation of Y on X is given by
Y =1.10 +1.30X
Continued..
Ch 8_26
What will be the forms of regression equation of X on Y

and regression equation of Y on X on the basis of
deviations taken from arithmetic Means of X and Y?
If we take the deviations of X and Y series from their
respective means, the regression equation of Y on X will
take the form
The value byx can be easily obtained as follows:
Y Y b yx X X
The two normal equations in terms of x and y will then

become
xy
b yx
,
2
x
where, x X X and y Y Y .......(1)
y Na b x........... 1
2
xy
a
x
b
x
.... 2
Ch 8_27
Since x =y =0 (deviations being taken from means)

Equation (1) reduces to
Na = 0 a = 0
Equation (2) reduces to
2
xy
b
x
b or b yx
xy
2
x
Ch 8_28
After obtaining the value of byx , the regression

equation can easily be written in terms of X and Y by
substituting for y, Y Y and for x, X X .
Similarly, the regression equations X= a+bY is reduced

to X X
= b xy Y Y and the value of b xy can be
xy
similarly obtained as
bxy
2
y
Ch 8_29
Example:
The following data give the hardness (X) and tensile
strength (Y) 7 samples of metal in certain units. Find
the linear regression equation of Y on X.
X:
146 152
158 164 170 176 182
Y:
75
77
78
89
82
85
86
Continued..
Ch 8_30
Hardness Strength
(X)
(Y)
Y
X(x) X Y (y)
x2
y2
xy
146
75
- 18
- 6.7142
324
45.08
120.86
152
78
- 12
-3.7122
144
13.80
44.57
158
77
-6
- 4.7142
36
22.22
28.29
164
89
7.2858
53.08
170
82
0.2853
36
0.08
1.71
176
85
12
3.2858
144
10.80
39.43
182
86
18
4.2858
324
18.37
77.14
N=7
X=1148 Y=572
x=0
y=0
x2=
1008
Continued..
y2
=163.43
xy
=312
Ch 8_31
1148
X
164
7
Y 572
81 71
N
7
xy 312
b yx
0 309 0 31
2
x 1008
the linear regression of Y on X is given as

Y Y b yx X X
Y Y 0 31 X 164
Y 81 71 0 31X 50 84 81 71
Y 0 31X 50 84 81 71
Y 0 31X 30 87
i.c. Y 30 87 0 31X
Ch 8_32
What are regression coefficients?

The quantity b in the regression equations (Y= a+bX
and X = a+bY) is called the regression coefficient
or slope co-efficient. Since there are two regression
equations, therefore, there are two regression coefficients regression coefficient
of X on Y and
regression coefficient of Y on X.
Ch 8_33
What is regression coefficient of X on Y?

The regression coefficient of X on Y is represented by the
symbol bxy or b1 . It measures the amount of change in X
corresponding to a unit change in Y. The regression co
x
efficient of X on Y is given by
bxy r.
y
When deviations are taken from the means of X and Y, the
regression coefficient is obtained by
xy
b xy
2
y
Continued..
Ch 8_34
What is regression coefficient of X on Y?

When deviations are taken from assumed means, the
value of bxy is obtained as follows:
bxy
N dx dy dx dy
N d y dy
2
Ch 8_35
What are the assumptions of constructing a

regression modal?
The value of the dependent variable, Y, is dependent in
some degree upon the value of the independent variable,
X. The dependent variable is assumed to be a random
variable, but the values of X are assumed to be fixed
quantities that are selected and controlled by the
experimenter. The requirement that the independent
variables assumes fixed values, however, is not a critical
one. Useful results can still be obtained by regression
analysis in the case where both X and Y are random
variables.
The average relationship between X and Y can be
adequately described by a linear equation Y= a+bX whose
geometrical presentation is a straight line.
Ch 8_36
The height of the line tells the average value of Y at a

fixed value of X. When X= 0, the average value of Y is
equal to a. The value of a is called the Y intercept,
since it is the point at which the straight line crosses
the Y-axis. The slope of the line is measured by b,
which gives the average amount of change of Y per
unit change in the value of X. The sign of b also
indicates the type of relationship between Y and X.
Associated with each value of X there is a sub
population of Y. The distribution of the sub
population may be assumed to be normal or non
specified in the sense that it is unknown. In any
event, the distribution of each population Y is
conditional to the value of X.
Ch 8_37
The mean of each sub- population Y is called the

expected value of Y for a given X: E Y yx.
X
Furthermore, under the assumption of a linear

relationship between X and Y, all
Y
values of
E
or yx must fall on a straight line.
X
This is
Y
E
yx
X
a bX
Which is the population regression equation for our

bivariate linear model. In this equation a and b are
called the population regression co efficient.
An individual value in each sub-population Y, may be
expressed as:
X
Y E e
Y
Ch 8_38
Where e is the deviation of a particular value of Y

from yx and is called the error term or the stochastic
disturbance term. The errors are assumed to be
independent random variables because Ys are
random variables and independent. The expectations
of these errors are zero; E(e) = 0. Moreover, if Ys are
normal variables, the error can also be assumed to be
normal.
It is assumed that the variances of all sub
populations, called variances of the regression, are
identical.
Ch 8_39
What is regression coefficient of Y on X ?

The regression Coefficient of Y on X is represented by
byx or b2. It measures the amount of change in Y
corresponding to a unit change in X. The value of b yx is
y
given
b yx r.
When deviations are taken from the means of X on Y,
xy
b yx
2
x
Continued..
Ch 8_40
When deviations are taken from assumed mean
b yx
N dxd y dx
dy
2
2
N d x d x
Ch 8_41
What are the properties of regression

coefficients?
The coefficient of correlation is the geometric
mean of the two regression coefficients .
Symbolically
r bxy b yx
If one of the regression coefficients is greater than
unity, the other must be less than unity, as the value
of the coefficient correlation cannot exceed unity.
Example: if bxy =1.2 and byx =1.4,
r would be 1 2 1 4
1 29 which is not possible.
Continued
Ch 8_42

coefficient?
Both the regression coefficients will have the same
sign i.e. they will be either positive or negative.
The coefficient of correlation will have the same
sign as that of regression coefficient.
Example:
If b xy 0 2 and b yx 0 8,
r 0 2 0 8 0 4
Continued
Ch 8_43

co-efficient?
The average value of the two regression coefficients
would be greater than the value of coefficient of
correlation. Symbolically, b b
xy
Example:
If bxy
yx
r
2
0 8 and byx 0 4, the average of
08 0 4
the two values would be
0 6.
2
The value of r would be 0 8 0 4
0 566 which is less than 0 6.
Continued
Ch 8_44
Regression coefficients are independent of change

of origin but not scale.
Ch 8_45
Example:
Prove that the coefficient of correlation is the
geometric mean of the regression coefficient
Proof: Let bxy be co efficient of X on Y and byx be co
efficient of Y on X.
now, bxy
y
x
r.
; bxy r.
y
x
bxy b yx r
y
x
y
x
r2
or , r 2 bxy b yx r bxy b yx
The coefficient of correlation is the geometric mean
of the two regression coefficients.
Ch 8_46
Prove that Regression coefficients are independent

of change of origin but not scale.
b yx
N XY X
Y
2
N X 2 X
or
X a
Y b
and v
h
k
X a hu and Y b kv
Let
and
X a hu
Subtracting
and Y b kv
we get
X X h u u
X X Y Y
b yx
.......... i
2
X X
and
Y Y k v v
Substituting these values in the above formula, we get
u u v v k
h u u k v v k
b yx
bvu
2
2
2
h
h
h u u
u u
Similary , it can be shown that bxy
h
buv .
k
Ch 8_47
Example:
The following figures relate to advertisement
expenditure and corresponding sales
Advertisement
(in lakhs of Taka)
60
62
65
70
73
75
71
Sales
( in crores of Taka)
10
11
13
15
16
19
14
Estimate
i) The sales for advertisement expenditure of Tk. 80
lakhs and
ii) The advertisement expenditure for a sales target of
Tk. 25 crores
Ch 8_48
Let the advertisement expenditure be denoted by X and

sales by y calculation of regression equations
X
X X
=x
x2
60
64
62
65
Y Y
=y
y2
xy
10
16
32
36
11
18
13
70
15
73
25
16
10
75
49
19
25
35
71
14
X=476 x= 0 x2=196
Y=98 y= 0 y2= 476
xy= 100
Ch 8_49
Here,
X
Y 98
476
N 7;
68 and Y
14
N
7
N
7
i Regression equation of Y on X :
Y Y b yx X X ..................(i )
Here,
xy 100
b yx
0 51
2
x 196
Now, from i , we have

Y 14 0 51 X 68
Y 14 0 51X 34 68
0 51X 34 68 14
0 51X 20 68
Ch 8_50
When advertisement expenditure, X= 80 lakhs,

Sales,Y= 0.5180 20.68
= 40.8 20.68
= 20.12
The likely sales for advertisement expenditure of Tk.
80 lakhs = Tk. 20.12 crores.
Ch 8_51
ii) Regression Equation of X on Y:
X X b xy Y Y
Here,
xy 100
bxy
1 7857 1 79
2
y 56
Putting the values in equation (2), we get
X 68 1 79 Y 14
X 68 1 79 24 9998
X 1 79Y 24 9998 68
X 1 79Y 43
Ch 8_52
When sales target,
Y= 25 croes,
Advertisement expenditure , X= 1.79 25+43

= 44.75+43
= 87.75
The likely advertisement expenditure for a sales

target of Tk.25 crores = Tk. 87.75 Lakhs.
Ch 8_53
Example:
The following data relate to advertising expenditure
(in lakhs of Taka) and their corresponding sales (in
crores of Taka):
Advertising Expenditure 10
12
15
23
20
Sales
17
23
25
21
14
Estimate i) the sales corresponding to advertising

expenditure of Tk 30 lakhs and
ii) the advertising expenditure for a sales
target of Tka. 35 crores.
Ch 8_54
Calculation of regression equations

X
X (x) X
10
-6
36
12
-4
15
Y Y
(y)
y2
xy
14
-6
36
+36
16
17
-3
+12
-1
23
+3
-3
23
+7
49
25
+5
25
+35
20
+4
16
21
+1
+4
X= 80 x= 0
x2=118
Y=100
y=0
y2=80 xy=84
Ch 8_55
Here,
X
X
N
80
16
5
Y 100
20
N
(1) Regression equation of Y on X : Y Y b yx X X ........(i )

Now,
XY
b yx
2
X
84
0 712
118
From equation (1) , we have
Y 20 0 712 X 16
Y 0 712 X 11.392 20
Y 0 712 X 8 608
Ch 8_56
When the advertisement expenditure is Tk. 30 lakhs,

Sales, Y= 0.712 X +8.608= 21.36+8.608=29.968
Thus the likely sales corresponding to advertisement
expenditure of Tk.30 lakhs is Tk 29.968 crores.
Regression equation of X on Y is given by
X X bxy Y Y
xy 84
bxy
1 05
2
y 80
Ch 8_57
X 16 1 05 Y 20
X 16 1 05Y 21
X 16 21 1 05Y
X 5 1 05Y
when the sales target is Tk. 35 crores, the

advertising expenditure, X = - 5 +1.05 35
= - 5 +36.75 = 31.75
The advertising expenditure for a sales target of
Tk. 35 crores is Tk. 31.75 lakhs.
Ch 8_58
What is the standard error of estimate?

The measure of variation of the observations around the
computed regression line is referred to as the standard
error of estimate.
Just as the standard deviation is a measure of the
scatter of observations in a frequency distribution
around the mean of that distribution, the standard error
of estimate is a measure of the scatter of the observed
values of Y around the corresponding computed values
of Y on the regression line. It is computed as a standard
deviation, being also a square root of the mean of the
squared deviation. But the deviations here are not
deviations of the items from the arithmetic mean; they
are rather the vertical distances of every dot from the
line of average relationship.
Ch 8_59
What are the formulae for calculating

the standard error of estimate?
where
S yx
S yx
N 2
Or
Syx= S.E. of estimate of

regression equation
of Y on X.
2
Y
a Y b YX
N 2
Ch 8_60
S xy
S xy
N 2
Or
2
X
a X b xy
N 2
The standard error of estimate can be easily

calculated with the help of the following formula.
i)
ii )
S xy S y 1 r 2
S yx S x 1 r
2
Continued.
Ch 8_61
Significance of standard error

The standard error of estimate measures the accuracy
of the estimated figures. The smaller the value of
standard error of estimate, the closer will be dots to
the regression line and the better the estimates based
on the equation for this line.
If standard error of estimate is zero, then there is no
variation about the line and the correlation will be
perfect.
With the help of standard error of estimate, it is
possible for us to ascertain how good and
representative the regression line is as a description
of the average relationship between two series.
Ch 8_62
What is coefficient of determination?

The ratio of the unexplained variation to the total
variation represents the proportion of variation in Y that
is not explained by regression on X. Subtraction of this
proportion from 1.0 gives the proportion of variation in Y
that is explained by regression on X. The statistic used
to express this proportion is called the co-efficient of
determination and is denoted by R2. It may be written as
follows:
Variation in Y remaining after regression on X
R2 1
Total variation in Y
Error sum of squares
2
R 1
Total sum of squares
Continued.
Ch 8_63
The value of R2 is the proportion of the variation in

the dependent variable Y explained by regression on
the independent variable X.
Ch 8_64
Ch 8_65

G Statiscis Chapter 8

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

G Statiscis Chapter 8

Hochgeladen von

Copyright:

Verfügbare Formate

Welcome

What does the term Regression literally

What is the utility of studying?

Distinguish between correlation and

What are Regression Lines?

Under what conditions can there be one

What are regression equations?

Why is the line of best fit?

What is the general form of the regression

What is the general form of the regression

What is the general form of the regression

What is the general form of the regression

It is clear from this diagram, the height of the line

How are the values of a and b obtained

How can the normal equations be

Let S = (Y Ye)2 = (Y a bX)2

Differentiating partially with respect to a and b,

146 152 158 164 170 176 182

Calculation of regression equations

The linear regression equation of Y on X is

Calculate the regression equations of X on Y

Regression equation of X on Y is given by X= a +bY.

Hence, the regression equation of X on Y is given by

What will be the forms of regression equation of X on Y

The two normal equations in terms of x and y will then

where, x X X and y Y Y .......(1)

Since x =y =0 (deviations being taken from means)

After obtaining the value of byx , the regression

Similarly, the regression equations X= a+bY is reduced

158 164 170 176 182

the linear regression of Y on X is given as

What are regression coefficients?

What is regression coefficient of X on Y?

What is regression coefficient of X on Y?

What are the assumptions of constructing a

The height of the line tells the average value of Y at a

The mean of each sub- population Y is called the

Furthermore, under the assumption of a linear

Which is the population regression equation for our

Where e is the deviation of a particular value of Y

What is regression coefficient of Y on X ?

When deviations are taken from the means of X on Y,

When deviations are taken from assumed mean

What are the properties of regression

What are the properties of regression

What are the properties of regression

Regression coefficients are independent of change

Prove that Regression coefficients are independent

Substituting these values in the above formula, we get

Let the advertisement expenditure be denoted by X and

Y=98 y= 0 y2= 476

Now, from i , we have

When advertisement expenditure, X= 80 lakhs,

ii) Regression Equation of X on Y:

Putting the values in equation (2), we get

When sales target,

Advertisement expenditure , X= 1.79 25+43

The likely advertisement expenditure for a sales

Estimate i) the sales corresponding to advertising

Calculation of regression equations

(1) Regression equation of Y on X : Y Y b yx X X ........(i )

From equation (1) , we have

When the advertisement expenditure is Tk. 30 lakhs,

when the sales target is Tk. 35 crores, the