Beruflich Dokumente
Kultur Dokumente
A relationship between a single regressor variable x and a response variable Y is called a simple linear
regression. The regressor variable x is controlled by the experimenter.
Suppose that the true relationship between Y and x is a straight line and the observation Y at each x is a random
variable. So, it is reasonable to consider that
= 0 + 1 + (1),
where 0 1 are called the regression coefficients. These coefficients are unknown and they can be
estimated from the observed data. The equation (1) is called simple regression line. The symbol denotes the
random error in the modeling of equation (1), the random error is assumed to have mean zero ( = 0).
Then
(|) = 0 + 1 (2)
Estimation of
Suppose we have n pair of observation 1, 1 , 2, 2 , , , , these data are used to estimate the
unknown parameters 0 1 of the equation (1) such that the sum of the squares of the errors is least
possible (miminum).
= 0 + 1 + , = 1, 2, ,
= 0 1 , = 1, 2, ,
1
That mean, = =1( 0 1 ) 2 (3) to be minimized.
Let 0 1 be the estimators of 0 1 for attaining the minimum value of S. For minimum value of S,
we must have = 0 1
=0
0
Now for = 0 2 =1( 0 1 ) = 0
0
( 0 1 ) = 0
=1
0 1 = 0
=1 =1
0 + 1 = (4)
=1 =1
and for = 0 2 =1 ( 0 1 ) = 0
1
( 0 1 2 ) = 0
=1
0 1 2 = 0
=1 =1 =1
0 + 1 2 = (5)
=1 =1 =1
Equations (4) and (5) are called the least square normal equations. We rewrite equation (4) to get
1 1
0 = 1
=1 =1
0 = 1 (6)
n
Multiplying (4) by i=1 x i and (5) by n and then subtract (4) from (5) to get
1
=1 =1 =1
1 = 2 (7)
2 1
=1 =1
2
= 0 + 1 (8)
Equation (8) is used to predict the value of response variable Y for given regressor x.
Note: The least square fitted simple regression line (8) passes through the center point ( , ) because of
equation (6).
= =
=1 = =1 =1
Then from equation (7) using above notations, we have 1 =
Note: The least square estimators 0 1 are random variables, since they are calculated using the linear
combinations of random values of the random variable Y.
: the predicted value of response variable Y at using regression line given by equation (8).
Then the difference = is known as residue (error) of Y at . The sum of the square of the residues
or error sum of the square is
= 2
=1
2
= ( )
=1
= ( 0 1 )2
=1
2
= (2 0 1 ) , after simplifications, we get
=1
= 1 ---------------------- (9)
= +
= (10) ,
= 1 -------------------------(11)
The coefficient of determination is a measure of how well the regression line given in the equation (8)
represents the data. The coefficient of determination is the quantity is defined by
2 =
2 =
(Using equation (10))
2 = 1
4
So,
2 = =1
Remarks:
Correlation
Correlation:
Correlation is the relation between two random variables X and Y. It is the measure of how things are
related. For examples the correlation between
The rainfall and level of pollutant in a city.
Temperature and consumption of cold drinks.
5
Height and weight of kids of age 5.
Correlation Coefficient
Correlation Coefficient is the measure of correlation between two random variables X and Y.
It measures the strength and direction of linear relation between two random variables X and
Y. For the given values of the pair , , = 1,2, , of random variables (X, Y), the
correlation coefficient is given by
=1 ( )
=
=
2 2
=1 =1
Properties of correlation coefficient r:
The values of r lie in 1 1.
If r lies in the range 0 < 1, then correlation is said to be positive and if = 1,
then correlation is perfectly positive.
If r lies in the range 1 0, then correlation is said to be negative and if
= 1, then correlation is perfectly negative.
If = 0, then there is no correlation between the random variables X and Y.
Determinant of coefficient 2 = 2 .
1
=1 =1 =1
=
2 1 2 2 1 2
=1
=1 =1
=1
6
=1 =1 =1
=
2 2
2 2
=1 =1 =1 =1
Example: A study of the amount of rainfall and the quantity of air pollution removed produced the
following data:
() ( ) () (/3)
4.3 126
4.5 121
5.9 116
5.6 118
6.1 114
5.2 118
3.8 132
2.1 141
7.5 108
a) Plot a scatter diagram.
b) Find the equation of the regression line to predict, the particulate removed from the amount of
daily rainfall.
c) Estimate the amount of particulate removed when the daily rainfall is = 4.8 units.
d) Find the determination of coefficient.
e) Find the correlation coefficient.
Answer:
7
Where, 1 = , and 0 = 1 and
2
1
= 2
=1 =1
1
=
=1 =1 =1
2 2
4.3 126 18.49 15876 541.8
4.5 121 20.25 14641 544.5
5.9 116 34.81 13456 684.4
5.6 118 31.36 13924 660.8
6.1 114 37.21 12996 695.4
5.2 118 27.04 13924 613.6
3.8 132 14.44 17424 501.6
2.1 141 4.41 19881 296.1
7.5 108 56.25 11664 810
=9 ,
2 2
=1 = 45, =1 = 1094, =1 = 244.26, =1 = 133786, =1 = 5348.2
1 45
= =1 = = 5,
9
1 1094
= = = 121.56
9
=1
So,
2
2
1
=
=1 =1
1 2
= 244.26 45
9
= 19.26
1
And = =1 =1 =1
1
= 5348.2 45 1094
9
= 121.80
8
121.80
Now, 1 = = 19.26
= 6.32
(c) Estimate the amount of particulate removed when the daily rainfall is = . units:
When = 4.8
= 122.82
1 2
= 133786 1094
9
= 804.22
1 (6.32) (121.80)
2 = =
804.22
2 = 0.9572
Thus, 2 = 0.9572 implies 95.72% of the total variation in Y explained by the regression line (2).
(e) Correlation coefficient:
=1 =1 =1
=
2 2 2 2
=1 =1 =1 =1
9
9 5348.2 45 1094
=
9 244.26 45 2 9 133786 1094 2
1096.20
=
13.17 85.08
= 0.9783
Thus, = 0.9783 implies the random variable Y strongly negatively correlated with the random
variable X. That means particulate removed decreases strongly when daily rainfall increases.
10