Beruflich Dokumente
Kultur Dokumente
y y
x x
y y
x x
Scatter Plot Examples
(continued)
Strong relationships Weak relationships
y y
x x
y y
x x
Scatter Plot Examples
(continued)
No relationship
x
Correlation Coefficient
(continued)
x x x
r = -1 r = -.6 r=0
y y
x x
r = +.3 r = +1
Calculating the Correlation Coefficient by using
Karl Pearson’s method
To measure the intensity of the relationship between the
variables Karl Person proposed a formula known as Karl
Pearson's Correlation coefficient
r
( x x )( y y )
[ ( x x ) ][ ( y y ) ]
2 2
n xy x y
r
[n( x 2 ) ( x )2 ][n( y 2 ) ( y )2 ]
where:
r = Sample correlation coefficient
n = Sample size
x = Value of the independent variable
y = Value of the dependent variable
Assumptions of using Pearson’s
Correlation Coefficient
Pearson’s correlation coefficient is appropriate to
calculate when both variables ‘x’ and ‘y’ are measured
on an interval or a ratio scale
1 r 2
P.E ( r ) = (0.6745)
n
If r < 6.P.E ( r ) then the value of ‘r’ is not significant
If r > 6.P.E ( r ) then the value of ‘r’ is significant
Coefficient of Determination
Coefficient of determination is denoted by r2.
It always has value between 0 to 1
By using coefficient of determination we can find the
strength of the relationship between variables but we
lose the information about the direction
r2 = 0 then no variation in y can be explain by the
variable x
r 2=1 then the values of y completely explained by x
Examples of Approximate
R2 Values
y
R2 = 1
x
R = +1
2
Examples of Approximate
R2 Values
y
0 < R2 < 1
x
Examples of Approximate
R2 Values
R2 = 0
y
No linear relationship
between x and y:
70
10(10800) (220)(450)
60
50
40
[10(5600) (220)2 ][8(22100) (450)2 ]
30
0.759014
20
10
0
5 10 15 20 25
Y
30 35 40 45
There is positive relation
between the sales calls and
sales of the copier
Calculation Example II
Tree Trunk
Height Diameter
y x xy y2 x2
35 8 280 1225 64
49 9 441 2401 81
27 7 189 729 49
33 6 198 1089 36
60 13 780 3600 169
21 7 147 441 49
45 11 495 2025 121
51 12 612 2601 144
=321 =73 =3142 =14111 =713
Calculation Example
(continued)
Tree n xy x y
Height,
y
r
70
[n( x 2 ) ( x)2 ][n( y 2 ) ( y)2 ]
60
8(3142) (73)(321)
50
40
[8(713) (73)2 ][8(14111) (321)2 ]
30
0.886
20
10
0
r = 0.886 → relatively strong positive
0 2 4 6 8 10 12 14
linear association between x and y
Trunk Diameter, x
Excel Output
Correlation between
Tree Height and Trunk Diameter
Ex: Pepsi Cola is studying the effect of its last advertising
campaign. People chosen at random were called and
asked how many cans of Pepsi Cola had bought (X) in
the past week and how many advertisements (Y) they
had either read or seen in the past week.
X :3 7 4 2 0 4 1 2
Y :11 18 9 4 7 6 3 8
6 d 2
x, y 1 2
n(n 1)
Where d R x R y
n is no. of pair of observations
When ranks are equal we add correction factor to
∑d2 and is given by
6 d 2 correction factor
( x, y ) 1
2
n(n 1)
m(m2 1)
Where correction factor is , m is number
12
of times an item repeated
Ten Competitors in a beauty contest are ranked by
three judges in the following order.
Judge I :1 6 5 10 3 2 4 9 7 8
Judge II :3 5 8 4 7 10 2 1 6 9
Judge III:6 4 9 8 1 2 3 10 5 7
No.Of Unemployed 15 12 13 11 12 12 19 26
Introduction to Regression Analysis
y β0 β1x ε
Variable
y y β0 β1x ε
Observed Value
of y for xi
εi Slope = β1
Predicted Value
Random Error
of y for xi
for this x value
Intercept = β0
xi x
Estimated Regression Model
The sample regression line provides an estimate of
the population regression line
ŷ i b0 b1x variable
e 2
(y ŷ) 2
(y (b 0 b1x)) 2
The Least Squares Equation
The formulas for b1 and b0 are:
b1
( x x )( y y )
(x x) 2
algebraic equivalent:
and
xy x y
b1 n b0 y b1 x
(
x n
2 x ) 2
Interpretation of the
Slope and the Intercept
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
350
Slope
300
250
= 0.10977
200
150
100
50
Intercept 0
= 98.248 0 500 1000 1500 2000 2500 3000
Square Feet
98.25 0.1098(200 0)
317.85
The predicted price for a house with 2000
square feet is 317.85($1,000s) = $317,850
Example: Market Trend
Over all Market Average Return %
In finance, it is of interest to look at Return % (X) (Y)
the relationship between Y, a stock’s
10 11
average return, and X, the overall
market return. The slope coefficient 12 15
computed by linear regression is 8 3
called the stock’s beta by investment
15 18
analysts. A beta greater than 1
indicates that the stock is relatively 9 10
sensitive to changes in the market; a 11 12
beta less than 1 indicates that the
8 6
stock is relatively insensitive. For the
following data, compute the beta and 10 7
suggest market trend. 13 18
11 13
Properties of regression lines and
their coefficients:
1. Correlation coefficient is the geometric
mean between the regression
coefficient
2. The sign of correlation coefficient is the
same as that of regression coefficient.
3. Regression coefficients are dependent
of the change origin but not of scale.
Problem
The following data give the ages and Blood
Pressure of 10 women. Find
1. Correlation Coefficient between age and BP
2. Determine the least square regression
equation of BP on age
3. Estimate the BP of a woman whose age is
45
Data
AGE BP Calculations
56 147
x y x2 y2 xy
42 125 56 147 3136 21609 8232
ŷ i b 0 b1 x
and
x y
b1
xy n b0 y b1 x
( x ) 2
x 2
n
b1 = 1.11 b0 = 83.755
Regression equation is
y = 83.755+ 1.11x
When x=45 y =?
Y=133.705
Multiple regression Analysis
A linear regression equation with more than one
independent variable is called a multiple
regression model.
The linear regression equation with
k independen t variables takes the form :
y β 0 β1 x1 β 2 x 2 β 3 x 3 ........ β k x k ε
where
y is the value of dependent variable to be estimated
β 0 is a constant
β1,β 2, ...β k are the regression coefficien ts associated
with each of the x k independen t variable.
ε is the random error due to chance.
Let the fitted linear regression equation be
yˆ b 0 b1 x1 b 2 x 2 ....... b k x k which minimizes
the sum of squares errors (SSE) (y - yˆ ) 2
where
yˆ is the estimated value of dependent variable y
b1 , b 2 , b 3 ....b k partial regression coefficien ts and are
obtained by the principle of least squares technique.
Let us consider the case where two independent
variables and a dependent variable.
The multiple linear regression model
involving two independen t variables is :
y β 0 β1 x1 β 2 x 2 ε
where
y is the dependent variable
x1 and x 2 are independen t variables.
ε is the random error due to chance.
β 0 is the y - intercept.
β1 , β 2 are the regression coefficien ts.
Let the fitted multiple linear regression equation be
yˆ b 0 b1 x1 b 2 x 2
or
yˆ b 0 b y1.2 x1 b y2.1x 2
where
yˆ is the estimated value of dependent variable y.
x1 , x 2 are the independen t variables.
b 0 , b1,b 2 are the unknown constants and
are determined by the priniple of least squares technique
which minimizes the sum of squres errors (SSE) (y - yˆ ) 2
By solving the following equations the values of
b 0 , b1 , b 2 can be determined .
y nb 0 b y1.2 x1 b y2.1 x 2
y x 1 b 0 x1 b y1.2 x b x x
1
2
y2.1 1 2
y x 2 b 0 x 2 b y1.2 x1 x 2 b y2.1 x 2
2
Let the fitted multiple linear regression equation be
y b 0 b 1 x1 b 2 x 2
or y b 0 b y1.2 x1 b y2.1x 2 - - - -(1)
y b 0 b y1.2 x1 b y2.1x 2 - - - -(2)
(1) - (2)
(y - y ) b y1.2 (x1 x1 ) b y2.1 (x 2 - x 2 )
Y b y1.2 X1 b y2.1 X 2
Y X X Y X X X
1
2
2 2 2 1
b y1.2
X X X X
2
1
2
2 1 2
2
Y X X Y X X X
2
2
1 1 2 1
b y2.1
X X X X
2
1
2
2 1 2
2
where Y y - y , X 1 x 1 x1 , X 2 x 2 x 2
Relationsh ip b/w partial regression coefficien ts & Correlatio n coefficien ts :
ry1 (ry2 r12 ) σ y
b y1.2 2
σ
1 r12 1
ry2 (ry1 r12 ) σ y
b y2.1 2
σ
1 r12 2
Y X 1
r the correlatio n b/w y & x
y1 1
Y X 2 2
1
Y X 2
r the correlatio n b/w y & x
y2 2
Y X 2 2
2
X X 1 2
r the correlatio n b/w x & x
12 1 2
X X
2 2
1 2
A marketing manager of a company wants to
predict demand for the product. He is believing
strongly demand is highly influenced by annual
average price of the product (in units) &
advertising expenditure (Rs in lakh).He has
collected past data to know the effect of these
factors on demand and given below:
Y 4 6 7 9 13 15
X1 15 12 8 6 4 3
X2 30 24 20 14 10 4
• The following results are obtained from
measuremen t on length (in mm), volume (in cc)
and weight (in gm) of 300 eggs.
x1 55.95 x 2 51.48 y 56.03
σ 1 2.26 σ 2 4.39 σ y 4.41
ry1 0.578 ry2 0.581 r12 0.974
r r 2ry1ry2r12
2
y1
2
y2
R y.12
1r 2
12