Beruflich Dokumente
Kultur Dokumente
CORRELATION
ANALYSIS
MBA A
Newish
Jashan
Jotdeep Singh
Yogesh
Introduction
Correlation a LINEAR association between two
random variables
Correlation analysis show us how to determine
both the nature and strength of relationship
between two variables
When variables are dependent on time correlation
is applied
Correlation lies between +1 to -1
Types of Correlation
There are three types of correlation
Types
Type 1
Type 2
Type 3
Type1
Positive
Negative
No
Perfect
Type 2
Linear
Non linear
Type 3
Simple
Multiple
Partial
Correlation: Linear
Relationships
160
140
140
120
120
S ymptom Index
S y m pto m In de x
160
100
80
60
100
80
60
40
40
20
20
0
0
50
100
150
200
250
0
0
50
100
150
200
250
Moderate fit
Coefficient of Correlation
A measure of the strength of the linear relationship
between two variables that is defined in terms of
the (sample) covariance of the variables divided by
their (sample) standard deviations
Represented by r
r lies between +1 to -1
Magnitude and Direction
-1 < r < +1
The + and signs are used for positive linear
correlations and negative linear correlations,
respectively
r xy
n XY X Y
n X ( X ) nY (Y )
2
Interpreting Correlation
Coefficient r
strong correlation: r > .70 or r < .70
moderate correlation: r is between .30 &
.70
or r is between .30 and .70
weak correlation: r is between 0 and .30
or r is between 0 and .30 .
Coefficient of Determination
Coefficient of determination lies between 0 to 1
Represented by r2
The coefficient of determination is a measure of how
well the regression line represents the data
If the regression line passes exactly through every point
on the scatter plot, it would be able to explain all of the
variation
The further the line is away from the points, the less it is
able to explain
LOGO
BUSINESS STATISTICS
PRESENTATION
ON
REGRESSION ANALYSIS
Dependent and Independent Variables By linear regression, we mean models with just one
independent and one dependent variable. The variable whose
value is to be predicted is known as the dependent variable
and the one whose known value is used for prediction is
known as the independent variable.
By multiple regression, we mean models with just one
dependent and two or more independent variables. The
variable whose value is to be predicted is known as the
dependent variable and the ones whose known values are
used for prediction are known independent variables.
METHOD-
Experience( in
years)
Income( in
000)
15
150
10
120
60
40
70
90
240
210
income
180
150
120
90
60
30
10
experience
12
1418
16
2) ALGEBRIC
METHOD-
Regression equation(Linear).
A statistical technique used to explain or predict thebehaviour of a dependent
variable
The general equation is given by-
y = a + bx
a is the intercept
b is the slope of line
With the use of the above general equation we find the normal equations
Multiplying the general equation by N and taking the summatation of it
we find the first normal equation i.e.
Y = N.a + bX
And again to find the second normal equation we multiply the general
equation by x and then take the summatation i.e.
XY=a X + b X2
Regression equation(Multiple).
General equation => y = a + b1 x1 + b2x2 + .........+ bnxn
Normal equations for multiple regression are:
Y = N.a + b1X1 + b2X2
X1Y= a X1 + b1 X1 2 + b2 X1 . X2
X2Y= a X2 + b1 X1 . X2 + b2 X22
Lines of Regression
There are two lines of regression- that of Y on X and X on Y.
The line of regression of Y on X is given by Y = a + bX where a and b
are unknown constants known as intercept and slope of the equation.
This is used to predict the unknown value of variable Y when value of
variable X is known.
On the other hand, the line of regression of X on Y is given by X = c +
dY which is used to predict the unknown value of variable X using the
known value of variable Y.
Often, only one of these lines make sense.
Exactly which of these will be appropriate for the analysis in hand will
depend on labeling of dependent and independent variable in the
problem to be analyzed.
or
b y x = N .XY X .
Y N. X2 (X)2
xy
= N. XY X . Y
N. Y2 (Y)2
variables X and Y.
Variable X is taken as driving experience and variable Y is
taken as number of road accidents(in a year).
Road accident is taken as the dependent variable and which
is related to independent variable X i.e. driving experience.
X
5
(driving
experienc
e)
12
15
25
16
Y ( no. of
road
accidents)
87
50
71
44
56
42
60
64
From the date we will show The estimated regression line for the date.
Number of road accidents taking place when the
driving experience is 10 years and 30 years.
co efficient of determination(R2) and which will
help us to know that how much percentage of
dependent variable is explained by independent
variable.
X.Y
X2
Y2
64
320
25
4096
87
174
7569
12
50
600
144
2500
71
639
81
5041
15
44
660
225
1963
56
336
36
3136
25
42
1050
625
1764
16
60
960
256
3600
X=90
Y=474
XY=a X + b X2
8a + 90b = 474 E .q - 1
E.q-2
Now solving both the equation we get the value of a and b asValue of a = 76.66
Value of b = -1.5475
The estimated regression line is
Y = 76.66 1.5476 X
80
70
60
50
No. Of accidents
40
30
20
10
3
18
6
21
9
12
24
27
experience
15
Since we all know that the road accidents are dependent upon the driving
experience and a new driver is considered to be inexperienced and for
him the risk of accident is more so there exist a negative relationship
between the two variables so the trend line is downward sloping in this
case.
From the above value of a and b we can see that value of a is 76.66 which
means if a driver has 0 experience then the no of road accidents that will
take place is 76.66
From the value of b we can say that for every extra year of driving
experience , the road accident is decreased by 1.5476
No of accidents with 10 yr experience
Y = 76.66 1.5476 X
Y = 76.66 1.5476 (10)
Y = 61. 184
yx
= N .XY X . Y
N. X2 (X)2
= N. XY X . Y
N. Y2 (Y)2
= 8(4739) 90. 474
= 8 (4739) 90 . 474
8(29642)
(474)2
= 0.381
8(1396) (90)2
= 1.547
Now
xy
R2 =
b y x .b x y
= (- 1. 547) (- 0.381)
=
0.5894
LOGO
Scrips at BSE
ACC
AIRTEL
BHEL
DLF
GRASIM
GUJRAT AMBUJA
HDFC
HDFC BANK
HINDALCO
HUL
ICICI BANK
INFOSYS
SUN Pharma IND.
LTD
ITC
L&TMARUTI
o MARUTI
o MAHINDRA &
MAHINDRA
o NTPC
o ONGC
o RANBAXY
o RELIANCE
COMMUNICATION
o RELIANCE
INFRASTRUCTURE
o RIL
o STERLITE
INDUSTIES LTD
o SBI
o TCS
o TATA MOTERS
o TATA STEEL
o TATA POWER
COMPANY LTD
o WIPRO
Listing History
How are
the SENSEX 30
Trading
Frequency
Rank
Stocks
are
based
on selected?
the Market Cap (Should be
Among top 100)
Market Capitalization weight
Industry / sector they belong
Historical Record
Methodology of SENSEX
SENSEX has been calculated since 1986 and
initially it was calculated based on the Total
Market Capitalization methodology and the
methodology was changed in 2003 to Free
Float Market Capitalization.
Hence, these days, the SENSEX is based on
the Free Floating Market cap of 30 SENSEX
Stocks traded on the BSE relative to the base
value which is 100(1978-79) and it is
calculated for every 15 seconds