Correlation Final

LOGO
CORRELATION
ANALYSIS
MBA A
Newish
Jashan
Jotdeep Singh
Yogesh
Introduction
Correlation a LINEAR association between two
random variables
Correlation analysis show us how to determine
both the nature and strength of relationship
between two variables
When variables are dependent on time correlation
is applied
Correlation lies between +1 to -1
A zero correlation indicates that there is no

relationship between the variables
A correlation of 1 indicates a perfect negative
correlation
A correlation of +1 indicates a perfect positive
correlation
Types of Correlation
There are three types of correlation
Types
Type 1
Type 2
Type 3
Type1
Positive
Negative
No
Perfect
If two related variables are such that

when one increases (decreases), the
other also increases (decreases).
If two variables are such that when
one increases (decreases), the other
decreases (increases)
If both the variables are independent
Type 2
Linear
Non linear
When plotted on a graph it tends to be a perfect

line
When plotted on a graph it is not a straight line
Type 3
Simple
Multiple
Partial
Two independent and one dependent variable

One dependent and more than one independent
variables
One dependent variable and more than one
independent variable but only one independent
variable is considered and other independent
variables are considered constant
Methods of Studying Correlation

Scatter Diagram Method
Karl Pearson Coefficient Correlation of

Method
Spearmans Rank Correlation Method
Correlation: Linear
Relationships
Strong relationship = good linear fit

180
160
140
140
120
120
S ymptom Index
S y m pto m In de x
160
100
80
60
100
80
60
40
40
20
20
0
0
50
100
150
Drug A (dose in mg)
Very good fit
200
250
0
0
50
100
150
200
250
Drug B (dose in mg)
Moderate fit
Points clustered closely around a line show a strong correlation.

The line is a good predictor (good fit) with the data. The more
spread out the points, the weaker the correlation, and the less
good the fit. The line is a REGRESSSION line (Y = bX + a)
Coefficient of Correlation
A measure of the strength of the linear relationship
between two variables that is defined in terms of
the (sample) covariance of the variables divided by
their (sample) standard deviations
Represented by r
r lies between +1 to -1
Magnitude and Direction
-1 < r < +1
The + and signs are used for positive linear
correlations and negative linear correlations,
respectively

r xy
n XY X Y
n X ( X ) nY (Y )
2
Shared variability of X and Y variables on the

top
Individual variability of X and Y variables on the
bottom
Interpreting Correlation
Coefficient r
strong correlation: r > .70 or r < .70
moderate correlation: r is between .30 &
.70
or r is between .30 and .70
weak correlation: r is between 0 and .30
or r is between 0 and .30 .
Coefficient of Determination
Coefficient of determination lies between 0 to 1
Represented by r2
The coefficient of determination is a measure of how
well the regression line represents the data
If the regression line passes exactly through every point
on the scatter plot, it would be able to explain all of the
variation
The further the line is away from the points, the less it is
able to explain
r 2, is useful because it gives the proportion of the variance

(fluctuation) of one variable that is predictable from the
other variable
It is a measure that allows us to determine how certain one
can be in making predictions from a certain model/graph
The coefficient of determination is the ratio of the explained
variation to the total variation
The coefficient of determination is such that 0 < r 2 < 1, and
denotes the strength of the linear association between x and
y
The Coefficient of determination represents the

percent of the data that is the closest to the line of
best fit
For example,
if r = 0.922, then r 2 = 0.850
Which means that 85% of the total variation in y

can be explained by the linear relationship between
x and y (as described by the regression equation)
The other 15% of the total variation in y remains
unexplained
Spearmans rank coefficient

A method to determine correlation when the data
is not available in numerical form and as an
alternative the method, the method of rank
correlation is used. Thus when the values of the
two variables are converted to their ranks, and
there from the correlation is obtained, the
correlations known as rank correlation.
Computation of Rank Correlation

Spearmans rank correlation coefficient
can be calculated when
Actual ranks given
Ranks are not given but grades are given but not
repeated
Ranks are not given and grades are given and
repeated
LOGO
BUSINESS STATISTICS
PRESENTATION
ON
REGRESSION ANALYSIS
OBJECTIVES OF THE PRESENTATION-
What is regression analysis

Types and methods of regression analysis
Practical aspect of regression analysis with an
example
INTRODUCTIONRegression analysis is the statistical tool which is

employed for the purpose of forecasting or making
estimates
Here we make use of various mathematical formulas
and assumptions to describe a real world situation.
In every situation, estimation becomes easy once it is
known that the variable to be estimated is related to and
dependent to some other variable.
For making estimates we first have to model the relationship

between the variable involved .
Models can me broadly be classified into
Linear regression Linear regression analysis is a powerful technique used for
predicting the unknown value of a variable from the known
value of another variable.
More precisely, if X and Y are two related variables, then
linear regression analysis helps us to predict the value of Y
for a given value of X or vice verse.
For example age of a human being and maturity are related
variables. Then linear regression analyses can predict level
of maturity given age of a human being.
Multiple regression Multiple regression analysis is a powerful technique

used for predicting the unknown value of a variable from
the known value of two or more variables- also called the
predictors.
Multiple regression analysis helps us to predict the value
of Y for given values of X1, X2, , Xk.
For example the yield of rice per acre depends upon
quality of seed, fertility of soil, fertilizer used, temperature,
rainfall. If one is interested to study the joint affect of all
these variables on rice yield, one can use this technique.
Dependent and Independent Variables By linear regression, we mean models with just one
independent and one dependent variable. The variable whose
value is to be predicted is known as the dependent variable
and the one whose known value is used for prediction is
known as the independent variable.
By multiple regression, we mean models with just one
dependent and two or more independent variables. The
variable whose value is to be predicted is known as the
dependent variable and the ones whose known values are
used for prediction are known independent variables.
Methods of solving regression models1) GRAPHICAL
METHOD-
In this graphical method the average relationship

between the dependent variable and independent
variable is expressed by a line called line of best fit.
Example:
Experience( in
years)
Income( in
000)
15
150
10
120
60
40
70
90
240
210
income
Line of best fit
180
150
120
90
60
30
10
experience
12
1418
16
2) ALGEBRIC
METHOD-
In this method we make use of regression equation

and regression coefficients.
Regression equation(Linear).
A statistical technique used to explain or predict thebehaviour of a dependent
variable
The general equation is given by-
y = a + bx
a is the intercept
b is the slope of line
With the use of the above general equation we find the normal equations
Multiplying the general equation by N and taking the summatation of it
we find the first normal equation i.e.
Y = N.a + bX
And again to find the second normal equation we multiply the general
equation by x and then take the summatation i.e.
XY=a X + b X2
Regression equation(Multiple).
General equation => y = a + b1 x1 + b2x2 + .........+ bnxn
Normal equations for multiple regression are:
Y = N.a + b1X1 + b2X2
X1Y= a X1 + b1 X1 2 + b2 X1 . X2
X2Y= a X2 + b1 X1 . X2 + b2 X22
Lines of Regression
There are two lines of regression- that of Y on X and X on Y.
The line of regression of Y on X is given by Y = a + bX where a and b
are unknown constants known as intercept and slope of the equation.
This is used to predict the unknown value of variable Y when value of
variable X is known.
On the other hand, the line of regression of X on Y is given by X = c +
dY which is used to predict the unknown value of variable X using the
known value of variable Y.
Often, only one of these lines make sense.
Exactly which of these will be appropriate for the analysis in hand will
depend on labeling of dependent and independent variable in the
problem to be analyzed.
Regression coefficientsThe coefficient of X in the line of regression of Y on X is called the

regression coefficient of Y on X and is denoted by b y x
It represents change in the value of dependent variable (Y)corresponding to
unit change in the value of independent variable (X).
And similarly the coefficient of Y in the line of regression of X on Y is
called coefficient of X on Y and is denoted by b x y .
The two regression co-efficient are byx and bxy .
The formula for the two regression coefficient are given by
or
b y x = N .XY X .
Y N. X2 (X)2
xy
= N. XY X . Y
N. Y2 (Y)2
How Good Is the Regression?

Once a regression equation has been constructed, we can
check how good it by examining the coefficient of
determination (R2).
R2 always lies between 0 and 1.
The closer R2 is to 1, the better is the model and its
prediction.
PRACTICAL ASPECT OF REGRESSION ANALYSIS
Here we will show a linear regression analysis between two
variables X and Y.
Variable X is taken as driving experience and variable Y is
taken as number of road accidents(in a year).
Road accident is taken as the dependent variable and which
is related to independent variable X i.e. driving experience.
X
5
(driving
experienc
e)
12
15
25
16
Y ( no. of
road
accidents)
87
50
71
44
56
42
60
64
From the date we will show The estimated regression line for the date.
Number of road accidents taking place when the
driving experience is 10 years and 30 years.
co efficient of determination(R2) and which will
help us to know that how much percentage of
dependent variable is explained by independent
variable.
The following is the tabular representation of data related to

driving experience and number of road accidents.
X
X.Y
X2
Y2
64
320
25
4096
87
174
7569
12
50
600
144
2500
71
639
81
5041
15
44
660
225
1963
56
336
36
3136
25
42
1050
625
1764
16
60
960
256
3600
X=90
Y=474
X.Y=47 X2=139 Y2=2964

39
6
2
Since the estimated regression line is given by Y = a + b.X , now

using the normal equations we calculate the value of a and b .
Y = N. a + b X
XY=a X + b X2
474= 8.a + b.90
4739 = a.90 + b.1396
8a + 90b = 474 E .q - 1
90a + 1396 b = 4739
E.q-2
Now solving both the equation we get the value of a and b asValue of a = 76.66
Value of b = -1.5475
The estimated regression line is
Y = 76.66 1.5476 X
Trend line for

Y = 76.66 1.5476 X
80
70
60
50
No. Of accidents
40
30
20
10
3
18
6
21
9
12
24
27
experience
15
Since we all know that the road accidents are dependent upon the driving
experience and a new driver is considered to be inexperienced and for
him the risk of accident is more so there exist a negative relationship
between the two variables so the trend line is downward sloping in this
case.
From the above value of a and b we can see that value of a is 76.66 which
means if a driver has 0 experience then the no of road accidents that will
take place is 76.66
From the value of b we can say that for every extra year of driving
experience , the road accident is decreased by 1.5476
No of accidents with 10 yr experience
Y = 76.66 1.5476 X
Y = 76.66 1.5476 (10)
Y = 61. 184
No. of accidents with 30 yr experience

Y = 76.66 1.5476 X
Y = 76.66 1.5476 (30)
Y= 30.232
Now we find coefficient of variation for the data

using regression coefficients.
b
yx
= N .XY X . Y
N. X2 (X)2
= N. XY X . Y
N. Y2 (Y)2
= 8(4739) 90. 474
= 8 (4739) 90 . 474
8(29642)
(474)2
= 0.381
8(1396) (90)2
= 1.547
Now
xy
R2 =
b y x .b x y
= (- 1. 547) (- 0.381)
=
0.5894
From the above coefficient of determination we can say that almost 59 %

of variance of dependent variable is explained by the independent
variable.
LOGO
Conceptual Frame work of

SENSEX and Nifty
Stock Market Indices

Stock Market performance is
quantified by calculating an index
using the benchmark scrips and as
known to all
SENSEX (Sensitive
Index) is associated with Bombay Stock
Exchange and S&P CNX NIFTY is
associated
with
National
Stock
Exchange
Bombay Stock Exchange

There are 23 stock exchanges in the India.
Bombay Stock Exchange is the largest, with
over 6,000 stocks listed. The BSE accounts
for over two thirds of the total trading
volume in the country.
Established in 1875, the exchange is also the
oldest in Asia. Among the twenty-two Stock
Exchanges recognized by the Government of
India under the Securities Contracts
(Regulation) Act, 1956, it was the first one to
be recognized and it is the only one that had
the
privilege of
getting
permanent
Scrips at BSE
ACC
AIRTEL
BHEL
DLF
GRASIM
GUJRAT AMBUJA
HDFC
HDFC BANK
HINDALCO
HUL
ICICI BANK
INFOSYS
SUN Pharma IND.
LTD
ITC
L&TMARUTI
o MARUTI
o MAHINDRA &
MAHINDRA
o NTPC
o ONGC
o RANBAXY
o RELIANCE
COMMUNICATION
o RELIANCE
INFRASTRUCTURE
o RIL
o STERLITE
INDUSTIES LTD
o SBI
o TCS
o TATA MOTERS
o TATA STEEL
o TATA POWER
COMPANY LTD
o WIPRO
National Stock Exchange

The National Stock Exchange (NSE), located
in Bombay, is India's first debt market.
It was set up in 1993 to encourage stock
exchange
reform
through
system
modernization and competition.
The instruments traded are, treasury bills,
government security and bonds issued by
public sector companies
Listing History
How are
the SENSEX 30
Trading
Frequency
Rank
Stocks
are
based
on selected?
the Market Cap (Should be
Among top 100)
Market Capitalization weight
Industry / sector they belong
Historical Record
Methodology of SENSEX
SENSEX has been calculated since 1986 and
initially it was calculated based on the Total
Market Capitalization methodology and the
methodology was changed in 2003 to Free
Float Market Capitalization.
Hence, these days, the SENSEX is based on
the Free Floating Market cap of 30 SENSEX
Stocks traded on the BSE relative to the base
value which is 100(1978-79) and it is
calculated for every 15 seconds
SENSEX is calculated using the "Free-float

Market Capitalization" methodology, wherein, the
level of index at any point of time reflects the freefloat market
It reflects value of 30 component stocks relative to
a base period.
The market capitalization of a company is
determined by multiplying the price of its stock by
the number of shares issued by the company.
This market capitalization is further multiplied by
the free-float factor to determine the free-float
How SENSEX is calculated?
The formula for calculating the SENSEX =

(Sum of free flow market cap of 30
benchmark stocks)*Index Factor
where,
Index Factor = 100/Market Cap Value in
1978-79.
100 is the Index value during 1978-79.
How NIFTY is calculated?
The National Stock Exchange (NSE) is

associated with NIFTY and it is also
calculated by the same methodology but with
two key differences.
1. Base year is 1995 and base value is 1000.
2. NIFTY is calculated based on 50 stocks.
Formulae for valuation
Free float market

Capital
SENSEX= Market Capital in

1978-79
Base index points of

1978-79

Correlation Final

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Correlation Final

Hochgeladen von

Copyright:

Verfügbare Formate

LOGO

A zero correlation indicates that there is no

If two related variables are such that

When plotted on a graph it tends to be a perfect

Two independent and one dependent variable

Methods of Studying Correlation

Karl Pearson Coefficient Correlation of

Spearmans Rank Correlation Method

Strong relationship = good linear fit

Drug A (dose in mg)

Very good fit

Drug B (dose in mg)

Points clustered closely around a line show a strong correlation.

Shared variability of X and Y variables on the

r 2, is useful because it gives the proportion of the variance

The Coefficient of determination represents the

if r = 0.922, then r 2 = 0.850

Which means that 85% of the total variation in y

Spearmans rank coefficient

Computation of Rank Correlation

OBJECTIVES OF THE PRESENTATION-

What is regression analysis

INTRODUCTIONRegression analysis is the statistical tool which is

For making estimates we first have to model the relationship

Multiple regression Multiple regression analysis is a powerful technique

Methods of solving regression models1) GRAPHICAL

In this graphical method the average relationship

Line of best fit

In this method we make use of regression equation

Regression coefficientsThe coefficient of X in the line of regression of Y on X is called the

How Good Is the Regression?

PRACTICAL ASPECT OF REGRESSION ANALYSIS

Here we will show a linear regression analysis between two

The following is the tabular representation of data related to

X.Y=47 X2=139 Y2=2964

Since the estimated regression line is given by Y = a + b.X , now

474= 8.a + b.90

4739 = a.90 + b.1396

90a + 1396 b = 4739

Trend line for

No. of accidents with 30 yr experience

Now we find coefficient of variation for the data

From the above coefficient of determination we can say that almost 59 %

Conceptual Frame work of

Stock Market Indices

Bombay Stock Exchange

National Stock Exchange

SENSEX is calculated using the "Free-float

How SENSEX is calculated?

The formula for calculating the SENSEX =

How NIFTY is calculated?

The National Stock Exchange (NSE) is

Formulae for valuation

Free float market

SENSEX= Market Capital in

Base index points of

Das könnte Ihnen auch gefallen