Correlation and Linear Regression

CORRELATION
AND LINEAR
REGRESSION
BHAVYA BHUVNESH
BHAVYA VAID
BHUPENDER SINGH
DIVYA RAMACHANDRAN
GAURAV KASHYAP
CORRELATION ANALYSIS
 An analysis of relation of two or more variables is usually called correlation. –

A.M Tuttle
Significance
 Advantage of measuring association (or correlation) between two or more variable are
as under :
1. Aids in locating the critically important variables on which each other depends.
2. Reduce the range of uncertainty of our prediction. The prediction based on correlation
analysis will be more reliable and near reality.
3. In economic theory we come across several types of variables which shows some kind
of relationship for eg. There exists a relationship between Price, supply, quantity
demanded.
4. Convenience, amenities and service are related to customer satisfaction.
5. In area of health care such as how health care such as how health problems are
related to certain biological or environmental factors.
 A statistical technique that is used to analyze the strength
(Magnitude) and direction of the relation between two
quantitative variables is called correlation analysis.
 Coefficient of correlation- It is a number that indicates the
strength and direction of statistical relationship between two
variable.
1. r- Both x and y variable are measured on an interval or ratio
scale (numeric data)
 The correlation between two ratio scale variable is represented
by letter “r” which takes on value between -1 and +1 only.
Sometimes this measure is called the Pearson product moment
correlation or correlation coefficient.
Methods of correlation analysis
 Methods of finding the correlation coefficient between two
variable X and Y
1. Scatter diagram method
2. Karl’s Pearson coefficient correlation method
Question
The following data relate to age of employees and the number of days they
reported sick in a month. Calculate karl Pearson’s coefficient of correlation and
interpret it.
Employee Age Sick days

1 30 1
2 32 0
3 35 2
4 40 5
5 48 2
6 50 4
7 52 6
8 55 5
9 57 7
10 61 8
Spearman’s Rank correlation
coefficient
 This method is applied in a situation in which quantitative measures=
of certain qualitative factor such as judgment, brands personality,
TV programmes, leadership, colour, taste, cannot be fixed but
individual observation can be arranged into definite order (also
called rank).
Where R= Rank correlation coefficient, R1= rank of observations with

respect to first variable, R2= rank of observations with respect to the
second variable, d= R1-R2,difference in pair of ranks, n= number of
paired observations or individuals being ranked.
Advantage
 This method is easy to understand and its application is simpler than

pearson’s method.
 This method is useful for correlation analysis when variables are
expressed in a qualitative terms like beauty, intelligence, honesty,
efficiency and so on.
 This method is appropriate to measure the association between two
variables if the data type is at least ordinal scaled (ranked)
 The sample data of values of two variables is converted into ranks
either in the ascending order or descending order for calculating
the degree of correlation between two variables.
Question-
An office has 12 clerks. The long serving clerks feels that they should have seniority increment
based on the length of service built into their salary structure. An assessment of their efficiency
by their departmental manager and the personnel department produces a ranking of
efficiency.
Rank a/c to length of the service Rank A/c to efficency
1 2
2 3
3 5
4 1
5 9
6 10
7 11
8 12
9 8
10 7
11 6
12 4
Do the data support the clerks claim for seniority increment ?
Hypothesis 11

Testing procedure 12

Example 13

REGRESSION ANALYSIS
 It reveals average relationship between two or more variables and provides
mechanism for prediction or forecasting.
 There are two types of variables –Independent and Dependent
 This is a linear relationship of the form Y= a + bx
DIFFERENCE BETWEEN CORRELATION AND REGRESSION

In correlation the degree and direction In regression, the nature of relationship is
of relationship between variables is studied.
studied.
If value of one variable is known then The same can be done in regression
value of other variable cant be using functional relationships.
estimated.
Correlation lies between -1 and 1. Only one regression coefficient can be
greater than 1.
Correlation coefficient is independent of Regression coefficient is independent of
change of origin and scale. change of origin but not of scale.
WHAT ARE ITS USES?
 Estimates the values of dependent variable from the values of
independent variable through regression line.
 Another goal is to obtain the measure of error involved in using
regression line.
 We also calculate correlation coefficients with the help of
regression coefficients.
REGRESSION LINES
If we take two variables X and Y , we have two regression lines as
the regression line of X on Y and regression line of Y on X.
 Regression line of Y on X gives most probable values of Y for given
values of X in the form of Y= a+bX.
 Regression line of X on Y gives the most probable values of X for
given values of Y in the form of the equation X= a+bY.
TYPES OF REGRESSION MODELS
SIMPLE AND MULTIPLE REGRESSION MODEL
SIMPLE: If regression model characterizes a dependent variable X and only one independent variable Y
then it is a simple regression model.
MULTIPLE: If more than one independent variables are associated then it’s a multiple regression model.
LINEAR AND NON-LINEAR REGRESSION MODEL

LINEAR: If the value of a dependent variable Y increases in direct proportion to increase in the value of
independent variable X then it’s a linear model. It is expressed in the form
Y = β0 + β1X.
Y = β0 + β1X + e
Represented by a straight line.
NONLINEAR: Implies a varying absolute change in dependent variable with respect to changes in
independent variable in the form Y = a+𝑏 2 𝑋 The line passing through pair of values of variables is
curvilinear.
ESTIMATION:
METHOD OF LEAST SQUARES
 To estimate the values of regression coefficent β0 and β1, suppose a
sample of observations Xn , Yn is drawn from the population under
study.
 The method that provides the best linear unbiased estimates of β0
and β1 should result in a straight line that is best fit to the data points.
 The straight line so drawn is referred to as best fitted regression line
because the sum of the squares of the vertical deviations is as small
as possible.
 Hence the best fitted or estimated regression line is given by :
y^ = b0 + b1 x
where ^y called y hat is the value of lying on the fitted regression for a
given x value and e1 = yi – y^i is called the residual that describes the
error in fitting of the regression line to the observation yi .The fitted
value y^ is called the predicted value of y because if actual value of y is
not known, then it would be predicted for a given value of x using the
estimated regression line.
Assumptions for a simple linear
regression model
1. The relationship between the dependent variables y and

independent variable x exists and is linear. The average
relation between x and y can be described by a simple linear
regression equation y = bx+ e.
2. For every value of the independent variable x , there is an
expected or mean value of the dependent variable y and these
values are normally distributed. The mean of these normally
distributed values fall on the line of regeression.
3. The independent variable y is a continuous random variable,
whereas values of the independent variables x are fixed
values and are not random.
4. The sampling error associated with the expected value of the
dependent variable y is assumed to be an independent random
variable distributed normally with mean zero and constant
standard deviation.
5.The standard deviation and variance of expected values of the
dependent variable y about the regression lines are constant for
all values of the independent variable x within range of the
sample data.
6.The value of the dependent variable cannot be estimated for a
value of an independent variable lying outside the range of values
in the sample data.
Parameters of simple linear
regression model
 The device used for estimating the values of one variable from the
value of the other consists of a line through the points, drawn in such
a manner as to represent the average relationship between two
variables such a lines is called line of regression.
 The two variables x and y which are expressed in terms of each

other in the form of straight line equation called regression equations
Regression Coefficients
 To estimate value of population parameter β0 and β1, under certain

assumptions, the fitted or estimated equation representing the
straight line regression model is written as :
y^ = a + bx
where y^ = estimated average value of dependent variable y for a given
value of independent variable x
a or b0 = y- intercept that represent average value of y^
b = slope of regression line that represents the expected change
in the value of y for units change in the value of x.
Properties of regression
coefficients
1. The correlation coefficient is the geometric mean of two regression
coefficients that is , r = 𝑏𝑦𝑥 ∗ √𝑏𝑥𝑦
2. If one regression coefficient is greater than one, then other
regression coefficient must be less than one because the value of
correlation coefficient r cannot exceed one.
3. Both regression coefficients must have the same sign ( either
positive or negative). This property rules out the case of opposite
sign of two regression coefficients.
4. The correlation coefficients will have the same sign ( either positive
or negative) as that of the two regression coefficients.
METHODS TO DETERMINE
REGRESSION COEFFICIENTS
 Least Squares Normal Equation
 Deviation Method
 Regression Coefficients for grouped sampled data.

Correlation and Linear Regression

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Correlation and Linear Regression

Hochgeladen von

Copyright:

Verfügbare Formate

CORRELATION

 An analysis of relation of two or more variables is usually called correlation. –

Employee Age Sick days

Where R= Rank correlation coefficient, R1= rank of observations with

 This method is easy to understand and its application is simpler than

DIFFERENCE BETWEEN CORRELATION AND REGRESSION

LINEAR AND NON-LINEAR REGRESSION MODEL

1. The relationship between the dependent variables y and

 The two variables x and y which are expressed in terms of each

 To estimate value of population parameter β0 and β1, under certain

 Least Squares Normal Equation

 Regression Coefficients for grouped sampled data.

Das könnte Ihnen auch gefallen