Sie sind auf Seite 1von 41

1

CORRELATION AND REGRESSION ANALYSIS


SUBJECT: Analytical and Numerical Methods for Structural Engineers- ANSE
(3712013)

Prepared by: Presented to:


PIUS NYANZI (1807020006) Dr Subhanshu Goyal
STUDENT: M.E CIVIL (STRUCTURAL) Head, Dept. of Mathematics (MEFGI)

MEFGI - GTU 10/12/2018


2 CONTENT

▪ Introduction
▪ Scatter diagrams
▪ Correlation analysis
o Pearson correlation coefficient with example
o Spearman rank correlation coefficient with example
o Kendall’s rank correlation coefficient with example
o Differences between Spearman and Kendall’s tau
▪ Regression Analysis
o Regression (curve fitting)
o Methods of regression
o Multiple regression model
▪ Some Statistical software Packages for regression analysis
▪ Conclusion
MEFGI - GTU 10/12/2018
3 CORRELLATION AND REGRESSION – Introduction

▪ Scientists and engineers always face the task of estimating the


values of dependent variable y for an intermediate value of the
independent variable x , given the discrete data points (x,y).

The data available belongs to main categories:


1. Values of well-defined functions e.g. log tables, trigonometric
tables, interest tables
2. Data values from experiment. E.g. the relationship between stress
and strain on a metal strip, voltage applied and speed of fan, drag
force and velocity of a falling body. Here the relationship is not well
defined.

MEFGI - GTU 10/12/2018


SCATTER DIAGRAMS
4

 A scatter diagram is a diagram that shows the values of two variables X and Y , along with
the way in which these two variables relate to each other.

MEFGI - GTU 10/12/2018


Scatter diagrams
Steel 67 69 85 83 74 81 97 97 114 85
5
bar
Temp
(oC)
Length 120 125 140 160 130 180 150 140 200 130
(mm)
Length (y)
mm

10/12/2018
MEFGI - GTU Temp. (x) oC
6 CORRELATION
 Correlation is a bivariate analysis that measures the strength of relationship
or association between two variables and the direction of the relationship.

 Finding the relationship between two quantitative variables

 Correlation coefficient:
Statistic showing the degree of relation between two variables

MEFGI - GTU 10/12/2018


Correlation coefficient
7

 In terms of the strength of relationship, the value of the correlation


coefficient varies between +1 and -1.
 The direction of the relationship is indicated by the sign of the
coefficient; a + sign indicates a positive relationship and a – sign
indicates a negative relationship.
 Usually, in statistics four types of correlations in statistics:

i. Pearson correlation
ii. Spearman correlation
iii. Kendall rank correlation

MEFGI - GTU 10/12/2018


8 Pearson correlation (r)

 xy −  x y
r= n
 ( x) 2   ( y)2 
x −
2 .  y −
2 
 n    n 

 The value of r ranges between (-1) and ( +1)

 The value of r denotes the strength of the relationship, the sign


denotes direction

MEFGI - GTU 10/12/2018


9
Pearson correlation (r)
strong intermediate weak weak intermediate strong

-1 -0.75 -0.25 0
0.25 0.75 1
indirect Direct
no relation
perfect perfect
correlation correlation

If r = Zero this means no association or correlation between the two variables.

If 0 < r < 0.25 = weak correlation.

If 0.25 ≤ r < 0.75 = intermediate correlation.

If 0.75 ≤ r < 1 = strong correlation.

If r = l = perfect correlation
MEFGI - GTU 10/12/2018
Example1 -Pearson correlation
10 A sample of 6 concrete cubes was selected, data about their age
in days and strength in N/mm2 was recorded as shown in the
following table . It is required to find the correlation between age
and weight.
serial Age Strength
No (days) (N/mm2)
1 7 12
2 6 8
3 8 12
4 5 10
5 6 11
6 9 13
MEFGI - GTU 10/12/2018
11
Example1 -Pearson correlation

▪ Independent variable (x) – Age


▪ Dependent variable (y)
▪ Simple correlation coefficient :

MEFGI - GTU 10/12/2018


• Pearson correlation coefficient
12
Age Strength
Serial
(days) (N/mm2) xy X2 Y2
n.
(x) (y)
1 7 12 84 49 144
2 6 8 48 36 64
3 8 12 96 64 144
4 5 10 50 25 100
5 6 11 66 36 121
6 9 13 117 81 169
Total ∑x= ∑y= ∑xy= ∑X2= ∑Y2=
41 66 461 291 742
MEFGI - GTU 10/12/2018
13 Example1 -Pearson correlation
41  66
461 −
r= 6
 (41) 2   (66) 2 
291 − .742 − 
 6  6 

• r = 0.759 (strong direct correlation)


Interpretation
• There is a strong positive correlation between the number of days of
concrete cubes and the strength of concrete, since r is very close to 1.

MEFGI - GTU 10/12/2018
14 Spearman correlation coefficient (rs)
• It is a non-parametric measure of correlation makes use of the two
sets of ranks assigned to the variables
6 (di) 2
rs = 1 −
n(n 2 − 1)
• Spearman Rank correlation coefficient could be computed in
the following cases:
I. Both variables are quantitative.
II. Both variables are qualitative ordinal.
III. One variable is quantitative and the other is qualitative ordinal.
MEFGI - GTU 10/12/2018
15
Spearman correlation coefficient
Procedure

▪ Rank the values of X from 1 to n where n is the numbers of pairs of


values of X and Y in the sample.
▪ Rank the values of Y from 1 to n.
▪ Compute the value of di for each pair of observation by subtracting
the rank of Yi from the rank of Xi
▪ Square each di and compute ∑(di)2
▪ which is the sum of the squared values.

MEFGI - GTU 10/12/2018


Example-2 Spearman correlation coefficient
16
In a study of the relationship between level education and income
the following data was obtained. Find the relationship between
them and comment.
sample level education Income
numbers (X) (Y)
A Preparatory. 25
B Primary. 10
C University. 8
D secondary 10
E secondary 15
F illiterate 50
G University. 60
MEFGI - GTU 10/12/2018
Example-2 Spearman correlation coefficient
17
Rank Rank di di2
(X) (Y) X Y
A Preparatory 25 5 3 2 4

B Primary. 10 6 5.5 0.5 0.25


C University. 8 1.5 7 -5.5 30.25
D secondary 10 3.5 5.5 -2 4
E secondary 15 3.5 4 -0.5 0.25
F illiterate 50 7 2 5 25
G university. 60 1.5 1 0.5 0.25

∑(di)2=64
(rs)=-0.1 A negative (indirect) weak correlation
MEFGI - GTU 10/12/2018
18 Kendall rank correlation coefficient, tau
• Kendall rank correlation is a non-parametric test that measures the
degree of concordance between 2 columns of ranked data.

• Range, -1.0 and +1.0 just like, r and rs

• Kendall’s tau = (C – D) / (C + D)
C – No of concordant pairs
D – No of discordant pairs

• Kendall's rank correlation improves upon this by reflecting the strength


of the dependence between the variables Spearman coefficient
being compared.
MEFGI - GTU 10/12/2018
Example 3. Kendall’s tau
19

Income Rank X Rank Y C D


SAMPLE Educ. Level (X) (Y) Rank X Rank Y 1.5 7 0 6
A Preparatory 25 5 3 1.5 1 5 0
B Primary. 10 6 5.5 3.5 5.5 0 3
C University. 8 1.5 7 3.5 4 1 2
D secondary 10 3.5 5.5 5 3 1 2
E secondary 15 3.5 4 6 5.5 0 1
F illiterate 50 7 2 7 2
G university. 60 1.5 1 7 14

tau = (C – D) / (C + D)
= (7- 14) / (7 + 14) = -0.33 ( -ve Weak Relationship)

Spearman, (rs)=-0.1
MEFGI - GTU 10/12/2018
20 Pearson Vs Spearman rs Vs Kendall’s tau
▪ Parameteric statistic ▪ Non- Parameteric statistic

▪ rs is usually greater than tau, for tau = (C – D) / (C + D)


most cases (rs)=-0.1 tau = -0.33
▪ Parametric methods produce
more accurate and precise
estimates than non-parametric
methods.
MEFGI - GTU 10/12/2018
21
Regression Analysis
▪ Regression analysis is a form of predictive modelling technique
which investigates the relationship between a dependent (y) and
independent variable (x) (predictor).

▪ Technique is used for forecasting and finding the cause- effect


relationship between the variables.

▪ For example
1) Relationship between strength of concrete and number of
curing days
2) Relationship between strength of road subgrade with lime
content, ground temperature and delay in compaction

MEFGI - GTU 10/12/2018


22 Methods of regression
1. Graphical methods
2. Method of group averages
3. Method of moments
4. Method of least squares

▪ Graphical method and the method of averages fail to give the


values of the unknown constants uniquely and accurately while other
methods do.
▪ The method of least squares is the best to fit a unique curve to a
given data. It is also widely used in applications and can be easily
implemented on a computer.

MEFGI - GTU 10/12/2018


23 Graphical methods

MEFGI - GTU 10/12/2018


24 Graphical methods

MEFGI - GTU 10/12/2018


25
Graphical methods

MEFGI - GTU 10/12/2018


26 Method of group averages

MEFGI - GTU 10/12/2018


27 Method of group averages-Example

r = a + bt , r = 1090.26 – 0.534t

MEFGI - GTU 10/12/2018


28 Method of moments

MEFGI - GTU 10/12/2018


29 Method of moments - Example

MEFGI - GTU 10/12/2018


30 Method of least squares

• We need to minimise the sum


of squares of the errors
Vertical distance between pt (xi, yi) =
error

MEFGI - GTU 10/12/2018


31 Method of least squares (MLS)

▪ To minimise the sum of the squares of the error

MLS can be used to fit the data under the following situations
1. Relationship is linear y = f(x) = a + bx
2. Relationship is a polynomial f(x) = a + bx + bx + cx2
3. Relationship is transcendental f(x)=aeb
4. Multiple linear regression

MEFGI - GTU 10/12/2018


32 Method of least squares (MLS) – linear regression

Relationship is linear y = f(x) = a + bx

…………………..eqn (1)

…………………..eqn (2)

MEFGI - GTU 10/12/2018


33 Method of least squares (MLS) – linear regression

MEFGI - GTU 10/12/2018


34 Method of least squares (MLS) – polynomial
relationship (second order) - Example

y=a1 + a2 x + a3x2
Normal equations are as below;

MEFGI - GTU 10/12/2018


35 Method of least squares (MLS) – polynomial
relationship (second order) - Example

MEFGI - GTU 10/12/2018


36 Multiple linear regression model
Helps to learn more about the relationship between several
independent or predictor variables and a dependent or criterion
variable.

Example. To study the relationship between strength of road subgrade


(Y) with lime content (A), ground temperature (B) and delay in
compaction (C)

MEFGI - GTU 10/12/2018


37 Multiple linear regression model -Example
Subgrade Lime Ground Delay in
strength content temperature compacti
(CBR) -Y (%)-A /C -B on (Hrs) -C
68.5 2 25 0.25
98.9 4 30 0.5
102.5 6 35 0.75
120.5 8 40 1
99.8 10 45 1.25
99.9 12 50 1.5
85 14 55 1.75
Using SPSS a regression model was obtained as

MEFGI - GTU 10/12/2018


38 Some Statistical packages for correlation and
regression analysis
▪ Ms Excel
▪ SPSS
▪ MATLAB
▪ Stata
▪ Statistica
▪ StatXact
▪ Systat

MEFGI - GTU 10/12/2018


39 Conclusion
▪ Correlation coefficient measures the strength and direction between two variables
are related
▪ Pearson correlation coefficient is better for parametric statistics whereas Spearman
coefficient is better for non parametric statics
▪ Method of squares minimises the sum of the errors or vertical distances around the
regression line. It’s best compared to other methods
▪ A multiple regression model gives the relationship between on dependent variable
(y) and other independent variables A, B, C

MEFGI - GTU 10/12/2018


40 References
▪ Numerical methods in Engineering and Science. Dr B.S Grewal
▪ Numerical Methods by E Balagurusany
▪ Numerical Methods in Engineering with Matlab by Jaan Kiusalaas
▪ Statistics Solutions -http://www.statisticssolutions.com
▪ An investigation into field factors that affect the strength of Compacted
lime stabilised clay for subgrade construction. P. NYANZI and Odongo
Parsley, (2015)

MEFGI - GTU 10/12/2018


41

MEFGI - GTU 10/12/2018

Das könnte Ihnen auch gefallen