Sie sind auf Seite 1von 41 1
(3712013)
Prepared by: CORRELATION AND REGRESSION ANALYSIS

SUBJECT: Analytical and Numerical Methods for Structural Engineers- ANSE

Presented to:

Dr Subhanshu Goyal

PIUS NYANZI (1807020006)

STUDENT: M.E CIVIL (STRUCTURAL)

MEFGI - GTU

10/12/2018 2
o
o
o
o
o
o
o

CONTENT

Introduction

Scatter diagrams

Correlation analysis Pearson correlation coefficient with example

Spearman rank correlation coefficient with example Kendall’s rank correlation coefficient with example Differences between Spearman and Kendall’s tau

Regression Analysis

Regression (curve fitting) Methods of regression Multiple regression model

Some Statistical software Packages for regression analysis

Conclusion

MEFGI - GTU

10/12/2018 3

CORRELLATION AND REGRESSION Introduction

Scientists and engineers always face the task of estimating the values of dependent variable y for an intermediate value of the independent variable x , given the discrete data points (x,y).

The data available belongs to main categories:

1. Values of well-defined functions e.g. log tables, trigonometric tables, interest tables 2. Data values from experiment. E.g. the relationship between stress and strain on a metal strip, voltage applied and speed of fan, drag force and velocity of a falling body. Here the relationship is not well defined.

MEFGI - GTU

10/12/2018 4

SCATTER DIAGRAMS

A scatter diagram is a diagram that shows the values of two variables X and Y , along with the way in which these two variables relate to each other.  MEFGI - GTU

10/12/2018 Scatter diagrams
Steel
67
69
85
83
74
81
97
97
114
85
5
bar
Temp
(
o C)
Length
120
125
140
160
130
180
150
140
200
130
(mm)
10/12/2018
Temp. (x) o C
MEFGI - GTU
Length (y)
mm 6

CORRELATION

Correlation is a bivariate analysis that measures the strength of relationship or association between two variables and the direction of the relationship.

Finding the relationship between two quantitative variables

Correlation coefficient:

Statistic showing the degree of relation between two variables

MEFGI - GTU

10/12/2018 7
i.
ii.
iii.

Correlation coefficient

In terms of the strength of relationship, the value of the correlation

coefficient varies between +1 and -1.

The direction of the relationship is indicated by the sign of the coefficient; a + sign indicates a positive relationship and a sign indicates a negative relationship.

Usually, in statistics four types of correlations in statistics:

Pearson correlation

Spearman correlation

Kendall rank correlation

MEFGI - GTU

10/12/2018 8
 
x
y
xy −
n
r =
2
2
(
x)  
(
y) 
2
2
x
.
y
n
n
 
MEFGI - GTU

Pearson correlation (r)

The value of r ranges between (-1) and ( +1)

The value of r denotes the strength of the relationship, the sign denotes direction

10/12/2018 Pearson correlation (r)
9
strong
intermediate
weak
weak
strong
intermediate
-1
-0.75
-0.25
0
0.75
1
0.25
indirect
Direct
no relation
perfect
perfect
correlation
correlation
If r = Zero this means no association or correlation between the two variables.
If 0 < r < 0.25 = weak correlation.
If 0.25 ≤ r < 0.75 = intermediate correlation.
If 0.75 ≤ r < 1 = strong correlation.
If r = l = perfect correlation
MEFGI - GTU
10/12/2018 Example1 -Pearson correlation
10
serial
Age
No
(days)
Strength
(N/mm 2 )
1
7
12
2
6
8
3
8
12
4
5
10
5
6
11
6
9
13
MEFGI - GTU

A sample of 6 concrete cubes was selected, data about their age

in days and strength in N/mm 2 was recorded as shown in the following table . It is required to find the correlation between age and weight.

10/12/2018 11

Example1 -Pearson correlation

Independent variable (x) Age

Dependent variable (y)

Simple correlation coefficient : MEFGI - GTU

10/12/2018 Pearson correlation coefficient
12
Age
Serial
(days)
Strength
(N/mm 2 )
xy
X
2
Y
2
n.
(x)
(y)
1
7
12
84
49
144
2
6
8
48
36
64
3
8
12
96
64
144
4
5
10
50
25
100
5
6
11
66
36
121
6
9
13
117
81
169
Total
∑x=
∑y=
∑xy=
∑X 2 =
∑Y 2 =
41
66
461
291
742
MEFGI - GTU

10/12/2018 Example1 -Pearson correlation
13
4 1
6 6
4 6 1 −
6
r =
2
2
(41)
 
(66)
2 9 1 −
. 7 4 2 −
6
6
 
r = 0.759 (strong direct correlation)
Interpretation
There is a strong positive correlation between the number of days of
concrete cubes and the strength of concrete, since r is very close to 1.
MEFGI - GTU
10/12/2018 14
I.
II.
III.

Spearman correlation coefficient (r s )

It is a non-parametric measure of correlation makes use of the two sets of ranks assigned to the variables

r

s

= −

1

6

(di)

2

n(n

2

1)

Spearman Rank correlation coefficient could be computed in

the following cases:

Both variables are quantitative. Both variables are qualitative ordinal. One variable is quantitative and the other is qualitative ordinal.

MEFGI - GTU

10/12/2018 15
Procedure

Spearman correlation coefficient

Rank the values of X from 1 to n where n is the numbers of pairs of

values of X and Y in the sample.

Rank the values of Y from 1 to n.

Compute the value of di for each pair of observation by subtracting the rank of Yi from the rank of Xi

Square each di and compute ∑(di) 2

which is the sum of the squared values.

MEFGI - GTU

10/12/2018 16

Example-2 Spearman correlation coefficient

In a study of the relationship between level education and income

the following data was obtained. Find the relationship between them and comment. sample
level education
Income
numbers
(X)
(Y)
A
Preparatory.
25
B
Primary.
10
C
University.
8
D
secondary
10
E
secondary
15
F
illiterate
50
G
University.
60

MEFGI - GTU

10/12/2018 Example-2 Spearman correlation coefficient
17
Rank
Rank
di
di
2
(X)
(Y)
X
Y
A
Preparatory
25
5
3
2
4
B
Primary.
10
6
5.5
0.5
0.25
C
University.
8
1.5
7
-5.5
30.25
D
secondary
10
3.5
5.5
-2
4
E
secondary
15
3.5
4
-0.5
0.25
F
illiterate
50
7
2
5
25
G
university.
60
1.5
1
0.5
0.25
(r s )=-0.1
∑(di) 2 =64
A negative (indirect) weak correlation
MEFGI - GTU

10/12/2018 18
C
D

Kendall rank correlation coefficient, tau

Kendall rank correlation is a non-parametric test that measures the degree of concordance between 2 columns of ranked data.

Range, -1.0 and +1.0 just like, r and r s

Kendalls tau = (C D) / (C + D)

No of concordant pairs No of discordant pairs

Kendall's rank correlation improves upon this by reflecting the strength

of the dependence between the variables Spearman coefficient being compared.

MEFGI - GTU

10/12/2018 Example 3. Kendall’s tau
19
Income
SAMPLE
Educ. Level (X)
(Y)
Rank X
Rank Y
A
Preparatory
25
5
3
B
Primary.
10
6
5.5
C
University.
8
1.5
7
D
secondary
10
3.5
5.5
E
secondary
15
3.5
4
F
illiterate
50
7
2
G
university.
60
1.5
1
Spearman, (r s )=-0.1
MEFGI - GTU
 Rank X Rank Y C D 1.5 7 0 6 1.5 1 5 0 3.5 5.5 0 3 3.5 4 1 2 5 3 1 2 6 5.5 0 1 7 2 7 14

tau = (C D) / (C + D) = (7- 14) / (7 + 14) = -0.33 ( -ve Weak Relationship)

10/12/2018 20
Parameteric statistic
most cases
(r
s )=-0.1
methods.
MEFGI - GTU

Pearson Vs Spearman rs Vs Kendall’s tau

Non- Parameteric statistic r s is usually greater than tau, for

tau = -0.33

tau = (C D) / (C + D)

Parametric methods produce

more accurate and precise

estimates than non-parametric

10/12/2018 21

Regression Analysis

Regression analysis is a form of predictive modelling technique

which investigates the relationship between a dependent (y) and independent variable (x) (predictor).

Technique is used for forecasting and finding the cause- effect

relationship between the variables.

For example

1) Relationship between strength of concrete and number of

curing days 2) Relationship between strength of road subgrade with lime content, ground temperature and delay in compaction

MEFGI - GTU

10/12/2018 22
1.
2.
3.
4.

Methods of regression

Graphical methods Method of group averages Method of moments Method of least squares

Graphical method and the method of averages fail to give the values of the unknown constants uniquely and accurately while other methods do.

The method of least squares is the best to fit a unique curve to a

given data. It is also widely used in applications and can be easily implemented on a computer.

MEFGI - GTU

10/12/2018 23

Graphical methods  MEFGI - GTU

10/12/2018 24

Graphical methods   MEFGI - GTU

10/12/2018 25

Graphical methods MEFGI - GTU

10/12/2018 26

Method of group averages  MEFGI - GTU

10/12/2018 27

Method of group averages-Example  r = a + bt , r = 1090.26 0.534t

MEFGI - GTU

10/12/2018 28

Method of moments MEFGI - GTU

10/12/2018 29

Method of moments - Example     MEFGI - GTU

10/12/2018 30
error Method of least squares  We need to minimise the sum

of squares of the errors Vertical distance between pt (xi, yi) = MEFGI - GTU

10/12/2018 31
1.
2.
3.
4.

Method of least squares (MLS)

To minimise the sum of the squares of the error MLS can be used to fit the data under the following situations Relationship is linear y = f(x) = a + bx

Relationship is a polynomial f(x) = a + bx + bx + cx 2 Relationship is transcendental f(x)=ae b Multiple linear regression

MEFGI - GTU

10/12/2018 32

Method of least squares (MLS) linear regression

Relationship is linear y = f(x) = a + bx     eqn (1)

…………………

eqn (2)

…………………

MEFGI - GTU

10/12/2018 33

Method of least squares (MLS) linear regression  MEFGI - GTU

10/12/2018 34

Method of least squares (MLS) polynomial

relationship (second order) - Example y=a 1 + a 2 x + a 3 x 2 Normal equations are as below;  MEFGI - GTU

10/12/2018 35

Method of least squares (MLS) polynomial

relationship (second order) - Example    MEFGI - GTU

10/12/2018 36
variable.

Multiple linear regression model

Helps to learn more about the relationship between several independent or predictor variables and a dependent or criterion (Y) with lime content (A), ground temperature (B) and delay in

compaction (C)

MEFGI - GTU

10/12/2018 37
Lime
Ground
strength
content
temperature
Delay in
compacti
(CBR) -Y
(%)-A
/C -B
on (Hrs) -C
68.5
2
25
0.25
98.9
4
30
0.5
102.5
6
35
0.75
120.5
8
40
1
99.8
10
45
1.25
99.9
12
50
1.5
85
14
55
1.75
Using SPSS a regression model was obtained as
MEFGI - GTU

Multiple linear regression model -Example

10/12/2018 38
Ms Excel
SPSS
▪ MATLAB
Stata
Statistica
StatXact
Systat
MEFGI - GTU

Some Statistical packages for correlation and regression analysis

10/12/2018 39
are related

Conclusion

Correlation coefficient measures the strength and direction between two variables

Pearson correlation coefficient is better for parametric statistics whereas Spearman coefficient is better for non parametric statics

Method of squares minimises the sum of the errors or vertical distances around the

regression line. It’s best compared to other methods

A multiple regression model gives the relationship between on dependent variable (y) and other independent variables A, B, C

MEFGI - GTU

10/12/2018 40

References

Numerical methods in Engineering and Science. Dr B.S Grewal

Numerical Methods by E Balagurusany

Numerical Methods in Engineering with Matlab by Jaan Kiusalaas

Statistics Solutions -http://www.statisticssolutions.com

An investigation into field factors that affect the strength of Compacted

P. NYANZI and Odongo

lime stabilised clay for subgrade construction.

Parsley, (2015)

MEFGI - GTU

10/12/2018 41
MEFGI - GTU

10/12/2018