Sie sind auf Seite 1von 41
1 (3712013) Prepared by:
1
(3712013)
Prepared by:
1 (3712013) Prepared by: CORRELATION AND REGRESSION ANALYSIS SUBJECT : Analytical and Numerical Methods for Structural

CORRELATION AND REGRESSION ANALYSIS

SUBJECT: Analytical and Numerical Methods for Structural Engineers- ANSE

Presented to:

Dr Subhanshu Goyal

Head, Dept. of Mathematics (MEFGI)

PIUS NYANZI (1807020006)

STUDENT: M.E CIVIL (STRUCTURAL)

MEFGI - GTU

10/12/2018

2 ▪ ▪ ▪ o o o o ▪ o o o ▪ ▪
2
o
o
o
o
o
o
o

CONTENT

Introduction

Scatter diagrams

Correlation analysis Pearson correlation coefficient with example

Spearman rank correlation coefficient with example Kendall’s rank correlation coefficient with example Differences between Spearman and Kendall’s tau

Regression Analysis

Regression (curve fitting) Methods of regression Multiple regression model

Some Statistical software Packages for regression analysis

Conclusion

MEFGI - GTU

10/12/2018

3
3

CORRELLATION AND REGRESSION Introduction

Scientists and engineers always face the task of estimating the values of dependent variable y for an intermediate value of the independent variable x , given the discrete data points (x,y).

The data available belongs to main categories:

1. Values of well-defined functions e.g. log tables, trigonometric tables, interest tables 2. Data values from experiment. E.g. the relationship between stress and strain on a metal strip, voltage applied and speed of fan, drag force and velocity of a falling body. Here the relationship is not well defined.

MEFGI - GTU

10/12/2018

4 
4

SCATTER DIAGRAMS

A scatter diagram is a diagram that shows the values of two variables X and Y , along with the way in which these two variables relate to each other.

X and Y , along with the way in which these two variables relate to each
X and Y , along with the way in which these two variables relate to each

MEFGI - GTU

10/12/2018

Scatter diagrams Steel 67 69 85 83 74 81 97 97 114 85 5 bar
Scatter diagrams
Steel
67
69
85
83
74
81
97
97
114
85
5
bar
Temp
(
o C)
Length
120
125
140
160
130
180
150
140
200
130
(mm)
10/12/2018
Temp. (x) o C
MEFGI - GTU
Length (y)
mm
6   
6

CORRELATION

Correlation is a bivariate analysis that measures the strength of relationship or association between two variables and the direction of the relationship.

Finding the relationship between two quantitative variables

Correlation coefficient:

Statistic showing the degree of relation between two variables

MEFGI - GTU

10/12/2018

7 i. ii. iii.
7
i.
ii.
iii.

Correlation coefficient

In terms of the strength of relationship, the value of the correlation

coefficient varies between +1 and -1.

The direction of the relationship is indicated by the sign of the coefficient; a + sign indicates a positive relationship and a sign indicates a negative relationship.

Usually, in statistics four types of correlations in statistics:

Pearson correlation

Spearman correlation

Kendall rank correlation

MEFGI - GTU

10/12/2018

8   x y  xy − n r = 2 2  (
8
 
x
y
xy −
n
r =
2
2
(
x)  
(
y) 
2
2
x
.
y
n
n
 
MEFGI - GTU

Pearson correlation (r)

The value of r ranges between (-1) and ( +1)

The value of r denotes the strength of the relationship, the sign denotes direction

10/12/2018

Pearson correlation (r) 9 strong intermediate weak weak strong intermediate -1 -0.75 -0.25 0 0.75
Pearson correlation (r)
9
strong
intermediate
weak
weak
strong
intermediate
-1
-0.75
-0.25
0
0.75
1
0.25
indirect
Direct
no relation
perfect
perfect
correlation
correlation
If r = Zero this means no association or correlation between the two variables.
If 0 < r < 0.25 = weak correlation.
If 0.25 ≤ r < 0.75 = intermediate correlation.
If 0.75 ≤ r < 1 = strong correlation.
If r = l = perfect correlation
MEFGI - GTU
10/12/2018
Example1 -Pearson correlation 10 serial Age No (days) Strength (N/mm 2 ) 1 7 12
Example1 -Pearson correlation
10
serial
Age
No
(days)
Strength
(N/mm 2 )
1
7
12
2
6
8
3
8
12
4
5
10
5
6
11
6
9
13
MEFGI - GTU

A sample of 6 concrete cubes was selected, data about their age

in days and strength in N/mm 2 was recorded as shown in the following table . It is required to find the correlation between age and weight.

10/12/2018

11 ▪ ▪ ▪
11

Example1 -Pearson correlation

Independent variable (x) Age

Dependent variable (y)

Simple correlation coefficient :

variable (x) – Age Dependent variable (y) Simple correlation coefficient : M E F G I

MEFGI - GTU

10/12/2018

• Pearson correlation coefficient 12 Age Serial (days) Strength (N/mm 2 ) xy X 2
Pearson correlation coefficient
12
Age
Serial
(days)
Strength
(N/mm 2 )
xy
X
2
Y
2
n.
(x)
(y)
1
7
12
84
49
144
2
6
8
48
36
64
3
8
12
96
64
144
4
5
10
50
25
100
5
6
11
66
36
121
6
9
13
117
81
169
Total
∑x=
∑y=
∑xy=
∑X 2 =
∑Y 2 =
41
66
461
291
742
MEFGI - GTU

10/12/2018

Example1 -Pearson correlation 13 4 1  6 6 4 6 1 − 6 r
Example1 -Pearson correlation
13
4 1
6 6
4 6 1 −
6
r =
2
2
(41)
 
(66)
2 9 1 −
. 7 4 2 −
6
6
 
r = 0.759 (strong direct correlation)
Interpretation
There is a strong positive correlation between the number of days of
concrete cubes and the strength of concrete, since r is very close to 1.
MEFGI - GTU
10/12/2018
14 • • I. II. III.
14
I.
II.
III.

Spearman correlation coefficient (r s )

It is a non-parametric measure of correlation makes use of the two sets of ranks assigned to the variables

r

s

= −

1

6

(di)

2

n(n

2

1)

Spearman Rank correlation coefficient could be computed in

the following cases:

Both variables are quantitative. Both variables are qualitative ordinal. One variable is quantitative and the other is qualitative ordinal.

MEFGI - GTU

10/12/2018

15 Procedure ▪ ▪ ▪ ▪
15
Procedure

Spearman correlation coefficient

Rank the values of X from 1 to n where n is the numbers of pairs of

values of X and Y in the sample.

Rank the values of Y from 1 to n.

Compute the value of di for each pair of observation by subtracting the rank of Yi from the rank of Xi

Square each di and compute ∑(di) 2

which is the sum of the squared values.

MEFGI - GTU

10/12/2018

16
16

Example-2 Spearman correlation coefficient

In a study of the relationship between level education and income

the following data was obtained. Find the relationship between them and comment.

sample level education Income numbers (X) (Y) A Preparatory. 25 B Primary. 10 C University.
sample
level education
Income
numbers
(X)
(Y)
A
Preparatory.
25
B
Primary.
10
C
University.
8
D
secondary
10
E
secondary
15
F
illiterate
50
G
University.
60

MEFGI - GTU

10/12/2018

Example-2 Spearman correlation coefficient 17 Rank Rank di di 2 (X) (Y) X Y A
Example-2 Spearman correlation coefficient
17
Rank
Rank
di
di
2
(X)
(Y)
X
Y
A
Preparatory
25
5
3
2
4
B
Primary.
10
6
5.5
0.5
0.25
C
University.
8
1.5
7
-5.5
30.25
D
secondary
10
3.5
5.5
-2
4
E
secondary
15
3.5
4
-0.5
0.25
F
illiterate
50
7
2
5
25
G
university.
60
1.5
1
0.5
0.25
(r s )=-0.1
∑(di) 2 =64
A negative (indirect) weak correlation
MEFGI - GTU

10/12/2018

18 • • • C D •
18
C
D

Kendall rank correlation coefficient, tau

Kendall rank correlation is a non-parametric test that measures the degree of concordance between 2 columns of ranked data.

Range, -1.0 and +1.0 just like, r and r s

Kendalls tau = (C D) / (C + D)

No of concordant pairs No of discordant pairs

Kendall's rank correlation improves upon this by reflecting the strength

of the dependence between the variables Spearman coefficient being compared.

MEFGI - GTU

10/12/2018

Example 3. Kendall’s tau 19 Income SAMPLE Educ. Level (X) (Y) Rank X Rank Y
Example 3. Kendall’s tau
19
Income
SAMPLE
Educ. Level (X)
(Y)
Rank X
Rank Y
A
Preparatory
25
5
3
B
Primary.
10
6
5.5
C
University.
8
1.5
7
D
secondary
10
3.5
5.5
E
secondary
15
3.5
4
F
illiterate
50
7
2
G
university.
60
1.5
1
Spearman, (r s )=-0.1
MEFGI - GTU

Rank X

Rank Y

C

D

1.5

7 0

 

6

1.5

1 5

 

0

3.5

5.5

0

3

3.5

4

1

2

5

3

1

2

6

5.5

0

1

7

2

   
   

7

14

tau = (C D) / (C + D) = (7- 14) / (7 + 14) = -0.33 ( -ve Weak Relationship)

10/12/2018

20 ▪ Parameteric statistic ▪ most cases (r s )=-0.1 ▪ methods. MEFGI - GTU
20
Parameteric statistic
most cases
(r
s )=-0.1
methods.
MEFGI - GTU

Pearson Vs Spearman rs Vs Kendall’s tau

Non- Parameteric statistic

rs Vs Kendall’s tau ▪ Non- Parameteric statistic r s is usually greater than tau, for

r s is usually greater than tau, for

tau = -0.33

tau = (C D) / (C + D)

Parametric methods produce

more accurate and precise

estimates than non-parametric

10/12/2018

21 ▪ ▪ ▪
21

Regression Analysis

Regression analysis is a form of predictive modelling technique

which investigates the relationship between a dependent (y) and independent variable (x) (predictor).

Technique is used for forecasting and finding the cause- effect

relationship between the variables.

For example

1) Relationship between strength of concrete and number of

curing days 2) Relationship between strength of road subgrade with lime content, ground temperature and delay in compaction

MEFGI - GTU

10/12/2018

22 1. 2. 3. 4.
22
1.
2.
3.
4.

Methods of regression

Graphical methods Method of group averages Method of moments Method of least squares

Graphical method and the method of averages fail to give the values of the unknown constants uniquely and accurately while other methods do.

The method of least squares is the best to fit a unique curve to a

given data. It is also widely used in applications and can be easily implemented on a computer.

MEFGI - GTU

10/12/2018

23
23

Graphical methods

23 Graphical methods M E F G I - G T U 10/12/2018
23 Graphical methods M E F G I - G T U 10/12/2018

MEFGI - GTU

10/12/2018

24
24

Graphical methods

24 Graphical methods M E F G I - G T U 10/12/2018
24 Graphical methods M E F G I - G T U 10/12/2018
24 Graphical methods M E F G I - G T U 10/12/2018

MEFGI - GTU

10/12/2018

25
25

Graphical methods

25 Graphical methods M E F G I - G T U 10/12/2018

MEFGI - GTU

10/12/2018

26
26

Method of group averages

26 Method of group averages M E F G I - G T U 10/12/2018
26 Method of group averages M E F G I - G T U 10/12/2018

MEFGI - GTU

10/12/2018

27
27

Method of group averages-Example

27 Method of group averages-Example r = a + bt , r = 1090.26 – 0.534t
27 Method of group averages-Example r = a + bt , r = 1090.26 – 0.534t

r = a + bt , r = 1090.26 0.534t

MEFGI - GTU

10/12/2018

28
28

Method of moments

28 Method of moments M E F G I - G T U 10/12/2018

MEFGI - GTU

10/12/2018

29
29

Method of moments - Example

29 Method of moments - Example M E F G I - G T U 10/12/2018
29 Method of moments - Example M E F G I - G T U 10/12/2018
29 Method of moments - Example M E F G I - G T U 10/12/2018
29 Method of moments - Example M E F G I - G T U 10/12/2018
29 Method of moments - Example M E F G I - G T U 10/12/2018

MEFGI - GTU

10/12/2018

30 error
30
error
30 error Method of least squares • We need to minimise the sum of squares of

Method of least squares

30 error Method of least squares • We need to minimise the sum of squares of
30 error Method of least squares • We need to minimise the sum of squares of

We need to minimise the sum

of squares of the errors

• We need to minimise the sum of squares of the errors Vertical distance between pt

Vertical distance between pt (xi, yi) =

the sum of squares of the errors Vertical distance between pt (xi, yi) = M E

MEFGI - GTU

10/12/2018

31 ▪ 1. 2. 3. 4.
31
1.
2.
3.
4.

Method of least squares (MLS)

To minimise the sum of the squares of the error

(MLS) To minimise the sum of the squares of the error MLS can be used to

MLS can be used to fit the data under the following situations

can be used to fit the data under the following situations Relationship is linear y =

Relationship is linear y = f(x) = a + bx

Relationship is a polynomial f(x) = a + bx + bx + cx 2 Relationship is transcendental f(x)=ae b Multiple linear regression

MEFGI - GTU

10/12/2018

32
32

Method of least squares (MLS) linear regression

Relationship is linear y = f(x) = a + bx

linear regression Relationship is linear y = f(x) = a + bx eqn (1) ………………… eqn
linear regression Relationship is linear y = f(x) = a + bx eqn (1) ………………… eqn
linear regression Relationship is linear y = f(x) = a + bx eqn (1) ………………… eqn
linear regression Relationship is linear y = f(x) = a + bx eqn (1) ………………… eqn
linear regression Relationship is linear y = f(x) = a + bx eqn (1) ………………… eqn

eqn (1)

…………………

eqn (2)

…………………

MEFGI - GTU

10/12/2018

33
33

Method of least squares (MLS) linear regression

33 Method of least squares (MLS) – linear regression M E F G I - G
33 Method of least squares (MLS) – linear regression M E F G I - G

MEFGI - GTU

10/12/2018

34
34

Method of least squares (MLS) polynomial

relationship (second order) - Example

(MLS) – polynomial relationship (second order) - Example y=a 1 + a 2 x + a

y=a 1 + a 2 x + a 3 x 2 Normal equations are as below;

- Example y=a 1 + a 2 x + a 3 x 2 Normal equations are
- Example y=a 1 + a 2 x + a 3 x 2 Normal equations are

MEFGI - GTU

10/12/2018

35
35

Method of least squares (MLS) polynomial

relationship (second order) - Example

of least squares (MLS) – polynomial relationship (second order) - Example M E F G I
of least squares (MLS) – polynomial relationship (second order) - Example M E F G I
of least squares (MLS) – polynomial relationship (second order) - Example M E F G I
of least squares (MLS) – polynomial relationship (second order) - Example M E F G I

MEFGI - GTU

10/12/2018

36 variable.
36
variable.

Multiple linear regression model

Helps to learn more about the relationship between several independent or predictor variables and a dependent or criterion

or predictor variables and a dependent or criterion Example. To study the relationship between strength of

Example. To study the relationship between strength of road subgrade

(Y) with lime content (A), ground temperature (B) and delay in

compaction (C)

MEFGI - GTU

10/12/2018

37 Subgrade Lime Ground strength content temperature Delay in compacti (CBR) -Y (%)-A /C -B
37
Subgrade
Lime
Ground
strength
content
temperature
Delay in
compacti
(CBR) -Y
(%)-A
/C -B
on (Hrs) -C
68.5
2
25
0.25
98.9
4
30
0.5
102.5
6
35
0.75
120.5
8
40
1
99.8
10
45
1.25
99.9
12
50
1.5
85
14
55
1.75
Using SPSS a regression model was obtained as
MEFGI - GTU

Multiple linear regression model -Example

10/12/2018

38 ▪ Ms Excel ▪ SPSS ▪ MATLAB ▪ Stata ▪ Statistica ▪ StatXact ▪
38
Ms Excel
SPSS
▪ MATLAB
Stata
Statistica
StatXact
Systat
MEFGI - GTU

Some Statistical packages for correlation and regression analysis

10/12/2018

39 ▪ are related ▪ ▪ ▪
39
are related

Conclusion

Correlation coefficient measures the strength and direction between two variables

Pearson correlation coefficient is better for parametric statistics whereas Spearman coefficient is better for non parametric statics

Method of squares minimises the sum of the errors or vertical distances around the

regression line. It’s best compared to other methods

A multiple regression model gives the relationship between on dependent variable (y) and other independent variables A, B, C

MEFGI - GTU

10/12/2018

40 ▪ ▪ ▪ ▪ ▪
40

References

Numerical methods in Engineering and Science. Dr B.S Grewal

Numerical Methods by E Balagurusany

Numerical Methods in Engineering with Matlab by Jaan Kiusalaas

Statistics Solutions -http://www.statisticssolutions.com

An investigation into field factors that affect the strength of Compacted

P. NYANZI and Odongo

lime stabilised clay for subgrade construction.

Parsley, (2015)

MEFGI - GTU

10/12/2018

41 MEFGI - GTU
41
MEFGI - GTU

10/12/2018