Correlation & Regression (Students)

Correlation and Regression Analysis
Many engineering design and analysis problems involve factors that are
interrelated and dependent. E.g., (1) runoff volume, rainfall; (2) evaporation,
temperature, wind speed; (3) peak discharge, drainage area, rainfall intensity;
(4) crop yield, irrigated water, fertilizer.
Due to inherent complexity of system behaviors and lack of full understanding
of the procedure involved, the relationship among the various relevant factors
or variables are established empirically or semi-empirically.
Regression analysis is a useful and widely used statistical tool dealing with
investigation of the relationship between two or more variables related in a
non-deterministic fashion.
If a variable Y is related to several variables X 1, X2, , XK and their
relationships can be expressed, in general, as
Y = g(X1, X2, , XK)
where g(.) = general expression for a function;
Y = Dependent (or response) variable;
X1, X2,, XK = Independent (or explanatory) variables.
Correlation
When a problem involves two dependent random variables, the degree of

linear dependence between the two can be measured by the correlation
coefficient (X,Y), which is defined as
where Cov(X,Y) is the covariance between random variables X and Y defined
as
where <Cov(X,Y)<
and
(X,Y) .
Various correlation coefficients are developed in statistics for measuring the

degree of association between random variables. The one defined above is
called the Pearson product moment correlation coefficient or correlation
coefficient.
If the two random variables X and Y are independent, then (X,Y)=

Cov(X,Y)= . However, the reverse statement is not necessarily true.
Cases of Correlation
Perfectly linearly
correlated in opposite
direction
Uncorrelated in
linear fashion
Strongly & positively

correlated in
linear fashion
Perfectly correlated in
nonlinear fashion, but
uncorrelated linearly.
Calculation of Correlation Coefficient

Given a set of n paired sample observations of two random variables
(xi, yi), the sample correlation coefficient ( r) can be calculated as
Auto-correlation
Consider following daily stream flows (in 1000 m 3) in June 2001 at Chung Mei
Upper Station (610 ha) located upstream of a river feeding to Plover Cove
Reservoir. Determine its 1-day auto-correlation coefficient, i.e., (Qt, Qt+1).
Day (t) Flow Q(t)
1
8.35
2
6.78
3
6.32
4
17.36
5
191.62
6
82.33
7
524.45
8
196.77
9
785.09
10
562.05
Day (t) Flow Q(t)

11
313.89
12
480.88
13
151.28
14
83.92
15
44.58
16
36.58
17
33.65
18
26.39
19
22.98
20
21.92
Day (t) Flow Q(t)

21
20.06
22
17.52
23
116.13
24
68.25
25
280.22
26
347.53
27
771.30
28
124.20
29
58.00
30
44.08
29 pairs: {(Qt, Qt+1)} = {(Q1, Q2), (Q2, Q3), , (Q29, Q30)};

Relevant sample statistics: n=29
Qt 186.22; SQt 230.06; Qt 1 187.45; SQt 1 229.17
The 1-day auto-correlation is 0.439
Chung Mei Upper Daily Flow

800
Q(t+1), 1000 m^3
600
500
400
300
200
100
0
Day
10
20
30
200
400
600
Q(t), 1000 m^3
Autocorrelation for June 2001 Daily Flows at Chung Mei Upper, HK
Autocorrelation
Flow (1000 cubic meters)
700
900
800
700
600
500
400
300
200
100
0
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
Time lags (Days)
800
1000
Regression Models
due to the presence of uncertainties a deterministic functional
relationship generally is not very appropriate or realistic.
The deterministic model form can be modified to account for
uncertainties in the model as
Y = g(X1, X2, , XK) +
where = model error term with E()=0, Var()=2.
In engineering applications, functional forms commonly used for
establishing empirical relationships are
Additive: Y = 0 + 1X1 + 2X2 + + KXK +
1
2
K
Multiplicative: Y 0 X1 X 2 ... X K
Least Square Method

Suppose that there are n pairs of data, {(x i, yi)}, i=1, 2,.. , n and a plot of
these data appears as
y
What is a plausible mathematical model describing x & y relation?
Least Square Method

Considering an arbitrary straight line, y =0+1 x, is to be fitted through these
data points. The question is Which line is the most representative?
y
^
y =0+1 x
1
^ = error (residual)
ei = y i y
i
^
yi
yi
0
xi
Least Square Criterion

What are the values of 0 and 1 such that the resulting line best fits
the data points?
But, wait !!! What goodness-of-fit criterion to use to determine among
all possible combinations of 0 and 1 ?
The least squares (LS) criterion states that the sum of the squares of
errors (or residuals, deviations) is minimum. Mathematically, the LS
criterion can be written as:
Any other criteria that can be used?
Normal Equations for LS Criterion

The necessary conditions for the minimum values of D are:
D
D
0 and
0
n
D
2 y i 0 1 xi 1 0
i 1
0
n
D 2 y x x 0
i
0
1 i
i
1
i 1
Expanding the above equations

n
n

y n x 0
i 1
x y
n
i 1
i 1
i 1
x
n
i 1
2
i
Normal equations:
x
i 1
yi
i 1
i 1
n
n
xi2 xi y i
i 1
i 1
i 1
xi 0
x y
i 1
xi 0
LS Solution (2 Unknowns)
i 1
i 1
y x
n
n
n
1
xi y i xi y i
n i 1 i 1
i 1
x 1
n
i 1
n
2
i
i 1
x y
n
i 1
n
x
i 1
2
i
nx y
nx
Fitting a Polynomial Eq. By LS Method

y i xi 2 xi2 k xik i , i 1,2, , n
LS criterion:
D= yi xi xi2 xik
n
minimize
i 1
, ,
D
0 , for j 0,1,2, , k
j
Set
Normal Equations are:
i 1
i 1
i 1
2
i
i 1
n
k
i
i 1
yi
k 1
i
i 1
i 1
y i xi
x
i 1
k
i
x
i 1
k 1
i
x
i 1
2k
i
i 1
y i xik
Fitting a Linear Function of Several Variables

y x1 x 2 x k
LS criterion :
n
Minimize D= yi xi x1 xk
, 1 ,, k
Set
i 1
D
0 , for j 0, 1, 2, , k
j
Normal equations:
i 1
i 1
i 1
i1
2
i1
x
i 1
ik
x
i 1
x
i 1
ik
x
i 1
ik
xi1
yi
i 1
x y i xi1
i 1
i1 ik
x
i 1
2
ik
i 1
y i xik
Matrix Form of Multiple Regression by LS

1 x11 x12 x1k

1
x
x
x
21
22
2
k

1 xn1 x n 2 x nk
y1
y
2

yn
1
2

k n
(Note: xij = ith observation of the jth independent variable)

or
y=X+
in short
LS criterion is:
n
min D i2 ' y - X ' y - X

i 1
Set
0 , and result in:
The LS solutions are:
X' ( y - X ) 0
X' X X' y
Measure of Goodness-of-Fit
R2 = Coefficient of Determination
n 2
i
i
1
1
n
2
y
y
i
i 1
= 1 - % of variation in the dependent variable, y, unexplained by

the regression equation;
= % of variation in the dependent variable, y, explained by the
regression equation.
Example 1 (LS Method)
Example 1 (LS Method)
LS Example
LS Example (Matrix Approach)
LS Example (by Minitab w/ 0)
LS Example (by Minitab w/o 0)
LS Example (Output Plots)

Correlation &amp; Regression (Students)

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Correlation &amp; Regression (Students)

Hochgeladen von

Copyright:

Verfügbare Formate

Correlation and Regression Analysis

When a problem involves two dependent random variables, the degree of

Various correlation coefficients are developed in statistics for measuring the

If the two random variables X and Y are independent, then (X,Y)=

Strongly & positively

Calculation of Correlation Coefficient

Day (t) Flow Q(t)

Day (t) Flow Q(t)

29 pairs: {(Qt, Qt+1)} = {(Q1, Q2), (Q2, Q3), , (Q29, Q30)};

Chung Mei Upper Daily Flow

Q(t+1), 1000 m^3

Q(t), 1000 m^3

Autocorrelation for June 2001 Daily Flows at Chung Mei Upper, HK

Flow (1000 cubic meters)

Time lags (Days)

Least Square Method

What is a plausible mathematical model describing x & y relation?

Least Square Method

Least Square Criterion

Any other criteria that can be used?

Normal Equations for LS Criterion

Expanding the above equations

Fitting a Polynomial Eq. By LS Method

Normal Equations are:

Fitting a Linear Function of Several Variables

Matrix Form of Multiple Regression by LS

(Note: xij = ith observation of the jth independent variable)

min D i2 ' y - X ' y - X

0 , and result in:

The LS solutions are:

= 1 - % of variation in the dependent variable, y, unexplained by

Example 1 (LS Method)

Example 1 (LS Method)

LS Example (Matrix Approach)

LS Example (by Minitab w/ 0)

LS Example (by Minitab w/o 0)

LS Example (Output Plots)

Das könnte Ihnen auch gefallen

Correlation & Regression (Students)

Correlation & Regression (Students)