Beruflich Dokumente
Kultur Dokumente
i i i
r
i i
r
x x a a y
a
S
x a a y
a
S
=
=
2
1 0
1 0
0
0
i i i i
i i
x a x a x y
x a a y
The equations become:
The normal equations are
= +
= +
i i i i
i i
y x x a x a
y x a na
2
1 0
1 0
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
( )
x a y a
x x n
y x y x n
a
i
i i i i
i
1 0
2
2
1
=
=
The slope and the y-intercept are given by
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
Example
Fit a straight line to the data
x
i
y
i
1 0.5
2 2.5
3 2
4 4
5 3.5
6 6
7 5.5
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
Quantification of Error of Linear Regression
To quantify the error reduction due to describing the
data in terms of a straight line, we use the coefficient of
determination which is defined as
=
2
2
) ( where y y S
S
S S
r
i t
t
r t
It represents the fraction of variability in y that can be
explained by the variability in x (how close the points are
to the line).
For r
2
= 1, it signifies the line explains 100% of the
variability of the data.
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
Example
Compute the coefficient of determination for the linear
regression in previous example
S
t
= 22.7145
S
r
= 2.9911
r
2
= 0.868
This indicates that 86.8% of the original uncertainty is
explained by the linear model.
Answer
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
Linearization of Nonlinear Relationships
Transformations can be used to express the data in a
form that is compatible with the linear regression.
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
Suppose the relationship between x and y is
x b
e a y
1
1
=
x b a y
1 1
ln ln + =
2
2
b
x a y =
It can be linearized by taking the ln of both sides:
Consider
It can be transform into the linear form
x b a y log log log
2 2
+ =
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
Consider
x b
x
a y
+
=
3
3
It can be linearized by inverting both sides
3 3
3 3
3
3
3
1 1 1
1
1 1 1 1
a x a
b
y x
b
a y x
x b
a y
+ =
|
.
|
\
|
+ =
|
.
|
\
|
+
=
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
Example
Fit to the data
Answer
2
2
b
x a y =
x y
1 0.5
2 1.7
3 3.4
4 5.7
5 8.4
x' y'
0 -0.301
0.301 0.23
0.477 0.531
0.602 0.756
0.699 0.924
75 . 1
2
3 . 0
2
'
1
5 . 0
75 . 1
5 . 0 10 300 . 0
x y
b
a a
=
=
= = =
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
Polynomial Regression
We need to fit a polynomial to data using polynomial
regression.
A second-order polynomial or quadratic fit is
y = a
0
+ a
1
x + a
2
x
2
+ c
The sum of squares of the residues:
( )
=
=
n
i
i i i r
x a x a a y S
1
2
2
2 1 0
Differentiate S
r
with respect to all parameters:
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
Set the partials to zero and arrange
These equations are called the normal equations.
They form a system of linear equations with 3 equations
and 3 unknowns.
In general, an mth order polynomial requires solving a
system of m+1 linear equations.
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
Example
Fit a second-order polynomial to the data
x y
0 2.1
1 7.7
2 14
3 27
4 41
5 61
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
Multiple Linear Regression
The function y is a linear function of 2 or more
independent variables, such as
y = a
0
+ a
1
x
1
+ a
2
x
2
+ c
The sum of the squares of the residuals
To minimize S
r
,
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
The normal equations are
A system of 3 linear equations and 3 unknowns
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
Example
See the example and the solution in the book
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
Regression in Matlab
Use the polyfit function
Regression in Excel
Use Add Trendline
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
Additional Example
The natural gas consumption for electric power generation
in the Kingdom from 1977 to 2000 is shown in the graph
below.
0
2000
4000
6000
8000
10000
12000
1975 1980 1985 1990 1995 2000 2005
Year
M
i
l
l
i
o
n
C
u
b
i
c
M
e
t
e
r
s
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
Observations from the data:
There is an upward trend in the observations.
It looks like that the relation between the gas
consumption and the years is linear; i.e. the general
trend of the data is linear.
Can regression be used?
Yes because the gas consumption values are not
precise (there are errors in the measurements).
We can assume the normality holds.
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
0
2000
4000
6000
8000
10000
12000
1976 1981 1986 1991 1996
Year
M
i
l
l
i
o
n
C
u
b
i
c
M
e
t
e
r
s
Using the equations:
a
1
= 393.94
a
0
= - 777828
The coefficient of determination = 0.8811