8 Matematical Model

Mathematical Model
Equation, formula
Mathematical Model
Ideal gas law : PV = NRT Q1 : Is this relationship true? Q2 : What is the value of the constant R? Answer these questions by a set of measurements :
E = mc
V = IR
y = x 2 3 cos 2 (log )
Mathematical Model
PV = NRT
P : pressure V : Volume T : Temperature n : number of moles R : universal gas constant
(Pi , Vi , Ti , N i )
Ri =
PiVi N i Ti
Assumptions : ideal gas, static and close environment
Errors due to unknown outside factors exists.
Statistical Model
Observed data
p = P + p
Analysis of Variance Model (ANOVA)

v = V + v t = T +t n = N +n
One-way ANOVA
N 11,, 122
Compare multiple populations

Y11 , Y12 ,..., Y1n1
Unobserved measurement errors (random)
Ideal gas law :
pv = nRt( + p )+v p v p( v Rt n t Rn t + R n t ) p (v ( PV) = NRT n )R ( t ) n v

Systematic component Statistical Model Random errors Model parameter
Unknown parameter in systematic component e.g. universal gas constant R
Assumptions 1. Normal
N 22,, 22 2
Y21 , Y22 ,..., Y2 n2
2. Equal Variances 3. Independence
Data
..
N aa,, a 2 2
Ya1 , Ya 2 ,..., Yana
One-way ANOVA
Total sample size
N = ni
i =1 a
One-way ANOVA
ANOVA model
=
1 N
Overall population mean (grand mean) ith treatment effect Random errors ANOVA model
Yij = + i + ij
j = 1,2,..., ni
n
i =1 i
Yij = + i + ij
i
j = 1,2,..., ni
i = 1,2,..., a
ij ~ N (0, 2 )
iid
i = i
a ni i = 0 i =1
n
i =1 i
=0
Between group
+ 1
Within group
+ 2 + 21 = Y21 + 2 + 22 = Y22
ij = Yij i = Yij i
i = 1,2,..., a
ij ~ N (0, 2 )
iid
+2
.
+a
.
+ 2 + 2 n = Y2 n
2 2
n
i =1 i
=0
Test for Treatment Effects

H = the = effect. 0 HPopulation s are not all the same. H 1 : are treatment H 0 0: Population s1are 2 same. avs=vs 1 vs1 : There some i 0effects. H : There0is: no treatment l H :

Break down of sum of squares
ith sample mean
1 Yi = ni
Y
j =1
ni
ij
(Y
a ni i =1 j =1
a ni ij
ij
ijY Y== YinSSY Y(Yij + (Yij Yi ) Y SS T = YA + SS E Yi ) i i +

2 a 2 a ni i =1 i =1 j =1
) ( ( ) )
MS A =
overall sample mean Total sum of squares
Y =
1 N
Y
i =1 j =1
=
ni
1 N
n Y
i =1
i i
Treatment mean squares Error mean squares

= ni Yi Y
i =1 a
SS A 1 a = ni Yi Y a 1 a 1 i =1
SST = Yij Y
i =1 j =1 a ni
) )
MS E =
SS E 1 a ni 2 = (Yij Yi ) N a N a i =1 j =1
Treatment sum of squares Between Group Variation Withinsum of squares Error Group Variation
SS A = Yi Y
i =1 j =1
H 1 true
i not all the same
MS variation of Y around Y large A tends to i be large
SS E = (Yij Yi )
a ni i =1 j =1
MSE is unaffected by the population means.

Treatment mean squares Error mean squares
MS A F= MS E
SS A 1 a MS A = = ni Yi Y a 1 a 1 i =1 MS E =
F Distribution
( )
2
f (x ) =
SS E 1 a ni 2 = (Yij Yi ) N a N a i =1 j =1
r + r2 r1 r +r 1 1 2 r1 2 r1 2 x 2 1 1 + r1 x 2 r2 r1 r 2 r 2 2 2
,x > 0
1 0.9 0.8
F Densities
r1 = 2, r2 = 4 r1 = 4, r2 = 6 r1 = 9, r2 = 9 r1 = 12, r2 = 12
X ~ F (r1 , r2 )
E(X ) =
Var ( X ) =
Test statistic
0.7 0.6 0.5 0.4
r2 r2 2
Reject 0 if > is too N-a, Reject H0 ifHFobs F F(a-1, large.).
0.3 0.2 0.1
r1 (r2 2 ) (r2 4 )
2
2r22 (r1 + r2 2 )
F (a 1, N a, )
Obtained from F distribution table
0 0 1 2 3 4 5
F Distribution
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 1 2 3 4 5
F Distribution Table
F (r1 , r2 )
F (r1 , r2 , )
F (3,4,0.05) = ? .59 6 F (4,6,,0.01) = 9.15 ?
ANOVA Table
H 0 : 1 = 2 = l a = 0 vs H 1 : some i 0 MS A MS E
SS
Computational Formulae
ith total
a
Ti = Yij
j =1
ni
overall total
2
T.. = Ti
i =1
Test statistic F =
Source Treatment Error Total
Reject H0 if Fobs > F(a-1, N-a, ).

d.f. MS F-ratio
SS A = ni Yi Y
i =1
a ni
=
2
Ti 2 T..2 N i =1 ni
a
a ni
SSA SSE SST
a-1 N-a N-1
SSA/(a - 1) SSE/(N - a)
MSA / MSE
SS E = (Yij Yi ) = Yij2
a i =1 j =1 i =1 j =1
Ti 2 i =1 ni
SS T = Yij Y
i =1 j =1
ni
= Yij2
i =1 j =1
ni
T..2 N
One-way ANOVA
Example : Color brightness of films
aBrand n1 = n2 = n3 = 15 =3
Kodak Agfa 32, 34, 31, 30, 37, 28, 28, 27, 30, 32, 26, 29, 27, 30, 31
3 15 2
One-way ANOVA
Source
Ti
452 378
SS 1363.38 SSA 621.86 SSE 1985.24 SST
d.f.
MS 681.69 SSA/(a - 1) 14.81 SSE/(N - a)
F-ratio 46.03 MSA / MSE
N = 45Data
Treatment Error Total
a21 N42a N441 -
41, 44, 50, T 32, 378 32, + T.. = 45238, 38, Yij 46040 + 378 T1 Fuji452 43, T2 = 578 47, 3 = 32, 36, 35, 34,1408 578i =40,=136 = 578 = 1 j 23, 24, 25, 21, 26, 25, 27, 26, 22, 25, 27, 30, 25, 25, 27
H 0 : 1 = 2 = 3
vs
H 1 : not H 0
= 0.05
SS E = 198586SS A1363.38 621. 24 SS T .
From F distribution table
F (2,42,0.05) F (2,40,0.05) = 3.23
2 3 T 45222 T578 2 378 2 1408 2 SS A = i . .. SS A = 1363 38 + + n N15 i =115i 15 45
F ratio = 46.03 > 3.23

Reject H0 at = 0.05 .
The color brightness of the three brands of films are significantly different.
2 SS TT = .Yij SST = 1985 24 46040 i =1 j =1
ni
T..2 1408 2 N 45
Estimation
Treatment effect : i Point Yi Y Interval
Estimation
Example : Color brightness of films
(Y Y ) t
i
N a , 2
1 1 MS E n N i
Y1 =
452 = 30.13 15
Y2 =
578 378 = 38.53 Y3 = = 25.2 15 15
Y =
1408 = 31.29 45
95% C.I. For 1 : [Y12.16 ..t64,0.025 (MS E ) 1(14.81 1 1 (1 .801 48] ) 2.021 1 ) 30 13 31. . Y , 0 4229 95% C.I. For 2 - 3 : 132... Y,3) 25.42,]0. (2.MS E (14.+ 1 + 1 38 53 > (1033 16.842) 025 0212) 1 3 81) 1 t [Y 49 2.17
n1
N 15
45
Difference in treatment effects : i - j Point Yi Y j Interval (Yi Y j ) t N a , 2 MS E + n n j i

1 1
95% C.I. For 1 - 2 : [ 11.24 , 5.56] 95% C.I. For 1 - 3 : [2.09 , 7.77]
3 n2 n 15 1 < 2
15
1 > 3
2 > 1 > 3
Overall confidence < 95%
Two way ANOVA

Example : Brightness of synthetic fabric
Temperature Time (cycles) 40 50
Two way ANOVA

Example : Brightness of synthetic fabric
MTB > ANOVA 'Bright' = Temp Time*Temp. Two-way factorial 'Time' Time model: MTB > print 'Bright' ANOVA 'Temp'
350F
38, 32, 30 40, 45, 36
375F
37, 35, 40 39, 42, 46
400F
36, 39, 43 39, 48, 47
Analysis of Data Display Variance (Balanced Designs)

Row Bright Type Levels Values Time Temp Factor Time fixed 40 =2 j = 1 38 fixed i 350 350 40 Temp 3j i i
Yijk = + i + j + ij + ijk

ij50 = 375
k = 1,2,3 j = 1, 2,3 i = 1,2

= 400 0
ij j
Two-way factorial ANOVA model:
Yijk = + i + j + ij + ijk
i i j i
k = 1,2,3 j = 1, 2,3 i = 1,2

j
= j = ij = ij = 0
ijk ~ N (0, 2 )
iid
2 32 40 350 3 30 40 350 Analysis 37 Variance for Bright of 4 40 375 5 35 40 375 Source 40 DF MS 6 40 375 SS 7 36 40 400 Time 1 150.22 150.22 8 39 40 400 Temp 2 80.78 40.39 9 43 40 4003.44 Time*Temp 2 1.72 10 40 50 350 Error 12 186.00 15.50 11 45 50 350 Total 17 420.44 12 36 50 350
ijk ~ N (0, 2 )
iid
F 9.69 2.61 0.11
P 0.009 0.115 0.896
significant
Interaction
Group mean Time = 40 Time = 50 Time = 50 Time = 40
Regression
Sir Francis Galton (1822 1911)
Height of Son
Non-additive Additive
Height of Father
350
375
400
Temperature
Height of the sons of fathers regressed towards the mean height of the population
Regression
Simple Linear Regression Examples
Dependent variable (Y ) Job performance Return of a stock Overall CGA Tree age (by C14) Independent variable (X ) Extent of training Risk of the stock A-Level Score Tree age (by tree rings)
Simple Linear Regression

Scatterplot Linear Model the relationship between dependent variable and independent variable(s) one independent variable Regression line A line well fit the data
Simple Linear Regression Model

Data :
Simple Linear Regression Model

Example : Y = Height of son (in cm) X = Height of father (in cm) assumptions Suppose true relation given by : Y (= ) = 0.9 X + 15 More reasonable relationship E Y 0.9 X + 15 Fathers with same heights
X 170 170 175 175 180 180 185 185
Observed
{( X 1 , Y1 ), ( X 2 , Y2 ),..., ( X n , Yn )}
, i = 1,2,..., n
i ~ N (0, 2 )
iid
Yi = + X i + i
E(Y) = 0.9X + 15 Y 168
(Random Error)
Sons with same heights Y
Unrealistic! 1.3 169.3 Estimate the regression line 171.7 172.5 -0.8 171.7 Fit a regression line to datadata the from these observed 174.6 174.6 177 -2.4
169.3 182.2 181.5
Unobserved
0.7
Unobserved
182.2
Observed
Estimation of Model Parameters

Sample statistics
1 n X = Xi n i =1
n
Fitting Regression Line

Example : Study of how wheat yield depends on fertilizer. X = Fertilizer (in lb/acre)
X Y 100 40 200 50 300 50 400 70
1 n Y = Yi n i =1
2 n 2 i
S xy = ( X i X )(Yi Y ) = X i Yi nXY
n n i =1 i =1
2
Y = Yield (in bu/acre)

500 65 600 65 700 80
S xx = (X i X ) = X nX
i =1 i =1
S yy = (Yi Y ) = Yi nY
n 2 n 2 i =1 i =1
K S xy =b= S xx
K = a = Y bX
X = 400
Y = 60
D Fitted regression line : Y = a + bX
X
i =1
2 i
= 1400000
Y
i =1
= 26350
True regression line :
E (Y ) = + X
X Y
i =1
i i
= 184500

X = 400 Y = 60

Y
i =1 7
X Y
i =1
i i
= 184500
X
i =1
2
Y = 36.43 + 0.059 X
2
2 i
= 1400000
= 26350
Prediction
X 0 = 650 400
280000 S xx = 1400000 nX7 )(400) X i2 ( 2

i =1
S yy = 1150 (7 )(60)2xy = 184500i (nXY )(60 ) S xy 16500 7 )(400 26350 yy X iY

i =1
Y0 = 36Y0 = (74.03)(400 ) .43 + 60 78 0.059 0
S xy b = 16500 = 0.059 b= S xx 280000
X0 = 0
a = Y bX.059 )(400 ) = 36.43 60 (0
Y0 = 36.43 ?
Fitted regression line : Y = 36.43 + 0.059 X
Danger of Extrapolation
SARS Trend
1400 1200
SARS Trend
2500
2000 1000 No. of Cases 800 600 400 200 0 10-Mar 15-Mar 0 28-Feb 10-Mar 20-Mar 25-Mar 30-Mar Date 4-Apr 9-Apr 14-Apr 19-Apr -500 Date 20-Mar 30-Mar 9-Apr 19-Apr 29-Apr 9-May 19-May No. of Cases 1500
1000
500
SARS Trend
2500 2000
SARS Trend
1000 900 No. of patients in hospital 800 700 600 500 400 300 200 100
No. of Cases
1500 1000 500
0 28-Feb 10-Mar -500
20-Mar
30-Mar
9-Apr
19-Apr
29-Apr
9-May
19-May
0 10-Mar
15-Mar
20-Mar
25-Mar
30-Mar Date
4-Apr
9-Apr
14-Apr
19-Apr
Date
SARS Trend
2000
SARS Trend
2000 No. of patients in hospital 1500 1000 500 0 28-Feb 10-Mar 20-Mar 30-Mar 9-Apr 19-Apr 29-Apr 9-May -500 Date 19May
No. of patients in hospital
1500
1000
500
0 28-Feb -500
10-Mar
20-Mar
30-Mar
9-Apr
19-Apr
29-Apr
9-May
19-May
Date
SARS Trend
2000 No. of patients in hospital 1500 1000 500 0 28-Feb -500 Date
SARS Trend
No. of patients in hospital 1200 1000 800 600 400 200 0 28-Feb 10-Mar 20-Mar 30-Mar 9-Apr 19-Apr 29-Apr 9-May Date
20-Mar
9-Apr
29-Apr
19-May
8-Jun
19May
SARS Trend
1200 No. of patients in hospital 1000 800 600 400 200 0 28-Feb
Nonlinear Relationships
20-Mar
9-Apr Date
29-Apr
19-May
8-Jun
Association Causation
Example : Price and Demand for gas
Year Price Demand Year Price Demand
1960 1961 1962 1963 1964 1965 1966 1967 1968 1969
Simpsons Paradox
30 134
1970
31 112
1971
37 136
1972
42 109
1973
43 105
1974
45 87
1975
50 56
1976
54 43
1977
54 77
1978
57 35
1979
1960-1965
Year
1974-1979
58 65
58 56
60 58
73 55
88 49
89 39
92 36
97 46
100 40
102 42
1966-1973
Price
Demand
Fitted regression line : Demand = 139.24 1.11 Price ? Low demand is due to high price. ?
Test For Regression Effect

Test H 0 : = 0 vs
H1 : 0
Fitted values Residuals
Yi = a + bX i
Random Error
ri = Yi Yi
i = Yi X i
Decomposition of Variation
Yi Y = Yi Y + Yi Yi
Variation of Y Explained variation
) (
)
Unexplained variation

Break down of of Variation Decomposition sum of squares
(Y
n i =1
Y = (Y + Yi iY)Y = i YiY )Y (Yi+ )Yi Yi

2 n 2 n i =1 i =1
SST
SSR
SSE
Total sum of squares
SST = S yy
n n S2 (a = iS xy )2 2 2 SS R = b 2 SY +XbXi + bX i Y ) xx (bX X Y ) i =1 i =1 xx
Regression sum of squares Error sum of squares
SS E = SS T bSS Rxx = S yy S yy 2 S E
2 S xy
S xx

H 0 : = 0 vs
SS MS R = R = SS R 1

Example : Wheat yield example Regression line S xx = 280000
Source Regression Total
H1 : 0
SS E MS E = n2
Y =S36.= 1150 .059 X S xy = 16500 yy 43 + 0

SS
Test statistic F = ANOVA table

Source Regression Error Total
MS R MS E
Reject H0 if Fobs > F(1, n - 2, ).
Regression line
974.68 1150
Y = 36.43 + 0.059 X
1 6 974.68
d.f.
MS
F-ratio 27.805
SS b0 059 2 SS RError.S .xx ) (280000 )T = S yy = 1150 = 974 68 175.32 (2 5
SS E = 1150 974 SS T 35.064 175.32SS R .68
SS SSR SSE SST
d.f. 1
n-2 n-1
MS SSR SSE/(n - 2)
F-ratio MSR / MSE
F (1,5,0.05) = 6.61 < 27.805

Reject H0 at = 0.05 .
Coefficient of Determination
Strong relationship
SS R SS T
Coefficient of Determination
High prediction power Explained variation
R2 =
Total variation
0 R2 1
No linear relationship Example : R 2 = 974.68 = 84.8%
1150
Perfect linear relationship
C.I. For Regression Parameters

100(1 - )% C.I. for
b t n 2, 2 MS E S xx
C.I. For Regression Parameters

Example : Wheat yield example Regression line
Source Regression Error Total
Y = 36.43 + 0.059 X
SS d.f. 1 5 6 MS 974.68 35.064 F-ratio 27.805
100(1 - )% C.I. for
1 X a t n 2 , 2 MS E + n S xx
974.68 175.32 1150
Large Sxx
More accurate estimates Demonstration
95% C.I. for : 95% C.I. for :
0 t 5 . , 0.57 ) [b..059,002520288 E]35.064 ( 0 0302 0. 0878
MS S xx 280000
2 2 3643,0.0252172) E( .64) 1 (400 ) [a ..43 ( 37.236] 1 + X + 32 t 5 2 MS 36 .892 , . .57 35 7 n S xx 280000
Prediction
Predict the value of Y0 at a fixed value of X = X0 Point prediction :
Y0 = a + bX 0
Prediction
Example : Wheat yield example Regression line
Source Regression Error
2
Y = 36.43 + 0.059 X
SS d.f. 1 5 6 MS 974.68 35.064 F-ratio 27.805
100(1 - )% prediction interval (P.I.)

1 (X X ) Y0 t n 2 , 2 MS E 1 + + 0 n S xx
974.68 175.32 1150
Total
At X0 = 450,
Y0 = 36.43 + (0.059)(450) 62 98
90% prediction interval
X) 2 450 400 ) (E 62 143 12.837. 1 + + [50..98 05. 75.817] 1 (1X ( 280000 ) Y0 .98t, 0(.2,02MS35064 )1+ +0 62 5 n 7 S xx
2
Prediction
Multiple Linear Regression

Example : Fuel consumption data
Data Display
Row 1 2 3 4 5 6 7 8 9 10 11 12
State ME NH VT FUEL MA RI CN NY NJ PA OH IN IL
POP
TAX
NLIC
INC
ROAD 1.976 1.250 1.586 INC + 3 2.351 0.431 1.333 11.868 2.138 8.577 8.507 5.939 14.186
FUELC
DLIC
1029 9.00 771 9.00 462 9.00 =5787 + 1TAX 7.50 0 968 8.00 3082 10.00 18366 8.00 7367 8.00 11926 8.00 10783 7.00 5291 8.00 11251 7.50
540 3.571 441 4.092 268 3.865 + 3060 4.870 DLIC + 2 527 4.399 1760 5.342 8278 5.319 4074 5.126 6312 4.447 5948 4.512 2804 4.391 5903 5.126
557 52.4781 404 57.1984 259 58.0087 ROAD 52.8771 + 2396 4 397 54.4422 1408 57.1058 6312 45.0724 3439 55.3007 5528 52.9264 5375 55.1609 3068 52.9957 5301 52.4664
..
Multiple Linear Regression

Example : Fuel consumption data
Regressionof Variance Analysis Analysis The regression equationSS is SOURCE DF MS F p FUEL = 37.7 - 3.483991.921.34997.98- 6.65 INC0.000 TAX + DLIC - 0.242 ROAD Regression 4 22.68 Error 43 1892.05 44.00 Predictor t-ratio p Total 47 Coef 5883.96 Stdev Constant 37.68 18.57 2.03 0.049 TAX -3.478 1.298 -2.68 0.010 DLIC 1.3366 0.1924 6.95 0.000 Unusual Observations INC -6.651 1.723 -3.86 Obs. TAX FUEL Fit Stdev.Fit Residual 0.000 St.Resid ROAD -0.2417 37 5.0 63.963 64.7580.3391 3.723 -0.71 -0.795 0.480 -0.14 X 40 7.0 s = 6.633 96.812 73.371 R-sq = 67.8% 2.102 23.441 3.73R R-sq(adj) = 64.9%
R denotes an obs. with a large st. resid. X denotes an obs. whose X value gives it large influence.
10

8 Matematical Model

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

8 Matematical Model

Hochgeladen von

Copyright:

Verfügbare Formate

Mathematical Model

Assumptions : ideal gas, static and close environment

Errors due to unknown outside factors exists.

Analysis of Variance Model (ANOVA)

Compare multiple populations

Unobserved measurement errors (random)

Ideal gas law :

pv = nRt( + p )+v p v p( v Rt n t Rn t + R n t ) p (v ( PV) = NRT n )R ( t ) n v

Y21 , Y22 ,..., Y2 n2

2. Equal Variances 3. Independence

Ya1 , Ya 2 ,..., Yana

Test for Treatment Effects

Test for Treatment Effects

ith sample mean

ijY Y== YinSSY Y(Yij + (Yij Yi ) Y SS T = YA + SS E Yi ) i i +

overall sample mean Total sum of squares

Treatment mean squares Error mean squares

i not all the same

MS variation of Y around Y large A tends to i be large

MSE is unaffected by the population means.

Test for Treatment Effects

0.7 0.6 0.5 0.4

Reject 0 if > is too N-a, Reject H0 ifHFobs F F(a-1, large.).

0.3 0.2 0.1

Obtained from F distribution table

Reject H0 if Fobs > F(a-1, N-a, ).

SSA SSE SST

a-1 N-a N-1

SS 1363.38 SSA 621.86 SSE 1985.24 SST

MS 681.69 SSA/(a - 1) 14.81 SSE/(N - a)

F-ratio 46.03 MSA / MSE

Treatment Error Total

a21 N42a N441 -

From F distribution table

F (2,42,0.05) F (2,40,0.05) = 3.23

2 3 T 45222 T578 2 378 2 1408 2 SS A = i . .. SS A = 1363 38 + + n N15 i =115i 15 45

F ratio = 46.03 > 3.23

2 SS TT = .Yij SST = 1985 24 46040 i =1 j =1

578 378 = 38.53 Y3 = = 25.2 15 15

Difference in treatment effects : i - j Point Yi Y j Interval (Yi Y j ) t N a , 2 MS E + n n j i

Overall confidence < 95%

Two way ANOVA

Two way ANOVA

Analysis of Data Display Variance (Balanced Designs)

k = 1,2,3 j = 1, 2,3 i = 1,2

Two-way factorial ANOVA model:

k = 1,2,3 j = 1, 2,3 i = 1,2

F 9.69 2.61 0.11

P 0.009 0.115 0.896

Simple Linear Regression

Simple Linear Regression Model

Simple Linear Regression Model

E(Y) = 0.9X + 15 Y 168

Sons with same heights Y

Estimation of Model Parameters

Fitting Regression Line

Y = Yield (in bu/acre)

D Fitted regression line : Y = a + bX

True regression line :

Fitting Regression Line

Fitting Regression Line

280000 S xx = 1400000 nX7 )(400) X i2 ( 2

S yy = 1150 (7 )(60)2xy = 184500i (nXY )(60 ) S xy 16500 7 )(400 26350 yy X iY

Y0 = 36Y0 = (74.03)(400 ) .43 + 60 78 0.059 0