Beruflich Dokumente
Kultur Dokumente
Equation, formula
Mathematical Model
Ideal gas law : PV = NRT Q1 : Is this relationship true? Q2 : What is the value of the constant R? Answer these questions by a set of measurements :
E = mc
V = IR
y = x 2 3 cos 2 (log )
Mathematical Model
PV = NRT
P : pressure V : Volume T : Temperature n : number of moles R : universal gas constant
(Pi , Vi , Ti , N i )
Ri =
PiVi N i Ti
Statistical Model
Observed data
p = P + p
One-way ANOVA
N 11,, 122
Assumptions 1. Normal
N 22,, 22 2
Data
..
N aa,, a 2 2
One-way ANOVA
Total sample size
N = ni
i =1 a
One-way ANOVA
ANOVA model
=
1 N
Overall population mean (grand mean) ith treatment effect Random errors ANOVA model
Yij = + i + ij
j = 1,2,..., ni
n
i =1 i
Yij = + i + ij
i
j = 1,2,..., ni
i = 1,2,..., a
ij ~ N (0, 2 )
iid
i = i
a ni i = 0 i =1
n
i =1 i
=0
Between group
+ 1
Within group
+ 2 + 21 = Y21 + 2 + 22 = Y22
ij = Yij i = Yij i
i = 1,2,..., a
ij ~ N (0, 2 )
iid
+2
.
+a
.
+ 2 + 2 n = Y2 n
2 2
n
i =1 i
=0
1 Yi = ni
Y
j =1
ni
ij
(Y
a ni i =1 j =1
a ni ij
ij
) ( ( ) )
MS A =
Y =
1 N
Y
i =1 j =1
=
ni
1 N
n Y
i =1
i i
SS A 1 a = ni Yi Y a 1 a 1 i =1
SST = Yij Y
i =1 j =1 a ni
) )
MS E =
SS E 1 a ni 2 = (Yij Yi ) N a N a i =1 j =1
Treatment sum of squares Between Group Variation Withinsum of squares Error Group Variation
SS A = Yi Y
i =1 j =1
H 1 true
SS E = (Yij Yi )
a ni i =1 j =1
F Distribution
( )
2
f (x ) =
SS E 1 a ni 2 = (Yij Yi ) N a N a i =1 j =1
r + r2 r1 r +r 1 1 2 r1 2 r1 2 x 2 1 1 + r1 x 2 r2 r1 r 2 r 2 2 2
,x > 0
1 0.9 0.8
F Densities
r1 = 2, r2 = 4 r1 = 4, r2 = 6 r1 = 9, r2 = 9 r1 = 12, r2 = 12
X ~ F (r1 , r2 )
E(X ) =
Var ( X ) =
Test statistic
r2 r2 2
r1 (r2 2 ) (r2 4 )
2
2r22 (r1 + r2 2 )
F (a 1, N a, )
0 0 1 2 3 4 5
F Distribution
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 1 2 3 4 5
F Distribution Table
F (r1 , r2 )
F (r1 , r2 , )
F (3,4,0.05) = ? .59 6 F (4,6,,0.01) = 9.15 ?
ANOVA Table
H 0 : 1 = 2 = l a = 0 vs H 1 : some i 0 MS A MS E
SS
Computational Formulae
ith total
a
Ti = Yij
j =1
ni
overall total
2
T.. = Ti
i =1
Test statistic F =
Source Treatment Error Total
SS A = ni Yi Y
i =1
a ni
=
2
Ti 2 T..2 N i =1 ni
a
a ni
SSA/(a - 1) SSE/(N - a)
MSA / MSE
SS E = (Yij Yi ) = Yij2
a i =1 j =1 i =1 j =1
Ti 2 i =1 ni
SS T = Yij Y
i =1 j =1
ni
= Yij2
i =1 j =1
ni
T..2 N
One-way ANOVA
Example : Color brightness of films
aBrand n1 = n2 = n3 = 15 =3
Kodak Agfa 32, 34, 31, 30, 37, 28, 28, 27, 30, 32, 26, 29, 27, 30, 31
3 15 2
One-way ANOVA
Source
Ti
452 378
d.f.
N = 45Data
41, 44, 50, T 32, 378 32, + T.. = 45238, 38, Yij 46040 + 378 T1 Fuji452 43, T2 = 578 47, 3 = 32, 36, 35, 34,1408 578i =40,=136 = 578 = 1 j 23, 24, 25, 21, 26, 25, 27, 26, 22, 25, 27, 30, 25, 25, 27
H 0 : 1 = 2 = 3
vs
H 1 : not H 0
= 0.05
SS E = 198586SS A1363.38 621. 24 SS T .
ni
T..2 1408 2 N 45
Estimation
Treatment effect : i Point Yi Y Interval
Estimation
Example : Color brightness of films
(Y Y ) t
i
N a , 2
1 1 MS E n N i
Y1 =
452 = 30.13 15
Y2 =
Y =
1408 = 31.29 45
95% C.I. For 1 : [Y12.16 ..t64,0.025 (MS E ) 1(14.81 1 1 (1 .801 48] ) 2.021 1 ) 30 13 31. . Y , 0 4229 95% C.I. For 2 - 3 : 132... Y,3) 25.42,]0. (2.MS E (14.+ 1 + 1 38 53 > (1033 16.842) 025 0212) 1 3 81) 1 t [Y 49 2.17
n1
N 15
45
95% C.I. For 1 - 2 : [ 11.24 , 5.56] 95% C.I. For 1 - 3 : [2.09 , 7.77]
3 n2 n 15 1 < 2
15
1 > 3
2 > 1 > 3
350F
38, 32, 30 40, 45, 36
375F
37, 35, 40 39, 42, 46
400F
36, 39, 43 39, 48, 47
Yijk = + i + j + ij + ijk
ij50 = 375
Yijk = + i + j + ij + ijk
i i j i
= j = ij = ij = 0
ijk ~ N (0, 2 )
iid
2 32 40 350 3 30 40 350 Analysis 37 Variance for Bright of 4 40 375 5 35 40 375 Source 40 DF MS 6 40 375 SS 7 36 40 400 Time 1 150.22 150.22 8 39 40 400 Temp 2 80.78 40.39 9 43 40 4003.44 Time*Temp 2 1.72 10 40 50 350 Error 12 186.00 15.50 11 45 50 350 Total 17 420.44 12 36 50 350
ijk ~ N (0, 2 )
iid
significant
Interaction
Group mean Time = 40 Time = 50 Time = 50 Time = 40
Regression
Sir Francis Galton (1822 1911)
Height of Son
Non-additive Additive
Height of Father
350
375
400
Temperature
Height of the sons of fathers regressed towards the mean height of the population
Regression
Simple Linear Regression Examples
Dependent variable (Y ) Job performance Return of a stock Overall CGA Tree age (by C14) Independent variable (X ) Extent of training Risk of the stock A-Level Score Tree age (by tree rings)
{( X 1 , Y1 ), ( X 2 , Y2 ),..., ( X n , Yn )}
, i = 1,2,..., n
i ~ N (0, 2 )
iid
Yi = + X i + i
(Random Error)
Unrealistic! 1.3 169.3 Estimate the regression line 171.7 172.5 -0.8 171.7 Fit a regression line to datadata the from these observed 174.6 174.6 177 -2.4
169.3 182.2 181.5
Unobserved
0.7
Unobserved
182.2
Observed
1 n Y = Yi n i =1
2 n 2 i
S xy = ( X i X )(Yi Y ) = X i Yi nXY
n n i =1 i =1
2
S xx = (X i X ) = X nX
i =1 i =1
S yy = (Yi Y ) = Yi nY
n 2 n 2 i =1 i =1
K S xy =b= S xx
K = a = Y bX
X = 400
Y = 60
X
i =1
2 i
= 1400000
Y
i =1
= 26350
E (Y ) = + X
X Y
i =1
i i
= 184500
X Y
i =1
i i
= 184500
X
i =1
2
Y = 36.43 + 0.059 X
2
2 i
= 1400000
= 26350
Prediction
X 0 = 650 400
X0 = 0
Y0 = 36.43 ?
Danger of Extrapolation
SARS Trend
1400 1200
Danger of Extrapolation
SARS Trend
2500
2000 1000 No. of Cases 800 600 400 200 0 10-Mar 15-Mar 0 28-Feb 10-Mar 20-Mar 25-Mar 30-Mar Date 4-Apr 9-Apr 14-Apr 19-Apr -500 Date 20-Mar 30-Mar 9-Apr 19-Apr 29-Apr 9-May 19-May No. of Cases 1500
1000
500
Danger of Extrapolation
SARS Trend
2500 2000
Danger of Extrapolation
SARS Trend
1000 900 No. of patients in hospital 800 700 600 500 400 300 200 100
No. of Cases
20-Mar
30-Mar
9-Apr
19-Apr
29-Apr
9-May
19-May
0 10-Mar
15-Mar
20-Mar
25-Mar
30-Mar Date
4-Apr
9-Apr
14-Apr
19-Apr
Date
Danger of Extrapolation
SARS Trend
2000
Danger of Extrapolation
SARS Trend
2000 No. of patients in hospital 1500 1000 500 0 28-Feb 10-Mar 20-Mar 30-Mar 9-Apr 19-Apr 29-Apr 9-May -500 Date 19May
1500
1000
500
0 28-Feb -500
10-Mar
20-Mar
30-Mar
9-Apr
19-Apr
29-Apr
9-May
19-May
Date
Danger of Extrapolation
SARS Trend
2000 No. of patients in hospital 1500 1000 500 0 28-Feb -500 Date
Danger of Extrapolation
SARS Trend
No. of patients in hospital 1200 1000 800 600 400 200 0 28-Feb 10-Mar 20-Mar 30-Mar 9-Apr 19-Apr 29-Apr 9-May Date
20-Mar
9-Apr
29-Apr
19-May
8-Jun
19May
Danger of Extrapolation
SARS Trend
1200 No. of patients in hospital 1000 800 600 400 200 0 28-Feb
Nonlinear Relationships
20-Mar
9-Apr Date
29-Apr
19-May
8-Jun
Association Causation
Example : Price and Demand for gas
Year Price Demand Year Price Demand
1960 1961 1962 1963 1964 1965 1966 1967 1968 1969
Simpsons Paradox
30 134
1970
31 112
1971
37 136
1972
42 109
1973
43 105
1974
45 87
1975
50 56
1976
54 43
1977
54 77
1978
57 35
1979
1960-1965
Year
1974-1979
58 65
58 56
60 58
73 55
88 49
89 39
92 36
97 46
100 40
102 42
1966-1973
Price
Demand
Fitted regression line : Demand = 139.24 1.11 Price ? Low demand is due to high price. ?
H1 : 0
Yi = a + bX i
Random Error
ri = Yi Yi
i = Yi X i
Decomposition of Variation
Yi Y = Yi Y + Yi Yi
Variation of Y Explained variation
) (
)
Unexplained variation
(Y
n i =1
SST
SSR
SSE
SST = S yy
n n S2 (a = iS xy )2 2 2 SS R = b 2 SY +XbXi + bX i Y ) xx (bX X Y ) i =1 i =1 xx
SS E = SS T bSS Rxx = S yy S yy 2 S E
2 S xy
S xx
H1 : 0
SS E MS E = n2
MS R MS E
Regression line
974.68 1150
Y = 36.43 + 0.059 X
1 6 974.68
d.f.
MS
F-ratio 27.805
d.f. 1
n-2 n-1
MS SSR SSE/(n - 2)
Coefficient of Determination
Strong relationship
SS R SS T
Coefficient of Determination
R2 =
Total variation
0 R2 1
No linear relationship Example : R 2 = 974.68 = 84.8%
1150
Y = 36.43 + 0.059 X
SS d.f. 1 5 6 MS 974.68 35.064 F-ratio 27.805
1 X a t n 2 , 2 MS E + n S xx
Large Sxx
MS S xx 280000
Prediction
Predict the value of Y0 at a fixed value of X = X0 Point prediction :
Y0 = a + bX 0
Prediction
Example : Wheat yield example Regression line
Source Regression Error
2
Y = 36.43 + 0.059 X
SS d.f. 1 5 6 MS 974.68 35.064 F-ratio 27.805
Total
At X0 = 450,
Y0 = 36.43 + (0.059)(450) 62 98
X) 2 450 400 ) (E 62 143 12.837. 1 + + [50..98 05. 75.817] 1 (1X ( 280000 ) Y0 .98t, 0(.2,02MS35064 )1+ +0 62 5 n 7 S xx
2
Prediction
Row 1 2 3 4 5 6 7 8 9 10 11 12
State ME NH VT FUEL MA RI CN NY NJ PA OH IN IL
POP
TAX
NLIC
INC
ROAD 1.976 1.250 1.586 INC + 3 2.351 0.431 1.333 11.868 2.138 8.577 8.507 5.939 14.186
FUELC
DLIC
1029 9.00 771 9.00 462 9.00 =5787 + 1TAX 7.50 0 968 8.00 3082 10.00 18366 8.00 7367 8.00 11926 8.00 10783 7.00 5291 8.00 11251 7.50
540 3.571 441 4.092 268 3.865 + 3060 4.870 DLIC + 2 527 4.399 1760 5.342 8278 5.319 4074 5.126 6312 4.447 5948 4.512 2804 4.391 5903 5.126
557 52.4781 404 57.1984 259 58.0087 ROAD 52.8771 + 2396 4 397 54.4422 1408 57.1058 6312 45.0724 3439 55.3007 5528 52.9264 5375 55.1609 3068 52.9957 5301 52.4664
..
R denotes an obs. with a large st. resid. X denotes an obs. whose X value gives it large influence.
10