Beruflich Dokumente
Kultur Dokumente
Objectives
To know the concept of variance analysis. To be able to perform simple analyses with 1 and 2 input factors. To be able to determine the mathematical model. To be able to check the model prerequisites. To determine the practical significance. To know the concept of blocking and be able to use simple Randomized Block Designs. To be able to perform the ANOVA in Minitab and interpret the results.
Variable Data
Attributive Data
Mean Value
Variation
Ratio
Against target
2 dist.
> 2 dist.
Against target
2 dist.
> 2 dist.
Against target
2 dist.
> 2 dist.
t-Test
t-Test
ANOVA
G -Test
F, Levenes
Bart., Lev.
G -Test
Ho:
Q1 ! Q
! Q
! Q
ANOVA -- Underlying Assumptions The F distribution is also used for testing the equality of more than two means using a technique called analysis of variance (ANOVA). ANOVA requires the following conditions:
The populations being sampled are normally distributed. The populations have equal standard deviations. The samples are randomly selected and are independent.
Are the average distances achieved with each dimple pattern the same? Do the 4 samples come from the same population?
H o : Q1 ! Q 2 ! Q 3 ! Q 4
Are some of the 4 population means different?
NOTE
If there are k populations being sampled, then the df (numerator) = k-1 If there are a total of N sample points, then df (denominator) = N- k The test statistic is computed by: F = [(SST)/(k-1)] [(SSE)/(N-k)] SST represents the treatment sum of squares. SSE represents the error sum of squares.
Where.
Formula
SS (total ) ! 7 X
2 c
7X
n
2
Let: TC represent the column totals, nc represent the number of observations (sample size) for each treatment, and 7X represent the sum of all the observations.
We are using the example of Diet. Twenty-four animals were fed using one of four diets. Diet is the input variable (factor); blood clotting time is the output variable (response). The diets were assigned to the animals randomly. Blood samples were taken and tested in a random sequence. Why?
DIET A 62 60 63 59 65 66
DIET B 63 67 71 64 68 68
DIET C 68 66 71 67 63 64 63 59
DIET D 56 62 60 61
65
60 70
55
Diet
Coagtime
65
60
55 1 2 3 4
Diet
CoagTime 62 60 63 59 63 67 71 64 65 66 68 66 71 67 68 68 56 62 60 61 63 64 63 59
Diet 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 4 4
Coagtime
MS 76.00 5.60
F 13.57
P 0.000
ANOVA Table
The ANOVA table is an important result of ANOVA One-Way Analysis of Variance Analysis of Variance on CoagTime Source Diet Error Total DF 3 20 23 SS 228.00 112.00 340.00 MS 76.00 5.60 F 13.57 P 0.000
If the p-value is less than 5%, there is a difference in the mean value of at least one group. In this case we reject the null hypothesis indicating that the mean values of all groups are equal. The mean value of at least one diet is different from the others. An F-test of this magnitude may also occur randomly, but only at a frequency of 1 per 10,000 occasions. That corresponds to getting heads thirteen times in a row with a fair coin.
The F-test is near 1.00 when the group mean values are similar. In this case the F-test is much higher.
F-Distribution
The following displays the F-distribution for our example. It shows the distribution of F-values that would have occurred if all 4 diets produced the same blood clotting time. Note that the F-test obtained in our experiment is near the end of the distribution, making it very unlikely.
Prob.
0 .4 0 .3 0 .2 0 .1 0 .0 0 2 4
1% Mark
10
12
14
F-value
68 67 66
Coagtime
65 64 63 62 61 1 2 3 4
Diet
Interval Plots
The interval plot is another type of display. Minitab: Stat > ANOVA>Interval Plot... Create this diagram using the option Confidence interval.
Confidence interval-Plot
68
Coagtime
63
58 1 2 3 4
Diet
Factor
( y
j !1 i !1
nj
ij
y ) 2 ! n j ( y j y ) 2 ( yij y j ) 2
j !1 j !1 i !1
nj
SS(Total)
SS(Factor)
SS(Error)
SS(Tot) = Total Sum of Squares of the Experiment (individuals - Grand Mean) SS(Factor) = Sum of Squares of the Factor (Group Mean - Grand Mean) SS(Error) = Sum of Squares within the Group (individuals - Group Mean)
To determine whether we can accept or not accept the null hypothesis we must calculate the Test Statistic (F-ratio) using the Analysis of Variance as shown in table below.
SOURCE BETWEEN WITHIN TOTAL SS SS(Factor)
g
df g-1
SS(Error) SS(Total)
n
j !1
1
g n j 1 j !1
Standard format
Why is Source Within called the Error or Noise? In practical terms what is the F-ratio telling us? What do you think large F-ratios mean?
j !1 i !1
xij
x
! k ( x
2 j !1
x)
2
j !1 i !1
xij
xj
Coagtime
65
}X }X
Group mean values
}X
3 4
}X
60
55 1 2
Diet
ANOVA Table
Lets return to the ANOVA table. We want to manually create the ANOVA table for a very simple case.
Factor A A B B
Data 29 31 39 41
Analysis of Variance for Data Source DF SS MS Factor 1 100.00 100.00 Error 2 4.00 2.00 Total 3 104.00
F 50.00
P 0.019
ANOVA Table
The ANOVA table content is produced as follows: Source Factor DF a-1 SS MS
Test statistic is the F-test (Signal/noise ratio)
F MS(Factor)/ MS(Error)
Error
N-a
Total
N-1
SS(Total)
Pooled Error Variance (remaining variation)
Conventional:
Cat 3 73 75 68 72
Cat 4 75 74 72 75
We enter the data in Minitab as follows: two columns for the input variables, and one column for the output variable (the response).
Mathematical model
yti ! Q F i X t I ti
Ho: Ha: Ho: Ha: F i's = 0 F i's { 0 X t's = 0 X t's { 0
Blocks provide consistent results within each block, but, where possible, show differences between blocks.
Catalyst 3 32.75 10.92 1.75 0.211 Error Total 12 75.00 15 107.75 6.25
Analysis of Variance for Yield Source DF Catalyst 3 Batch Error Total 3 9 15 SS MS F P 0.014 0.002
6.24 0.014
15 107.75
15 107.750
This probability is now small enough to reject the null hypothesis, meaning that the variable Catalyst is significant.
By taking the batch-to-batch variation into account, we can reduce the Mean Square Error term (noise). This makes the Catalyst effect significant.