An Ova

Analysis of Variance (ANOVA)
Objectives
To know the concept of variance analysis. To be able to perform simple analyses with 1 and 2 input factors. To be able to determine the mathematical model. To be able to check the model prerequisites. To determine the practical significance. To know the concept of blocking and be able to use simple Randomized Block Designs. To be able to perform the ANOVA in Minitab and interpret the results.
Hypothesis Test: Roadmap

Hypothesis Test
Variable Data
Attributive Data
Mean Value
Variation
Ratio
Against target
2 dist.
> 2 dist.
Against target
2 dist.
> 2 dist.
Against target
2 dist.
> 2 dist.
t-Test
t-Test
ANOVA
G -Test
F, Levenes
Bart., Lev.
G -Test
ANOVA (Variance Analysis)

Previously, we discussed the testing of hypotheses using 2 mean values (T-Test). ANOVA is used to test hypotheses with 2 or more mean values.
Ho:
Q1 ! Q
! Q
! Q
HA: At least one k is different

Advantage:
To test the NULL HYPOTHESIS (all 4 mean values are equal), we would have to test hypotheses for 6 combinations using the technique previously described (t-test). Using the ANOVA technique, we can decide whether to reject the null hypothesis or keep the null hypothesis with a single test.
ANOVA -- Underlying Assumptions The F distribution is also used for testing the equality of more than two means using a technique called analysis of variance (ANOVA). ANOVA requires the following conditions:
The populations being sampled are normally distributed. The populations have equal standard deviations. The samples are randomly selected and are independent.
Questions Asked by ANOVA
Are the average distances achieved with each dimple pattern the same? Do the 4 samples come from the same population?
H o : Q1 ! Q 2 ! Q 3 ! Q 4
Are some of the 4 population means different?
H a : At least one Q k is different
Analysis of Variance Procedure

The Null Hypothesis: the population means are the same. The Alternative Hypothesis: at least one of the means is different. The Test Statistic: F = (between sample variance) (within sample variance) Decision rule: For a given significance level E , reject the null hypothesis if F (computed) is greater than F (table) with numerator and denominator degrees of freedom.
NOTE
If there are k populations being sampled, then the df (numerator) = k-1 If there are a total of N sample points, then df (denominator) = N- k The test statistic is computed by: F = [(SST)/(k-1)] [(SSE)/(N-k)] SST represents the treatment sum of squares. SSE represents the error sum of squares.
Where.
Formula
SS (total ) ! 7 X
2 c
7X
n
2
T X 7 SST ! 7 n n c SSE ! SS (total) - SST
Let: TC represent the column totals, nc represent the number of observations (sample size) for each treatment, and 7X represent the sum of all the observations.
Example: Comparing More than Two Groups
We are using the example of Diet. Twenty-four animals were fed using one of four diets. Diet is the input variable (factor); blood clotting time is the output variable (response). The diets were assigned to the animals randomly. Blood samples were taken and tested in a random sequence. Why?
DIET A 62 60 63 59 65 66
DIET B 63 67 71 64 68 68
DIET C 68 66 71 67 63 64 63 59
DIET D 56 62 60 61
Example: Comparing More than Two Groups

First, we create a plot or box plot of the data. Are there differences in the 4 diets?
Plot "Coagtime" by "Diet"
70
65
60 70
55
Diet
Coagtime
65
60
55 1 2 3 4
Diet
CoagTime 62 60 63 59 63 67 71 64 65 66 68 66 71 67 68 68 56 62 60 61 63 64 63 59
Diet 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 4 4
Coagtime
Performing ANOVA in Minitab

We perform ANOVA in Minitab
Stat>ANOVA>One-way
One-way Analysis of Variance Analysis of Variance for Coagtime Source Diet Error Total DF 3 20 23 SS 228.00 112.00 340.00
Individual 95% CIs For Mean Based on Pooled StDev Level 1 2 3 4 N 4 6 6 8 Mean 61.000 66.000 68.000 61.000 StDev 1.826 2.828 1.673 2.619 (----*----) ---+---------+---------+---------+--Pooled StDev = 2.366 59.5 63.0 66.5 70.0 ---+---------+---------+---------+--(------*------) (-----*----) (----*-----)
MS 76.00 5.60
F 13.57
P 0.000
ANOVA Table
The ANOVA table is an important result of ANOVA One-Way Analysis of Variance Analysis of Variance on CoagTime Source Diet Error Total DF 3 20 23 SS 228.00 112.00 340.00 MS 76.00 5.60 F 13.57 P 0.000
If the p-value is less than 5%, there is a difference in the mean value of at least one group. In this case we reject the null hypothesis indicating that the mean values of all groups are equal. The mean value of at least one diet is different from the others. An F-test of this magnitude may also occur randomly, but only at a frequency of 1 per 10,000 occasions. That corresponds to getting heads thirteen times in a row with a fair coin.
The F-test is near 1.00 when the group mean values are similar. In this case the F-test is much higher.
F-Distribution
The following displays the F-distribution for our example. It shows the distribution of F-values that would have occurred if all 4 diets produced the same blood clotting time. Note that the F-test obtained in our experiment is near the end of the distribution, making it very unlikely.
F-Distribution for 3 and 20 degrees of freedom

0 .7 0 .6 0 .5
10% Mark 5% Mark Observed value
Prob.
0 .4 0 .3 0 .2 0 .1 0 .0 0 2 4
1% Mark
10
12
14
F-value
Main Effects Plots

We use the main effect plot to display our results. It is displayed only if there is a significant difference. Minitab: Stat > ANOVA > Main Effects Plot...
Main Effects Plot - Data Means for Coagtime
68 67 66
Caution: line is without warranty
Coagtime
65 64 63 62 61 1 2 3 4
Diet
Interval Plots
The interval plot is another type of display. Minitab: Stat > ANOVA>Interval Plot... Create this diagram using the option Confidence interval.
Confidence interval-Plot
68
Coagtime
63
58 1 2 3 4
Diet
Analysis of Variance Recall, ANOVA looks at three sources of variability:

Total = total variability among all observations (SS total) Between = variation between group means (factorSST) Within = random (chance) variation within each group (noise, or statistical errorSSE)
between subgroup variation analogy with control charts within subgroup variation
Total = between + within
Understanding the Fundamentals - Sums of Squares

Response
70 65 60 55 1 2 3 4
yj - Mean of Group y - Grand Mean of the

experiment
yij - individual measurement

i = represents a data point within the jth group j = represents the jth group g = total # of groups
Factor
( y
j !1 i !1
nj
ij
y ) 2 ! n j ( y j y ) 2 ( yij y j ) 2
j !1 j !1 i !1
nj
The computer will take care of this for us...
SS(Total)
SS(Factor)
SS(Error)
SS(Tot) = Total Sum of Squares of the Experiment (individuals - Grand Mean) SS(Factor) = Sum of Squares of the Factor (Group Mean - Grand Mean) SS(Error) = Sum of Squares within the Group (individuals - Group Mean)
Developing the ANOVA Table Using Sums of Squares

Hypothesis Test
Ho: Q 1 ! Q 2 ! Q 3 ! Q 4 Ha: At least one Q k is different
To determine whether we can accept or not accept the null hypothesis we must calculate the Test Statistic (F-ratio) using the Analysis of Variance as shown in table below.
SOURCE BETWEEN WITHIN TOTAL SS SS(Factor)
g
df g-1
MS (=SS/df) SS(Factor)/(g - 1) SS(Error) / n j 1

j !1 g
F {=MS(Factor)/MS(Error)} MS(Factor) / MS(Error)
SS(Error) SS(Total)
n
j !1
1
g n j 1 j !1
Pooled Error Variance
Standard format
Why is Source Within called the Error or Noise? In practical terms what is the F-ratio telling us? What do you think large F-ratios mean?
Basic Model for ANOVA

The following always applies to the sum of squares:
SS(Total) = SS(Factor) + SS(Error)
j !1 i !1
xij
x
! k ( x
2 j !1
x)
2
j !1 i !1
xij
xj
Basic Model for ANOVA

Total mean value
70
Coagtime
65
}X }X
Group mean values
}X
3 4
}X
60
55 1 2
Diet
X = influence of factors (diet in this case)
ANOVA Table
Lets return to the ANOVA table. We want to manually create the ANOVA table for a very simple case.
Factor A A B B
Data 29 31 39 41
This is our data
This is the completed variance analysis table from Minitab
Analysis of Variance for Data Source DF SS MS Factor 1 100.00 100.00 Error 2 4.00 2.00 Total 3 104.00
F 50.00
P 0.019
ANOVA Table
The ANOVA table content is produced as follows: Source Factor DF a-1 SS MS
Test statistic is the F-test (Signal/noise ratio)
F MS(Factor)/ MS(Error)
SS(Factor) SS(Factor)/ DF(Factor) SS(Error) SS(Error)/ DF(Error)
Error
N-a
Total
N-1
SS(Total)
Pooled Error Variance (remaining variation)
Mathematical Model for ANOVA

y ti ! Q X t I ti
The mathematical model for this case is:

Where:
y ti ! A single measurement from group (treatment) t; Q ! The total mean value X t ! Effect of treatment t I ti ! Random error f Ho asumes that the treatment time is zero.
Mathematical: Ho: X ' s = 0 Ha: at least one Xk { 0
Conventional:
Ho: Q 1 ! Q 2 ! Q 3 ! Q 4 Ha: at least one k is different
Mathematical Model for ANOVA

We can calculate the mathematical model in Minitab and store the results in a worksheet.
Randomized Block Design, Example

An engineer wants to test 4 catalysts. For time reasons, he can only run 4 tests per batch. In what sequence do you perform the experiment and why?
Cat 1 Cat 2 Charge 1 Charge 2 Charge 3 Charge 4 69 72 68 71 72 75 67 72
Cat 3 73 75 68 72
Cat 4 75 74 72 75
Randomized Block Design, Example

We want to test all 4 catalysts in each batch. The output is Yield, the input is the Catalyst, and the block variable or variation is the Batch. We want to separate the effect of the catalyst from the block effect.
Catalyst 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 Batch 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 Yield 69 72 68 71 72 75 67 72 73 75 68 72 75 74 72 75
We enter the data in Minitab as follows: two columns for the input variables, and one column for the output variable (the response).
Randomized Block Designs
Randomized block designs contain two types of input variables:

The controllable process variable (the main interest) Blocks (variations) whose influence can be eliminated:
Day to day Batch to batch Shift to shift
Mathematical model
yti ! Q F i X t I ti
Ho: Ha: Ho: Ha: F i's = 0 F i's { 0 X t's = 0 X t's { 0
Blocks provide consistent results within each block, but, where possible, show differences between blocks.
One Way ANOVA, False Analysis

Perform a one-way ANOVA with Yield as Output and Catalyst as Input. Minitab: Stat>ANOVA>Oneway
One-Way Analysis of Variance Analysis of Variance on Yield SourceDF SS MS F P
Catalyst 3 32.75 10.92 1.75 0.211 Error Total 12 75.00 15 107.75 6.25
We see no significant influence from the catalyst.
Balanced ANOVA, Correct Analysis

Perform a Balanced ANOVA, with Yield as Output, and Catalyst and Batch as Input. What are your conclusions?
Analysis of Variance for Yield Source DF Catalyst 3 Batch Error Total 3 9 15 SS MS F P 0.014 0.002
32.750 10.917 6.24 59.250 19.750 11.29 15.750 1.750 107.750
We see a significant influence from catalyst and batch.
Comparing the Analyses

One-Way Analysis of Variance Analysis of Variance on Yield Source Catalyst Error Total DF 3 12 SS 32.75 75.00 MS 10.92 6.25 F 1.75 P 0.211 Analysis of Variance for Yield Source Catalyst Batch Error Total DF 3 3 9 SS 32.750 59.250 15.750 MS 10.917 F P
6.24 0.014
19.750 11.29 0.002 1.750
15 107.75
15 107.750
This probability is not small enough to reject the null hypothesis.
This probability is now small enough to reject the null hypothesis, meaning that the variable Catalyst is significant.
Compare the MSEs of both analyses. Note the relationships in SS:
75.00 = 59.25(Batch) + 15.75
By taking the batch-to-batch variation into account, we can reduce the Mean Square Error term (noise). This makes the Catalyst effect significant.

An Ova

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

An Ova

Hochgeladen von

Copyright:

Verfügbare Formate

Analysis of Variance (ANOVA)

Hypothesis Test: Roadmap

ANOVA (Variance Analysis)

HA: At least one k is different

Questions Asked by ANOVA

H a : At least one Q k is different

Analysis of Variance Procedure

T X 7 SST ! 7 n  n c SSE ! SS (total) - SST

Example: Comparing More than Two Groups

Example: Comparing More than Two Groups

Performing ANOVA in Minitab

F-Distribution for 3 and 20 degrees of freedom

10% Mark 5% Mark Observed value

Main Effects Plots

Caution: line is without warranty

Analysis of Variance Recall, ANOVA looks at three sources of variability:

Total = between + within

Understanding the Fundamentals - Sums of Squares

yj - Mean of Group y - Grand Mean of the

yij - individual measurement

The computer will take care of this for us...

Developing the ANOVA Table Using Sums of Squares

Ho: Q 1 ! Q 2 ! Q 3 ! Q 4 Ha: At least one Q k is different

MS (=SS/df) SS(Factor)/(g - 1) SS(Error) / n j  1

F {=MS(Factor)/MS(Error)} MS(Factor) / MS(Error)

Pooled Error Variance

Basic Model for ANOVA

SS(Total) = SS(Factor) + SS(Error)

Basic Model for ANOVA

X = influence of factors (diet in this case)

This is our data

This is the completed variance analysis table from Minitab

SS(Factor) SS(Factor)/ DF(Factor) SS(Error) SS(Error)/ DF(Error)

Mathematical Model for ANOVA

The mathematical model for this case is:

Mathematical: Ho: X ' s = 0 Ha: at least one Xk { 0

Ho: Q 1 ! Q 2 ! Q 3 ! Q 4 Ha: at least one k is different

Mathematical Model for ANOVA

Randomized Block Design, Example

Cat 1 Cat 2 Charge 1 Charge 2 Charge 3 Charge 4 69 72 68 71 72 75 67 72

Randomized Block Design, Example

Randomized Block Designs

Randomized block designs contain two types of input variables:

One Way ANOVA, False Analysis

One-Way Analysis of Variance Analysis of Variance on Yield SourceDF SS MS F P

We see no significant influence from the catalyst.

Balanced ANOVA, Correct Analysis

32.750 10.917 6.24 59.250 19.750 11.29 15.750 1.750 107.750

We see a significant influence from catalyst and batch.

Comparing the Analyses

19.750 11.29 0.002 1.750

This probability is not small enough to reject the null hypothesis.

Compare the MSEs of both analyses. Note the relationships in SS:

75.00 = 59.25(Batch) + 15.75

Das könnte Ihnen auch gefallen

T X 7 SST ! 7 n n c SSE ! SS (total) - SST

MS (=SS/df) SS(Factor)/(g - 1) SS(Error) / n j 1