Sie sind auf Seite 1von 31

Analysis of Variance (ANOVA)

Objectives
To know the concept of variance analysis. To be able to perform simple analyses with 1 and 2 input factors. To be able to determine the mathematical model. To be able to check the model prerequisites. To determine the practical significance. To know the concept of blocking and be able to use simple Randomized Block Designs. To be able to perform the ANOVA in Minitab and interpret the results.

Hypothesis Test: Roadmap


Hypothesis Test

Variable Data

Attributive Data

Mean Value

Variation

Ratio

Against target

2 dist.

> 2 dist.

Against target

2 dist.

> 2 dist.

Against target

2 dist.

> 2 dist.

t-Test

t-Test

ANOVA

G -Test

F, Levenes

Bart., Lev.

G -Test

ANOVA (Variance Analysis)


Previously, we discussed the testing of hypotheses using 2 mean values (T-Test). ANOVA is used to test hypotheses with 2 or more mean values.

Ho:

Q1 ! Q

! Q

! Q

HA: At least one k is different


Advantage:
To test the NULL HYPOTHESIS (all 4 mean values are equal), we would have to test hypotheses for 6 combinations using the technique previously described (t-test). Using the ANOVA technique, we can decide whether to reject the null hypothesis or keep the null hypothesis with a single test.

ANOVA -- Underlying Assumptions The F distribution is also used for testing the equality of more than two means using a technique called analysis of variance (ANOVA). ANOVA requires the following conditions:
The populations being sampled are normally distributed. The populations have equal standard deviations. The samples are randomly selected and are independent.

Questions Asked by ANOVA

Are the average distances achieved with each dimple pattern the same? Do the 4 samples come from the same population?

H o : Q1 ! Q 2 ! Q 3 ! Q 4
Are some of the 4 population means different?

H a : At least one Q k is different

Analysis of Variance Procedure


The Null Hypothesis: the population means are the same. The Alternative Hypothesis: at least one of the means is different. The Test Statistic: F = (between sample variance) (within sample variance) Decision rule: For a given significance level E , reject the null hypothesis if F (computed) is greater than F (table) with numerator and denominator degrees of freedom.

NOTE
If there are k populations being sampled, then the df (numerator) = k-1 If there are a total of N sample points, then df (denominator) = N- k The test statistic is computed by: F = [(SST)/(k-1)] [(SSE)/(N-k)] SST represents the treatment sum of squares. SSE represents the error sum of squares.

Where.

Formula
SS (total ) ! 7 X
2 c

7X 
n
2

T X 7 SST ! 7 n  n c SSE ! SS (total) - SST

Let: TC represent the column totals, nc represent the number of observations (sample size) for each treatment, and 7X represent the sum of all the observations.

Example: Comparing More than Two Groups

We are using the example of Diet. Twenty-four animals were fed using one of four diets. Diet is the input variable (factor); blood clotting time is the output variable (response). The diets were assigned to the animals randomly. Blood samples were taken and tested in a random sequence. Why?

DIET A 62 60 63 59 65 66

DIET B 63 67 71 64 68 68

DIET C 68 66 71 67 63 64 63 59

DIET D 56 62 60 61

Example: Comparing More than Two Groups


First, we create a plot or box plot of the data. Are there differences in the 4 diets?
Plot "Coagtime" by "Diet"
70

65

60 70

55

Diet

Coagtime

65

60

55 1 2 3 4

Diet

CoagTime 62 60 63 59 63 67 71 64 65 66 68 66 71 67 68 68 56 62 60 61 63 64 63 59

Diet 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 4 4

Coagtime

Performing ANOVA in Minitab


We perform ANOVA in Minitab
Stat>ANOVA>One-way
One-way Analysis of Variance Analysis of Variance for Coagtime Source Diet Error Total DF 3 20 23 SS 228.00 112.00 340.00
Individual 95% CIs For Mean Based on Pooled StDev Level 1 2 3 4 N 4 6 6 8 Mean 61.000 66.000 68.000 61.000 StDev 1.826 2.828 1.673 2.619 (----*----) ---+---------+---------+---------+--Pooled StDev = 2.366 59.5 63.0 66.5 70.0 ---+---------+---------+---------+--(------*------) (-----*----) (----*-----)

MS 76.00 5.60

F 13.57

P 0.000

ANOVA Table
The ANOVA table is an important result of ANOVA One-Way Analysis of Variance Analysis of Variance on CoagTime Source Diet Error Total DF 3 20 23 SS 228.00 112.00 340.00 MS 76.00 5.60 F 13.57 P 0.000
If the p-value is less than 5%, there is a difference in the mean value of at least one group. In this case we reject the null hypothesis indicating that the mean values of all groups are equal. The mean value of at least one diet is different from the others. An F-test of this magnitude may also occur randomly, but only at a frequency of 1 per 10,000 occasions. That corresponds to getting heads thirteen times in a row with a fair coin.

The F-test is near 1.00 when the group mean values are similar. In this case the F-test is much higher.

F-Distribution
The following displays the F-distribution for our example. It shows the distribution of F-values that would have occurred if all 4 diets produced the same blood clotting time. Note that the F-test obtained in our experiment is near the end of the distribution, making it very unlikely.

F-Distribution for 3 and 20 degrees of freedom


0 .7 0 .6 0 .5

10% Mark 5% Mark Observed value

Prob.

0 .4 0 .3 0 .2 0 .1 0 .0 0 2 4

1% Mark

10

12

14

F-value

Main Effects Plots


We use the main effect plot to display our results. It is displayed only if there is a significant difference. Minitab: Stat > ANOVA > Main Effects Plot...
Main Effects Plot - Data Means for Coagtime

68 67 66

Caution: line is without warranty

Coagtime

65 64 63 62 61 1 2 3 4

Diet

Interval Plots
The interval plot is another type of display. Minitab: Stat > ANOVA>Interval Plot... Create this diagram using the option Confidence interval.
Confidence interval-Plot

68

Coagtime

63

58 1 2 3 4

Diet

Analysis of Variance Recall, ANOVA looks at three sources of variability:


Total = total variability among all observations (SS total) Between = variation between group means (factorSST) Within = random (chance) variation within each group (noise, or statistical errorSSE)
between subgroup variation analogy with control charts within subgroup variation

Total = between + within

Understanding the Fundamentals - Sums of Squares


Response
70 65 60 55 1 2 3 4

yj - Mean of Group y - Grand Mean of the


experiment

yij - individual measurement


i = represents a data point within the jth group j = represents the jth group g = total # of groups

Factor

( y
j !1 i !1

nj

ij

 y ) 2 ! n j ( y j  y ) 2  ( yij  y j ) 2
j !1 j !1 i !1

nj

The computer will take care of this for us...

SS(Total)

SS(Factor)

SS(Error)

SS(Tot) = Total Sum of Squares of the Experiment (individuals - Grand Mean) SS(Factor) = Sum of Squares of the Factor (Group Mean - Grand Mean) SS(Error) = Sum of Squares within the Group (individuals - Group Mean)

Developing the ANOVA Table Using Sums of Squares


Hypothesis Test

Ho: Q 1 ! Q 2 ! Q 3 ! Q 4 Ha: At least one Q k is different

To determine whether we can accept or not accept the null hypothesis we must calculate the Test Statistic (F-ratio) using the Analysis of Variance as shown in table below.
SOURCE BETWEEN WITHIN TOTAL SS SS(Factor)
g

df g-1

MS (=SS/df) SS(Factor)/(g - 1) SS(Error) / n j  1


j !1 g

F {=MS(Factor)/MS(Error)} MS(Factor) / MS(Error)

SS(Error) SS(Total)

n
j !1

 1

g n j 1 j !1

Pooled Error Variance

Standard format

Why is Source Within called the Error or Noise? In practical terms what is the F-ratio telling us? What do you think large F-ratios mean?

Basic Model for ANOVA


The following always applies to the sum of squares:

SS(Total) = SS(Factor) + SS(Error)

j !1 i !1

xij

x

! k ( x
2 j !1

 x) 
2

j !1 i !1

xij

 xj

Basic Model for ANOVA


Total mean value
70

Coagtime

65

}X }X
Group mean values

}X
3 4

}X

60

55 1 2

Diet

X = influence of factors (diet in this case)

ANOVA Table
Lets return to the ANOVA table. We want to manually create the ANOVA table for a very simple case.

Factor A A B B

Data 29 31 39 41

This is our data

This is the completed variance analysis table from Minitab

Analysis of Variance for Data Source DF SS MS Factor 1 100.00 100.00 Error 2 4.00 2.00 Total 3 104.00

F 50.00

P 0.019

ANOVA Table
The ANOVA table content is produced as follows: Source Factor DF a-1 SS MS
Test statistic is the F-test (Signal/noise ratio)

F MS(Factor)/ MS(Error)

SS(Factor) SS(Factor)/ DF(Factor) SS(Error) SS(Error)/ DF(Error)

Error

N-a

Total

N-1

SS(Total)
Pooled Error Variance (remaining variation)

Mathematical Model for ANOVA


y ti ! Q  X t  I ti

The mathematical model for this case is:


Where:
y ti ! A single measurement from group (treatment) t; Q ! The total mean value X t ! Effect of treatment t I ti ! Random error f Ho asumes that the treatment time is zero.

Mathematical: Ho: X ' s = 0 Ha: at least one Xk { 0

Conventional:

Ho: Q 1 ! Q 2 ! Q 3 ! Q 4 Ha: at least one k is different

Mathematical Model for ANOVA


We can calculate the mathematical model in Minitab and store the results in a worksheet.

Randomized Block Design, Example


An engineer wants to test 4 catalysts. For time reasons, he can only run 4 tests per batch. In what sequence do you perform the experiment and why?

Cat 1 Cat 2 Charge 1 Charge 2 Charge 3 Charge 4 69 72 68 71 72 75 67 72

Cat 3 73 75 68 72

Cat 4 75 74 72 75

Randomized Block Design, Example


We want to test all 4 catalysts in each batch. The output is Yield, the input is the Catalyst, and the block variable or variation is the Batch. We want to separate the effect of the catalyst from the block effect.
Catalyst 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 Batch 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 Yield 69 72 68 71 72 75 67 72 73 75 68 72 75 74 72 75

We enter the data in Minitab as follows: two columns for the input variables, and one column for the output variable (the response).

Randomized Block Designs

Randomized block designs contain two types of input variables:


The controllable process variable (the main interest) Blocks (variations) whose influence can be eliminated:
Day to day Batch to batch Shift to shift

Mathematical model

yti ! Q  F i  X t  I ti
Ho: Ha: Ho: Ha: F i's = 0 F i's { 0 X t's = 0 X t's { 0

Blocks provide consistent results within each block, but, where possible, show differences between blocks.

One Way ANOVA, False Analysis


Perform a one-way ANOVA with Yield as Output and Catalyst as Input. Minitab: Stat>ANOVA>Oneway

One-Way Analysis of Variance Analysis of Variance on Yield SourceDF SS MS F P

Catalyst 3 32.75 10.92 1.75 0.211 Error Total 12 75.00 15 107.75 6.25

We see no significant influence from the catalyst.

Balanced ANOVA, Correct Analysis


Perform a Balanced ANOVA, with Yield as Output, and Catalyst and Batch as Input. What are your conclusions?

Analysis of Variance for Yield Source DF Catalyst 3 Batch Error Total 3 9 15 SS MS F P 0.014 0.002

32.750 10.917 6.24 59.250 19.750 11.29 15.750 1.750 107.750

We see a significant influence from catalyst and batch.

Comparing the Analyses


One-Way Analysis of Variance Analysis of Variance on Yield Source Catalyst Error Total DF 3 12 SS 32.75 75.00 MS 10.92 6.25 F 1.75 P 0.211 Analysis of Variance for Yield Source Catalyst Batch Error Total DF 3 3 9 SS 32.750 59.250 15.750 MS 10.917 F P

6.24 0.014

19.750 11.29 0.002 1.750

15 107.75

15 107.750

This probability is not small enough to reject the null hypothesis.

This probability is now small enough to reject the null hypothesis, meaning that the variable Catalyst is significant.

Compare the MSEs of both analyses. Note the relationships in SS:

75.00 = 59.25(Batch) + 15.75

By taking the batch-to-batch variation into account, we can reduce the Mean Square Error term (noise). This makes the Catalyst effect significant.

Das könnte Ihnen auch gefallen