Sie sind auf Seite 1von 8

Kruskal-Wallis Test

The Kruskal-Wallis Test was developed by Kruskal and Wallis (1952) jointly and is
named after them. The Kruskal-Wallis test is a nonparametric (distribution free) test, and
is used when the assumptions of ANOVA are not met. They both assess for significant
differences on a continuous dependent variable by a grouping independent variable (with
three or more groups). In the ANOVA, we assume that distribution of each group is
normally distributed and there is approximately equal variance on the scores for each
group. However, in the Kruskal-Wallis Test, we do not have any of these assumptions.
Like all non-parametric tests, the Kruskal-Wallis Test is not as powerful as the ANOVA.
Hypothesis:
Null hypothesis: Null hypothesis assumes that the samples are from identical
populations.
Alternative hypothesis: Alternative hypothesis assumes that the samples come from
different populations.
Procedure:
1. Arrange the data of both samples in a single series in ascending order.
2. Assign rank to them in ascending order. In the case of a repeated value, or a tie, assign
ranks to them by averaging their rank position.
3. Then sum up the different ranks, e.g. R1 R2 R3., for each of the different groups..
4. To calculate the value, apply the following formula:

Where,
H = Kruskal-Wallis Test statistic
N = total number of observations in all samples
Ti = Sum of the ranks assigned
The Kruskal-Wallis test statistic is approximately a chi-square distribution, with k-1
degrees of freedom where ni should be greater than 5. If the calculated value of the
Kruskal-Wallis test is less than the critical chi-square value, then the null hypothesis

cannot be reject. If the calculated value of Kruskal-Wallis test is greater than the critical
chi-square value, then we can reject the null hypothesis and say that the sample comes
from a different population.
Assumptions
1. We assume that the samples drawn from the population are random.
2. We also assume that the cases of each group are independent.
3. The measurement scale for should be at least ordinal.
Sample question: A shoe company wants to know if three groups of workers have
different salaries:
Women: 23K, 41K, 54K, 66K, 78K.
Men: 45K, 55K, 60K, 70K, 72K
Minorities: 18K, 30K, 34K, 40K, 44K.
Step 1: Sort the data for all groups/samples into ascending order in one combined
set.
18K
23K
30K
34K
40K
41K
44K
45K
54K
55K
60K
66K
70K
72K
78K
Step 2: Assign ranks to the sorted data points. Give tied values the average rank.
20K 1
23K 2
30K 3
34K 4
40K 5
41K 6
44K 7
45K 8
54K 9
55K 10
60K 11

66K 12
70K 13
72K 14
90K 15
Step 3: Add up the different ranks for each group/sample.
Women: 23K, 41K, 54K, 66K, 90K = 2 + 6 + 9 + 12 + 15 = 44.
Men: 45K, 55K, 60K, 70K, 72K = 8 + 10 + 11 + 13 + 14 = 56.
Minorities: 20K, 30K, 34K, 40K, 44K = 1 + 3 + 4 + 5 + 7 = 20.
Step 4: Calculate the test statistic:
Where:

n = sum of sample sizes for all samples,


c = number of samples,
Tj = sum of ranks in the jth sample,
nj = size of the jth sample.

H = 6.72
Step 5: Find the critical chi-square value. With c-1 degrees of freedom. For 5 4
degrees of freedom and an alpha level of .05, the critical chi square value is 9.4877.
Step 5: Compare the H value from Step 4 to the critical chi-square value from Step
5.
If the critical chi-square value is less than the test statistic, reject the null hypothesis that
the medians are equal.
The chi-square value is not less than the test statistic, so there is not enough evidence to
suggest that the means are unequal.

he Wald Wolfowitz Run Test for Two


Small Samples
This nonparametric test evaluates if two continuous cumulative distributions are
significantly different or not. For example, if the assumption is two production lines
producing the same product create the same resulting dimensions, comparing a set of
samples from each line may reveal if that hypothesis is true or not.
The concepts looks at an ascending ordered list of the data from the two distributions (in
our case lines) keeping an identifier with the data on which line the datapoint originated.
Then we count the number of times the data changes from one line to the other. If too few
changes or runs then the two samples most likely come from different distributions.
Small samples here means the number of samples from each source is 10 or less. With
more samples we can use a normal distribution approximation of the expected count of
runs and will discuss that in a separate article.
Determine the number of runs example
Lets say we have two sets of 8 samples from the two production lines and we measure
output voltage at a key test point of the individual products (of course you should
measure something of importance to the final product quality or reliability). We have line
A and line B with the following data:
A:17.6512.9520.2025.0015.5012.7527.0525.20
B:27.4525.1017.9515.7027.2525.3010.3010.90
Lets order the values and tag each value with the line letter so we can keep track of
which value is from which line.
10.30 B10.90 B12.75 A12.95 A15.50 A15.70 B17.65 A17.95 B
20.20 A25.00 A25.10 B25.20 A25.30 B27.05 A27.25 B27.45 B
The runs are identified as which line the data came from. So, in this example the first
two, lowest, values are from line B. This is one run, those two values. Then there are
three values from line A, 12.75, 12,95, and 15.50. This is another run. And so on creating
a total of 11 runs.

How many runs is too few?


Now, lets say the two lines really were very different and produced results that were
dramatically different. We may have all the line A values centered tightly around 20 and
of the values of line B centered around 50. Ordering the values would create the 8 values
of line A followed by the line B values creating two runs.

If the two lines were a little closer such that they just overlapped with two values we may
have 4 runs A, B, A, B. Yet, if that was 4 As, followed by 4 Bs, then 4 As and the
remaining 4 Bs, that is pretty close to being an overlap of the two distributions. Or is it?
The Wald Wolfowitz approach is to estimate the probability of the number of runs that
may occur using (basically) a binomial distribution approach, we can tally the probability
of the number of runs till we achieve a reasonable critical value to define the threshold to
make a decision.

The Wald Wolfowitz 2 (small) Sample Run Test


The null hypothesis is the two samples are from the same distribution.

Ho:F(x)=G(x)
The alternative hypothesis is the two samples are not from the same distribution.

Ho:F(x)G(x)
The test statistic takes some work to determine. We need to estimate the probability of 2
runs, then 3, or 4, or 5, etc. number of runs. We can do this till we have the number of
observed runs, or reach the critical value of interest.
First the test statistic is calculated by summing the probabilities of observing the count of
possible runs. For an even number of runs use:

P(R=2k)=2(n11k1)(n21k1)(n1+n2n1)
Where R is the number of even runs and equal to 2k, where k is a positive integer. Where,
n1 and n2 are the number of samples from the two sources.
For an odd number of runs use:

P(R=2k+1)=(n11k)(n21k1)+(n21k)(n11k1)
(n1+n2n1)

That is a lot of calculating when there is large number of samples, thus well use a normal
approximation for samples larger than 10 from each source. Yet, here we have only 8
samples from each source, thus we need to calculate the probabilities.
In this case with n1 and n2 equal to 8 the calculation for the probability of just two runs,
R = 2 and therefore k = 1, is:

P(R=2)=2(8111)(8111)(8+88)=212,870=.00016
And the calculation for R = 3 and therefore k=1 again is

P(R=3)=(811)(8111)+(811)(8111)
(8+88)=1412,870=.00109
For R = 4, k = 2, P(R = 4) = 0.00761
For R = 5, k = 2, P(R = 5) = 0.2284
And, for R = 6, k = 3, P(R = 6) = 0.06853.
Lets tally these up and see where we are for cumulative probabilities of being equal to or
less than a number of runs.

P(R3)=0.00016+0.00109=.00125
And,

P(R4)=0.00016+0.00109+0.00761=.00886
And,

P(R5)=0.00016+0.00109+0.00761+0.02284=.00317
And,

P(R6)=0.00016+0.00109+0.00761+0.02284+0.06853=.10023

We can now select our critical value, or the probability of null hypothesis actually
resulting in the run count observed or greater. For example, if we would to take a
relatively small risk, say a 5% risk, or 95% confidence, that the two distributions are
actually the different when they are actually the same, we select 0.05 as the critical value.
If the count of runs is 5 or less the test statistic is 0.0317 given 8 samples from each
source, and the statistic is 0.1002 for R = 6 or less. Thus if we actually have 5 or fewer
runs we have a 95% confidence that the two sources, in this two production lines, are
different.
In this case, we have 11 runs, which is create then the 5 or fewer associated with the
critical value, thus we cannot conclude their is sufficient evidence the two lines are
different.

Tables to make this quicker


This approach requires quite a bit of calculation to determine the test statistic. Yet the
values are independent of the actually values measures, as we use the count of runs. Thus,
we can calculate a table for various number of samples and specific confidence levels or
risk thresholds.
Of course this has been done already and here is one example with a critical value of 0.05
(a 95% one-sided confidence)
n1 n2 Critical R
10107
109 6
108 6
107 6
106 5
105 4
104 4
103 3
102
Thus if we have 10 samples from one source, and 8 from another source, if the number of
runs is 6 or less then we reject the null hypothesis the two sources create the same results
(in other words they are not the same). Note there is not critical R value for 10 and 2
samples as it is not possible to conclude with any count of runs if the two sources are
different or not with 95% one-sided confidence.
n1 n2 Critical R
996
986
975
965

954
944
943
93
n1 n2 Critical R
885
875
864
854
844
833
82
n1 n2 Critical R
774
764
754
743
733
72
n1 n2 Critical R
664
654
643
633
62
n1 n2 Critical R
553
543
53
52
With fewer than 8 total samples we are not able to make a determination using this
method.