Beruflich Dokumente
Kultur Dokumente
N
x x
t
x and x
t
x x
t
x
bias
B
x
t
B
A
x
x
1
and x
2
x
Web Chapter 19 Statistical Aids to Hypothesis Testing and Gross Errors
2
R
e
l
a
t
i
v
e
f
r
e
q
u
e
n
c
y
,
d
N
/
N
Analytical result, x
i
t
x =
A
B
B
bias
A
4
4.375
t t
crit
,
t
x x
t
s/
N
s
0.053854 (0.464)
2
/4
4 1
0.000030
3
0.0032
x
2
i
0.012544 0.013924 0.013225 0.014161 0.053854
x x
t
0.116 0.123 0.007% S
x
0.464
4
0.116% S
x
i
0.112 0.118 0.115 0.119 0.464
3
19A Statistical Aids To Hypothesis Testing
The probability of a difference this large
occurring because of only random errors
can be obtained from the Excel function
TDIST(x, deg_freedom, tails), where x is
the test value of t(4.375), deg_freedom is 3
for our case, and tails 2. The result is
TDIST(4.375,3,2) 0.022. Hence, it is
only 2.2% probable to get a value this large
because of random errors. The critical
value of t for a given condence level
can be obtained in Excel from
TINV(probability,deg_freedom). In
our case TINV(0,05,3) 3.1825.
If it was conrmed by further experiments
that the method always gave low results, we
would say that the method had a negative
bias.
Even if a mean value is shown to be equal to
the true value at a given condence level,
we cannot conclude that there is no system-
atic error in the data.
72795_02_ch19_p001-008.qxp 3/23/11 1:01 PM Page 3
Web Chapter 19 Statistical Aids to Hypothesis Testing and Gross Errors
4
means of two sets of identical analyses is real and constitutes evidence that the
samples are different or whether the discrepancy is simply a consequence of
random errors in the two sets. To illustrate, let us assume that N
1
replicate analy-
ses of material 1 yielded a mean value of and that N
2
analyses of material 2
obtained by the same method gave a mean of If the data were collected in an
identical way, it is usually safe to assume that the standard deviations of the two
sets of measurements are the same. We can then modify Equation 19-2 to take
into account that one set of results is being compared with a second rather than
with the true mean of the data, x
t
.
In this case, as with the previous one, we invoke the null hypothesis that the
samples are identical and that the observed difference in the results, is
the result of random errors. To test this hypothesis statistically, we modify
Equation 19-2 in the following way. First, we substitute for x
t
, thus making
the left side of the equation the numerical difference between the two means
Since we know from Equation 6-5 that the standard deviation of the
mean is
and likewise for
Thus, the variance of the difference between the means is
given by
By substituting the values of s
d
, s
m1
, and s
m2
into this equation, we have
If we then assume that the pooled standard deviation s
pooled
is a good estimate of
both s
m1
and s
m2
, then
and
Substituting this equation into Equation 19-2 (and also for x
t
), we nd that
(19-3)
or the test value of t is given by
(19-4)
We then compare our test value of t with the critical value obtained from the
t
x
1
x
2
s
pooled
N
1
N
2
N
1
N
2
x
1
x
2
ts
pooled
N
1
N
2
N
1
N
2
x
2
s
d
N
s
pooled
N
1
N
2
N
1
N
2
s
d
N
2
s
pooled
N
1
s
pooled
N
2
2
s
2
pooled
N
1
N
2
N
1
N
2
s
d
N
2
s
m1
N
1
s
m2
N
2
2
s
2
d
s
2
m1
s
2
m2
(d x
1
x
2
) s
2
d
s
m2
s
2
N
2
x
2
,
s
m1
s
1
N
1
x
1
x
1
x
2
.
x
2
(x
1
x
2
),
x
2
.
x
1
72795_02_ch19_p001-008.qxp 3/23/11 1:01 PM Page 4
table for the particular condence level desired. The number of degrees of free-
dom for nding the critical value of t in Table 3-6 is N
1
N
2
2. If the absolute
value of the test statistic is smaller than the critical value, the null hypothesis is
accepted and no signicant difference between the means has been demon-
strated. Atest value of t greater than the critical value of t indicates that there is a
signicant difference between the means.
If a good estimate of is available, Equation 19-3 can be modied by inserting
z for t and for s.
Example 19-2
Two barrels of wine were analyzed for their alcohol content to determine whether
they were from different sources. On the basis of six analyses, the average content
of the rst barrel was established to be 12.61% ethanol. Four analyses of the
second barrel gave a mean of 12.53% alcohol. The ten analyses yielded a pooled
value of s 0.070%. Do the data indicate a difference between the wines?
Here we employ Equation 19-4 to calculate the test statistic t.
The critical value of t at the 95% condence level for 10 2 8 degrees of free-
dom is 2.31. Since 1.771 2.31, we accept the null hypothesis at the 95% con-
dence level and conclude that there is no difference in the alcohol content of the
wines. The probability of getting a t value of 1.771 may be calculated using the
Excel function TDIST() and is TDIST(1.771,8,2) 0.11. Hence there is a better
than 10% chance that a value this large could occur just because of random error.
In Example 19-2, no signicant difference between the alcohol content of the
two wines was indicated at the 95% condence level. Note that this statement is
equivalent to saying that is equal to with a certain probability, but the tests do
not prove that the wines come from the same source. Indeed, it is conceivable that
one wine is a red and the other is a white. To establish with a reasonable probability
that the two wines are from the same source would require extensive testing of other
characteristics, such as taste, color, odor, and refractive index as well as tartaric acid,
sugar, and trace element content. If no signicant differences are revealed by all
these tests and by others, it might be possible to judge the two wines as having a
common origin. In contrast, the nding of one signicant difference in any test
would clearly show that the two wines are different. Thus, the establishment of a
signicant difference by a single test is much more revealing than the establishment
of an absence of difference.
19B DETECTING GROSS ERRORS
Adata point that differs excessively from the mean in a data set is termed an out-
lier. When a set of data contains an outlier, the decision must be made whether to
retain or reject it. The choice of criterion for the rejection of a suspected result
has its perils. If we set a stringent standard that makes the rejection of a question-
able measurement difcult, we run the risk of retaining results that are spurious
x
2
x
1
t
x
1
x
2
s
pooled
N
1
N
2
N
1
N
2
12.61 12.53
0.07
6 4
6 4
1.771
19B Detecting Gross Errors
5
Outliers are the result of gross errors.
72795_02_ch19_p001-008.qxp 3/23/11 1:01 PM Page 5
Web Chapter 19 Statistical Aids to Hypothesis Testing and Gross Errors
6
and have an inordinate effect on the mean of the data. If we set lenient limits on
precision and thereby make the rejection of a result easy, we are likely to discard
measurements that rightfully belong in the set, thus introducing a bias to the data.
It is an unfortunate fact that no universal rule can be invoked to settle the ques-
tion of retention or rejection.
1
19B-1 Using the Q Test
The Q test is a simple and widely used statistical test.
2
In this test, the absolute
value of the difference between the questionable result x
q
and its nearest neigh-
bor x
n
is divided by the spread w of the entire set to give the quantity Q
exp
:
(19-5)
This ratio is then compared with rejection values Q
crit
found in Table 19-1. If
Q
exp
is greater than Q
crit
, the questionable result can be rejected with the indi-
cated degree of condence (See Figure 19-2.).
Q
exp
x
q
x
n
w
x
q
x
n
x
high
x
low
x
1
x
2
x
3
x
4
x
5
x
6
d
d = x
6
x
5
w = x
6
x
1
Q
exp
= d/w
If Q
exp
> Q
crit
, reject
x
6
w
x
Figure 19-2 The Q test for outliers.
Table 19-1
Critical Values for the Rejection Quotient Q
Number of
Observations 90% Condence 95% Condence 99% Condence
3 0.941 0.970 0.994
4 0.765 0.829 0.926
5 0.642 0.710 0.821
6 0.560 0.625 0.740
7 0.507 0.568 0.680
8 0.468 0.526 0.634
9 0.437 0.493 0.598
10 0.412 0.466 0.568
Source: Reproduced from D. B. Rorabacher, Anal. Chem., 1991, 63, 139. By courtesy of the
American Chemical Society.
Q
crit
(Reject if Q
exp
Q Q
crit
)
Example 19-3
The analysis of a calcite sample yielded CaO percentages of 55.95, 56.00,
56.04, 56.08, and 56.23. The last value appears anomalous; should it be re-
tained or rejected?
The difference between 56.23 and 56.08 is 0.15%. The spread (56.23
55.95) is 0.28%. Thus,
Q
exp
0.15
0.28
0.54
1
J. Mandel, in Treatise on Analytical Chemistry, 2nd ed., I. M. Kolthoff and P. J. Elving, Eds.,
Part I, Vol. 1 (New York: Wiley, 1978), pp. 282289.
2
R. B. Dean and W. J. Dixon, Anal. Chem., 1951, 23, 636.
72795_02_ch19_p001-008.qxp 3/23/11 1:01 PM Page 6
19B Detecting Gross Errors
For ve measurements, Q
crit
at the 90% condence level is 0.64. Because
0.54 0.64, we must retain the outlier at the 90% condence level.
19B-2 A Word of Caution about Rejecting Outliers
Several other statistical tests have been developed to provide criteria for rejection
or retention of outliers. Such tests, like the Q test, assume that the distribution of
the population data is normal, or Gaussian. Unfortunately, this condition cannot
be proved or disproved for samples that have many fewer than 50 results.
Consequently, statistical rules, which are perfectly reliable for normal distribu-
tions of data, should be used with extreme caution when applied to samples con-
taining only a few data. J. Mandel, in discussing treatment of small sets of data,
writes, Those who believe that they can discard observations with statistical
sanction by using statistical rules for the rejection of outliers are simply deluding
themselves.
3
Thus, statistical tests for rejection should be used only as aids to
common sense when small samples are involved.
The blind application of statistical tests to retain or reject a suspect measure-
ment in a small set of data is not likely to be much more fruitful than an arbitrary
decision. The application of good judgment based on broad experience with an
analytical method is usually a sounder approach. In the end, the only valid reason
for rejecting a result from a small set of data is the sure knowledge that a mistake
was made in the measurement process. Without this knowledge, a cautious
approach to rejection of an outlier is wise.
19B-3 How Do We Deal with Outliers?
Recommendations for the treatment of a small set of results that contains a
suspect value are:
1. Re-examine carefully all data relating to the outlying result to see if a gross
error could have affected its value. This recommendation demands a properly
kept laboratory notebook containing careful notations of all observations
(see Section 18I).
2. If possible, estimate the precision that can be reasonably expected from the
procedure to be sure that the outlying result actually is questionable.
3. Repeat the analysis if sufcient sample and time are available. Agreement
between the newly acquired data and those of the original set that appear to be
valid will lend weight to the notion that the outlying result should be rejected.
Furthermore, if retention is still indicated, the questionable result will have a
smaller effect on the mean of the larger set of data.
4. If more data cannot be obtained, apply the Q test to the existing set to see if
the doubtful result should be retained or rejected on statistical grounds.
5. If the Q test indicates retention, consider reporting the median of the set
rather than the mean. The median has the great virtue of allowing inclusion
of all data in a set without undue inuence from an outlying value. In addi-
tion, the median of a normally distributed set containing three measurements
provides a better estimate of the correct value than the mean of the set after
the outlying value has been discarded.
7
Use extreme caution when rejecting data for
any reason.
3
J. Mandel, in Treatise on Analytical Chemistry, 2nd ed., I. M. Kolthoff and P. J. Elving, Eds.,
Part I, Vol. 1 (New York: Wiley, 1978), p. 282.
72795_02_ch19_p001-008.qxp 3/23/11 1:01 PM Page 7
Web Chapter 19 Statistical Aids to Hypothesis Testing and Gross Errors
8
19C QUESTIONS AND PROBLEMS
19-1. Lord Rayleigh prepared nitrogen samples by
several different methods. The density of each
sample was measured as the mass of gas re-
quired to ll a particular ask at a certain
temperature and pressure. Masses of nitrogen
samples prepared by decomposition of various
nitrogen compounds were 2.29890, 2.29940,
2.29849, and 2.30054 g. Masses of nitrogen
prepared by removing oxygen from air in vari-
ous ways were 2.31001, 2.31163, and 2.31028
g. Is the density of nitrogen prepared from nitro-
gen compounds signicantly different from that
prepared from air? What are the chances of the
conclusion being in error? (Study of this differ-
ence led to the discovery of the inert gases by
Sir William Ramsey, Lord Rayleigh.)
19-2. Apply the Q test to the following data sets to de-
termine whether the outlying result should be
retained or rejected at the 95% condence level.
(a) 41.27, 41.61, 41.84, 41.70
(b) 7.295, 7.284, 7.388, 7.292
19-3. Apply the Q test to the following data sets to
determine whether the outlying result should be
retained or rejected at the 95% condence level.
(a) 85.10, 84.62, 84.70
(b) 85.10, 84.62, 84.65, 84.70
72795_02_ch19_p001-008.qxp 3/23/11 1:01 PM Page 8