Sie sind auf Seite 1von 8

E

xperimentalists use statistical calculations to sharpen their judgments


concerning the quality of experimental measurements. In this web chapter,
we consider several of the most common applications of statistical tests to
the treatment of analytical results. These applications include:
1. Estimating the probability that (a) an experimental mean and a true value or
(b) two experimental means are different, that is, whether the difference is
real or simply the result of random error. This test is particularly important for
discovering systematic errors in a method and for determining whether two
samples come from the same source.
2. Deciding whether what appears to be an outlier in a set of replicate measure-
ments is with a certain probability the result of a gross error and can thus be
rejected or whether it is a legitimate result that must be retained in calculating
the mean of the set.
19A STATISTICAL AIDS TO HYPOTHESIS TESTING
Much of scientic and engineering endeavor is based on hypothesis testing.
Thus, to explain an observation, a hypothetical model is advanced and tested
experimentally to determine its validity. If the results from these experiments do
not support the model, we reject it and seek a new hypothesis. If agreement is
found, the hypothetical model serves as the basis for further experi-ments.
When the hypothesis is supported by sufcient experimental data, it becomes
recognized as a useful theory until such time as data are obtained that refute it.
Experimental results seldom agree exactly with those predicted from a theoret-
ical model. Consequently, scientists and engineers frequently must judge whether
a numerical difference is a manifestation of the random errors inevitable in all
measurements. Certain statistical tests are useful in sharpening these judgments.
Tests of this kind make use of a null hypothesis, which assumes that the
numerical quantities being compared are, in fact, the same. The probability of the
observed differences appearing as a result of random error is then computed from
a probability distribution. Usually, if the observed difference is greater than or
equal to the difference that would occur 5 times in 100 (the 5% probability level),
Statistical Aids to Hypothesis
Testing and Gross Errors
Web Chapter 19
1
In statistics, a null hypothesis postulates
that two observed quantities are the same.
72795_02_ch19_p001-008.qxp 3/23/11 1:01 PM Page 1
the null hypothesis is considered questionable and the difference is judged to
be signicant. Other probability levels, such as 1 in 100 or 10 in 100, may also be
adopted, depending on the certainty desired in the judgment. These probability
levels are often called signicance levels and are given the symbol in statistics.
The condence level as a percentage is related to and is given by (1 )
100%.
The kinds of testing that chemists use most often include the comparison of
(1) the mean of an experimental data set with what is believed to be the true
value ; (2) the means or the standard deviations
1
and
2
from two
sets of data; and (3) the mean to a predicted or theoretical value. The sections
that follow consider some of the methods for making these comparisons.
19A-1 Comparing an Experimental Mean with the True Value
A common way of testing for bias in an analytical method is to use the
method to analyze a sample whose composition is accurately known. Bias in
an analytical method is illustrated by the two curves shown in Figure 19-1,
which show the frequency distribution of replicate results in the analysis of
identical samples by two analytical methods having random errors of exactly
the same size. Method A has no bias, so the population mean
A
is the true
value x
t
. Method B has a systematic error, or bias, that is given by
(19-1)
Note that bias effects all the data in the set in the same way and that it can be
either positive or negative.
In testing for bias by analyzing a sample whose analyte concentration is known
exactly, it is likely that the experimental mean will differ from the accepted
value x
t
as shown in the gure; the judgment must then be made whether this dif-
ference is the consequence of random error or, alternatively, a systematic error.
In treating this type of problem statistically, the difference is com-
pared with the difference that could be caused by random error. If the observed
difference is less than that computed for a chosen probability level, the null
hypothesis that are the same cannot be rejected; that is, no signicant
systematic error has been demonstrated. It is important to realize, however, that
this statement does not say that there is no systematic error; it says only that
whatever systematic error is present is so small that it cannot be distinguished
from random error. If is signicantly larger than either the expected or
the critical value, we may assume that the difference is real and that the system-
atic error is signicant.
The critical value for rejecting the null hypothesis is calculated by rewriting
Equation 3-18 (Chapter 3) in the form
(19-2)
where N is the number of replicate measurements used in the test. If a good estimate
of is available, Equation 19-2 can be modied by replacing t with z and s with .
Example 19-1 illustrates the use of an hypothesis test to determine whether there is
bias in a method.
x x
t

ts

N
x x
t
x and x
t
x x
t
x
bias
B
x
t

B

A
x
x
1
and x
2
x
Web Chapter 19 Statistical Aids to Hypothesis Testing and Gross Errors
2
R
e
l
a
t
i
v
e

f
r
e
q
u
e
n
c
y
,

d
N
/
N
Analytical result, x
i
t
x =
A

B
B
bias
A

Figure 19-1 Illustration of bias:


bias
B
x
t

B

A
72795_02_ch19_p001-008.qxp 3/23/11 1:01 PM Page 2
Example 19-1
A new procedure for the rapid determination of sulfur in kerosenes was tested
on a sample known from its method of preparation to contain 0.123% S (x
t
).
The results were % S 0.112, 0.118, 0.115, and 0.119. Do the data indicate
that there is bias in the method?
From Table 3-6 (Chapter 3), we nd that at the 95% condence level, t has a
value of 3.18 for three degrees of freedom. Thus, we can calculate a test value
of t from our data and compare it to the values given in the tables at the desired
condence level. The values of t from the tables are often called critical values
and symbolized t
crit
. The test value is calculated from
If we reject the null hypothesis at the confidence level chosen. The
absolute value of t is used because we are interested in testing only that there
is a difference between our mean and the true value and do not care about the
sign of the difference. This type of test is often called a two-tailed test. In our
case
Since 4.375 3.18, the critical value of t at the 95% condence level, we con-
clude that a difference this large is signicant and reject the null hypothesis.
At the 99% condence level, t
crit
5.84 (Table 3-6). Since 4.375 5.84, we
would accept the null hypothesis at the 99% condence level and conclude that
there is no difference between the results. Note that the probability (signicance)
level (0.05 or 0.01) is the probability of making an error by rejecting the null
hypothesis.
19A-2 Comparing Two Experimental Means
The results of chemical analyses are frequently used to determine whether two
materials are identical. Here, the chemist must judge whether a difference in the
t
0.007
0.0032/

4
4.375
t t
crit
,
t
x x
t
s/

N
s

0.053854 (0.464)
2
/4
4 1

0.000030
3
0.0032
x
2
i
0.012544 0.013924 0.013225 0.014161 0.053854
x x
t
0.116 0.123 0.007% S
x
0.464
4
0.116% S
x
i
0.112 0.118 0.115 0.119 0.464
3
19A Statistical Aids To Hypothesis Testing
The probability of a difference this large
occurring because of only random errors
can be obtained from the Excel function
TDIST(x, deg_freedom, tails), where x is
the test value of t(4.375), deg_freedom is 3
for our case, and tails 2. The result is
TDIST(4.375,3,2) 0.022. Hence, it is
only 2.2% probable to get a value this large
because of random errors. The critical
value of t for a given condence level
can be obtained in Excel from
TINV(probability,deg_freedom). In
our case TINV(0,05,3) 3.1825.
If it was conrmed by further experiments
that the method always gave low results, we
would say that the method had a negative
bias.
Even if a mean value is shown to be equal to
the true value at a given condence level,
we cannot conclude that there is no system-
atic error in the data.
72795_02_ch19_p001-008.qxp 3/23/11 1:01 PM Page 3
Web Chapter 19 Statistical Aids to Hypothesis Testing and Gross Errors
4
means of two sets of identical analyses is real and constitutes evidence that the
samples are different or whether the discrepancy is simply a consequence of
random errors in the two sets. To illustrate, let us assume that N
1
replicate analy-
ses of material 1 yielded a mean value of and that N
2
analyses of material 2
obtained by the same method gave a mean of If the data were collected in an
identical way, it is usually safe to assume that the standard deviations of the two
sets of measurements are the same. We can then modify Equation 19-2 to take
into account that one set of results is being compared with a second rather than
with the true mean of the data, x
t
.
In this case, as with the previous one, we invoke the null hypothesis that the
samples are identical and that the observed difference in the results, is
the result of random errors. To test this hypothesis statistically, we modify
Equation 19-2 in the following way. First, we substitute for x
t
, thus making
the left side of the equation the numerical difference between the two means
Since we know from Equation 6-5 that the standard deviation of the
mean is
and likewise for
Thus, the variance of the difference between the means is
given by
By substituting the values of s
d
, s
m1
, and s
m2
into this equation, we have
If we then assume that the pooled standard deviation s
pooled
is a good estimate of
both s
m1
and s
m2
, then
and
Substituting this equation into Equation 19-2 (and also for x
t
), we nd that
(19-3)
or the test value of t is given by
(19-4)
We then compare our test value of t with the critical value obtained from the
t
x
1
x
2
s
pooled

N
1
N
2
N
1
N
2
x
1
x
2
ts
pooled

N
1
N
2
N
1
N
2
x
2
s
d

N
s
pooled

N
1
N
2
N
1
N
2

s
d

N

2

s
pooled

N
1

s
pooled

N
2

2
s
2
pooled

N
1
N
2
N
1
N
2

s
d

N

2

s
m1

N
1

s
m2

N
2

2
s
2
d
s
2
m1
s
2
m2
(d x
1
x
2
) s
2
d
s
m2

s
2

N
2
x
2
,
s
m1

s
1

N
1
x
1
x
1
x
2
.
x
2
(x
1
x
2
),
x
2
.
x
1
72795_02_ch19_p001-008.qxp 3/23/11 1:01 PM Page 4
table for the particular condence level desired. The number of degrees of free-
dom for nding the critical value of t in Table 3-6 is N
1
N
2
2. If the absolute
value of the test statistic is smaller than the critical value, the null hypothesis is
accepted and no signicant difference between the means has been demon-
strated. Atest value of t greater than the critical value of t indicates that there is a
signicant difference between the means.
If a good estimate of is available, Equation 19-3 can be modied by inserting
z for t and for s.
Example 19-2
Two barrels of wine were analyzed for their alcohol content to determine whether
they were from different sources. On the basis of six analyses, the average content
of the rst barrel was established to be 12.61% ethanol. Four analyses of the
second barrel gave a mean of 12.53% alcohol. The ten analyses yielded a pooled
value of s 0.070%. Do the data indicate a difference between the wines?
Here we employ Equation 19-4 to calculate the test statistic t.
The critical value of t at the 95% condence level for 10 2 8 degrees of free-
dom is 2.31. Since 1.771 2.31, we accept the null hypothesis at the 95% con-
dence level and conclude that there is no difference in the alcohol content of the
wines. The probability of getting a t value of 1.771 may be calculated using the
Excel function TDIST() and is TDIST(1.771,8,2) 0.11. Hence there is a better
than 10% chance that a value this large could occur just because of random error.
In Example 19-2, no signicant difference between the alcohol content of the
two wines was indicated at the 95% condence level. Note that this statement is
equivalent to saying that is equal to with a certain probability, but the tests do
not prove that the wines come from the same source. Indeed, it is conceivable that
one wine is a red and the other is a white. To establish with a reasonable probability
that the two wines are from the same source would require extensive testing of other
characteristics, such as taste, color, odor, and refractive index as well as tartaric acid,
sugar, and trace element content. If no signicant differences are revealed by all
these tests and by others, it might be possible to judge the two wines as having a
common origin. In contrast, the nding of one signicant difference in any test
would clearly show that the two wines are different. Thus, the establishment of a
signicant difference by a single test is much more revealing than the establishment
of an absence of difference.
19B DETECTING GROSS ERRORS
Adata point that differs excessively from the mean in a data set is termed an out-
lier. When a set of data contains an outlier, the decision must be made whether to
retain or reject it. The choice of criterion for the rejection of a suspected result
has its perils. If we set a stringent standard that makes the rejection of a question-
able measurement difcult, we run the risk of retaining results that are spurious
x
2
x
1
t
x
1
x
2
s
pooled

N
1
N
2
N
1
N
2

12.61 12.53
0.07

6 4
6 4
1.771
19B Detecting Gross Errors
5
Outliers are the result of gross errors.
72795_02_ch19_p001-008.qxp 3/23/11 1:01 PM Page 5
Web Chapter 19 Statistical Aids to Hypothesis Testing and Gross Errors
6
and have an inordinate effect on the mean of the data. If we set lenient limits on
precision and thereby make the rejection of a result easy, we are likely to discard
measurements that rightfully belong in the set, thus introducing a bias to the data.
It is an unfortunate fact that no universal rule can be invoked to settle the ques-
tion of retention or rejection.
1
19B-1 Using the Q Test
The Q test is a simple and widely used statistical test.
2
In this test, the absolute
value of the difference between the questionable result x
q
and its nearest neigh-
bor x
n
is divided by the spread w of the entire set to give the quantity Q
exp
:
(19-5)
This ratio is then compared with rejection values Q
crit
found in Table 19-1. If
Q
exp
is greater than Q
crit
, the questionable result can be rejected with the indi-
cated degree of condence (See Figure 19-2.).
Q
exp

x
q
x
n

w

x
q
x
n

x
high
x
low

x
1
x
2
x
3
x
4
x
5
x
6
d
d = x
6
x
5
w = x
6
x
1
Q
exp
= d/w
If Q
exp
> Q
crit
, reject

x
6
w
x
Figure 19-2 The Q test for outliers.
Table 19-1
Critical Values for the Rejection Quotient Q
Number of
Observations 90% Condence 95% Condence 99% Condence
3 0.941 0.970 0.994
4 0.765 0.829 0.926
5 0.642 0.710 0.821
6 0.560 0.625 0.740
7 0.507 0.568 0.680
8 0.468 0.526 0.634
9 0.437 0.493 0.598
10 0.412 0.466 0.568
Source: Reproduced from D. B. Rorabacher, Anal. Chem., 1991, 63, 139. By courtesy of the
American Chemical Society.
Q
crit
(Reject if Q
exp
Q Q
crit
)
Example 19-3
The analysis of a calcite sample yielded CaO percentages of 55.95, 56.00,
56.04, 56.08, and 56.23. The last value appears anomalous; should it be re-
tained or rejected?
The difference between 56.23 and 56.08 is 0.15%. The spread (56.23
55.95) is 0.28%. Thus,
Q
exp

0.15
0.28
0.54
1
J. Mandel, in Treatise on Analytical Chemistry, 2nd ed., I. M. Kolthoff and P. J. Elving, Eds.,
Part I, Vol. 1 (New York: Wiley, 1978), pp. 282289.
2
R. B. Dean and W. J. Dixon, Anal. Chem., 1951, 23, 636.
72795_02_ch19_p001-008.qxp 3/23/11 1:01 PM Page 6
19B Detecting Gross Errors
For ve measurements, Q
crit
at the 90% condence level is 0.64. Because
0.54 0.64, we must retain the outlier at the 90% condence level.
19B-2 A Word of Caution about Rejecting Outliers
Several other statistical tests have been developed to provide criteria for rejection
or retention of outliers. Such tests, like the Q test, assume that the distribution of
the population data is normal, or Gaussian. Unfortunately, this condition cannot
be proved or disproved for samples that have many fewer than 50 results.
Consequently, statistical rules, which are perfectly reliable for normal distribu-
tions of data, should be used with extreme caution when applied to samples con-
taining only a few data. J. Mandel, in discussing treatment of small sets of data,
writes, Those who believe that they can discard observations with statistical
sanction by using statistical rules for the rejection of outliers are simply deluding
themselves.
3
Thus, statistical tests for rejection should be used only as aids to
common sense when small samples are involved.
The blind application of statistical tests to retain or reject a suspect measure-
ment in a small set of data is not likely to be much more fruitful than an arbitrary
decision. The application of good judgment based on broad experience with an
analytical method is usually a sounder approach. In the end, the only valid reason
for rejecting a result from a small set of data is the sure knowledge that a mistake
was made in the measurement process. Without this knowledge, a cautious
approach to rejection of an outlier is wise.
19B-3 How Do We Deal with Outliers?
Recommendations for the treatment of a small set of results that contains a
suspect value are:
1. Re-examine carefully all data relating to the outlying result to see if a gross
error could have affected its value. This recommendation demands a properly
kept laboratory notebook containing careful notations of all observations
(see Section 18I).
2. If possible, estimate the precision that can be reasonably expected from the
procedure to be sure that the outlying result actually is questionable.
3. Repeat the analysis if sufcient sample and time are available. Agreement
between the newly acquired data and those of the original set that appear to be
valid will lend weight to the notion that the outlying result should be rejected.
Furthermore, if retention is still indicated, the questionable result will have a
smaller effect on the mean of the larger set of data.
4. If more data cannot be obtained, apply the Q test to the existing set to see if
the doubtful result should be retained or rejected on statistical grounds.
5. If the Q test indicates retention, consider reporting the median of the set
rather than the mean. The median has the great virtue of allowing inclusion
of all data in a set without undue inuence from an outlying value. In addi-
tion, the median of a normally distributed set containing three measurements
provides a better estimate of the correct value than the mean of the set after
the outlying value has been discarded.
7
Use extreme caution when rejecting data for
any reason.
3
J. Mandel, in Treatise on Analytical Chemistry, 2nd ed., I. M. Kolthoff and P. J. Elving, Eds.,
Part I, Vol. 1 (New York: Wiley, 1978), p. 282.
72795_02_ch19_p001-008.qxp 3/23/11 1:01 PM Page 7
Web Chapter 19 Statistical Aids to Hypothesis Testing and Gross Errors
8
19C QUESTIONS AND PROBLEMS
19-1. Lord Rayleigh prepared nitrogen samples by
several different methods. The density of each
sample was measured as the mass of gas re-
quired to ll a particular ask at a certain
temperature and pressure. Masses of nitrogen
samples prepared by decomposition of various
nitrogen compounds were 2.29890, 2.29940,
2.29849, and 2.30054 g. Masses of nitrogen
prepared by removing oxygen from air in vari-
ous ways were 2.31001, 2.31163, and 2.31028
g. Is the density of nitrogen prepared from nitro-
gen compounds signicantly different from that
prepared from air? What are the chances of the
conclusion being in error? (Study of this differ-
ence led to the discovery of the inert gases by
Sir William Ramsey, Lord Rayleigh.)
19-2. Apply the Q test to the following data sets to de-
termine whether the outlying result should be
retained or rejected at the 95% condence level.
(a) 41.27, 41.61, 41.84, 41.70
(b) 7.295, 7.284, 7.388, 7.292
19-3. Apply the Q test to the following data sets to
determine whether the outlying result should be
retained or rejected at the 95% condence level.
(a) 85.10, 84.62, 84.70
(b) 85.10, 84.62, 84.65, 84.70
72795_02_ch19_p001-008.qxp 3/23/11 1:01 PM Page 8

Das könnte Ihnen auch gefallen