Beruflich Dokumente
Kultur Dokumente
Environmental Protection
Agency
Office of
Research and
Development
Office of Solid
Waste and
Emergency
Response
EPA/600/S-97/006
December 1997
The H-statistic
The Jackknife procedure
The Bootstrap procedure
The Central Limit Theorem
The Chebychev Theorem
(10)
(6)
and
(7)
(8)
(11)
(9)
(13)
Jackknife Estimation
In the jackknife approach, n estimates of 2 are
computed by deleting one observation at a time.
Specifically, for each index, i, denote by 2^ (i) the
estimate of 2 (computed similarly as 2^ given
above) when the ith observation is omitted from
the original sample of size n, and denote the
arithmetic mean of these estimates by
(14)
(17)
Step 1. Let (xi1, xi2, ... , xin) represent the ith sample
of size n with replacement from the
original data set (x1, x2, ..., Xn). Then
compute the sample mean and denote it
by 0 I.
(18)
(20)
(19)
(21)
10
(23)
(24)
(26)
(28)
12
_
y = 4.887, sy = 0.966, CVy = 0.20.
Jackknife
Standard Bootstrap
Pivotal Bootstrap
CLT
Adjusted CLT
Chebychev
609.75
584.32
651.52
596.16
618.51
927.27
1085.17
994.40
1869.90
4150.96
Jackknife
Standard Bootstrap
Pivotal Bootstrap
CLT
Adjusted CLT
Chebychev
Notice that the 95% UCL computed from the Hstatistic (4150.96) exceeds the estimated 95th
percentile (2685.56) of an assumed lognormal
distribution. The H-UCL is also an order of
magnitude larger than the true mean, 400, of the
mixture of two normal populations.
265.79
258.21
292.17
260.96
271.57
378.80
Jackknife
Standard Bootstrap
Chebychev
H-UCL
289.30
281.22
448.41
427.62
13
Adjusted CLT
Chebychev
In the next example, the variance of the logtransformed variable is increased slightly with a
corresponding increase in CV and skewness.
Jackknife
Standard Bootstrap
Chebychev
H-UCL
1534.94
1363.26
2708.63
4570.27
919.81
1327.75
905.88
882.82
977.18
887.82
14
= 2595.18, CV = 4.12.
The generated data are:
16.5197, 235.4977, 1860.4443, 74.5825, 3.9684,
1498.6146.
874.22
854.51
1003.00
865.64
932.36
1331.95
1062.35
1088.94
1725.15
1792.54
Jackknife
Standard Bootstrap
Pivotal Bootstrap
CLT
Chebychev
749.31
721.07
862.51
731.14
1173.74
Jackknife
Standard Bootstrap
Chebychev
H-UCL
15
1176.39
1141.95
2059.47
4613.32
The sample size and the mean of the logtransformed variable in examples 4.2, 4.3, and 4.5
are held constant at 15 and 5, respectively,
whereas the standard deviation (sd) of the logtransformed variable are 1.0, 1.5, and 1.7,
respectively. From these examples alone, it can be
seen that as soon as the sd of the log-transformed
variable becomes greater than 1.0, the H-statisticbased UCL becomes orders of magnitude higher
than the largest concentrations observed, even
when the data were obtained from a lognormal
population. Thus, even though the H-UCL is
theoretically sound and possesses optimal
properties for truly lognormal populations such as
being MVUE, the practical merit of the use of HUCL in environmental applications is questionable
when the sd of the log-transformed variable starts
exceeding 1.0. This is especially true for small
sample sizes (e.g., n <30). As seen in the
examples discussed here, the use of the lognormal
distribution and the H-UCL in some
circumstances tends to hide contamination rather
than find it, which is contrary to one of the main
objectives in many environmental applications.
Actually, under the assumption of lognormal
distribution, one can get away with very little or
no cleanup, (Bowers, Neil, and Murphy 1994), at
a polluted site.
3250, 259.
16
Aluminum
3268.22
3125.56
5286.63
3186.47
3675.94
5482.64
Manganese
1405.83
1354.15
1968.03
1376.82
1503.84
2191.81
Aluminum
3283.34
3663.20
5314.99
9102.73
Manganese
1889.52
1821.55
3291.95
5176.16
17
_
x = 16478.33, sx = 58510.78, CVx = 3.55.
Aluminum
12327.40
12246.67
15161.90
12216.95
12895.10
16522.94
Toluene
37431.95
33494.25
152221.00
36547.89
47316.80
71013.85
Aluminum
13542.11
13579.18
19693.40
16503.51
Toluene
62263.37
278888.51
105757.50
18444955.15
Notice
19
References
20