Beruflich Dokumente
Kultur Dokumente
Indian Standard
CRITERIA FOR THE REJECTION OF
OUTLYING OBSERVATIONS
Chairman Representing
DR P. K. BOSE University of Calcutta, Calcutta
Members
SHRI B. ANANTHAKRISHNANAND National Productivity Council, New Delhi
SHRIR. S. GUPTA ( Alternate )
SHRI M. G. BHADE Tata Iron and Steel Co Ltd, Jamshedpur
DIRECTOR Institute of Agricultural Research Statistics (ICAR),
New Delhi
DR S. S. PILLAI ( Alternate)
SHRI D. DUTTA The Indian Tube Co Ltd, Jamshedpur
SARI Y. GHOUSEKHAN NGEF, Bangalore
SHRI C. RAJANA ( Alternate )
SHRI S. K. GUPTA Central Statistical Organization, New Delhi
SHRI A. LAHIRI Indian Jute Industries’ Research Association,
Calcutta
SHRI U. DUTTA ( Alternate )
SHRI P. LAKSHMANAN Indian Statistical Institute, Calcutta
SHRI S. MONDOL National Test House, Calcutta
SHRI S. K. BANERJEII.(Alternate )
DR S. P. MUKHERJEE Indian Association for Productivity, Quality and
Reliability, Calcutta
SHRI B. H~ATSINGKA ( Alternate )
SHRI Y. P. RAJPUT Army Statistical Organization ( Ministry of De-
fence), New Delhi
SHRI C. L. VERMA ( Alternate )
SHRI RAMESHSHANKER Directorate General of Inspection ( Ministry of
Defence ), New Delhi
SHRI N. S. SENGAR ( Alternate )
SHRI T. V. RATNAM The South India Textile Research Association,
Coimbatore
DRD. RAY Defence Research and Development Organization
( Ministry of Defence ), New Delhi
SHRI S. RANGANATHAN ( Alternate )
REPRESENTATIVE Indian Institute of Technology, Kharagpur
SHRI P. R. SENGUPTA Tea Board, Calcutta
SHRI N. RAMADURAI ( Alternate )
( Continued on page 2 )
(Continuedfrom fiage 1)
Members Rejresenting
SHRI S. SUBRAMU Steel Authority of India Ltd, New Delhi
SHRI S. N. VOHRA Directorate General of Supplies and Disposals, New
Delhi
DR B. N. SINGH, Director General, IS1 ( Ex-ofjcio Member )
Director ( Stat )
Secretary
SHRI Y. K. BHAT
Deputy Director (Stat), IS1
Convener
DR P. K. BOSE University of Calcutta, Calcutta; and Indian
Institute of Social Welfare and Business
Management, Calcutta
Members
DIRECTOR Institute of Agricultural Research Statistics ( ICAR ),
New Delhi
DR B. B. P. S. GOEL ( Alternate )
SHRI S. K. GUPTA Central Statistical Organization, New Delhi
SHRI S. B. PANDEY Imperial Chemical Industries ( India ) Private Ltd,
Calcutta
SHRI Y. P. RAJPUT Army Statistical Organization ( Ministry of Defence ),
New Delhi
SHRI C. L. VERMA ( Alternate)
DR D. RAY Defence Research and Development Organization
( Ministry of Defence ), New Delhi
REPRESENTATIVE Indian Institute of Technology, Kharagpur
SHRI B. K. SARKAR Indian Statistical Institute, Calcutta
SHRI D. R. SEN Delhi Cloth & General Mills Co Ltd, Delhi
SHRI K. N. VALI National Sample Survey Organization, New Delhi
2
1S t 8900 - 1978
Indian Standard
CRITERIA FOR THE REJECTION OF
~OUTLYING OBSERVATIONS
0. FOREWORD
0.1 This Indian Standard was adopted by the Indian Standards Insti-
tution on 25 July 1978, after the draft finalized by the Quality Control
and Industrial Statistics Sectional Committee had been approved by the
Executive Committee.
0.2 An outlying observation or an ‘outlier’ is one that appears to deviate
markedly from the other observations of the sample in which it occurs.
An outlier may arise merely because of an extreme manifestation of the
random variability inherent in the data or because of the non-random
errors, such as gross deviation from the prescribed experimental procedure,
mistakes in calculations, errors in recording numerical values, other
human errors, loss of the calibration of an instrument, change of measuring
instruments, etc. If it is known that a mistake has occurred, the outlying
observation must be rejected irrespective of its magnitude. If, however,
only a suspicion exists, it may be desirable to determine whether such an
observatien may be rejected or whether it may be accepted as part of the
normal variation expected.
0.3 The procedure consists in testing the statistical significance of the
outlier(s). A null hypothesis ( assumption ) is made that all the obser-
vations including the suspect observations come from thesame population
( or lot ) as the other observations in the sample. A statistical test is then
applied to determine whether this null hypothesis can be rejected at the
specified level of significance ( see 2.8 ). If so, the outliers can then be
taken to have come from a population(s) different from that of the other
observations in the sample and hence the outlier(s) can be rejected.
0.3.1 This standard provides certain statistical criteria which would be
useful for the identification of the outlier(s) on an objective basis. When
so identified, necessary investigations may also be initiated wherever pos-
sible to find *out. the assignable causes which gave_ rise to the .outlier(s).
zh~Foother objectlves of the statrstrcal tests for locatmg the outher may
e :
a) screen the data before statistical analysis;
b) sound an alarm that outliers are present, thereby indicating the
need for a closer study of the data generating process; and
c) pin-point the observations which may be of special interest just
because they are extremes.
3
IS : 8900 - 1978
0.4 When a test method recommends more than one determination for
reporting the average value of the characteristic under consideration, the
precision ( repeatability and reproducibility ) of the test method is found
to be useful in detecting observations which deviate unduly from the rest.
For further guidance in this regard, reference may be made to IS : 5420
( Part I )-1969*. When the -precision of the test method is not known
quantitatively, one or more of the procedures covered in the standard
may be found helpful. The tests outlined in this standard primarily apply
to the observations in a single random sample or the experimental data
as given by the replicate measures of some property of a given material.
0.5.1 In the case of a single outlier ( the smallest or the largest suspect
observation ), two tests are available. One test is based on the standard
deviation whereas the other test is based on the ratio of differences bet-
ween certain order statistics, that is, the observations when they are
arranged in the ascending or descending order of magnitude. The
latter test would be more useful in those cases where the calculation of
standard deviation is to be avoided or a quick judgement is called for.
0.5.2 In the case of two or more outliers at either end of the ordered
sample observations, the test given is based on the ratio of the sample
sum of squares when the doubtful observations are omitted to the sample
sum of squares when the doubtful observations are included, the sample
sum of squares being defined as the sum of the squares of the deviations
of the observations from the corresponding mean. If simplicity in
calculation is the prime requirement then *the test based on the order
statistics for the case of single outliers may be used by actually omitting
one suspect observation in the sample at a time. However, this test is to
be applied with caution because the overall level of significance of the test
may change due to its repeated applications.
0.53 In the case of two or more outliers such that one is at least at each
end of the ordered sample observations, two tests have been given. One
test is based on the ratio of the range to the standard deviation whereas
the other test is based on the largest residuals, a residual being defined as
the deviation of an observation from the corresponding mean. It may be
mentioned that the former test is applicable when only two observations,
namely, the largest and the smallest observations, are suspect whereas the
latter test is much more general.
4
IS : 8900 - 1978
1. SCOPE
1.1 This standard lays down the criteria for the detection of outlying
observations in the following three situations:
a) Single outlier ( at either end of the ordered sample observations ),
b) Two or more outliers ( at either end of the ordered sample
observations ), and
c) Two or more outliers ( one at least at each of the two ends of the
ordered sample observations ).
1.2 In situations other than those enumerated in 1.1, it may be more
appropriate to conduct test for non-normality of observations which are
to be covered in a separate standard.
2. TERMINOLOGY
2.0 For the purpose of this standard, the following definitions shall apply.
2.1 Sample -Collection of items from a lot.
2.2 Sample Size (n) - Number of items in a sample.
2.3 Mean ( Arithmetic ) ( ?? ) - Sum of the values of the observations
divided by the number of observations.
2.4 Standard Deviation ( s) - The square root of the quotient obtained
by dividing sum of the squares of deviations of the observations from their
mean by one less than the number of observations in the sample.
2.5 Variance - Square of the standard deviation.
5
IS : 8900 - 1978
2.6 Range -The difference between the largest and the smallest obser-
vations in a sample.
2.7 Null Hypothesis - Hypothesis ( or assumption ) that all the obser-
vations in the sample come from the same parent population, distribution
or lot.
2.8 Level of Significance - The probability ( or risk ) of rejecting the
null hypothesis when it is true. Conventionally it is taken to be 5 or 1
percent.
2.9 Statistic - A function calculated from the observationsin the sample.
2.10 Critical Value - The value of the appropriate statistic which would
be exceeded by chance with some small probability equivalent to the level
of significance chosen.
2.11 Degrees of Freedom- The number of independent component
values which are necessary to determine a statistic.
3.0 There are many situations when, in a given sample of size n, one of
the observations, which is either the largest or the smallest, is suspect. To
detect such outliers, two tests have been given. The first one needs the
calculation of the average and the standard deviation of the sample
observations whereas the second one is based on order statistics and hence
is easier to operate.
3.1 Let x1, x2.. . . . .x, be the n sample observations arranged in the ascending
order of magnitude so that xl < x2 < ..,...... <x,,. If the smallest obser-
R - X1
vation x1 is suspect, the test criterion 7-r = - may be used, where the
S
/ ; (Xi---R)2
f= i=l
d n- 1
If the calculated value of Z-, is greater than or equal to the corres-
ponding tabulated ( critical ) value of II in Table 1 for sample size x
and the chosen level of significance, x1 shall be treated as an outlier and
rejected; otherwise no doubt would be cast on the validity of x1.
6
IS : 8900 - 1978
7
IS : 8900 - 1978
n
s* = 2(l~i-Z)~ = (n-l )s2
Leaving out the k suspect observations, the mean (A-k ) and the
sum of squares ( S*n_k ) of the remaining ( n-k ) observations are calcula-
ted as:
S2 n-k
The ratio Lk = sB which is the test criterion, is then computed.
The calculated value of Lk is compared with the corresponding critical
value given in Table 3 for the sample size n and the chosen level of
significance. If the calculated value of Lk is found to be less than the
corresponding tabulated value, then the k largest observations will be
considered as outliers, not otherwise.
It may be noted that for the criterion Lk and the Table 3, the smaller
values denote the significance and not the larger values as in the case of the
earlier criteria and Tables 1 and 2.
8
IS : 8900 - 1978
9
IS : 8900 I 1978
If the smallest and the largest observations, namely, 87.5 and 105.7,
are suspected to be outliers, then from all the 15 observations range ( R )
and standard deviation ( s ) are obtained as:
R = 105.7 - 87.5 = 18.2
15
I Z (xl--E)s
1
S= = 4.32
14
18.2
Hence R/s = 4.y2 = 4.21
5.2 Test for More Than Two Outliers (At Least One Outlier at
Each End ) - Let x1, x, . . . . . . xn be the sample observations arranged
in the ascending order of magnitude. If two or more ( say k ) of these
observations are suspect such that there is at least one suspect observation
at each end of the ordered observations in the sample, the procedure as
given below may be followed:
The mean ( z ) of the sample observations is first calculated. Then
the residuals, that is, the absolute deviations of the observations from
their mean are calculated as:
I *1- 3 I , Ixz--zq, Ix,-21 . . . . . . )%I-21
10
IS : 8988 - 1978
extreme values, as also the sum of squares ( U2n _ k ) are calculated as:
?Z- k
z G
i- 1
zn-k =
12- k
fl-
U’,_k = I: k ( r?l - ;,,_k )”
i = 1
u2 n-k
The ratio & = u2- , which is the test criterion, is then compared
with the critical value given in Table 5 for the sample size 12, the value of
k and the desired level of significance. If the calculated value of & is
found to be smaller than the corresponding tabulated value, then the
k original suspect values are considered to be outliers, not otherwise.
It may be noted that smaller values of & denote significance as in
the case of Lk in Table 3.
5.2.1 Exarr@e 5- In the data on shearing strength given in Example 4,
if the smallest two observations 87’5 and 88.7 and the largest observation
105.7 are suspect to be outliers,,then the mean (2) is first calculated as
95.2.
The residuals, that is, the absolute deviations of the sample observa-
tions from their mean are then found out as 7.7, 6.5, 2.3, 1.9, 1*6,0*7, 0.5,
0.2, 0, 0.2, 0.9, 2.0, 3.1, 4.8 and 10.5.
If these residuals (zr ) are arranged in the ascending order of
magnitude, we get 0, 0.2, 0.2, 0.5, 0.7, 0.9, 1.6, 1.9, 2.0, 2.3, 3.1, 4.8, 6.5,
7.7 and 10.5.
Now the 3 residuals corresponding to the 3 original suspected outliers
in the new sample are the last three values, namely, 6.5, 7.7 and 10.5
which would be suspect and tested for their significance.
From all the 15 residuals we get the mean ( i1s ) as
Z?rs= 2.86
15
u2,s = z (ZI - 215 )” = 138.84
i = 1
Leaving out the 3 largest residuals ( Z values ), we get from the
remaining 12 values:
Lr2 I= 1.52
12
u212= I: (Zi -&,)2 = 22.14
i= 1
BP= u212 ~
22.14 = 0.159
Hence Es
U%S 138.84
11
IS : 8900 - 1978
3 1.153 1.155
4 1.463 1.492
5 1.672 I.749
G 1.822 1,944
7 1.938 2.097
8 2032 2’221
9 2.110 2.323
10 2.176 2.410
11 2.234 2,485
12 2.285 2.550
13 2.331 2.607
14 2.371 2.659
15 2.409 2.705
16 2.443 2,747
17 2.475 2.785
18 2.504 2.821
19 2.532 2.854
20 2.557 2.884
21 2.580 2.912
22 2.603 2.939
23 2.624 2.963
24 2.644 2.987
25 2.663 3.009
30 2.745 3.103
35 2.811 3.178
40 2866 3240
45 2.914 3.292
50 2956 3.336
12
IS : 8900 - 1978
: Clause 3.2 j
13
IS I 8900 - 1978
-
9
- -I-
-i- 7-
T
5% 1% ~5% 1% 5% 10’
ro
50’
10 10’
/0 10’
/0 5% 10 ’
‘0 5% 54’.d
-- -- - - -_ - -_ _----_ - --
4 PO01 PO00
: 1.018 PO04
: CPO55 PO21 0.010 cI.002
7 CPI06 bO47 0.032 cPO10
8 PO76 0.064 I.028 0022 0.008
9
10
E
C
I.112 0.099
t.142 0.129
:I.048 0.045
C1.070 0.070
tiola
0032 (PO34 0.012
I.363
t I.387
l.267
J.294
0.250
0.276
:I L-147
I.172
I.194
@I50
0.174
PI97
0.094
0.113
0.132
PO98
: l.122
(PI40
0.056
0.072
0.090
I9.060
I0.079
IPO97
3.033
DO42
>057
I.050
PO66
PO27
PO37
(P410 I.311 0.300 l.219 0.151 pi59 pi08 I0.115 I.072 YO82 >049 D*O55 I3,030
I.427 l.338 0.322 :.b.237 0.171 : P181 0.126 I3.136 PO91 )*I00 PO64 0.072 I3044
i 1.447 l.358 0.331 ( k260 0.192 (l.200 @140 IPl54 3.104 )_076 0086 IDO53 3.062
I l.462
l.484
I.366
P387
0.354
0377
(.I.272
cP300
k201
0.231
I.209
: P238
I 0154
t 0.175
I3.168
I3.188
>I18
1.136
PllG
1.130 1.088
1.104
0.099
w115
ID-064
IPO78
I.074
PO88
I(I.056
PO46
PO58 JO66
Pl50
I(1.550
l.599
I.642
P488
l.526
0.450
0506
P377
:b434
0.374
0.434
0.308
0.369
I.312
i J.376
’ 0246
’ 0.312
IP262
I3,327
3204
1.36a
I.222
I.283
l.168
I.229
0.184
0245
I0144
IP196
3’154
I.212
(PI12
(1166
0.126
0183
I.574 0.554 ( k484 @482 0.418 (I.424 0.364 I3376 3.321 l.334 P282 0.297 I3*250 J’264 Y220 0235
40 (X.72 1.608 0.588 j.522 0,523 0460 (J.468 0408 I0.421 Y364 I.378 l.324 0.342 I0.292 I.310 ; ;I.262 0.280
Il.696
I.722
l.636
I.668
0618
0.646
I l.558
cI.592
0556
0588
0.498
0,531
P502
: I.535
0444
0483
I3.456
I3.490
1.399 I.417 I.361 0.382
0.414
I‘3.328
12 368
I.350
3’383
(I.296
11.336
0.320
0.356
:; 1438 P450 I*400
1
-_
14
1s t 8900 - 1978
3 2-00 2.00
+ 2.43 2.45
5 2.75 2.80
li 3.01 +i+lU
7 3.22 3.34
8 3-40 3.54
9 3.55 3.72
10 3.68 3.88
11 3.80 4.01
12 3.91 413
13 4-00 4’24
14 4.09 4.34 ,
15 4.17 443
16 424 451
17 4,.31 4.59
18 4.38 4.66
19 4.43 473
20 4.49 *79
30 4.89 5.25
40 5.15 5.54
50 535 577
15
rt
,
TABLE 5 CRITICAL VALUES OF Ek FOR THE DETECTION OF k ( SOME SMALL AND OTHERS LARGE) OUTLIERS
( Clauses 5.2 and 5.2.1 )
-F
-
k 2 -I
-.
3 I
---.-f-----
4 I
.
5
- --
6 T 7 T
_-
8
-
9
--
1 10
5% I 1%, 5% 1% 5% 1% 5% 1% 5% 1% 5% 1% 5% 1% 5% 1%
- - - -
O-001 0.000 /
0.010 0.002
0034 O-012 1.004 @OOl
O-065 0.028
0.099 0050
I.016
(I.034
I 0.006
0.014 0.010 0.004
0.137 0.078 (I.057 0.026 0021 0009
0.172 0.101 (I.083 0.037 0037 O-013 0.014
16 0.340 0.263 J.227 0166 PI53 O-107 0.102 PO67 I.041 l-024
17
18
0362
O-382
0.290
0.306
f(j.248
l.267
0.188
0.206
0.170
3.187
O-122
0141
0.116
0.132
II.068
j.079 PO78
PO91
0.040
0052
0062
I.050
I.062
PO32
I.041
YO24
l.032
l.014
1018
PO26 PO14
19
20
0.398
0.416
0.323
0.339
I.287
: j.302
0.219
0.236
P203
3,221
0.156
0.170
0146
0.163
:(PO94
I.108
l.121
b105
b119
0074
3.086
PO74
I.085
I.050
1058
PO41
1.050
I.059
I.026
I.032
PO40
PO33
l-041
3020
3.026 PO28
25 0493 0.418 b381 0.320 3.298 0.245 0236 1’188 P186 3146 I.146 I.110 l.114 I.087 I.089 3066 PO68 0.050
30 0549 0482 : j.443 0.386 3.364 0308 0.298 : I.250 j.246 0,204 I.203 I.166 P166 l-132 I.137 I.108 l.112 O-087
35 0.596 0.533 P495 0.435 3.417 0364 o-351 I.299 i.298 3.252 1.254 1.211 I.214 P132 P181 Y149 l.154 0.124
40 0.629 0.574 : j.534 0.480 3.458 0.408 0.395 : I.347 I.343 3298 1297 I.258 I.259 PI77 1,223 Y190 I.195 0.164
45 0.658 0.607 P567 0.518 3492 0.446 0.433 j.386 I.381 3.336 p337 1.294 I.299 l.220 I.263 1.228 Y233 O-200
50 0.684 0636 : ). 599 0.550 a.529 0482 0.468 : I.424 v417 3.376 I.373 1.334 l.334 b.257 I.299 .264 >2G8 0.235
16