Sie sind auf Seite 1von 8

MISSING DATA

Learning Objectives

At the end of this lesson, the participants should be able to:


• identify the major problems when missing data occurs; and
• determine the remedial measure appropriate for handling missing data

One of the most common problems in the conduct of an experiment involves missing
observations where it is not possible to obtain valid measurements on some of the
experimental units. Missing observations arise in an experiment (not as a result of
treatment) when plants are destroyed, animals die, errors are made in recording/
transcription, or recorded data are lost.

Two major difficulties arise when missing data occurs, namely, loss of information and
non-applicability of the standard analysis of variance.

It is important to stress that the definition of missing values and the suggested remedies
specifically exclude the case where recordings are not made because of a treatment effect.

Examples of treatment-related missing observations which should be considered as


zero:

1. Due to virus/bacteria/disease infection when the experiment is to assess resistance of


different varieties to virus/bacteria/disease.

2. Due to weed competition and/or herbicide phytotoxicity when the experiment is to


evaluate different weed control treatments.

3. Destruction of experimental plants due to soil deficiency when the experiment is to


evaluate varietal resistance to such deficiency.

4. Due to insect/rat damage when the experiment is to evaluate varietal resistance to


insect/rat.

5. Due to toxicity of high rates of a fertilizer treatment.

Missing Data 235


Examples of non-treatment-related missing observations:

1. Lost samples.

2. Rat/insect damage when the experiment is not to evaluate varietal resistance to


insect/rat.

3. Improper treatment.

4. Due to weed competition and/or herbicide phytotoxicity when the experiment is not to
evaluate different weed control treatments.

5. Destruction of experimental unit such as poor germination, physical damage during


crop culture, grazed by stray cattle and vandalized by thieves.

6. Character whose measurement depends on the existence of some “yield” such as 100-
grain weight and panicle length, % infection (in case of detached leaf), number of
days to insect adulthood (insect died) root/shoot ratio (seedling did not germinate or
seedling died).

7. Outlying data are usually recognized after the data has been transcribed and recorded.
Data may be considered outlying if their values are too extreme to be considered
within the possible range of the normal behavior of the experimental materials.
Common errors resulting in outlying data are misread observations, incorrect
transcription, and improper application of the sampling techniques or misuse of
measuring instruments.

Remedial Measures

Missing data can be estimated if they occur in less than 40% of the total number of
replications of a particular treatment. Otherwise, missing data are non-estimable and the
treatments involved should be deleted. In handling missing data, determine first whether
each is a legitimate case of missing data, i.e., if it is not treatment-related.

1. Analysis with missing data

a. For estimable missing data (< 40% of the number of replications of a given
treatment are missing):

i) Compute the appropriate estimate of each missing data value and run ANOVA
on the augmented data set (with proper adjustment of degrees of freedom and
error mean square).

236 Remedial Measures


ii) Compute the treatment means based on both observed data and the estimate(s)
of missing data. In making the test of significance, adjustment of standard
errors of mean differences for unequal replications must be made.

b. For non-estimable missing data (> 40% of the number of replications of a given
treatment are missing).

i) Single factor experiment (RCB). Delete affected treatments and perform


analysis of variance the usual way.

ii) Multi-factor experiment. The level in which the missing observation occurred
should be deleted from one of the factors. The choice of which factor to
delete will depend on the major interest of the researcher.

2. Replications with missing data

a. < 40% of the number of treatments: Handle each case accordingly as discussed
above.

b. > 40% of the number of treatments: Delete the affected replication from the
analysis.

Missing Data 237


Estimation of Missing Data

A summary of estimating missing data of some experimental designs is as follows:

Design Estimate of missing value


aT + bR − G
RCB Design X=
(a − 1)(b − 1)
where
X = estimate of missing data
a = no. of treatments
b = number of replications
T = total of treatment with missing value
R = total of replication with missing value
G = grand total of all observed values
t ( R + C + T ) − 2G
Latin Square X=
( t − 1)( t − 2)

where
t = no. of treatments
R = total of row with missing data
C = total of column with missing data
T = total of treatment with missing data
G = grand total of all observed values
rM + bT − P
Split-Plot X=
( b − 1)(r − 1)
where
b = level of subplot factor
r = no. of replications
M = total of the specific main plot with missing data
T = total of the treatment combination with the missing data
tM + cT − P
Split-Split Plot X=
(c − 1)(r − 1)
where
c = level of sub-subplot factor
r = no. of replications
M = total of the specific subplot with missing data
T = total of treatment with missing data
P = total of all subplots with the same treatments as that of the missing data
a (bT − P) + r (aH + bV − B) − bL + S
Strip Plot X=
(a − 1)(b − 1)(r − 1)
where
a = level of horizontal factor
b = level of vertical factor
r = no. of replications data
T = total of treatment with missing data
P = total of the specific level of horizontal factor with missing data
H = total of the specific level of horizontal strip with missing data
V = total of the specific level of vertical strip with missing data

238 Estimation of Missing Data


Example

An estimation of missing data in a RCB Design, with the fourth treatment in


replication II is illustrated in the following example:

Table 1. Data from a RCB Design, with one missing observation.

Treatment, Grain Yield, kg/ha) Treatment


kg seed/ha Rep I Rep II Rep III Rep IV Total
25 5,113 5,398 5,307 4,678 20,496
50 5,346 5,952 4,719 4,264 20,281
75 5,272 5,713 5,483 4,749 21,217
100 5,164 ? 4,986 4,410 (14,560=T)
125 4,804 4,848 4,432 4,748 18,832
150 5,254 4,542 4,919 4,098 18,813
Rep total 30,953 (26,453=B) 29,846 26,947
Grand total (114,199=G)

The missing data in a randomized complete block design is estimated by


aT + bR − G
X=
(a − 1)(b − 1)

6(14,560) + (4(26,453) − 114,119


=
(4 − 1)(6 − 1)

= 5,265 kg / ha

The missing value is replaced by X = 5,265 kg/ha and all sums of squares in the
analysis of variance are then computed as usual. However, the total and error d.f. are
reduced by 1, the number of missing values.

Missing Data 239


Table 1. Data with the missing observation replaced by the estimated value.

Treatment, Grain Yield, kg/ha) Treatment


kg seed/ha Rep I Rep II Rep III Rep IV Total
25 5,113 5,398 5,307 4,678 20,496
50 5,346 5,952 4,719 4,264 20,281
75 5,272 5,713 5,483 4,749 21,217
100 5,164 5,265 4,986 4,410 19,825
125 4,804 4,848 4,432 4,748 18,832
150 5,254 4,542 4,919 4,098 18,813
Rep total 30,953 31,718 29,846 26,947
Grand total 119,464

The correction factor for bias is subtracted from the treatment sum of squares and the
total sums of squares, i.e.,

[R − ( t − 1)X]2
Bias =
t ( t − 1)

[26,453 − (6 − 1)(5,265)] 2
=
6(6 − 1)
= 546

The resulting ANOVA is as follows:

Table 2. ANOVA (RCB Design) of data with one missing value


estimated.

Source of DF SS MS Computed
Variation F
Replication 3 2,188,739 729,580
Treatment 5 1,139,955 227,991 2.07ns
Error 14 1,540,726 110,052
Total 22 4,869,420

240 Estimation of Missing Data


References

Gomez, K.A. and Gomez, A.A. (1984). Statistical Procedures for Agricultural Research.
2nd. ed. John Wiley and Sons, Inc. New York.

Ostle, B. and Mensing, R.W. (1975). Statistical Research. 3rd ed. The Iowa University
Press. Ames, Iowa.

Snedecor, G.W. and Cochran, W.G. (1967). Statistical Methods. 6th ed. The Iowa State
University Press. Ames, Iowa.

Missing Data 241


242