MIssing Values

Filling Holes in Your Data:
Multiple Imputation
in Education Research
Paul T. von Hippel
Harvard Graduate School of Education
Larsen G-06
Wednesday, April 22, 1-230 pm
I. Background
II. New Results
I. Background
Education Data
Are Full of Holes
Listwise Deletion
aka Case deletion, Complete-case analysis
Impute the Mean
Regression Imputation
Impute the conditional mean
Random Regression Imputation
Conditional mean + random variation
Add Extra Regressors
Graduation rate. Sector (public vs private).
Multiple Imputation
Rubin 1987
Steps:
1.
2.
3.
4.
Replication
Imputation
Analysis
Recombination
1&2. Replication & Imputation
1&2. Replication & Imputation
The imputed variable is not the original variable.

They just have similar statistical properties.
3&4. Analysis & Recombination
MI Point Estimate
MI Standard Errror
How many imputations?
Large Large M
Often enough (Rubin 1987; von Hippel 2005)
M = 3 to 10
Surely enough (Bodner 2008)

M = 100 = 31
Models used for imputation
SASs MI procedure
Multivariate normal
Normal
Linear
Statas ice command (Royston 2006)
Alternating regression
Logistic, Poisson, normal
Other models (not widely implemented)
Rs Mixed and Pan (Schafer 1997)

Resampling
Non-normal models (He & Raghunathan 2008)
II. New results:

A. Non-normality
1. Discrete variables Horton et al 2003; Allison 2005
2. Skew von Hippel 2008
B. Missing Y von Hippel 2007
C. Nonlinearity
1. Interactions von Hippel 2009; Allison 2002
2. Curves von Hippel 2009
Theme:
Your data can look wrong,
so long as your estimates are right.
IIA. Non-normality
IIA1. Rounding discrete variables

Common advice:
Impute dummy as normal
Round normal imputations to 0 and 1
Horton, Lipsitz & Parzen 2003; Allison 2005; Bollen & Barb 1981
IIA2. Rounding skewed variables

Skewed variable.
Impute as though normal.
Truncate implausible values.

(von Hippel 2008)
IIA2. Transforming skewed variables
(von Hippel 2008)
IIA. Non-normality: Summary

Best
impute using a model that fits
Often OK
impute as though normal
Bad
Try to make data normal
Editing imputations
Principle
Imputed data
original data
Imputed estimates = original estimates
IIB. Missing Y
IIB. Missing Y
Missing Ys are useless for regression
But cases with missing Ys have information about X
Little 1992
von Hippel 2007
IIB. Missing Y:
Multiple Imputation, then Deletion (MID)
Steps:
von Hippel 2007
1. Replication
2. Imputation
2 . Deletion [of cases with imputed Y]
3. Analysis
4. Recombination
IIC. Non-linearity
IIC1. Nonlinearity: Interactions

(Allison 2002, von Hippel 2009)
Complete data. Y regressed on X, D and DX
Impute, then Interact?
Impute (X,Y,D) as though linear (no interaction).

then regress Y on X, D, and DX
Stratify, then Impute
(Allison 2002, von Hippel 2009)
2 strata: public schools and private schools.

Impute (X,Y) as linear within each stratum.
Interact, then Impute!

Impute the interaction, like any other variable.
Then regress on the imputed interaction
(Allison 2002, von Hippel 2009).
IIC2. Nonlinearity: Curves

(von Hippel 2009)
Complete data. Y regressed on X and X 2
Impute, then Square?
Impute (X,Y) as though linear (with other variables).

then regress Y on X and X2?
Square, Then Impute (von Hippel 2009)

Impute the square like any other variable.
Then use the imputed square in regression
IIC. Nonlinearity: Summary
Transform, then impute. (von Hippel 2009)

1. Calculate transformation (square, interaction, etc).
2. Impute like any other variable.
3. Use imputed transformation in analysis.
Principle
Imputed data
real data
Imputed statistics = real statistics
Conclusions
Plausible estimates more important than
plausible data
Normal imputations are versatile, but messy
Future research and software
Resampling (approximate Bayesian bootstrap)
Alternatives to imputation
full-information maximum likelihood estimation
Data quality
References
Allison, P (2001). Missing Data. Thousand Oaks CA: Sage.

Allison, P (2005). Imputation of Categorical Variables with PROC MI, SAS Users Group International
conference, Philadelphia, PA, April 10-13.
Barnard, J. and Rubin, D B (1999), Small-Sample Degrees of Freedom With Multiple Imputation,
Journal of the American Statistical Association 86(4):948-55.
He, Yulei, and Raghunathan, Trivellore E. (2006), Tukeys gh Distribution for Multiple Imputation.
The American Statistician 60 (3): 251-256.
Horton, NJ, Lipsitz, SP, & Parzen, M. (2003).
A potential for bias when rounding in multiple imputation. The American Statistician 57(4), 229-232.
Horton, NJ & Kleinman, KP. (2007). Much Ado About Nothing: A Comparison of Missing Data
Methods and Software to Fit Incomplete Data Regression Models.
Little, RJA (1992), Regression with Missing Xs: A Review, Journal of the American Statistical
Association 87(420), 1227-1237.
Little, RJA & Rubin, DB (2002), Statistical Analysis with Missing Data.
Kim, JK (2004), Finite Sample Properties of Multiple Imputation Estimators, The Annals of Statistics
32(2), 766-783.
Meng, X. L. (1995), Multiple Imputation Inferences With Uncongenial Sources of Input, Statistical
Science 10:538-73.
Rubin, DB (1987), Multiple Imputation for Survey Nonresponse. New York: Wiley.
Schafer, JL (1996). Analysis of Incomplete Multivariate Data. New York: Chapman & Hall.
von Hippel, PT (2004). " Biases in SPSS 12.0 Missing Value Analysis. " The American Statistician
58(2), 160-164.
von Hippel, PT (2005). "
How Many Imputations Are Needed? A Comment on Hershberger and Fisher (2003). " Structural
Equation Modeling, 12(2), 334-335.
von Hippel, PT (2007), Regression with Missing Ys. Sociological Methodology.
von Hippel, PT (2008), Imputing skewed variables. Under review.
von Hippel, PT (2009), How to impute squares and interactions. Sociological Methodology, in press.
Assumption:
Ignorable missingness
missing at random (MAR), noninformative

The missing values are like the observed values in similar
cases.
Full information maximum likelihood (ML)
Suppose Y has a missing value

Estimate the
distribution of possible Y values
In MI
impute 3-10 values
from this distribution
In ML,
density
0.01
0.008
0.006
0.004
integrate
0.002
across the full distribution
Weight
75
100
125
150
175
200
225
of possible Y values
Like MI
with an infinite number of imputations
ML in AMOS
to run
to view results

MIssing Values

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

MIssing Values

Hochgeladen von

Copyright:

Verfügbare Formate

Filling Holes in Your Data:

Are Full of Holes

aka Case deletion, Complete-case analysis

Impute the Mean

Impute the conditional mean

Random Regression Imputation

Conditional mean + random variation

Add Extra Regressors

Graduation rate. Sector (public vs private).

1&2. Replication & Imputation

1&2. Replication & Imputation

The imputed variable is not the original variable.

3&4. Analysis & Recombination

How many imputations?

Surely enough (Bodner 2008)

Models used for imputation

Statas ice command (Royston 2006)

Logistic, Poisson, normal

Other models (not widely implemented)

Rs Mixed and Pan (Schafer 1997)

II. New results:

IIA1. Rounding discrete variables

IIA2. Rounding skewed variables

Impute as though normal.

Truncate implausible values.

IIA2. Transforming skewed variables

(von Hippel 2008)

IIA. Non-normality: Summary

von Hippel 2007

von Hippel 2007

IIC1. Nonlinearity: Interactions

Complete data. Y regressed on X, D and DX

Impute, then Interact?

Impute (X,Y,D) as though linear (no interaction).

Stratify, then Impute

(Allison 2002, von Hippel 2009)

2 strata: public schools and private schools.

Interact, then Impute!

IIC2. Nonlinearity: Curves

Complete data. Y regressed on X and X 2

Impute, then Square?

Impute (X,Y) as though linear (with other variables).

Square, Then Impute (von Hippel 2009)

IIC. Nonlinearity: Summary

Transform, then impute. (von Hippel 2009)

Allison, P (2001). Missing Data. Thousand Oaks CA: Sage.

missing at random (MAR), noninformative

Full information maximum likelihood (ML)

Suppose Y has a missing value

Das könnte Ihnen auch gefallen