Sie sind auf Seite 1von 37

Statistical Models for Enhancing Cross-Population Comparability

A. Tandon, C.J.L. Murray, J.A. Salomon, and G. King Global Programme on Evidence for Health Policy Discussion Paper No. 42 World Health Organization, Geneva, Switzerland January 23, 2002

Introduction

Measuring the health state of individuals is important for the evaluation of health interventions, monitoring individual health progress, and as a critical step in measuring the health of populations. Self-report responses in household survey data are widely used for assessing the non-fatal health status of populations. These data typically take the form of ordered categorical (ordinal) responses. Over the past three decades, there has been great progress in developing instruments to measure the multiple domains of health that are reliable and demonstrate within population validity [31],[22]. One key analytical issue is that these self-report ordinal responses are not comparable across populations primarily because of response category cut-point shifts. Conceptualizing the observed responses as resulting from a mapping between an underlying unobserved latent variable (e.g., ability on the domain of mobility) and categorical response categories, cut-points are threshold levels on the latent variable that characterize the transition from one observed categorical response to the next. If cut-points dier systematically across populations, or even across sociodemographic groups within a population, then the observed ordinal responses are not cross-population comparable since they will not imply the same level on the underlying latent variable that we are trying to measure (Figure 1). Another way of characterizing this problem is that, for the same level of the latent variable on any given domain, the probability of an individual responding in any given response category is dierent across populations. This issue of cross-population comparability is not limited to health surveys: it is of equal relevance to self-report surveys on responsiveness of health systems, as well as to numerous other questions that rely on ordinal responses. One example of self-report health data comes from the WHO Multi-Country Household Survey Study on Health and Responsiveness [28]. The main self-report question on the domain of mobility is: Overall in the past 30 days, how much diculty did you have with moving around? Respondents are asked to classify themselves using one of ve response categories: 1=Extreme/Cannot do; 2=Severe diculty; 3=Moderate diculty; 4=Mild

A
N N

B
N Mi Mo Mi S

Mi Mo

Mo S

Cut-points
E E

La t e nt mobilit y s c a le

N = None, Mi = Mild, Mo = Moderate, S = Sev ere, E = Extreme

Figure 1: Mapping from unobserved latent variable to observed response categories diculty; 5=No diculty. We can hypothesize that cut-points may vary between populations because of dierent cultural or other expectations for domains of health. Cut-points are also likely to vary within a cultural or sociodemographic group. The cut-points for older individuals may shift as their expectations for a domain diminish with age. Men may be more likely to deny declines in health so that their cut-points may be systematically shifted as compared to women. Contact with health services may inuence expectations for a domain and thus shift cut-points [20]. Empirical examples suggesting cross-population cut-point shifts in health surveys abound [23]. For instance, in Australian national health surveys comparing the self-reported health status of Aboriginals with that of the general population, only around 12% of the Aboriginal population characterized their own health status as fair or poor, while more than 20% of the general population rated their health in these low categories. By any other major indicator of mortality and morbidity, the Aboriginal population fares much worse than the general population, which suggests that there may be important dierences in the interpretation of categorical responses in the dierent sub-populations due to shifts in response category cut-points. Residents of the state of Kerala in India which has the lowest rates of infant and child mortality and the highest rates of literacy in India consistently report highest incidences of morbidity in the country [19]. The object of this document is to elaborate on several statistical models used in the analysis of survey data. First, we focus on o-the-shelf models that are widely available as part of any standard statistical software. In particular, we demonstrate the problems of inference that arise from these standard methods when the underlying data are not crosspopulation comparable. In later sections, we introduce methods that modify these standard routines to enhance the cross-population comparability of survey analyses.

Models for Analyzing Ordinal Survey Responses

We begin by describing the application of existing statistical models for the analysis of ordinal survey data. These models serve as the building blocks for the methodological innovations introduced in subsequent sections. In particular, the focus is on two o-theshelf methods: (a) the ordered probit model (widely used by econometricians and other social scientists), and (b) the partial credit model (from psychometrics). Both these models are used in the analysis of ordered categorical response data. The partial credit model is a multiple-category generalization of the Rasch model and is part of a large body of literature often referred to as item response theory (IRT) which has its roots in educational testing using standardized exams. One needs to be careful, though, in using these standard models in the analysis of data that may not be cross-population comparable. In other words, if there are good reasons to believe that respondents saying they are in good health in Ethiopia and in Denmark mean very dierent things in terms of an underlying latent variable measure, then the use of these methods without correction may lead to very misleading conclusions regarding the actual levels of health in these two populations. In order to better demonstrate this point, and to subsequently introduce some methodological innovations dealing with cross-population comparability, a simulated dataset is utilized. The simulated dataset consists of 1,000 respondents each from two hypothetical populations (countries A and B) for which the level of health on a domain, say mobility, is to be estimated based on self-report categorical responses to three questions (one core question, and two auxiliary questions). These questions are:1
Main Question: Overall in the past 30 days, how much diculty did you have moving around? Auxiliary Question 1: Overall in the past 30 days, how much diculty did you have standing for long periods such as 30 minutes? Auxiliary Question 2: Overall in the past 30 days, how much diculty did you have climbing several ights of stairs or walking up a steep hill?

Each of the questions asks the respondents to pick one of ve responses:


1 = Extreme/Cannot Do 2 = Severe 3 = Moderate 4 = Mild 5 = None

Since this is simulated data, the true mobility levels are known for each respondent. This enables a comparison of the estimated mobility levels versus truth for the dierent models.
1

The questions mirror those in the WHO Multi-Country Study.

The simulated data is generated based on the assumption that true mobility is a function of age, sex, education, and country of residence for each respondent. An individual-level random eect term is also added to represent other individual-specic unobserved factors that might aect mobility. Table 1 reports the mean age, education level, and sex distribution in the simulated sample. Table 1: Descriptive statistics (simulated data) Country Mean Age Mean Education Female A 38.72 4.72 500 B 38.63 7.33 492

N 1,000 1,000

In addition, the simulation allows cut-points for each question to dier by sociodemographic group. The response category cut-points are generated as functions of age, sex, education, and country of residence. Figure 2 plots the distribution of the simulated observed categorical responses for the three questions for countries A and B.2 At rst glance, the distribution of self-report responses in the two countries does not look very dierent.
Country A 1 1 Country A 1 Country A

.8

.8

.8

Fraction

Fraction

.4

.4

Fraction

.6

.6

.6

.4

.2

.2

.2

0 1 Country B 1 2 3 Main question 4 5

0 1 Country B 1 2 3 Auxiliary question 1 4 5

0 1 Country B 1 2 3 Auxiliary question 2 4 5

.8

.8

.8

Fract ion

Fract ion

.4

.4

Fract ion

.6

.6

.6

.4

.2

.2

.2

0 1 2 3 Main question 4 5

0 1 2 3 Auxiliary question 1 4 5

0 1 2 3 Auxiliary question 2 4 5

Figure 2: Distribution of responses for three self-report questions in countries A and B In the next two sub-sections, these data are analyzed using both the ordered probit model and the Rasch-based partial credit model. It is assumed that the data analyst has access to the self-report categorical responses as well as standard demographic variables such as age, sex, education, and country of residence for each of the respondents. The goal is to estimate mobility levels in the two simulated populations using these data. In later
2 In generating the categorical responses, a stochastic error term with a variances ranging from 15 to 25 units was used (assumed dierent across questions, with auxiliary question 2 being the noisiest question).

sections, we introduce models that allow response category-cut-points also to be functions of covariates. In such models, the direction of shift for the response category cut-points is also of substantive interest (e.g., to test the hypotheses that more educated respondents have higher cut-points indicative of higher norms, or that older individuals respond based on norms for their age category, and so on). Of course, such models can also be used for testing hypotheses relating to causal inferences and other tests of statistical signicance.

2.1

The Ordered Probit Model

The ordered probit model assumes there is an unobserved latent variable Yi (mobility) distributed with mean i and variance 1, where i refers to the respondent.3 The mean level of the latent variable is a function of individual-level sociodemographic characteristics such as age, sex, education, and country of residence, Yi N (i , 1), i = Zi0 . i = 1, ..., N

Let yi be the observed categorical response of individual i to the main self-report question. The ordered probit model stipulates an observation mechanism such that: yi = k if k1 Yi < k ; for 0 = , 5 = , i & k = 1, ..., 5.

Also, it follows from the set-up of the model that 1 < 2 < 3 < 4 . Given this structure, the probabilities of responding in any given category k = 1, ..., 5, conditional on a vector of covariates Zi , can be derived as:
0 F ( 1 Zi ), F ( 2 Z 0 ) F ( 1 Z 0 ), i i

Pr(yi = k) =

k k F ( 3 Zi0 ) F ( 2 Zi0 ), k F ( 4 Zi0 ) F ( 3 Zi0 ), k k 1 F ( 4 Zi0 ),

=1 =2 =3 =4 = 5,

(1)

where F () is the standard normal cumulative distribution function. If the observations are assumed independent across individuals, then the likelihood function is simply the product of the probabilities of observing each value of yi in the dataset. Estimates of the vector as well as the cut-points k may then be obtained using maximum likelihood methods. It is important to note that the standard ordered probit model assumes the same set of cut-points for the entire sample. Table 2 reports the results from a run of the ordered probit model for our simulated data for the main question in both countries. Figure 3 plots the cut-points estimated from the ordered probit model versus true cutpoints for the main question. Because the true cutpoints may vary across individuals but
3 Since the latent variable is unobserved, the variance of the latent variable conditional on determinants is arbitrarily set to 1 in the ordered probit model. In addition, in order to identify the model, the constant term is set to 0. These conventions produce a scale that is unique up to any positive ane transformation, i.e., the latent scale has so-called interval properties.

Table 2: Estimation results : ordered probit Variable Coecient (Std. Err.) Age 30-44 -0.079 (0.065) Age 45-59 -0.166 (0.077) Age 60+ -0.498 (0.088) Male -0.062 (0.053) 1 < Educ 6 0.124 (0.091) 6 < Educ 11 0.245 (0.096) Educ > 11 0.344 (0.113) Country B -0.232 (0.056) 1 -1.612 (0.102) -1.335 (0.1) 2 -1.01 (0.098) 3 -0.365 (0.096) 4
First cut-point Third cut-point 50 Second cut-point Fourth cut-point

True cut-points

-50

-100 -1.5 -1 Predicted cut-points -.5 0

Figure 3: Predicted versus true cut-points: ordered probit for main question the model assumes that they are xed, each predicted cutpoint is associated with a range of dierent true values. Figure 4 is a plot of true mobility versus estimated average mobility using the standard ordered probit model. As reported in the graph, the R-squared value is only about 0.011. Not only does the ordered probit model predict the mean mobility poorly, it also predicts that the average mobility is lower in country B (see coecient on country B in Table 2) even though the true level of mobility is higher in country B in the simulated data. The basic point of this simulation experiment is simple: if there are signicant cutpoint shifts in the underlying data-generating mechanism then using standard procedures such as the ordered probit model to analyze the data can be very misleading. Since the ordered probit model is a probability model, we can also obtain the predicted probabilities of responding in each of the ve categories for the main question, given any particular level on the underlying latent variable scale (Figure 5). We have used only the main question for analyzing the data using the ordered probit model. One way to analyze 6

R-squared = .011 RMSE = 21.264 50

0 True mobility

-50

-100 -1 -.5 Predicted mobility 0 .5

Figure 4: Predicted versus true mobility: ordered probit for main question
Predicted probabilities: ordered probit 1

.8

Pr(k = 1)

Pr(k = 5)

.6

.4

.2

Pr(k = 4) Pr(k = 3)
-2.5 -2 -1.5

Pr(k = 2)
-1 -.5 0 Latent mobility scale .5 1 1.5

Figure 5: Predicted probabilities: ordered probit for main question multiple questions using this model would be to pool the data and allow for a dummy variable per question (since the cut-points will be assumed to be the same for all questions). However, doing this will yield a dierent mean value of the latent variable per question for each individual. Running the model in this way is potentially confusing, since we assume that an individual has a single value on the latent variable of interest that informs answers to all three questions, but this procedure would allow estimates of this latent variable to dier by question.

2.2

The Partial Credit Model

A second model that is often used in the analysis of ordinal data is the partial credit model from item response theory. This is basically a polytomous extension of the binary-response 7

Rasch model [16],[17],[18].4 Suppose there are N respondents, each answering J questions on a given domain. Individual i = 1, ..., N chooses response category k = 1, ..., 5 for question j = 1, ..., J. The partial credit model conceptualizes the ordinal nature of the categorical data as a series of dichotomies or steps.5 These dichotomies are modeled such that the probability that a respondent chooses response category k, given the choice between response category k or k 1, is: k = ij
k exp(i j ) Pr(yij = k) = k Pr(yij = k 1) + Pr(yij = k) 1 + exp(i j )

Here, Pr(yij = k) is the probability that individual i responds in category k for question j, and ijk is the corresponding probability of responding in category k conditional on k responding either in category k 1 or k. i is the ability of individual i, and j is the diculty associated with the k-th step in question j. In other words, the probability of responding in category k, conditional on responding either in category k 1 or k, is modeled as a positive function of a persons ability and a negative function of the diculty for the question category. Making use of the condition that the probabilities of responding in a category must sum to 1 across all ve categories for each individual i and question j, i.e., Pr(yij = 1) + Pr(yij = 2) + Pr(yij = 3) + Pr(yij = 4) + Pr(yij = 5) = 1, a general expression for the probability of responding in the k -th category (where k = 1, ..., 5) can be derived: Pr(yij = k) = P5
P

s=1 exp[(s

exp[(k 1)i

0 where, for notational convenience, 0 j 0. For the case of ve categories, the probam=0 bilities of responding in each category can be written as:

m m=0 j ] Ps1 m , 1)i m=0 j ]

Pk1

Pr(yij = k) =

1/A, exp(i 1 )/A, j

k k 1 2 exp(2i j j )/A, k 1 2 3 )/A, exp(3i j k j j 1 2 3 4 )/A, k exp(4i j j j j

=1 =2 =3 =4 = 5,

(2)

where A is the expression

1 1 2 1 2 3 A 1 + exp(i j ) + exp(2i j j ) + exp(3i j j j ) 1 2 3 4 +exp(4i j j j j )

For a xed number of questions, the unconditional estimation of the likelihood function yields diculty parameters that are inconsistent [16],[3]. Consistent estimates of the difculty parameters can be obtained by conditioning on the raw score (i.e., on the sum of
The Rasch model is a xed-eect logit model and can also be reformulated as a quasi-symmetry loglinear model [27],[8]. 5 In this sense, the partial credit model can be viewed as an adjacent category logit model.
4

responses across questions for each individual). So, for example, the conditional probability that a person responds in category 2 for all 3 questions is calculated as the joint probability divided by the probability of getting a raw score r of 6 across the questions: Pr(yi1 = 2)Pr(yi2 = 2)Pr(yi3 = 2) Pr(r = 6) The likelihood written in this manner is free of the ability parameter . Once the diculty parameters have been estimated using the conditional approach, estimates of r can be obtained using the unconditional likelihood derived from:
1/A, exp(r 1 )/A, j

Pr(yij = k) =

k k 1 2 )/A, exp(2r j k j 1 2 3 exp(3r j j j )/A, k 1 2 3 4 exp(4r j j j j )/A, k

=1 =2 =3 =4 = 5,

The notation changes to r because this method requires only one estimate of ability for every possible sum score of responses across all questions. In the partial credit model, the diculty parameters are points on the latent variable scale where the probabilities of responding in one category or the next are equal. Alternatively, the diculty parameters are points where the probability of responding in category k, conditional on responding in categories k 1 or k, is 0.5. The ability parameters can be thought of as estimates of the individuals underlying latent variable. The estimates of ability levels can be compared to true mobility for the simulated data to assess the performance of this model. This simple version of the partial credit model assumes that the diculty parameters do not vary by sociodemographic characteristics which in the language of psychometrics is akin to saying that it assumes there is no dierential item functioning. Table 3 reports the diculty parameters for the simulated data obtained by running the conditional likelihood procedure in STATA (for identication, 1 is set to zero for the main question).6 Figure 6 plots the estimated ability parameters versus the true mobility. As with the ordered probit model, Figure 7 reports the predicted probabilities from the model for given values of ability. The predicted probabilities are quite similar to those that are predicted by the ordered probit model (Figure 1). As the value of the latent variable increases, the probability of responding in the lowest category becomes small and the probability of responding in higher categories increases. The partial credit model does better than the ordered probit model in predicting the true level of mobility. The R-squared value is much higher than that of the ordered probit model. However, the comparison between the two models is not entirely fair since we only use one question for the ordered probit model and all three questions in the partial credit model. In the formulation introduced here, the partial credit model uses no extraneous information (i.e., covariates such as sex, age, and education) in the estimation of the abilities.
6 Estimates of the diculty and ability parameters using STATA were of the same magnitude as those obtained using IRT software such as WINMIRA and RUMM.

R-squared = .221 RMSE = 17.437 50

0 True mobility

-50

-100 -1 0 Predicted mobility 1 2

Figure 6: Predicted versus true mobility: two-stage partial credit

Predicted probabilities: partial credit 1

Pr(k = 1)
.8

Pr(k = 5)

.6

.4

Pr(k = 4)
.2

Pr(k = 2)

Pr(k = 3)

0 -2.5 -2 -1.5 -1 -.5 0 Latent mobility scale .5 1 1.5

Figure 7: Predicted probabilities: two-stage partial credit for main question

10

Table 3: Estimation results : two-stage partial credit Variable Coecient (Std. Err.) 1 Dummy Aux 1 0.207 (0.183) Dummy Aux 2 1.615 (0.178) 2 Dummy Aux 1 0.225 (0.186) Dummy Aux 2 0.723 (0.183) Main question -0.795 (0.267) 3 Dummy Aux 1 1.277 (0.154) Dummy Aux 2 1.797 (0.151) Main question -0.933 (0.187) 4 Dummy Aux 1 -1.267 (0.11) Dummy Aux 2 1.291 (0.131) Main question -0.544 (0.175) In the next subsection, we present an alternative specication of the model that includes covariates.

2.3

The Partial Credit Model with Covariates

The partial credit model can be reformulated so that instead of having a dummy variable per individual i , variables such as age, sex, education, and country of residence can be introduced. Such a modication to the partial credit model is especially useful in the analysis of health survey data given that sociodemographic variables are usually collected in such surveys. Equation (2) with covariates can be written as the probability that individual i responds in category k for each of the questions j, conditional on a vector of covariates Zi : k=1 1/A, exp(Z 0 1 )/A, k=2 i j 1 2 exp(2Zi0 j j )/A, k=3 Pr(yij = k) = (3) exp(3Z 0 1 2 3 )/A, k=4 i j j j 1 2 3 4 exp(4Zi0 j j j j )/A, k = 5, where A is the expression
1 1 2 1 2 3 A 1 + exp(Zi0 j ) + exp(2Zi0 j j ) + exp(3Zi0 j j j ) 1 2 3 4 +exp(4Zi0 j j j j )

Assuming independence across observations and questions, estimates can be computed using maximum likelihood. The mean predicted level of mobility versus truth is plotted in Figure 8 and the estimates are in Table 4. 11

Table 4: Estimation results : partial credit with covariates Variable Coecient (Std. Err.) Mean Age 30-44 -0.134 (0.024) Age 45-59 -0.203 (0.028) Age 60+ -0.336 (0.032) Male -0.077 (0.019) 1<Educ6 0.049 (0.033) 1<Educ6 0.109 (0.034) Educ>11 0.16 (0.041) Country B -0.075 (0.02) 1 Dummy Aux 1 0.274 (0.185) Dummy Aux 2 1.261 (0.163) Main question 0.272 (0.144) 2 Dummy Aux 1 0.092 (0.185) Dummy Aux 2 -0.076 (0.166) Main question -0.747 (0.14) 3 Dummy Aux 1 1.261 (0.151) Dummy Aux 2 1.247 (0.126) Main question -1.124 (0.1) 4 Dummy Aux 1 -1.319 (0.109) Dummy Aux 2 0.746 (0.099) Main question -1.202 (0.066) The mean level of the estimated latent variable that is plotted in Figure 8 does not account for the fact that the deterministic variation in the latent variable will be imperfectly captured by the limited set of included covariates. In the absence of a random eect, the model will overestimate the amount of stochastic variability in the data. The next subsection introduces a method for accounting for this by using Bayes theorem to estimate the predicted mobility.

2.4

Random Eects and Latent Variable Estimation using Bayes Theorem

If there is an individual-level random eect in the data i.e., when covariates in our model do not capture all the systematic variation in the latent variable then there remains information content in the set of responses across questions for each individual that has not been fully exploited. The partial credit model with covariates and a random eect i with

12

R-squared = .055 RMSE = 20.787 50

0 True mobility

-50

-100 -.6 -.4 -.2 Predicted mobility 0 .2

Figure 8: Predicted versus true mobility: partial credit with covariates


2 mean zero and variance can be written out as follows:

Pr(yij = k) =

1/A, exp[(Z 0 + i ) 1 ]/A, i j

k k 0 + ) 1 2 ]/A, exp[2(Zi k i j j 1 2 3 exp[3(Zi0 + i ) j j j ]/A, k 1 2 3 4 exp[4(Zi0 + i ) j j j j ]/A, k

=1 =2 =3 =4 = 5,

(4)

where A is the expression

1 1 2 1 2 3 A 1 + exp[(Zi0 + i ) j ] + exp[2(Zi0 + i ) j j ] + exp[3(Zi0 + i ) j j j ] 1 2 3 4 +exp[4(Zi0 + i ) j j j j ]

In order to exploit the information content in the set of responses we can make use of Bayes theorem to obtain estimates of the mean level of mobility conditional of the observed set of responses. That is, we can estimate Pr(i | yi ) using Bayes formula: Pr(i | yi ) = R Pr(yi | i ) Pr(i ) . Pr(yi | i ) Pr(i ) di (5)

where yi represents the vector of categorical responses on all questions for individual i. The way this can be implemented is as follows. First, we use the model with a random eect and estimate all the parameters including the variance of the random eect. This estimate of the variance can be used to simulate 100 dierent values of i around the predicted Zi0 of the latent variable for each individual in the sample. Hence, for each simulated value of i , Pr(i ) can be calculated. Pr(yi | i ) can be calculated using the probability specications given in equation (4). Integrating over all simulated values of i for each individual gives us the denominator of equation (5). In the absence of a model that estimates the variance of this individual-specic random eect, one can assume that the random eect captures about 50% of the variation 13

R-squared = .334 RMSE = 17.441 50

0 True mobility

-50

-100 -2 -1 0 Predicted mobility 1 2

Figure 9: Predicted versus true mobility: partial credit with covariates (Bayesian) in estimated variance of the error term. Under this assumption, the Bayesian predication of mobility conditional on the observed pattern of responses is plotted in Figure 9 for the partial credit model with covariates.7 It is quite remarkable that the Bayesian correction signicantly improves the estimation of mobility (Figure 9) when compared with the estimation of abilities using the two-step conditional procedure for the partial credit model (Figure 6), as judged by the R-squared values. In other words, if the goal of the analyst is to estimate the underlying latent variable, then a modication of the partial credit model that allows for covariates and a random eect outperforms the simple version of the partial credit model.

2.5

Ordered Probit versus Partial Credit

We have introduced two basic types of models that are widely used in the analysis of categorical data, namely the ordered probit model and the partial credit model (with ability dummies and with covariates). Fundamentally, both models assume some sort of latent variable that gives rise to an observation mechanism governed by probabilities given in equations (1) and (2). Viewed this way, the two models are quite similar diering only with respect to the functional form for the data generating mechanism and their dierences in approach to modeling the probabilities: these being derived from dierences in the cumulative probability function for the ordered probit model versus the focus on adjacent categories in the partial credit model. Apart from poor predictions of the underlying latent variable, both the ordered probit and the partial credit models suer from the problem that one cannot allow the response category cut-points ( s), or the so-called diculty parameters (s), to be functions of the same covariates as the mean value of the latent variable. This is because there will be a clear
7 We have developed working versions of the models with random eects. However, they are very slow to run and we are currently trying to improve the speed of estimation.

14

identication problem if one does so: in the absence of additional exogenous information, neither model will be able to detect whether the eects of the covariates are on the mean value of the latent variable or on the cut-points or diculties. This is easy to see from the equations for the predicted probabilities [equations (1) and (2)]. This is likely to be a serious shortcoming of both models in estimating cross-population comparable dierences in the latent variable of interest. In simple terms, these models do not allow for a world in which the Danish not only have a higher health status, but also have dierent expectations for their health status relative to Ethiopians. In the next section, we introduce an innovation to both the ordered probit and partial credit models that allows for the introduction of exogenous information in the form of vignettes. Analyzing the self-report questions in conjunction with responses to vignettes allows us to identify the model such that the same set of covariates can be used to assess dierences in the mean level of the underlying latent variable as well as in cut-points or diculties.

Vignettes

We now introduce the use of vignettes as a means of correction of self-report responses in order to make them cross-population comparable. A vignette is a description of a concrete level of ability on a given domain that respondents are asked to evaluate with relation to the same main question and on the same categorical response scale as the main self-report question [24]. The vignette xes the level of ability such that variations in categorical responses are attributable to variations in response category cut-points. This introduction of exogenous information in the form of responses to vignettes allows us to identify the eects of a set of sociodemographic covariates (such as age, sex, education, country of residence, etc.) on both the level of the underlying latent variable that is being estimated as well as on the cut-points (in the ordered probit version of the model) and diculties (in the partial credit version of the model).8 In the WHO Multi-Country Study, there are six vignettes for the domain of mobility, each designed to capture a dierent level of ability on this domain. The vignettes are:
Vignette 1: [Paul] is an active athlete who runs long distance races of 20 kilometers twice a week and engages in soccer with no problems. Vignette 2: [Mary] has no problems with moving around or using her hands, arms and legs. She jogs 4 kilometers twice a week without any problems. Vignette 3: [Rob] is able to walk distances of up to 200 meters without any problems but feels breathless after walking one kilometer or climbing up more than one ight of stairs. He has no problems with day-to-day physical activities, such as carrying food from the market.
An alternative method to set a comparable scale such that response category cut-point dierences can be recovered is to use measured tests [26].
8

15

Vignette 4: [Margaret] feels chest pain and gets breathless after walking distances of up to 200 meters, but is able to do so without assistance. Bending and lifting objects such as groceries produces pain. Vignette 5: [Louis] is able to move his arms and legs, but requires assistance in standing up from a chair or walking around the house. Any bending is painful and lifting is impossible. Vignette 6: [David] is paralyzed from the neck down. He is conned to bed and must be fed and bathed by somebody else.

Respondents are asked to classify each of these vignettes on the same ve-point response category scale as the main question. So, for each individual, we not only have categorical responses to their self-report main question and several auxiliary questions, but we also have their categorical responses to a set of vignettes (ranging in number from six to eight across the dierent domains for health and responsiveness in the WHO Multi-Country Study). In order to introduce statistical models designed around the use of vignettes, we have extended the simulated data set to include hypothetical ratings of seven mobility vignettes in countries A and B by assigning true mobility scores to the dierent vignettes and assuming that individuals will use the categorical response scale the same way in assessing vignettes as they do in assessing their own levels of mobility on the main question. This assumption is critical for the estimation of the models, as discussed below. The simulated vignette ratings for the two countries are summarized in Figures 10 and 11. Each graph shows the distribution of categorical responses for the set of vignettes (lighter colors signifying worse responses). The vignettes are ranked from 1 to 7 in decreasing order of ability: i.e., vignette 1 refers to a higher level of mobility than vignette 2, and vignette 3 is higher than vignette 2, and so on. From these graphs, it is clear that there are important dierences in the cutpoints between country A and country B. At lower levels of mobility, respondents in country B are more likely to characterize a vignette unfavorably than respondents in country A. In addition, the compression of the middle categories in country B suggest cut-points that are more narrowly spaced than those in country A. The types of variation in vignette ratings that we have generated in the simulated dataset closely parallel the variation observed in actual data from the WHO Multi-Country Survey Study. In a later section, we show the response distributions for China versus those for India for mobility vignettes. In the following sections we describe how variants of the ordered probit model and partial credit model may be used in conjunction with vignette ratings in order to characterize these systematic cutpoint dierences more precisely. Both models are modied such that: (a) information from responses to vignettes are introduced in the likelihood function, and (b) cut-points and diculties are allowed to be functions of the same covariates as those used in the estimation of the mean value of the latent variable.

3.1

Hierarchical Ordered Probit Model (HOPIT)

The hierarchical ordered probit (HOPIT) model is a modication of the standard ordered probit model described earlier. In order to incorporate information on vignette ratings and 16

100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 1 2 3 4 Vignette 5 6 7

Figure 10: Distribution of vignette responses for country A

100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 1 2 3 4 Vignette 5 6 7

Figure 11: Distribution of vignette responses for country B

17

multiple questions, the expanded model has several components to the likelihood function: the rst component refers to estimation of cut-points using responses to vignettes, and the second component utilizes responses on the self-report main question. The remaining components are for auxiliary questions. In formal terms, the rst component of the likelihood v function assumes there is an unobserved latent variable Yij distributed with mean v and ij variance 1. Here, i refers to the respondent, j refers to the vignette number, and the v superscript indicates that this refers to the vignette component of the model. In mathematical terms,
v Yij N (v , 1), ij

i = 1, ..., N ; j = 1, ..., V

v ij

Ji0 ,

v where Ji is a vector of indicator variables for each of V 1 vignettes. Letting yij denote the observed categorical response by individual i to vignette j, the observation mechanism is dened as follows: v v yij = k if ik1 Yij < ik ;

for i0 = , i5 = , i, j & k = 1, ..., 5. ik = Xi0 k ,

In addition, the cut-points are allowed to be functions of covariates:

As before, i1 < i2 < i3 < i4 . The second component of the likelihood function utilizes information from the respondents main self-report question (the one that is tied to the vignettes) and assumes there is an unobserved latent variable Yis distributed with mean s and variance 2 . Here, the s i superscript indicates that this component refers to self-report questions. This formulation is slightly dierent from the standard ordered probit model: since we are allowing the vignettes to drive the cut-point estimation, this second component of the likelihood function has more in common with an interval regression model (i.e., an ordered probit model with known cut-points). Since the cut-point estimation is being driven by vignettes and the scale is set by the rst estimation component, we are now able to obtain estimates of the variance of the latent variable (i.e., there is no need to set the variance equal to 1 as before). In mathematical terms, the model is: Yis N (s , 2 ), i s i = Zi .
0

i = 1, ..., N

s Let yi be the observed categorical responses on the self-report such that: s yi = k if ik1 Yis < ik ; for i0 = , i5 = , i & k = 1, ..., 5

Similarly for the auxiliary questions, let aj be the observed categorical responses on the i j-th auxiliary question such that: aj = k, if ij,k1 Yis < ij,k ; for ij,0 = , ij,5 = , i & k = 1, ..., 5 i and ij,k = Xi0 j,k 18

R-squared = .495 RMSE = 15.186 50

0 True mobility

-50

-100 2 4 Predicted mobility 6 8

Figure 12: Predicted versus true mobility: HOPIT


v It is assumed that Yis & Yis are independent i 6= i0 , conditional on Xi . Yij & Yis 0 are independent i, j conditional on Xi , Ji and Zi . The probabilities associated with the observed responses to vignettes, the main question, and the auxiliary questions can be computed as in equation (1) with the adjustment for cut-point shifts being functions of covariates. The likelihood function can be written using these probabilities as three separate components. The three components of the likelihood function are additive in logs and can be jointly maximized to yield the parameter estimates. There is explicit parametric dependence between the dierent components of the likelihood function. The cut-points to be estimated from the vignettes component are the same as those in the main question component. In addition, s is the same for both the main question and all the auxiliary i questions. This ensures that the estimated cut-points for both the main question and the auxiliary questions are on the same scale to enable meaningful comparisons.

Tables 5 to 9 report the results of the estimation in the Annex. Figure 12 plots the estimates of the mean level versus truth. The R-squared for the prediction has improved when compared with the simple ordered probit model as well as with the partial credit models with and without covariates. Figure 13 reports the true versus estimated cutpoints for the main question. These dier by sociodemographic group in that they are also functions of the same covariates (age, sex, education, and country of residence) as the mean level of the mobility. As can be seen, the model is able to recover the cut-point dierences quite well. Figures 14 and 15 report the comparison of estimated cut-points to truth for the two auxiliary questions. The recovery here is not quite as good as that for the main question. This is to be expected since the information in the vignettes are directly driving the main question cut-points, whereas the estimation of the cut-points for the auxiliary questions is more indirect and is not anchored to the cut-points derived from vignette responses. The estimation of the latent variable using Bayes theorem (Figure 16) improves the R-squared quite signicantly, yielding estimates of mobility that are quite close to the true mobility levels in the underlying simulated data.

19

First cut-point Third cut-point 50

Second cut-point Fourth cut-point

True cut-points

-50

-100 2 4 Predicted cut-points 6 8

Figure 13: Predicted versus true cut-points: HOPIT main question

First cut-point Third cut-point 50

Second cut-point Fourth cut-point

True cut-points

-50

-100 2 4 Predicted cut-points 6 8

Figure 14: Predicted versus true cut-points: HOPIT auxiliary question 1

20

First cut-point Third cut-point 50

Second cut-point Fourth cut-point

True cut-points

-50

-100 2 4 6 Predicted cut-points 8

Figure 15: Predicted versus true cut-points: HOPIT auxiliary question 2

R-squared = .729 RMSE = 11.139 50

0 True mobility

-50

-100 0 5 Predicted mobility 10

Figure 16: Predicted versus true mobility: HOPIT (Bayesian)

21

3.2

Hierarchical Partial Credit Model

In analogy to the HOPIT model, we implement the use of vignettes in exactly the same way for the Rasch-based partial credit model. We allow for responses to vignettes to set the diculty levels and estimate dierences across sociodemographic groups in the rst component of the likelihood function. In the other components of the likelihood, we utilize information from the main and auxiliary questions. The logic is the same as before: we are using information on diculty parameters from responses on vignettes to allow us to have covariates that aect both the mean level of the estimated latent variable as well as the diculty parameters. For all the vignette questions, i.e., for j = 1, ..., V :
1/A, exp(J 0 1 )/A, i i

v Pr(yi = k) =

k k 1 2 k exp(2Ji0 i i )/A, 0 1 2 3 )/A, k exp(3Ji i i i 0 1 2 3 4 )/A, k exp(4Ji i i i i

=1 =2 =3 =4 = 5,

(6)

where Ji is a vector of indicator variables for each of V 1 vignettes, and A is the expression
1 1 2 1 2 3 A 1 + exp(Ji0 i ) + exp(2Ji0 i i ) + exp(3Ji0 i i i ) 1 2 3 4 +exp(4Ji0 i i i i )

and,
k i = Xi0 k

Similarly, the probabilities for the main question (the one which is tied to the vignettes):
1/A, exp(Z 0 1 )/A, i i

s Pr(yi = k) =

k k 1 2 k exp(2Zi0 i i )/A, 1 2 3 k exp(3Zi0 i i i )/A, 0 1 2 3 4 )/A, k exp(4Zi i i i i

=1 =2 =3 =4 = 5,

(7)

where Zi is a vector of individual-level covariates, and A is the expression


1 1 2 1 2 3 A 1 + exp(Zi0 i ) + exp(2Zi0 i i ) + exp(3Zi0 i i i ) 1 2 3 4 +exp(4Zi0 i i i i )

And for the j-th auxiliary question:


1/A, exp(Z 0 1 )/A, i ij

s Pr(yij = k) =

k k 1 2 exp(2Zi0 ij ij )/A, k 1 2 3 exp(3Zi0 ij ij ij )/A, k 0 1 2 3 4 )/A, k exp(4Zi ij ij ij ij

=1 =2 =3 =4 = 5,

(8)

where Zi is a vector of individual-level covariates, and A is the expression


1 1 2 1 2 3 A 1 + exp(Zi0 ij ) + exp(2Zi0 ij ij ) + exp(3Zi0 ij ij ij ) 1 2 3 4 +exp(4Zi0 ij ij ij ij )

22

R-squared = .487 RMSE = 15.287 50

0 True mobility

-50

-100 -6 -5 Predicted mobility -4 -3

Figure 17: Predicted versus true mobility: partial credit model


R-squared = .683 RMSE = 12.019 50

0 True m obility

-50

-100 -8 -6 Predicted mobility -4 -2

Figure 18: Predicted versus true mobility: partial credit model (Bayesian) Tables 10 to 14 in the Annex report the results of this estimation. Figures 17 and 18 show the predicted mobility versus the true mobility before and after the Bayesian correction. The R-squared values obtained from the hierarchical ordered probit model for predicted mobility are similar in magnitude for the pre-Bayesian estimates obtained using the HOPIT model. The post-Bayesian estimation appears to be slightly higher for HOPIT than for the hierarchical partial credit model. This may result from the fact that the hierarchical partial credit model, in the way we have formulated it, does not estimate the variance of the stochastic term. This constraint will inhibit the model from tting the data as well as it could if the variance were included as a parameter.

23

Goodness-of-Fit

Assessing goodness-of-t for categorical data is not straightforward. One can compute a simple count-R2 which is a measure of the proportion of correct responses obtained for a given sample. For ordinal data, the predicted categorical response would be the one associated with the maximum predicted probability. Other options include a pseudo-R2 measure, which in software such as STATA, is a likelihood-based comparison of the model with all the parameters to one with only the intercept [12]. Rasch-based models use measures of t such as outt and int: outt is a chi-square test based on the sum of the standardized deviation of observed versus expected values of a response. Int is also a chisquare test which utilizes an information-weighted sum by adjusting for extreme responses using weights [32]. In order to assess model t, a standard likelihood ratio test can be used. These tests compare the log-likelihood value of the full model with a constrained version of the same model (i.e., a model that is nested within the full model) to assess the contribution of the dropped covariates to the likelihood function. Assume L0 is the log-likelihood value associated with the full model and L1 is the log-likelihood value of the constrained model. Then 2(L1 L0 ) is distributed 2 with d0 d1 degrees of freedom, where d0 and d1 are the model degrees of freedom associated with the full and the constrained models, respectively [12].

Unidimensionality

Both the HOPIT model and the Rasch-based models in IRT assume some form of unidimensionality. In formal terms, unidimensionality can be dened as the assumption that any dependence between dierent questions tapping into a given domain is solely due to the existence of a single underlying latent trait. Tests of unidimensionality are often based on uncovering this assumed factor that underlies observed responses to multiple question. Mathematically, the assumption of unidimensionality can be worked out by assuming responses to all questions on a given domain are tapping this latent trait. In the WHO Multi-Country Study, test-retest data are available from a subsample of respondents who were revisited and administered the survey questionnaire for a second time. This availability of test-retest data can be used to design a test of unidimensionality. Suppose we get latent variable estimates from two separate questions on any given domain, Y1 and Y2 . Each of these estimates of the latent variable represents some measure of truth with error. That is, if truth were denoted by Ytrue , then:
Y1test = Ytrue + 1test Y1retest = Ytrue + 1retest

and
Y2test = Ytrue + 2test Y2retest = Ytrue + 2retest

24

Here, 1 and 2 are the question-specic error terms for both test and retest questions, 2 2 1 N (0, 1 ), 2 N (0, 2 ). The correlation coecient between the measured Y s is: = Rewriting (9), = Similarly, cov(Y1 , Y2 ) Y1 Y2 (9)

cov(Y1 , Y2 ) cov(Y1 , Y2 ) q =q 2 2 2 2 Y1 Y2 Y + 1 Y + 2
true true

(10)

= Dividing (11) by (10), =


q

cov(Ytrue , Ytrue ) Ytrue Ytrue

(11)

2 Y

true

2 + 1

Ytrue Ytrue

2 Y

true

2 + 2

2 2 where i = var(Yitest2Yiretest ) for i = 1, 2. Given that both Y = cov(Y1 , Y2 ) and true are observed, the above expression should equal 1. This can form the basis of a test of unidimensionality using information from test-retest data.

since cov(Y1 , Y2 ) = cov(Ytrue , Ytrue ) if the error terms are assumed to be uncorrelated. Therefore, v v u 2 u 2 2 2 u Y + 1 u Y + 2 t true t true =1 = 2 2 Y Y
true true

Discussion

One of the key conclusions of this paper is that adjustments are needed to make survey results comparable across populations. In particular, when categorical variables are involved, analyses must account for dierences in response category cut-points. There is considerable evidence that suggests that response category cut-points are dierent across countries. Therefore, until variation in cut-points is addressed, one must start from a presumption that results are not comparable across populations. The problem of cross-population comparability also appears to apply within populations across dierent socio-economic and demographic groups. This has important implications for the measurement of inequality, which may be greater or smaller than measured before taking into account response category cut-point shifts. It also has critical implications for comparisons over time. Cut-points may systematically shift over time (e.g., due to rising income, education, and health norms) so long-term trends may be dicult to assess without correction.

25

Annex
Table 5: Estimation results : HOPIT Variable Coecient (Std. Err.) Vignettes Ivignette 2 -0.035 (0.146) Ivignette 3 -4.033 (0.117) Ivignette 4 -5.116 (0.122) Ivignette 5 -5.341 (0.123) Ivignette 6 -7.458 (0.175) Ivignette 7 -7.643 (0.195) Mean Age 30-44 -0.488 (0.085) Age 45-59 -0.715 (0.1) Age 60+ -1.656 (0.113) Male 0.174 (0.068) 1<Educ6 0.185 (0.115) 6<Educ11 0.332 (0.122) Educ>11 0.521 (0.147) Country B 0.996 (0.074) Intercept -2.985 (0.166) log(s) 0.061 (0.043)

26

Table 6: Estimation results : HOPIT 1 Variable Coecient (Std. Err.) Main question Age 30-44 -0.504 (0.046) Age 45-59 -0.569 (0.054) Age 60+ -1.282 (0.062) Male 0.25 (0.037) 1<Educ6 0.079 (0.061) 6<Educ11 0.072 (0.065) Educ>11 0.129 (0.079) Country B 1.296 (0.041) Intercept -4.662 (0.134) Auxiliary question 1 Age 30-44 -0.384 (0.132) Age 45-59 -0.544 (0.156) Age 60+ -1.172 (0.170) Male 0.227 (0.106) 1<Educ6 0.056 (0.175) 6<Educ11 0.161 (0.185) Educ>11 0.196 (0.227) Country B 0.928 (0.113) Intercept -4.312 (0.223) Auxiliary question 2 Age 30-44 -0.099 (0.113) Age 45-59 -0.159 (0.134) Age 60+ -0.768 (0.155) Male 0.464 (0.092) 1<Educ6 0.249 (0.157) 6<Educ11 0.242 (0.166) Educ>11 0.395 (0.197) Country B 1.260 (0.099) Intercept -3.777 (0.202)

27

Table 7: Estimation results : HOPIT 2 Variable Coecient (Std. Err.) Main question Age 30-44 -0.441 (0.048) Age 45-59 -0.551 (0.056) Age 60+ -1.283 (0.063) Male 0.25 (0.038) 1<Educ6 0.053 (0.062) 6<Educ11 0.059 (0.066) Educ>11 0.072 (0.081) Country B 1.259 (0.043) Intercept -4.399 (0.134) Auxiliary question 1 Age 30-44 -0.356 (0.125) Age 45-59 -0.402 (0.145) Age 60+ -1.232 (0.164) Male 0.271 (0.100) 1<Educ6 -0.117 (0.165) 6<Educ11 -0.005 (0.175) Educ>11 0.104 (0.212) Country B 0.839 (0.107) Intercept -3.922 (0.210) Auxiliary question 2 Age 30-44 -0.152 (0.112) Age 45-59 -0.224 (0.133) Age 60+ -0.845 (0.155) Male 0.460 (0.092) 1<Educ6 0.324 (0.157) 6<Educ11 0.346 (0.165) Educ>11 0.458 (0.196) Country B 1.258 (0.098) Intercept -3.579 (0.201)

28

Table 8: Estimation results : HOPIT 3 Variable Coecient (Std. Err.) Main question Age 30-44 -0.395 (0.051) Age 45-59 -0.537 (0.059) Age 60+ -1.160 (0.065) Male 0.227 (0.040) 1<Educ6 0.089 (0.065) 6<Educ11 0.079 (0.070) Educ>11 0.136 (0.087) Country B 1.252 (0.046) Intercept -4.074 (0.135) Auxiliary question 1 Age 30-44 -0.271 (0.118) Age 45-59 -0.388 (0.138) Age 60+ -1.262 (0.158) Male 0.217 (0.095) 1<Educ6 -0.093 (0.159) 6<Educ11 0.031 (0.168) Educ>11 0.073 (0.204) Country B 0.836 (0.101) Intercept -3.611 (0.203) Auxiliary question 2 Age 30-44 -0.120 (0.113) Age 45-59 -0.250 (0.136) Age 60+ -0.883 (0.162) Male 0.345 (0.093) 1<Educ6 0.206 (0.160) 6<Educ11 0.259 (0.168) Educ>11 0.367 (0.200) Country B 1.235 (0.100) Intercept -2.945 (0.203)

29

Table 9: Estimation results : HOPIT 4 Variable Coecient (Std. Err.) Main question Age 30-44 -0.371 (0.059) Age 45-59 -0.526 (0.069) Age 60+ -1.095 (0.074) Male 0.188 (0.046) 1<Educ6 0.054 (0.076) 6<Educ11 0.090 (0.081) Educ>11 0.199 (0.101) Country B 1.223 (0.057) Intercept -3.364 (0.139) Auxiliary question 1 Age 30-44 -0.285 (0.115) Age 45-59 -0.371 (0.135) Age 60+ -1.263 (0.155) Male 0.290 (0.093) 1<Educ6 0.004 (0.157) 6<Educ11 0.081 (0.166) Educ>11 0.127 (0.200) Country B 0.822 (0.100) Intercept -3.498 (0.201) Auxiliary question 2 Age 30-44 -0.097 (0.118) Age 45-59 -0.221 (0.143) Age 60+ -0.842 (0.178) Male 0.371 (0.098) 1<Educ6 0.161 (0.170) 6<Educ11 0.118 (0.178) Educ>11 0.339 (0.212) Country B 1.263 (0.106) Intercept -2.461 (0.212)

30

Table 10: Estimation results : hierarchical partial credit Variable Coecient (Std. Err.) Vignettes Ivignette 2 -0.181 (0.349) Ivignette 3 -4.975 (0.264) Ivignette 4 -5.665 (0.265) Ivignette 5 -5.809 (0.265) Ivignette 6 -8.535 (0.362) Ivignette 7 -8.814 (0.394) Mean Age 30-44 -0.29 (0.066) Age 45-59 -0.461 (0.074) Age 60+ -1.052 (0.079) Male 0.096 (0.051) 1<Educ6 0.108 (0.082) 6<Educ11 0.198 (0.088) Educ>11 0.333 (0.111) Country B 0.563 (0.056) Intercept -4.327 (0.277)

31

Table 11: Estimation results : hierarchical partial credit 1 Variable Coecient (Std. Err.) Main question Age 30-44 -0.708 (0.126) Age 45-59 -0.614 (0.151) Age 60+ -1.196 (0.176) Male 0.233 (0.100) 1<Educ6 0.177 (0.161) 6<Educ11 0.150 (0.171) Educ>11 0.394 (0.223) Country B 1.383 (0.110) Intercept -3.664 (0.316) Auxiliary question 1 Age 30-44 -0.306 (0.326) Age 45-59 -0.910 (0.352) Age 60+ -0.515 (0.427) Male -0.034 (0.251) 1<Educ6 0.558 (0.363) 6<Educ11 0.669 (0.395) Educ>11 0.429 (0.502) Country B 0.827 (0.266) Intercept -3.961 (0.481) Auxiliary question 2 Age 30-44 0.225 (0.215) Age 45-59 0.222 (0.258) Age 60+ 0.022 (0.320) Male 0.339 (0.179) 1<Educ6 -0.161 (0.346) 6<Educ11 -0.322 (0.357) Educ>11 -0.017 (0.419) Country B 0.799 (0.188) Intercept -3.038 (0.446)

32

Table 12: Estimation results : hierarchical partial credit 2 Variable Coecient (Std. Err.) Main question Age 30-44 -0.042 (0.158) Age 45-59 -0.188 (0.187) Age 60+ -0.841 (0.204) Male 0.175 (0.123) 1<Educ6 -0.153 (0.199) 6<Educ11 -0.082 (0.213) Educ>11 -0.330 (0.273) Country B 0.396 (0.135) Intercept -5.569 (0.341) Auxiliary question 1 Age 30-44 -0.477 (0.333) Age 45-59 -0.037 (0.365) Age 60+ -0.998 (0.466) Male 0.455 (0.260) 1<Educ6 -0.451 (0.383) 6<Educ11 -0.402 (0.416) Educ>11 0.258 (0.532) Country B 0.300 (0.275) Intercept -4.512 (0.497) Auxiliary question 2 Age 30-44 -0.486 (0.235) Age 45-59 -0.558 (0.287) Age 60+ -1.111 (0.366) Male 0.408 (0.198) 1<Educ6 0.649 (0.370) 6<Educ11 0.754 (0.383) Educ>11 0.693 (0.452) Country B 0.688 (0.209) Intercept -5.606 (0.464)

33

Table 13: Estimation results : hierarchical partial credit 3 Variable Coecient (Std. Err.) Main question Age 30-44 -0.195 (0.134) Age 45-59 -0.331 (0.156) Age 60+ -0.529 (0.159) Male 0.143 (0.102) 1<Educ6 0.195 (0.168) 6<Educ11 0.076 (0.180) Educ>11 0.161 (0.224) Country B 0.775 (0.114) Intercept -5.62 (0.322) Auxiliary question 1 Age 30-44 -0.014 (0.304) Age 45-59 -0.449 (0.350) Age 60+ -1.168 (0.417) Male -0.414 (0.244) 1<Educ6 -0.504 (0.444) 6<Educ11 -0.289 (0.468) Educ>11 -0.423 (0.572) Country B 0.556 (0.235) Intercept -3.342 (0.545) Auxiliary question 2 Age 30-44 -0.089 (0.205) Age 45-59 -0.354 (0.253) Age 60+ -0.855 (0.329) Male -0.150 (0.175) 1<Educ6 -0.044 (0.288) 6<Educ11 0.289 (0.306) Educ>11 0.156 (0.359) Country B 0.512 (0.184) Intercept -4.00 (0.398)

34

Table 14: Estimation results : hierarchical partial credit 4 Variable Coecient (Std. Err.) Main question Age 30-44 -0.167 (0.110) Age 45-59 -0.286 (0.127) Age 60+ -0.526 (0.133) Male 0.044 (0.085) 1<Educ6 -0.069 (0.139) 6<Educ11 -0.002 (0.148) Educ>11 0.101 (0.186) Country B 0.665 (0.096) Intercept -5.130 (0.298) Auxiliary question 1 Age 30-44 -0.086 (0.243) Age 45-59 -0.005 (0.274) Age 60+ -0.588 (0.321) Male 0.536 (0.196) 1<Educ6 0.497 (0.379) 6<Educ11 0.396 (0.397) Educ>11 0.365 (0.466) Country B 0.343 (0.203) Intercept -7.336 (0.482) Auxiliary question 2 Age 30-44 0.060 (0.198) Age 45-59 0.023 (0.249) Age 60+ -0.386 (0.347) Male 0.308 (0.171) 1<Educ6 0.028 (0.291) 6<Educ11 -0.293 (0.307) Educ>11 0.133 (0.359) Country B 0.856 (0.181) Intercept -4.947 (0.397)

References
[1] Andrich, D. (1988), Rasch Models for Measurement, Newbury Park, California: Sage Publications. [2] Bjorner J.B., and J.E. Ware (1998), Using Modern Psychometric Methods to Measure Health Outcomes, Medical Outcomes Trust Monitor, 3:2-3. [3] Chamberlain, G. (1980), Analysis of Covariance with Qualitative Data, Review of Economic Studies 47:225-238.

35

[4] Clark, A.E. (1997), Job Satisfaction and Gender: Why Are Women So Happy at Work? Labour Economics 4:341-372. [5] Frechette, G.R. (2001), Random-Eects Ordered Probit, Stata Technical Bulletin 59:23-26. [6] Groot, W., and H.M. van den Brink (1999), Job Satisfaction and Preference Drift, Economics Letters 63:363-367. [7] Groot, W. (2000), Adaptation and Scale of Reference Bias in Self-Assessments of Quality of Life, Journal of Health Economics 19:403-420. [8] Kelderman, H. (1984), Loglinear Rasch Model Tests, Psychometrika 49:223-245. [9] Keller S.D., J.E. Ware, B. Gandek B et al. (1998), Testing the Equivalence of Translations of Widely Used Response Choice Labels, Journal of Clinical Epidemiology 51(11):933-944. [10] Kerkhofs, M., and M. Lindeboom (1995), Subjective Health Measures and State Dependent Reporting Errors, Health Economics 4:221-235. [11] Long, J.S. (1997), Regression Models for Categorical and Limited Dependent Variables, Thousand Oaks, California: Sage Publications. [12] Long, J.S., and J. Freese (2001), Regression Models for Categorical Dependent Variables using STATA, College Station, Texas: STATA Press. [13] Ludlow, L.H., and S.M. Haley (1995), Rasch Model Logits: Interpretation, Use, and Transformation, Educational and Psychological Measurement 55(6):967-975. [14] MacIntosh, R. (1998), Global Attitude Measurement: An Assessment of the World Values Survey Postmaterialism Scale, American Sociological Review 63(1):452-464. [15] Martin, E.A., P.C. Campanelli, and R.E. Fay (1991), An Application of Rasch Analysis to Questionnaire Design: Using Vignettes to Study the Meaning of Work in the Current Population Survey, The Statistician 40:265-276. [16] Masters, G.N. (1982), A Rasch Model for Partial Credit Scoring, Psychometrika 47(2):149-174. [17] Masters, G.N. (1985), A Comparison of Latent Trait and Latent Class Analyses of Likert-Type Data, Psychometrika 50(1):69-82. [18] Masters, G.N., and B.D. Wright (1997), The Partial Credit Model, in van der Linden, W. and R.K. Hambleton (eds.), Handbook of Modern Item Response Theory, New York: Springer-Verlag. [19] Murray, C.J.L. (1996), Epidemiology and Morbidity Transitions in India, in DasGupta, M., L.C. Chen, and T.N. Krishnan (eds.), Health, Poverty and Development in India, Delhi: Oxford University Press.

36

[20] Murray, C.J.L., A. Tandon, J. Salomon, C.D. Mathers, and R. Sadana (2001), CrossPopulation Comparability of Evidence for Health Policy, Global Programme on Evidence for Health Policy Discussion Paper, Geneva: World Health Organization. [21] Powers, D.A., and Y. Xie (2000), Statistical Methods for Categorical Data Analysis, San Diego, California: Academic Press. [22] Rabin, R., and F. deCharro (2001), EQ-5D: a measure of health status from the EuroQol Group. Annals of Medicine 33: 337-343. [23] Sadana, R., C.D. Mathers, A.D. Lopez, C.J.L. Murray, and K. Iburg (2001), CrossPopulation Comparability, in Murray, C.J.L. and J. Frenk (eds.), Health System Performance: Concepts, Measurement, and Determinants, Geneva: World Health Organization. [24] Salomon, J.A., A. Tandon, C.J.L. Murray (2001), Using Vignettes to Improve CrossPopulation Comparability of Health Surveys: Concepts, Design and Evaluation Techniques, Global Programme on Evidence for Health Policy Discussion Paper No. 41, Geneva: World Health Organization. [25] Samejima, F. (1997), Graded Response Model, in van der Linden, W. and R.K. Hambleton (eds.), Handbook of Modern Item Response Theory, New York: SpringerVerlag. [26] Tandon, A., S. Chatterji, B. Ustun, J.A. Salomon, and C.J.L. Murray (2001), CrossValidation of Cut-Point Estimation Using Measured Tests and Vignettes: The Case of Vision, Global Programme on Evidence for Health Policy Discussion Paper, Geneva: World Health Organization. [27] Tjur, T. (1982), A Connection Between Raschs Item Analysis Model and a Multiplicative Poisson Model, Scandinavian Journal of Statistics 9:23-30. [28] Ustun, T.B., S. Chatterji, M. Villanueva et al. (2001), WHO Multi-Country Household Survey Study on Health and Responsiveness 2000-2001, Global Programme on Evidence for Health Policy Working Paper, Geneva: World Health Organization. [29] van der Linden, W., and R.K. Hambleton (eds.) (1997), Handbook of Modern Item Response Theory, New York: Springer-Verlag. [30] von Davier, M. (2001), WINMIRA User Manual, Kiel: IPN. [31] Ware, J.E., and B. Gandek (1998), Overview of the SF-36 Health Survey and the International Quality of Life Assessment (IQOLA) Project. Journal of Clinical Epidemiology 51(11): 903-912. [32] Wright, B.D., and M. Mok (2000), Rasch Models Overview, Journal of Applied Measurement 1(1):83-106. [33] Wright, B.D., and G.N. Masters (1982), Rating Scale Analysis: Rasch Measurement, Chicago: MESA Press.

37