Sie sind auf Seite 1von 19

Measurement Validity: A Shared Standard for Qualitative and Quantitative Research

Author(s): Robert Adcock and David Collier


Source: The American Political Science Review, Vol. 95, No. 3 (Sep., 2001), pp. 529-546
Published by: American Political Science Association
Stable URL: http://www.jstor.org/stable/3118231
Accessed: 04/12/2010 18:02

Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at
http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless
you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you
may use content in the JSTOR archive only for your personal, non-commercial use.

Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at
http://www.jstor.org/action/showPublisher?publisherCode=apsa.

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed
page of such transmission.

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.

American Political Science Association is collaborating with JSTOR to digitize, preserve and extend access to
The American Political Science Review.

http://www.jstor.org
Aeica Political
American Science Review
PoitclSineRveVo.9,N.3Spmbr20 Vol. 95, No. 3 September 2001

Measurement AShared
Validity: Standard
forQualitative
andQuantitative
Research
ROBERT ADCOCK and DAVID COLLIER University of California,Berkeley
Scholarsroutinelymake claims thatpresuppose the validityof the observationsand measurementsthat
operationalize their concepts. Yet, despite recent advances in political science methods, surprisingly
little attention has been devoted to measurementvalidity.Weaddressthis gap by exploringfour themes.
First, we seek to establish a shared framework that allows quantitative and qualitative scholars to assess
more effectively,and communicate about, issues of valid measurement.Second, we underscorethe need to
draw a clear distinction between measurement issues and disputes about concepts. Third, we discuss the
contextual specificity of measurement claims, exploring a variety of measurement strategies that seek to
combine generalityand validityby devotinggreaterattention to context.Fourth, we address the proliferation
of termsfor alternativemeasurementvalidationprocedures and offeran account of the three main types of
validation most relevant to political scientists.

R esearchers routinely make complex choices 1994), no major statement on this topic has appeared
about linking concepts to observations, that is, since Zeller and Carmines (1980) and Bollen (1989).
about connecting ideas with facts. These choices Although King, Keohane, and Verba (1994, 25, 152-5)
raise the basic question of measurement validity: Do cover many topics with remarkable thoroughness, they
the observations meaningfully capture the ideas con- devote only brief attention to measurement validity.
tained in the concepts? We will explore the meaning of New thinking about measurement, such as the idea of
this question as well as procedures for answering it. In measurement as theory testing (Jacoby 1991, 1999), has
the process we seek to formulate a methodological not been framed in terms of validity.
standard that can be applied in both qualitative and Four important problems in political science re-
quantitative research. search can be addressed through renewed attention to
Measurement validity is specifically concerned with measurement validity. The first is the challenge of
whether operationalization and the scoring of cases establishing shared standards for quantitative and qual-
adequately reflect the concept the researcher seeks to itative scholars, a topic that has been widely discussed
measure. This is one aspect of the broader set of (King, Keohane, and Verba 1994; see also Brady and
analytic tasks that King, Keohane, and Verba (1994, Collier 2001; George and Bennett n.d.). We believe the
chap. 2) call "descriptive inference," which also encom- skepticism with which qualitative and quantitative re-
passes, for example, inferences from samples to popu- searchers sometimes view each other's measurement
lations. Measurement validity is distinct from the va- tools does not arise from irreconcilable methodological
lidity of "causal inference" (chap. 3), which Cook and differences. Indeed, substantial progress can be made
Campbell (1979) further differentiate into internal and in formulating shared standards for assessing measure-
external validity.1 Although measurement validity is ment validity. The literature on this topic has focused
interconnected with causal inference, it stands as an almost entirely on quantitative research, however,
important methodological topic in its own right. rather than on integrating the two traditions. We
New attention to measurement validity is overdue in propose a framework that yields standards for mea-
political science. While there has been an ongoing surement validation and we illustrate how these apply
concern with applying various tools of measurement to both approaches. Many of our quantitative and
validation (Berry et al. 1998; Bollen 1993; Elkins 2000; qualitative examples are drawn from recent compara-
Hill, Hanna, and Shafqat 1997; Schrodt and Gerner tive work on democracy, a literature in which both
groups of researchers have addressed similar issues.
Robert Adcock (adcockr@uclink4.berkeley.edu) is a Ph.D candi- This literature provides an opportunity to identify
date, Department of Political Science, and David Collier parallel concerns about validity as well as differences in
(dcollier@socrates.berkeley.edu) is Professorof Political Science,
specific practices.
Universityof California,Berkeley,CA 94720-1950. A second problem concerns the relation between
Amongthe manycolleagueswho haveprovidedhelpfulcomments
on this article,we especiallythank ChristopherAchen, Kenneth measurement validity and disputes about the meaning
Bollen,HenryBrady,EdwardCarmines,RubetteCowan,PaulDosh, of concepts. The clarification and refinement of con-
ZacharyElkins,JohnGerring,KennethGreene,ErnstHaas,Edward cepts is a fundamental task in political science, and
Haertel, Peter Houtzager,Diana Kapiszewski,Gary King, Marcus
Kurtz,James Mahoney,SebastianMazzuca,Doug McAdam,Ger- carefully developed concepts are, in turn, a major
ardo Munck, Charles Ragin, Sally Roever, Eric Schickler,Jason prerequisite for meaningful discussions of measure-
Seawright,JeffSluyter,RichardSnyder,RuthStanley,LauraStoker, ment validity. Yet, we argue that disputes about con-
and three anonymousreviewers.The usual caveats apply.Robert cepts involve different issues from disputes about mea-
Adcock'sworkon this projectwas supportedby a NationalScience surement validity. Our framework seeks to make this
FoundationGraduateFellowship.
1 These involve,respectively,the validityof causalinferencesabout distinction clear, and we illustrate both types of dis-
the casesbeingstudied,and the generalizability of causalinferences putes.
to a broaderset of cases (Cook and Campbell1979,50-9, 70-80). A third problem concerns the contextual specificity

529
Measurement Validity: A Shared Standard for Qualitative and Quantitative Research September 2001

of measurement validity-an issue that arises when a attention in political science: content, convergent/dis-
measure that is valid in one context is invalid in criminant, and nomological/construct validation.
another. We explore several responses to this problem
that seek a middle ground between a universalizing
OVERVIEWOF MEASUREMENT VALIDITY
tendency, which is inattentive to contextual differences,
and a particularizingapproach, which is skeptical about Measurement validity should be understood in relation
the feasibility of constructing measures that transcend to issues that arise in moving between concepts and
specific contexts. The responses we explore seek to observations.
incorporate sensitivity to context as a strategy for
establishing equivalence across diverse settings. Levels and Tasks
A fourth problem concerns the frequently confusing
language used to discuss alternative procedures for We depict the relationship between concepts and ob-
measurement validation. These procedures have often servations in terms of four levels, as shown in Figure 1.
been framed in terms of different "types of validity," At the broadest level is the background concept, which
among which content, criterion, convergent, and con- encompasses the constellation of potentially diverse
struct validity are the best known. Numerous other meanings associated with a given concept. Next is the
labels for alternative types have also been coined, and systematized concept, the specific formulation of a
we have found 37 different adjectives that have been concept adopted by a particular researcher or group of
attached to the noun "validity" by scholars wrestling researchers. It is usually formulated in terms of an
with issues of conceptualization and measurement.2 explicit definition. At the third level are indicators,
The situation sometimes becomes further confused, which are also routinely called measures. This level
given contrasting views on the interrelations among includes any systematic scoring procedure, ranging
different types of validation. For example, in recent from simple measures to complex aggregated indexes.
validation studies in political science, one valuable It encompasses not only quantitative indicators but
analysis (Hill, Hanna, and Shafqat 1997) treats "con- also the classification procedures employed in qualita-
vergent" validation as providing evidence for "con- tive research. At the fourth level are scores for cases,
struct" validation, whereas another (Berry et al. 1998) which include both numerical scores and the results of
treats these as distinct types. In the psychometrics qualitative classification.
tradition (i.e., in the literature on psychological and Downward and upward movement in Figure 1 can be
educational testing) such problems have spurred a understood as a series of research tasks. On the
theoretically productive reconceptualization. This liter- left-hand side, conceptualization is the movement from
ature has emphasized that the various procedures for the background concept to the systematized concept.
assessing measurement validity must be seen, not as Operationalization moves from the systematized con-
establishing multiple independent types of validity, but cept to indicators, and the scoring of cases applies
rather as providing different types of evidencefor valid- indicators to produce scores. Moving up on the right-
ity. In light of this reconceptualization, we differentiate hand side, indicators may be refined in light of scores,
between "validity"and "validation."We use validity to and systematized concepts may be fine-tuned in light of
refer only to the overall idea of measurement validity, knowledge about scores and indicators. Insights de-
and we discuss alternative procedures for assessing rived from these levels may lead to revisiting the
validity as different "types of validation." In the final background concept, which may include assessing al-
part of this article we offer an overview of three main ternative formulations of the theory in which a partic-
types of validation, seeking to emphasize how proce- ular systematized concept is embedded. Finally, to
dures associated with each can be applied by both define a key overarching term, "measurement" involves
quantitative and qualitative researchers. the interaction among levels 2 to 4.
In the first section of this article we introduce a
framework for discussing conceptualization, measure-
ment, and validity. We then situate questions of validity
Defining Measurement Validity
in relation to broader concerns about the meaning of Valid measurement is achieved when scores (including
concepts. Next, we address contextual specificity and the results of qualitative classification) meaningfully
equivalence, followed by a review of the evolving capture the ideas contained in the corresponding con-
discussion of types of validation. Finally, we focus on cept. This definition parallels that of Bollen (1989,
three specific types of validation that merit central 184), who treats validity as "concerned with whether a
variable measures what it is supposed to measure."
King, Keohane, and Verba (1994, 25) give essentially
2 We have found the following adjectives attached to validity in the same definition.
discussions of conceptualization and measurement: a priori, appar-
ent, assumption, common-sense, conceptual, concurrent, congruent,
If the idea of measurement validity is to do serious
consensual, consequential, construct, content, convergent, criterion- methodological work, however, its focus must be fur-
related, curricular, definitional, differential, discriminant, empirical, ther specified, as emphasized by Bollen (1989, 197).
face, factorial, incremental, instrumental, intrinsic, linguistic, logical, Our specification involves both ends of the connection
nomological, postdictive, practical, pragmatic, predictive, rational, between concepts and scores shown in Figure 1. At the
response, sampling, status, substantive, theoretical, and trait. A
parallel proliferation of adjectives, in relation to the concept of concept end, our basic point (explored in detail below)
democracy, is discussed in Collier and Levitsky 1997. is that measurement validation should focus on the

530
Ameica
AmericanPoiia Science
Political cec Review
eiwVl 5 No.
Vol. 95, o 3

FIGURE1. Conceptualizationand Measurement:Levels and Tasks


I

Level 1. Backgl round Concept


The broadconstellatiion of meanings and
understandingsassociated witha given concept.

Task: ConceF itualization Task: Revisiting Background


Formulatinga systelmatizedconcept through Concept. Exploringbroaderissues concerning
reasoning about the backgroundconcept, in the backgroundconcept in lightof insightsabout
lightof the goals of Iresearch. scores, indicators,and the systematized concept.

Level 2. Systematized Concept


A specific formulationof a concept used by a
given scholar or groupof scholars;

J
Task: Operationalization
commonlyinvolves an explicitdefinition.
2.
Task: Modifying Systematized
Developing,on the basis of a systema- Concept. Fine-tuningthe systematized
tized concept, one or more indicators or
concept, possibly extensively revisingit, in
for scoring/classifyingcases. lightof insightsabout scores and indicators.

C
*0 Level 3. Indiicators
E
//
I Also referredto as "measures"and "opera-
tinnalinztinns " In nlalitativ ,e research,these
II,I %,.11..1i,.A,?I

are the operaltionaldefinittions employed in


y

U) /I - cllassifyingcaases.

aa Task: Scoring Cases


2
0 Task: Refining Indicators
E Applyingthese indicatorsto produce Modifyingindicators,or potentiallycreating
scores for the cases being analyzed. new indicators,in lightof observed scores.

Level 4. Scores for Cases


The scores for cases generated by a particular
indicator.These includebot:hnumericalscores
and the resultsof qualital
tiveclassification
L I

relation between observationsand the systematized Measurement Error, Reliability, and Validity
concept;any potential disputesabout the background
concept should be set aside as an important but Validityis often discussedin connectionwith measure-
separate issue. With regardto scores, an obvious but ment errorand reliability.Measurementerrormay be
crucialpoint must be stressed:Scores are never exam- systematic-in whichcase it is calledbias-or random.
ined in isolation;rather,they are interpretedand given Random error,which occurs when repeated applica-
meaningin relationto the systematizedconcept. tions of a given measurementprocedureyield incon-
In sum, measurementis validwhen the scores (level sistent results, is conventionallylabeled a problemof
4 in Figure 1), derivedfrom a given indicator(level 3), reliability.Methodologistsoffer two accounts of the
can meaningfullybe interpretedin termsof the system- relationbetween reliabilityand validity.(1) Validityis
atized concept (level 2) that the indicator seeks to sometimes understood as exclusivelyinvolving bias,
operationalize.It would be cumbersometo refer re- that is errorthat takes a consistentdirectionor form.
peatedly to all these elements, but the appropriate From this perspective,validityinvolvessystematicer-
focus of measurementvalidationis on the conjunction ror, whereas reliabilityinvolves random error (Car-
of these components. mines and Zeller 1979, 14-5; see also Babbie 2001,

531
MeasurementValidity:A SharedStandardfor Qualitativeand QuantitativeResearch
II II
September 2001
I

144-5). Therefore,unreliablescores may still be cor- concepts,the better the theorywe can formulatewith
rect "on average"and in this sense valid. (2) Alterna- them, and in turn,the betterthe conceptsavailablefor
tively,some scholarshesitate to view scores as valid if the next, improvedtheory."Various examplesof this
they contain large amounts of random error. They intertwiningare exploredin recent analysesof impor-
believe validityrequiresthe absence of both types of tant concepts, such as Laitin's (2000) treatment of
error.Therefore,theyviewreliabilityas a necessarybut languagecommunityand Kurtz's(2000) discussionof
not sufficientconditionof measurementvalidity(Kirk peasant.Fearon and Laitin's(2000) analysisof ethnic
and Miller 1986, 20; Shively1998, 45). conflict,in whichthey begin with their hypothesisand
Ourgoal is not to adjudicatebetweenthese accounts ask what operationalizationis needed to capturethe
but to state them clearlyand to specifyour own focus, conceptions of ethnic group and ethnic conflict en-
namely,the systematicerrorthat ariseswhen the links tailed in this hypothesis,furtherillustratesthe interac-
among systematizedconcepts, indicators,and scores tion of theoryand concepts.
are poorlydeveloped.This involvesvalidityin the first In dealingwith the choices that arise in establishing
sense stated above. Of course, the randomerror that the systematizedconcept,researchersmustavoidthree
routinelyarisesin scoringcases is also important,but it commontraps.First,they should not misconstruethe
is not our primaryconcern. flexibilityinherentin these choices as suggestingthat
A finalpoint shouldbe emphasized.Becauseerroris everythingis up for grabs. This is rarely,if ever, the
a pervasivethreat to measurement,it is essential to case. In any field of inquiry,scholarscommonlyasso-
view the interpretationsof scoresin relationto system- ciate a matrix of potential meaningswith the back-
atized concepts as falsifiable claims (Messick 1989, groundconcept. This matrixlimits the range of plau-
13-4). Scholarsshould treat these claimsjust as they sible options, and the researcherwho straysoutside it
would any casualhypothesis,that is, as tentativestate- runsthe riskof being dismissedor misunderstood.We
ments that require supportingevidence. Validity as- do not mean to implythat the backgroundconcept is
sessmentis the searchfor this evidence. entirelyfixed.It evolvesover time, as new understand-
ingsare developedandold ones are revisedor fall from
AND CHOICES use. At a giventime, however,the backgroundconcept
MEASUREMENTVALIDITY
ABOUTCONCEPTS usuallyprovidesa relativelystablematrix.It is essential
to recognizethat a real choice is being made, but it is
A growing body of work considers the systematic no less essential to recognize that this is a limited
analysisof conceptsan importantcomponentof polit- choice.
ical science methodology.3How shouldwe understand Second, scholarsshould avoidclaimingtoo much in
the relation between issues of measurementvalidity defendingtheirchoice of a givensystematizedconcept.
and broader choices about concepts, which are a It is not productive to treat other options as self-
centralfocus of this literature? evidentlyruled out by the backgroundconcept. For
example,in the controversyover whether democracy
versusnondemocracyshouldbe treatedas a dichotomy
Conceptual Choices: Forming the or in termsof gradations,there is too muchrelianceon
Systematized Concept claims that the backgroundconcept of democracy
We view systematizedconcepts as the point of depar- inherentlyrulesout one approachor the other (Collier
ture for assessingmeasurementvalidity.How do schol- and Adcock 1999, 546-50). It is more productiveto
ars formsuchconcepts?Becausebackgroundconcepts recognize that scholarsroutinelyemphasizedifferent
routinelyincludea varietyof meanings,the formation aspectsof a backgroundconceptin developingsystem-
of systematized concepts often involves choosing atized concepts,each of which is potentiallyplausible.
among them. The number of feasible options varies Rather than make sweeping claims about what the
greatly.At one extremeare conceptssuch as triangle, backgroundconcept "really"means, scholars should
which are routinelyunderstoodin terms of a single present specific arguments,linked to the goals and
conceptualsystematization;at the other extreme are context of their research,that justify their particular
"contestedconcepts"(Gallie 1956),suchas democracy. choices.
A carefulexaminationof diversemeaningshelpsclarify A thirdproblemoccurswhen scholarsstop short of
the options,but ultimatelychoices must be made. providinga fleshed-outaccount of their systematized
These choices are deeply interwinedwith issues of concepts.This requiresnot just a one-sentencedefini-
theory, as emphasizedin Kaplan's(1964, 53) paradox tion, but a broaderspecificationof the meaning and
of conceptualization:"Properconcepts are needed to entailmentsof the systematizedconcept. Within the
formulatea good theory,but we need a good theoryto psychometricsliterature,Shepard(1993, 417) summa-
arrive at the proper concepts.... The paradox is rizes what is required: "both an internal model of
resolvedby a processof approximation: the better our interrelateddimensionsor subdomains"of the system-
atized concept, and "an externalmodel depictingits
3
Examples of earlier work in this tradition are Sartori 1970, 1984 relationshipto other [concepts]."An exampleis Bol-
and Sartori, Riggs, and Teune 1975. More recent studies include len's (1990, 9-12; see also Bollen 1980) treatmentof
Collier and Levitsky 1997; Collier and Mahon 1993; Gerring 1997,
1999, 2001; Gould 1999; Kurtz 2000; Levitsky 1998; Schaffer 1998. political democracy,which distinguishesthe two di-
Important work in political theory includes Bevir 1999; Freeden mensionsof "politicalrights"and "politicalliberties,"
1996; Gallie 1956; Pitkin 1967, 1987. clarifiesthese by contrastingthem with the dimensions

532
Amerian
American Political Poitica
Sciece
Science Reiew
Review Vl.
Vol. 95
95, No.
No. 3

developedby Dahl, and exploresthe relationbetween competingmeanings,they may find a differentanswer


them. Bollen further specifies political democracy to the validityquestionfor each meaning.
through contrastswith the concepts of stability and By restrictingthe focusof measurementvalidationto
social or economic democracy.In the language of the systematized concept, we do not suggest that
Sartori(1984,51-4), this involvesclarifyingthe seman- political scientists should ignore basic conceptual is-
tic field. sues. Rather,argumentsaboutthe backgroundconcept
One consequenceof this effortto providea fleshed- and those about validitycan be addressedadequately
out account may be the recognitionthat the concept only when each is engaged on its own terms, rather
needs to be disaggregated.Whatbegins as a consider- than conflated into one overly broad issue. Consider
ation of the internaldimensionsor componentsof a Schumpeter's(1947, chap.21) proceduraldefinitionof
single concept may become a discussionof multiple democracy.This definitionexplicitlyrulesout elements
concepts. In democratictheory an importantexample of the backgroundconcept, such as the concernwith
is the discussionof majorityrule and minorityrights, substantivepolicy outcomes, that had been central to
which are variouslytreated as componentsof a single what he calls the classical theory of democracy.Al-
overall concept of democracy,as dimensions to be though Schumpeter'sconceptualizationhas been very
analyzedseparately,or as the basisfor formingdistinct influentialin politicalscience, some scholars(Harding
subtypes of democracy (Dahl 1956; Lijphart 1984; andPetras1988;Mouffe1992)havecalledfor a revised
Schmitterand Karl1992).Thiskindof refinementmay conceptionthat encompassesother concerns,such as
resultfrom new conceptualand theoreticalarguments social and economicoutcomes.This importantdebate
or fromempiricalfindingsof the sort that are the focus exemplifiesthe kind of conceptualdisputethat should
of the convergent/discriminant validationprocedures be placed outside the realm of measurementvalidity.
discussedbelow. Recognizingthat a givenconceptualchoice does not
involve an issue of measurementvalidity should not
precludeconsideredargumentsabout this choice. An
Measurement Validity and the Systematized exampleis the argumentthat minimaldefinitionscan
Versus Background Concept facilitatecausalassessment(Alvarezet al. 1996,4; Karl
1990, 1-2; Linz 1975, 181-2; Sartori 1975, 34). For
We stated earlierthat the systematizedconcept,rather instance,in the debateabouta proceduraldefinitionof
than the backgroundconcept, should be the focus in democracy,a pragmaticargumentcan be made that if
measurementvalidation.Consideran example.A re- analystswish to studythe casualrelationshipbetween
searchermay ask:"Is it appropriatethat Mexico,prior democracyand socioeconomicequality,then the latter
to the year 2000 (when the previouslydominantparty must be excluded from the systematizationof the
handed over power after losing the presidentialelec- former.The point is thatsuchargumentscan effectively
tion), be assigneda score of 5 out of 10 on an indicator justify certain conceptual choices, but they involve
of democracy?Does this score really capture how issues that are differentfromthe concernsof measure-
'democratic'Mexico was compared to other coun- ment validation.
tries?"Sucha questionremainsunderspecifieduntilwe
knowwhether "democratic"refers to a particularsys-
tematized concept of democracy,or whether this re- Fine-Tuning the Systematized Concept with
searcher is concerned more broadly with the back- Friendly Amendments
groundconcept of democracy.Scholarswho question We definemeasurementvalidityas concernedwith the
Mexico's score should distinguishtwo issues: (1) a relationamongscores,indicators,and the systematized
concern about measurement-whether the indicator concept,butwe do not ruleout the introductionof new
employed producesscores that can be interpretedas conceptual ideas during the validation process. Key
adequatelycapturingthe systematizedconceptused in here is the back-and-forth,iterativenatureof research
a given study and (2) a conceptualconcern-whether emphasizedin Figure 1. Preliminaryempiricalwork
the systematizedconcept employed in creating the may help in the initialformulationof concepts.Later,
indicatoris appropriatevis-a-visthe backgroundcon- even after conceptualizationappears complete, the
cept of democracy. applicationof a proposed indicatormay produce un-
We believevalidationshouldfocus on the firstissue, expected observationsthat lead scholars to modify
whereas the second is outside the realm of measure- their systematizedconcepts. These "friendlyamend-
ment validity.This distinctionseems especiallyappro- ments" occur when a scholar, out of a concern with
priate in view of the large numberof contested con- validity,engagesin furtherconceptualworkto suggest
cepts in political science. The more complex and refinementsor make explicit earlier implicit assump-
contestedthe backgroundconcept,the more important tions. These amendmentsare friendlybecause they do
it is to distinguishissues of measurementfrom funda- not fundamentallychallenge a systematizedconcept
mental conceptualdisputes.To pose the question of but instead push analyststo capturemore adequately
validitywe need a specificconceptualreferentagainst the ideas containedin it.
which to assess the adequacyof a given measure. A A friendly amendmentis illustratedby the emer-
systematizedconcept provides that referent. By con- gence of the "expandedproceduralminimum"defini-
trast,if analystsseek to establishmeasurementvalidity tion of democracy(Collierand Levitsky1997, 442-4).
in relation to a backgroundconcept with multiple Scholars noted that, despite free or relatively free

533
MeasurementValidity:A SharedStandardfor Qualitativeand QuantitativeResearch September
I 2001

elections, some civilian governmentsin Central and Contextual Specificity in Political Research
South America to varying degrees lacked effective
Contextualspecificityaffects many areas of political
power to govern.A basic concernwas the persistence science. It has long been a problemin cross-national
of "reserveddomains"of militarypower over which
elected governmentshad little authority(Valenzuela survey research (Sicinski 1970; Verba 1971; Verba,
1992,70). Becauseproceduraldefinitionsof democracy Nie, and Kim 1978, 32-40; Verba et al. 1987,Appen-
did not explicitlyaddress this issue, measures based dix). An exampleconcerningfeaturesof nationalcon-
text is Cain and Ferejohn's(1981) discussionof how
upon them could result in a high democracyscore for the differingstructureof partysystemsin the United
these countries,but it appearedinvalidto view them as States and Great Britainshouldbe taken into account
democratic.Some scholars therefore amended their when comparingpartyidentification.Contextis also a
systematizedconcept of democracyto add the differ- concernfor surveyresearchersworkingwithina single
entiatingattributethat the elected governmentmustto nation,who wrestlewith the dilemmaof "inter-person-
a reasonabledegreehave the powerto rule (Karl1990,
ally incomparableresponses"(Brady1985).For exam-
2; Loveman1994, 108-13; Valenzuela 1992, 70). De- ple, scholarsdebate whether a given surveyitem has
bate persistsover the scoringof specificcases (Rabkin the same meaning for different population sub-
1992, 165), but this innovation is widely accepted
groups-which could be defined, for example,by re-
among scholarsin the proceduraltradition(Hunting- gion, gender, class, or race. One specific concern is
ton 1991, 10; Mainwaring,Brinks, and Perez-Linan whether populationsubgroupsdiffersystematicallyin
2001;Markoff1996, 102-4). As a resultof this friendly their "response style" (also called "response sets").
amendment,analystsdid a betterjob of capturing,for Some groups may be more disposed to give extreme
these new cases, the underlyingidea of procedural answers,and others may tend toward moderate an-
minimumdemocracy. swers(Greenleaf1992).Bachmanand O'Malley(1984)
show that responsestyle varies consistentlywith race.
They argue that apparently important differences
SPECIFICITY,AND
CONTEXTUAL
VALIDITY, acrossracialgroupsmayin partreflectonly a different
EQUIVALENCE mannerof answeringquestions.Contextualspecificity
also can be a problemin surveycomparisonsover time,
Contextualspecificityis a fundamentalconcern that as Baumgartnerand Walker (1990) point out in dis-
ariseswhen differencesin contextpotentiallythreaten cussinggroupmembershipin the United States.
the validityof measurement.This is a centraltopic in The issue of contextual specificityof course also
psychometrics,the field that has produced the most arises in macro-level research in internationaland
innovative work on validity theory. This literature comparativestudies (Bollen, Entwisle,and Anderson
emphasizesthat the same score on an indicatormay 1993, 345). Examples from the field of comparative
have differentmeanings in different contexts (Moss politicsare discussedbelow. In internationalrelations,
1992, 236-8; see also Messick 1989, 15). Hence, the attention to context, and particularlya concern with
validationof an interpretationof scores generatedin "historicizingthe concept of structure,"is central to
one context does not imply that the same interpreta- "constructivism"(Ruggie 1998, 875). Constructivists
tion is validfor scoresgeneratedin anothercontext.In argue that modern internationalrelations rest upon
political science, this concern with context can arise "constitutiverules" that differ fundamentallyfrom
when scholarsare makingcomparisonsacrossdifferent those of both medievalChristendomand the classical
world regionsor distincthistoricalperiods.It can also Greek world (p. 873). Although they recognize that
arise in comparisonswithina national(or other) unit, sovereigntyis an organizingprincipleapplicableacross
giventhat differentsubunits,regions,or subgroupsmay diversesettings,the constructivistsemphasizethat the
constitute very different political, social, or cultural "meaningand behavioralimplicationsof this principle
contexts. vary from one historicalcontext to another"(Reus-
The potential difficultythat context poses for valid Smit 1997, 567). On the other side of this debate,
measurement, and the related task of establishing neorealistssuch as Fischer(1993, 493) offer a general
measurementequivalenceacrossdiverseunits,deserve warning:If pushedto an extreme,the "claimto context
more attention in political science. In a period when dependency"threatensto "makeimpossiblethe collec-
the quest for generalityis a powerfulimpulse in the tive pursuit of empiricalknowledge."He also offers
social sciences,scholarssuch as Elster (1999, chap. 1) specifichistoricalsupportfor the basic neorealistposi-
have strongly challenged the plausibilityof seeking tion that the behaviorof actorsin internationalpolitics
general,law-likeexplanationsof politicalphenomena. follows consistent patterns. Fischer (1992, 463, 465)
A parallelconstrainton the generalityof findingsmay concludes that "the structurallogic of action under
be imposed by the contextualspecificityof measure- anarchyhas the characterof an objectivelaw,"whichis
ment validity.We are not arguingthat the quest for grounded in "an unchangingessence of human na-
generalitybe abandoned.Rather, we believe greater ture."
sensitivityto context may help scholarsdevelop mea- The recurringtension in social research between
sures that can be validly applied across diverse con- particularizing and universalizingtendenciesreflectsin
texts. This goal requires concerted attention to the part contrastingdegrees of concern with contextual
issue of equivalence. specificity.The approachesto establishingequivalence

534
AmericanPolitical
American PoliticalScience
Science Review
Review Vol. 95,
Vol. No. 3
95, No.

discussedbelowpointto the optionof a middleground. equivalentobservationsthat adequatelytap the con-


These approachesrecognizethatcontextualdifferences cept of stickingpoint. Scholarswho look only at wage
are important,but they seek to combine this insight conflicts run the risk of omitting, for some national
with the quest for generalknowledge. contexts,domainsof conflictthat are highlyrelevantto
The lessons for political science are clear. Any the concept they seek to measure.
empiricalassessmentof measurementvalidityis neces- By allowingthe empiricaldomainto whicha system-
sarilybased on a particularset of cases, and validity atized conceptis appliedto varyacrossthe unitsbeing
claimsshouldbe made, at least initially,with reference compared,analystsmay take a productivestep toward
to this specific set. To the extent that the set is establishingequivalenceamong diversecontexts.This
heterogeneous in ways that may affect measurement practice must be carefullyjustified, but under some
validity,it is essentialto (1) assess the implicationsfor circumstancesit can makean importantcontributionto
establishingequivalenceacross these diverse contexts valid measurement.
and, if necessary,(2) adoptcontext-sensitivemeasures.
Extension to additionalcases requiressimilarproce-
dures. Establishing Equivalence: Context-Specific
Indicators and Adjusted Common
Indicators
Establishing Equivalence: Context-Specific Two other ways of establishingequivalence involve
Domains of Observation carefulwork at the level of indicators.We will discuss
One important means of establishing equivalence context-specificindicators,4and what we call adjusted
across diverse contexts is careful reasoning, in the commonindicators.In this second approach,the same
initial stages of operationalization,about the specific indicator is applied to all cases but is weighted to
domainsto whicha systematizedconceptapplies.Well compensatefor contextualdifferences.
before thinking about particularscoring procedures, An exampleof context-specificindicatorsis found in
scholars may need to make context-sensitivechoices Nie, Powell, and Prewitt's (1969, 377) five-country
regardingthe partsof the broaderpolity,economy,or study of political participation.For all the countries,
society to which they will applytheir concept.Equiva- they analyzefour relativelystandardattributesof par-
lent observationsmay require,in differentcontexts,a ticipation.Regardinga fifth attribute-membershipin
focus on what at a concrete level might be seen as a political party-they observe that in four of the
distincttypes of phenomena. countriespartymembershiphas a roughlyequivalent
Some time ago, Verba (1967) called attentionto the meaning, but in the United States it has a different
importanceof context-specificdomainsof observation. form and meaning.The authorsconcludethat involve-
In comparative research on political opposition in ment in U.S. electoralcampaignsreflectsan equivalent
stable democracies,a standardfocus is on political formof politicalparticipation.Nie, Powell,and Prewitt
parties and legislativepolitics, but Verba (pp. 122-3) thus focus on a context-specificdomainof observation
notes that this may overlookan analyticallyequivalent (the procedurejust discussedabove) by shiftingtheir
form of oppositionthat crystallizes,in some countries, attention,for the U.S. context,frompartymembership
in the domainof interestgrouppolitics.Skocpol(1992, to campaignparticipation.They then take the further
6) makes a parallelargumentin questioningthe claim step of incorporatingwithin their overall index of
that the United Stateswas a "welfarelaggard"because political participationcontext-specificindicatorsthat
social provisionwas not launchedon a largescale until for each case generatea score for what they see as the
the New Deal. This claim is based on the absence of appropriatedomain.Specifically,the overallindex for
standardwelfare programsof the kind that emerged the United States includes a measure of campaign
earlierin Europe but fails to recognizethe distinctive participationratherthan partymembership.
forms of social provisionin the United States, such as A differentexampleof context-specificindicatorsis
veterans'benefits and support for mothers and chil- found in comparative-historical research,in the effort
dren. Skocpol arguesthat the welfare laggardcharac- to establish a meaningfulthreshold for the onset of
terization resulted from looking in the wrong place, democracyin the nineteenthand early twentiethcen-
that is, in the wrong domainof policy. tury, as opposed to the late twentieth century.This
Locke and Thelen (1995, 1998) have extended this effort in turn lays a foundation for the comparative
approachin their discussionof "contextualizedcom- analysisof transitionsto democracy.One problemin
parison."They argue that scholarswho studynational establishingequivalenceacross these two eras lies in
responsesto externalpressurefor economicdecentral- the fact that the plausibleagendaof "full"democrati-
ization and "flexibilization"routinely focus on the zation has changed dramaticallyover time. "Full"by
points at which conflict emerges over this economic the standardsof an earlierperiodis incompleteby later
transformation.Yet, these "stickingpoints" may be standards.For example,by the late twentiethcentury,
located in differentpartsof the economicand political universalsuffrageand the protectionof civil rightsfor
system.With regardto laborpoliticsin differentcoun- the entire nationalpopulationhad come to be consid-
tries, such conflictsmay arise over wage equity,hours ered essential featuresof democracy,but in the nine-
of employment,workforce reduction, or shop-floor
reorganization.These differentdomainsof labor rela- 4 This approachwas originallyused by Przeworski
andTeune (1970,
tions must be examinedin order to gatheranalytically chap.6), who employedthe label "system-specific
indicator."

535
MeasurementValidity:A SharedStandardfor Qualitativeand QuantitativeResearch September 2001

teenth centurythey were not (Huntington1991,7, 16). basic formalprocedures.Thus,for certainpurposes,it


Yet, if the more recent standardis applied to the can be analyticallyproductive to adopt a standard
earlierperiod,cases are eliminatedthathavelong been definitionthatignoresnuancesof contextandapplythe
considered classic examples of nascent democratiza- same indicatorto all cases.
tion in Europe. One solution is to compare regimes In conclusion,we note that althoughPrzeworskiand
with respect to a systematizedconcept of full democ- Teune's (1970) and Verba'sargumentsabout equiva-
ratization that is operationalized according to the lence are well known, issues of contextualspecificity
normsof the respectiveperiods(Collier 1999,chap. 1; and equivalencehave not receivedadequateattention
Russett 1993, 15; see also Johnson1999, 118). Thus, a in political science. We have identifiedthree tools-
differentscoringprocedure-a context-specificindica- context-specificdomains of observation,context-spe-
tor-is employedin each period in order to produce cific indicators,and adjustedcommon indicators-for
scoresthat are comparablewith respectto this system- addressingthese issues, and we encouragetheir wider
atized concept.5 use. We also advocate greater attention to justifying
Adjusted common indicators are another way to their use. Claimsabout the appropriatenessof contex-
establishequivalence.An exampleis found in Moene tual adjustmentsshould not simplybe asserted;their
and Wallerstein's(2000) quantitativestudy of social validity needs to be carefully defended. Later, we
policy in advancedindustrialsocieties, which focuses explorethree types of validationthat may be fruitfully
specificallyon publicsocialexpendituresfor individuals applied in assessing proposals for context-sensitive
outside the laborforce. One componentof their mea- measurement.In particular,content validation,which
sure is public spending on health care. The authors focuses on whether operationalizationcaptures the
argue that in the United States such health care ideas containedin the systematizedconcept,is central
expenditureslargelytargetthose who are not members to determiningwhether and how measurementneeds
of the labor force. By contrast, in other countries to be adjustedin particularcontexts.
health expendituresare allocated without respect to
employmentstatus. Because U.S. policy is distinctive, ALTERNATIVEPERSPECTIVESON TYPES
the authorsmultiplyhealth care expendituresin the
other countriesby a coefficientthat lowerstheir scores
OF VALIDATION
on thisvariable.Theirscoresare therebymaderoughly Discussionsof measurementvalidityare confounded
equivalent-as part of a measure of public expendi- by the proliferationof differenttypesof validation,and
tures on individualsoutside the labor force-to the by an even greaternumberof labels for them. In this
U.S. score.A paralleleffortto establishequivalencein section we reviewthe emergenceof a unifiedconcep-
the analysis of economic indicators is provided by tion of measurementvalidityin the field of psychomet-
Zeitsch,Lawrence,and Salernian(1994, 169),who use rics, propose revisionsin the terminologyfor talking
an adjustmenttechnique in estimating total factor about validity,and examinethe importanttreatments
productivityto take accountof the differentoperating of validationin political analysisofferedby Carmines
environments,and hence the differentcontext, of the and Zeller, and by Bollen.
industriesthey compare.Expressingindicatorsin per-
capitatermsis also an exampleof adjustingindicators Evolving Understandings of Validity
in light of context. Overall, this practice is used to
addressboth very specificproblemsof equivalence,as In the psychometrictradition,currentthinkingabout
with the Moene and Wallersteinexample,as well as measurementvaliditydevelopedin two phases. In the
more familiarconcerns,such as standardizingby pop- firstphase,scholarswrote about"typesof validity"in a
ulation. waythat often led researchersto treateach type as if it
Context-specificindicators and adjusted common independentlyestablisheda distinctformof validity.In
indicatorsare not always a step forward,and some discussingthis literaturewe follow its terminologyby
scholarshaveself-consciouslyavoidedthem.The use of referringto types of "validity."As noted above, in the
such indicatorsshould match the analyticgoal of the rest of this articlewe refer insteadto types of "valida-
researcher.For example,manywho study democrati- tion."
zation in the late twentiethcenturydeliberatelyadopt The firstpivotaldevelopmentin the emergenceof a
a minimumdefinitionof democracyin order to con- unified approach occurred in the 1950s and 1960s,
centrateon a limitedset of formalprocedures.Theydo when a threefold typology of content, criterion,and
this out of a convictionthat these formal procedures constructvaliditywas officiallyestablishedin reaction
are important,even though they may have different to the confusiongeneratedby the earlierproliferation
meaningsin particularsettings.Even a scholarsuch as of types.6Other labels continued to appear in other
O'Donnell (1993, 1355),who has devoted great atten- disciplines,but this typologybecame an orthodoxyin
tion to contextualizingthe meaning of democracy,
insists on the importanceof also retaininga minimal 6 The second of these is often called criterion-related validity.
definition of "political democracy"that focuses on Regarding these official standards, see American Psychological As-
sociation 1954, 1966; Angoff 1988, 25; Messick 1989, 16-7; Shultz,
Riggs, and Kottke 1998, 267-9. The 1954 standards initially pre-
5 A well-known
example of applying different standards for democ- sented four types of validity, which became the threefold typology in
racy in making comparisons across international regions is Lipset 1966 when "predictive" and "concurrent"validity were combined as
(1959, 73-4). "criterion-related" validity.

536
AmericanPolitical
American PoliticalScience
Science Review
Review Vol.
Vol. 95, No. 3

psychology.A recurringmetaphorin that field charac- the general idea of valid measurement.These specific
terized the three types as "somethingof a holy trinity proceduresgenerallydo not encompasscontentvalida-
representingthree differentroads to psychometricsal- tion and have in common the practice of assessing
vation"(Guion 1980, 386). These types may be briefly measurementvalidityby takingas a point of reference
defined as follows. establishedconceptualand/ortheoreticalrelationships.
* Content validity assesses the degree to which an We findit helpfulto groupthese proceduresinto two
indicatorrepresentsthe universeof content entailed typesaccordingto the kindof theoreticalor conceptual
in the systematizedconcept being measured. relationship that serves as the point of reference.
* Criterionvalidityassesses whether the scores pro- Specifically,these types are based on the heuristic
ducedby an indicatorare empiricallyassociatedwith distinctionbetweendescriptionand explanation.7First,
scores for other variables,called criterionvariables, some procedures rely on "descriptive"expectations
which are considered direct measures of the phe- concerningwhethergiven attributesare understoodas
nomenon of concern. facets of the same phenomenon.This is the focus of
* Constructvalidityhas had a range of meanings.One what we label "convergent/discriminant validation."
centralfocus has been on assessingwhethera given Second,other proceduresrely on relativelywell-estab-
indicatoris empiricallyassociatedwith other indica- lished "explanatory"causal relations as a baseline
tors in a way that conformsto theoreticalexpecta- against which measurementvalidity is assessed. In
tions about their interrelationship. labelingthis second group of procedureswe drawon
Campbell's(1960, 547) helpful term, "nomological"
These labels remain very influentialand are still the validation, which evokes the idea of assessment in
centerpiecein some discussionsof measurementvalid- relation to well-establishedcausal hypothesesor law-
ity, as in the latest edition of Babbie's (2001, 143-4) like relationships.This second type is often called
widely used methodstextbookfor undergraduates. construct validity in political research (Berry et al.
The second phase grewout of increasingdissatisfac- 1998;Elkins2000).8Out of deferenceto this usage, in
tion with the "trinity"and led to a "unitarian"ap- the headings and summarystatementsbelow we will
proach (Shultz, Riggs, and Kottke 1998, 269-71). A refer to nomological/construct validation.
basic problem identified by Guion (1980, 386) and
others was that the threefold typologywas too often
taken to mean that any one type was sufficientto Types of Validation in Political Analysis
establishvalidity(Angoff 1988, 25). Scholarsincreas- A baseline for the revised discussion of validation
ingly argued that the differenttypes should be sub- presentedbelow is providedin workby Carminesand
sumedundera singleconcept.Hence, to continuewith Zeller, and by Bollen. Carminesand Zeller (1979, 26;
the priormetaphor,the earliertrinitycame to be seen Zeller and Carmines1980, 78-80) argue that content
"in a monotheistic mode as the three aspects of a validationand criterionvalidationare of limitedutility
unitarypsychometricdivinity"(p. 25). in fields such as political science. While recognizing
Much of the second phase involveda reconceptual- that contentvalidationis importantin psychologyand
ization of constructvalidityand its relationto content education,they arguethat evaluatingit "hasprovedto
and criterionvalidity.A centralargumentwas that the be exceedinglydifficultwith respectto measuresof the
latter two may each be necessaryto establishvalidity, more abstractphenomenathat tend to characterizethe
but neitheris sufficient.They shouldbe understoodas social sciences"(Carminesand Zeller 1979, 22). For
part of a larger process of validationthat integrates criterionvalidation,these authors emphasize that in
"multiplesources of evidence"and requiresthe com- many social sciences, few "criterion"variables are
binationof "logicalargumentand empiricalevidence" available that can serve as "real" measures of the
(Shepard 1993, 406). Alongside this development,a phenomenaunderinvestigation,againstwhichscholars
reconceptualization of constructvalidityled to "amore can evaluatealternativemeasures(pp. 19-20). Hence,
comprehensive and theory-basedview that subsumed for manypurposesit is simplynot a relevantprocedure.
other more limited perspectives"(Shultz, Riggs, and
Kottke 1998, 270). This broader understandingof Although Carmines and Zeller call for the use of
constructvalidityas the overarchinggoal of a single, multiple sources of evidence, their emphasis on the
limitationsof the first two types of validation leads
integratedprocessof measurementvalidationis widely them to give a predominant role to nomological/
endorsed by psychometricians.Moss (1995, 6) states constructvalidation.
"thereis a close to universalconsensusamongvalidity In relation to Carminesand Zeller, Bollen (1989,
theorists"that "content-and criterion-relatedevidence
185-6, 190-4) addsconvergent/discriminant validation
of validityare simply two of many types of evidence
that supportconstructvalidity." 7 Descriptionand explanationare of courseintertwined,but we find
Thus, in the psychometricliterature(e.g., Messick this distinctioninvaluablefor exploringcontrastsamongvalidation
1980, 1015), the term "constructvalidity"has become procedures.While these proceduresdo not alwaysfit in sharply
essentiallya synonymfor what we call measurement boundedcategories,manydo indeed focus on either descriptiveor
validity.We have adoptedmeasurementvalidityas the explanatoryrelationsand hence are productivelydifferentiatedby
our typology.
name for the overall topic of this article, in part 8 See also the mainexamplesof constructvalidationpresentedin the
because in politicalscience the label constructvalidity majorstatementsby CarminesandZeller 1979,23, andBollen 1989,
commonlyrefers to specificproceduresratherthan to 189-90.

537
MeasurementValidity:A SharedStandardfor Qualitativeand QuantitativeResearch September 2001

to theirthree typesand emphasizescontentvalidation, ent types of validationin qualitativeas well as quanti-


whichhe sees as both viableand fundamental.He also tativeresearch,usingexamplesdrawnfromboth tradi-
raises general concerns about correlation-basedap- tions. Furthermore,we will employcrucialdistinctions
proachesto convergentand nomological/construct val- introducedabove,includingthe differentiationof levels
idation,and he offersan alternativeapproachbased on presentedin Figure 1, as well as the contrastbetween
structuralequationmodelingwith latentvariables(pp. specific proceduresfor validation,as opposed to the
192-206). Bollen sharesthe concernof Carminesand overallidea of measurementvalidity.
Zeller that, for most social research,"true"measures
do not exist againstwhich criterionvalidationcan be
THREE TYPES OF MEASUREMENT
carriedout, so he likewise sees this as a less relevant
VALIDATION:QUALITATIVEAND
type (p. 188). QUANTITATIVEEXAMPLES
These valuable contributionscan be extended in
severalrespects.First,with referenceto Carminesand We now discuss various procedures,both qualitative
Zeller's critique of content validation,we recognize and quantitative,for assessingmeasurementvalidity.
that this procedure is harder to use if concepts are We organizeour presentationin terms of a threefold
abstractand complex.Moreover,it often does not lend typology:content,convergent/discriminant, and nomo-
itself to the kind of systematic,quantitativeanalysis logical/construct validation. The goal is to explicate
routinely applied in some other kinds of validation. each of these types by posing a basic questionthat, in
Yet, like Bollen (1989, 185-6, 194),we are convincedit all three cases, can be addressedby both qualitative
is possible to lay a secure foundation for content and quantitativescholars.Two caveatsshouldbe intro-
validationthat will make it a viable,and indeed essen- duced. First, while we discuss correlation-basedap-
tial, procedure. Our discussion of this task below proaches to validity assessment, this article is not
derives from our distinctionbetween the background intendedto providea detailedor exhaustiveaccountof
and the systematizedconcept. relevantstatisticaltests. Second,we recognizethat no
Second, we share the convictionof Carminesand rigid boundariesexist among alternativeprocedures,
Zeller that nomological/construct validationis impor- given that one occasionallyshades off into another.
tant, yet given our emphasison content and conver- Our typology is a heuristic device that shows how
gent/discriminantvalidation,we do not privilegeit to validationprocedurescan be groupedin termsof basic
the degree they do. Our discussionwill seek to clarify questions,and therebyhelps bringinto focus parallels
some aspectsof how this procedureactuallyworksand and contrastsin the approachesto validationadopted
will addressthe skepticalreactionof manyscholarsto by qualitativeand quantitativeresearchers.
it.
Third,we have a twofoldresponseto the critiqueof Content Validation
criterionvalidationas irrelevantto mostformsof social
research.On the one hand,in some domainscriterion Basic Question. In the frameworkof Figure 1, does a
validationis important,and this must be recognized. given indicator (level 3) adequatelycapture the full
For example, the literature on response validity in content of the systematizedconcept (level 2)? This
surveyresearchseeks to evaluateindividualresponses "adequacyof content"is assessedthroughtwo further
to questions, such as whether a person voted in a questions. First, are key elements omitted from the
particularelection,by comparingthemto officialvoting indicator? Second, are inappropriateelements in-
records (Anderson and Silver 1986; Clausen 1968; cludedin the indicator?9An examinationof the scores
Katoshand Traugott1980). Similarly,in panel studies (level 4) of specific cases may help answer these
it is possibleto evaluatethe adequacyof "recall"(i.e., questionsabout the fit between levels 2 and 3.
whetherrespondentsremembertheirown earlieropin- Discussion. In contrastto the other typesconsidered,
ions, dispositions,and behavior)throughcomparison contentvalidationis distinctivein its focus on concep-
with responses in earlier studies (Niemi, Katz, and tual issues, specifically,on what we have just called
Newman 1980). On the other hand, this is not one of
the most generallyapplicabletypes of validation,and adequacyof content. Indeed, it developedhistorically
as a correctiveto formsof validationthatfocusedsolely
we favor treatingit as one subtypewithinthe broader on the statisticalanalysisof scores, and in so doing
categoryof convergentvalidation.As discussedbelow, overlookedimportantthreatsto measurementvalidity
convergentvalidationcomparesa given indicatorwith (Sireci 1998, 83-7).
one or more other indicatorsof the concept-in which Because contentvalidationinvolvesconceptualrea-
the analyst may or may not have a higher level of
confidence.Even if these otherindicatorsare as fallible soning, it is imperativeto maintainthe distinctionwe
made between issues of validationand questionscon-
as the indicatorbeing evaluated,the comparisonpro-
vides greaterleveragethan does lookingonly at one of cerningthe backgroundconcept. If content validation
is to be useful, then there must be some ground of
them in isolation.To the extentthat a well-established,
direct measure of the phenomenon under study is conceptual agreement about the phenomena being
available,convergentvalidationis essentiallythe same investigated(Bollen 1989, 186; Cronbachand Meehl
as criterionvalidation. 9 Some readers may think of these questions as raising issues of "face
Finally,in contrastboth to Carminesand Zeller and validity."We have found so many different definitions of face validity
to Bollen, we will discussthe applicationof the differ- that we prefer not to use this label.

538
American Political Science Review
Scec Rve Vol. 95,
Vol. 95, No. 3
IPoiia
AeiaI

1955, 282). Withoutit, a well-focusedvalidationques- concept.In the frameworkof Figure 1, this procedure
tion may rapidlybecome entangled in a broaderdis- involvesrevisingthe indicator(i.e., the scoringproce-
pute over the concept. Such agreementcan be pro- dure) in order to sort cases in a way that better fits
vided if the systematizedconcept is taken as given, so conceptual expectations, and potentially fine-tuning
attention can be focused on whether a particular the systematizedconcept to better fit the cases. Ragin
indicatoradequatelycapturesits content. (1994, 98) terms this process of mutual adjustment
"double fitting." This procedure avoids conceptual
Examples of Content Validation. Within the psycho-
metrictradition(Angoff1988,27-8; Shultz,Riggs,and stretching(Collierand Mahon1993;Sartori1970),that
Kottke 1998, 267-8), content validationis understood is, a mismatchbetweena systematizedconceptand the
as focusing on the relationshipbetween the indicator scoringof cases, which is clearlyan issue of validity.
An example of case-orientedcontent validation is
(level 3) and the systematizedconcept (level 2), with- found in O'Donnell's(1996) discussionof democratic
out reference to the scores of specificcases (level 4).
We will first present examplesfrom political science consolidation.Some scholarssuggestthatone indicator
that adopt this focus. We will then turnto a somewhat of consolidationis the capacityof a democraticregime
to withstandsevere crises. O'Donnell argues that by
different, "case-oriented"procedure (Ragin 1987,
this standard, some Latin American democracies
chap. 3), identifiedwith qualitativeresearch,in which would be consideredmore consolidatedthan those in
the examinationof scores for specific cases plays a
centralrole in contentvalidation. southernEurope.He findsthis an implausibleclassifi-
Two examplesfrom political researchillustrate,re- cation because the standardleads to a "reductioad
absurdum"(p. 43). This exampleshows how attention
spectively,the problemsof omission of key elements to specific cases can spur recognitionof dilemmasin
from the indicatorand inclusionof inappropriateele-
ments.Paxton's(2000) articleon democracyfocuseson the adequacyof contentandcanbe a productivetool in
the firstproblem.Her analysisis particularlysalientfor content validation.
scholarsin the qualitativetradition,given its focus on In sum,for case-orientedcontentvalidation,upward
choices about the dichotomousclassificationof cases. movementin Figure 1 is especiallyimportant.It can
Paxtoncontraststhe systematizedconcepts of democ- lead to both refiningthe indicatorin lightof scoresand
racy offered by several prominent scholars-Bollen, fine-tuningthe systematizedconcept. In addition,al-
Gurr,Huntington,Lipset,Muller,and Rueschemeyer, though the systematizedconcept being measured is
Stephens,and Stephens-with the actualcontentof the usually relativelystable, this form of validationmay
indicatorsthey propose. She takes their systematized lead to friendlyamendmentsthat modify the system-
concepts as given, which establishescommonconcep- atized concept by drawingideas from the background
tual ground. She observesthat these scholarsinclude concept. To put this another way, in this form of
universalsuffragein whatis in effecttheirsystematized validationboth an "inductive"componentand concep-
concept of democracy,but the indicatorsthey employ tual innovationare especiallyimportant.
in operationalizingthe concept consider only male Limitations of Content Validation. Content validation
suffrage.Paxton thus focuses on the problemthat an makes an importantcontributionto the assessmentof
importantcomponent of the systematizedconcept is measurementvalidity,but alone it is incomplete,for
omitted from the indicator. two reasons.First,althougha necessarycondition,the
The debateon Vanhanen's(1979, 1990)quantitative
indicatorof democracyillustratesthe alternativeprob- findingsof content validationare not a sufficientcon-
dition for establishingvalidity(Shepard 1993, 414-5;
lem that the indicatorincorporateselements that cor- Sireci 1998, 112). The key point is that an indicator
respond to a concept other than the systematized with valid content may still produce scores with low
concept of concern. Vanhanen seeks to capture the overall measurementvalidity,because furtherthreats
idea of politicalcompetitionthat is part of his system- to validitycan be introducedin the coding of cases. A
atizedconceptof democracyby including,as a compo- second reason concerns the trade-offbetween parsi-
nent of his scale,the percentageof votes won by parties
other than the largestparty.Bollen (1990, 13, 15) and mony and completenessthat arisesbecause indicators
routinelyfail to capturethe full content of a system-
Coppedge (1997, 6) both question this measure of atized concept. Capturingthis content may require a
democracy, arguing that it incorporates elements complexindicatorthat is hardto use and adds greatly
drawn from a distinct concept, the structureof the to the time and cost of completingthe research.It is a
partysystem. matterof judgmentfor scholarsto decide when efforts
Case-Oriented Content Validation. Researchers en- to further improvethe adequacyof content may be-
gaged in the qualitativeclassificationof cases routinely come counterproductive.
carryout a somewhatdifferentprocedurefor content It is usefulto complementthe conceptualcriticismof
validation,based on the relation between conceptual indicatorsby examiningwhether particularmodifica-
meaningand choices aboutscoringparticularcases. In tions in an indicatormakea differencein the scoringof
the vocabularyof Sartori(1970, 1040-6), this concerns cases. To the extent that such modificationshave little
the relation between the "intension"(meaning) and influence on scores, their contributionto improving
"extension"(set of positivecases) of the concept. For validity is more modest. An example in which their
Sartori,an essentialaspectof conceptformationis the contributionis shown to be substantialis providedby
procedureof adjustingthis relationbetween cases and Paxton(2000). She developsan alternativeindicatorof

539
MeasurementValidity:A SharedStandardfor Qualitativeand QuantitativeResearch September 2001

democracythat takes female suffrage into account, atheoretically.Scholarsshouldhavespecificconceptual


comparesthe scores it produceswith those produced reasonsfor expectingconvergenceif it is to constitute
by the indicatorsshe originallycriticized,and shows evidencefor validity.Let us suppose a proposedindi-
that her revisedindicatoryields substantiallydifferent cator is meant to capturea facet of democracyover-
findings.Her content validationargumentstands on looked by existingmeasures;then too high a correla-
conceptualgroundsalone, but her informationabout tion is in fact negativeevidenceregardingvalidity,for
scoring demonstratesthe substantiveimportanceof it suggeststhat nothingnew is being captured.
her concerns.This comparisonof indicatorsin a sense An exampleof discriminantvalidationis providedby
introducesus to convergent/discriminantvalidation,to Bollen's (1980, 373-4) analysisof voter turnout.As in
whichwe now turn. the studiesjust noted, differentmeasuresof democracy
are compared,but in this instancethe goal is to find
Convergent/Discriminant Validation empiricalsupportfor divergence.Bollen claims,based
on contentvalidation,thatvoter turnoutis an indicator
Basic Question. Are the scores (level 4) producedby of a
alternativeindicators(level 3) of a given systematized concept distinctfrom the systematizedconcept of
political democracy.The low correlationof voter turn-
concept (level 2) empiricallyassociatedand thus con- out with other
proposedindicatorsof democracypro-
vergent? Furthermore,do these indicators have a vides discriminantevidencefor this claim.Bollen con-
weaker associationwith indicatorsof a second, differ- cludes that turnoutis best understoodas an indicator
ent systematizedconcept, thus discriminatingthis sec- of
ond groupof indicatorsfrom the first?Strongerasso- ized politicalparticipation,which shouldbe conceptual-
as distinctfrom politicaldemocracy.
ciationsconstituteevidence that supportsinterpreting
indicatorsas measuringthe same systematizedcon- data Although qualitativeresearchersroutinelylack the
necessaryfor the kind of statisticalanalysisper-
cept-thus providingconvergentvalidation;whereas formed by Bollen, convergent/discriminant validation
weaker associationssupportthe claim that they mea- is no means irrelevantfor them. often assess
by
sure different concepts-thus providingdiscriminant whetherthe scores for alternativeindicators They
validation.The specialcase of convergentvalidationin converge
or diverge.Paxton,in the examplediscussedabove, in
whichone indicatoris takenas a standardof reference, effect uses discriminantvalidationwhen she
and is used to evaluate one or more alternativeindi- alternative compares
indicatorsof democracyin order
qualitative
cators, is called criterionvalidation,as discussedabove.
to show that recommendationsderived from her as-
Discussion. Carefully defined systematizedconcepts, sessmentof content validationmake a differenceem-
and the availabilityof two or more alternativeindica- pirically.This comparison,based on the assessmentof
tors of these concepts, are the starting point for scores, "discriminates"among alternativeindicators.
convergent/discriminantvalidation. They lay the Convergent/discriminant validation is also employed
groundworkfor argumentsthat particularindicators when qualitativeresearchersuse a multimethodap-
measurethe same or differentconcepts,which in turn proachinvolving"triangulation" amongmultipleindi-
create expectationsabout how the indicatorsmay be catorsbasedon differentkindsof datasources(Brewer
empiricallyassociated. To the extent that empirical and Hunter 1989;Campbelland Fiske 1959;Webb et
findingsmatch these "descriptive"expectations,they al. 1966). Orum,Faegin, and Sjoberg(1991, 19) spe-
providesupportfor validity. cificallyargue that one of the great strengthsof the
Empirical associations are crucial to convergent/ case study tradition is its use of triangulationfor
discriminantvalidation,but they are often simplythe enhancingvalidity.In general,the basic ideas of con-
point of departure for an iterative process. What vergent/discriminant validationare at work in qualita-
initially appears to be negative evidence can spur tive researchwhenever scholars compare alternative
refinementsthat ultimatelyenhance validity.That is, indicators.
the failureto find expectedconvergencemay encour- Concernsabout
Convergent/DiscriminantValidation. A
age a returnto the conceptualand logical analysisof first concernhere is that scholarsmight think that in
indicators,whichmaylead to theirmodification.Alter- validation empiricalfindings
convergent/discriminant
natively, researchers may conclude that divergence always dictate conceptual choices. This frames the
suggeststhe indicatorsmeasuredifferentsystematized issue too narrowly.For example,Bollen (1993, 1208-9,
concepts and may reevaluate the conceptualization 1217) analyzesfour indicatorsthat he takes as compo-
that led them to expect convergence.This process nents of the of political liberties and four
illustratesthe intertwiningof convergentand discrimi- indicatorsthatconcept
he understandsas aspectsof democratic
nant validation. rule. An examinationof Bollen's covariancematrix
Examples of Convergent/Discriminant Validation. reveals that these do not emerge as two separate
Scholars who develop measures of democracy fre- empiricaldimensions.Convergent/discriminant valida-
quently use convergent validation.Thus, analystswho tion, mechanicallyapplied,might lead to a decision to
create a new indicatorcommonlyreportits correlation eliminatethis conceptualdistinction.Bollen does not
with previouslyestablished indicators (Bollen 1980, take that approach.He combinesthe two clustersof
380-2; CoppedgeandReincke1990,61;Mainwaringet indicatorsinto an overall empiricalmeasure, but he
al. 2001, 52; Przeworskiet al. 1996, 52). This is a also maintainsthe conceptual distinction.Given the
valuable procedure, but it should not be employed conceptualcongruencebetweenthe two sets of indica-

540
American Political Science Review Vol. 95, No. 3
I Ie IR I

tors and the concepts of political liberties and demo- present article, a central focus in his major method-
craticrule, the standardof contentvalidationis clearly ological statementon this approach(1989, 190-206).
met, and Bollen continues to use these overarching He demonstrates,for example,its distinctivecontribu-
labels. tion for a scholarconcernedwith convergent/discrimi-
Another concern arises over the interpretationof nantvalidationwho is dealingwith a data set with high
low correlationsamongindicators.Analystswho lack a correlationsamongalternativeindicators.In this case,
"true"measure againstwhich to assess validitymust structuralequationswith latent variablescan be used
base convergentvalidationon a set of indicators,none to estimatethe degreeto whichthese high correlations
of which may be a very good measureof the system- derivefrom sharedsystematicbias, ratherthan reflect
atized concept. The result may be low correlations the valid measurementof an underlyingconcept.12
among indicators,even though they have sharedvari- This approachis illustratedby Bollen (1993) and
ance that measuresthe concept.One possiblesolution Bollen and Paxton's(1998, 2000) evaluationof eight
is to focus on this shared variance,even though the indicatorsof democracytaken from data sets devel-
overall correlationsare low. Standardstatisticaltech- oped by Banks, Gastil, and Sussman.13For each indi-
niques may be used to tap this sharedvariance. cator,Bollen and Paxtonestimatethe percentof total
The opposite problemalso is a concern:the limita- variancethat validlymeasuresdemocracy,as opposed
tions of inferring validity from a high correlation to reflectingsystematicand randomerror.The sources
among indicators.Such a correlationmay reflect fac- of systematic error are then explored. Bollen and
tors other than valid measurement.For example,two Paxtonconclude,for example,that Gastil'sindicators
indicators may be strongly correlated because they have "conservative" bias, givinghigherscores to coun-
both measure some other concept;or they may mea- tries that are Catholic, that have traditionalmonar-
sure differentconcepts,one of whichcauses the other. chies, and that are not Marxist-Leninist(Bollen 1993,
A plausibleresponse is to think through,and seek to 1221; Bollen and Paxton 2000, 73). This line of re-
rule out, alternativereasonsfor the high correlation.10 search is an outstandingexampleof the sophisticated
Althoughframingthese concernsin the languageof use of convergent/discriminant validation to identify
high and low correlationsappearsto orient the discus- potentialproblemsof politicalbias.
sion toward quantitativeresearchers,qualitativere- In discussingBollen's treatmentand applicationof
searchersface parallelissues. Specifically,these issues structuralequationmodelswe would like to note both
arise when qualitativeresearchersanalyzethe sorting similarities, and a key contrast, in relation to the
of cases producedby alternativeclassificationproce- practice of qualitative researchers.Bollen certainly
dures that representdifferentways of operationalizing sharesthe concernwith carefulattentionto concepts,
either a given concept (i.e., convergentvalidation)or and with knowledgeof cases, that we have emphasized
two or more conceptsthat are presumedto be distinct above, and that is characteristicof case-orientedcon-
(i.e., discriminantvalidation).Giventhatthese scholars tent validationas practicedby qualitativeresearchers.
are probablyworkingwith a smallN, they maybe able He insiststhat complexquantitativetechniquescannot
to drawon their knowledgeof cases to assess alterna- replace careful conceptualand theoreticalreasoning;
tive explanationsfor convergences and divergences rather they presuppose it. Furthermore,"structural
among the sortingof cases yielded by differentclassi- equationmodels are not very helpful if you have little
fication procedures.In this way, they can make valu- idea aboutthe subjectmatter"(Bollen 1989,vi; see also
able inferences about validity.Quantitativeresearch- 194). Qualitativeresearchers,carryingout a case-by-
ers, by contrast, have other tools for making these case assessmentof the scores on differentindicators,
inferences,to whichwe now turn. could of course reach some of the same conclusions
about validityand politicalbias reachedby Bollen. A
ConvergentValidation and Structural Equation Models structuralequation approach,however, does offer a
with Latent Variables. In quantitative research, an
importantmeans of respondingto the limitationsof
simple correlational procedures for convergent/dis- reevaluationof substantivefindings-in this case concerningparty
criminantvalidationis offered by structuralequation identification(Greene 1991,67-71).
12Two pointsaboutstructuralequationmodelswith latentvariables
models with latent variables(also called LISREL-type shouldbe underscored.First,as noted below,these modelscan also
models). Some treatments of such models, to the be usedin nomological/construct validation,andhenceshouldnot be
extentthat they discussmeasurementerror,focus their associatedexclusivelywithconvergent/discriminant validation,which
attentionon randomerror,that is, on reliability(Hay- is the applicationdiscussedhere. Second,we have emphasizedthat
duk 1987, e.g., 118-24; 1996).11However,Bollen has convergent/discriminant validationfocuseson "descriptive"
relations
made systematicerror, which is the concern of the among concepts and their components.Withinthis framework,it
merits emphasisthat the indicatorsthat measure a given latent
variable (i.e., concept) in these models are conventionallyinter-
10On the appropriatesize of the correlation,see Bollenand Lennox
pretedas "effects"of thislatentvariable(Bollen1989,65;Bollenand
1991,305-7. Lennox1991,305-6). These effects,however,do not involvecausal
1 To take a political science application,Green and Palmquist's interactionsamongdistinctphenomena.Suchinteractions,whichin
(1990) studyalso reflectsthis focus on randomerror.By contrast, structuralequationmodels involvecausalrelationsamongdifferent
Green (1991) goes fartherby consideringboth randomand system- latentvariables,are the centerpieceof the conventionalunderstand-
aticerror.Likethe workby Bollendiscussedbelow,these articlesare ing of "explanation."By contrast, the links between one latent
an impressive demonstrationof how LISREL-typemodels can variableandits indicatorsare productivelyunderstoodas involvinga
incorporatea concernwith measurementerror into conventional "descriptive" relationship.
statistical analysis, and how this can in turn lead to a major 13See, for example,Banks1979;Gastil 1988;Sussman1982.

541
MeasurementValidity:IA Shared
I Standardfor Qualitativeand QuantitativeResearch
I I September 2001

fundamentallydifferentprocedurethat allowsscholars tion. When other approachesyield positive evidence,


to assess carefullythe magnitudeand sourcesof mea- however, then nomologicalvalidation is valuable in
surement error for large numbers of cases and to teasing out potentiallyimportantdifferencesthat may
summarize this assessment systematicallyand con- not be detected by other types of validation.Specifi-
cisely. cally, alternativeindicatorsof a systematizedconcept
may be stronglycorrelatedand yet performvery dif-
Nomological/Construct Validation ferentlywhen employedin causal assessment.Bollen
Basic Question. In a domain of researchin which a (1980, 383-4) shows this, for example, in his assess-
ment of whether regime stabilityshould be a compo-
given causalhypothesisis reasonablywell established, nent of measuresof democracy.
we ask: Is this hypothesisagain confirmedwhen the
cases are scored (level 4) with the proposedindicator Examplesof Nomological/Construct Validation. Lijp-
(level 3) for a systematizedconcept(level 2) that is one hart's(1996) analysisof democracyand conflictman-
of the variables in the hypothesis?Confirmationis agement in India provides a qualitativeexample of
treatedas evidencefor validity. nomologicalvalidation,which he uses to justify his
classificationof India as a consociationaldemocracy.
Discussion. We shouldfirstreiteratethat becausethe
term "constructvalidity"has become synonymousin Lijphartfirst draws on his systematizedconcept of
consociationalismto identify descriptivecriteria for
the psychometricliteraturewith the broadernotion of
measurementvalidity, to reduce confusion we use classifyingany given case as consociational.He then
uses nomologicalvalidationto furtherjustifyhis scor-
Campbell'sterm "nomological"validationfor proce- ing of India (pp. 262-4). Lijphartidentifiesa series of
dures that address this basic question. Yet, given causalfactorsthat he
common usage in political science, in headings and to arguesare routinelyunderstood
produce consociational regimes, and he observes
summarystatementswe call this nomological/construct that these factorsare presentin India.Hence, classify-
validation.We also propose an acronymthat vividly
ing India as consociationalis consistentwith an estab-
capturesthe underlyingidea: AHEM validation;that lished causalrelationship,whichreinforcesthe plausi-
is, "Assumethe Hypothesis,Evaluatethe Measure."
bilityof his descriptiveconclusionthatIndiais a case of
Nomologicalvalidationassessesthe performanceof consociationalism.
indicatorsin relationto causal hypothesesin order to Another qualitativeexampleof nomologicalvalida-
gain leverage in evaluating measurement validity. tion is found in a classic study in the tradition of
Whereas convergent validation focuses on multiple
indicatorsof the same systematizedconcept, and dis- comparative-historical analysis,PerryAnderson'sLin-
theAbsolutist State.14 Anderson(1974, 413-5)
criminantvalidationfocuses on indicatorsof different eages of
is concernedwith whetherit is appropriateto classify
concepts that stand in a "descriptive"relation to one as "feudalism"the politicaland economicsystemthat
another,nomologicalvalidationfocuses on indicators
of differentconceptsthat are understoodto standin an emergedin Japanbeginningroughlyin the fourteenth
century,whichwould place Japanin the same analytic
explanatory,"causal"relation with one another. Al- categoryas Europeanfeudalism.His argumentis partly
though these contrasts are not sharplypresented in descriptive,in that he asserts that "the fundamental
most definitions of nomologicalvalidation, they are resemblance betweenthe two historicalconfigurations
essential in identifyingthis type as distinctfrom con- as a whole [is] unmistakable"(p. 414). He validateshis
vergent/discriminant validation. In practice the con- classification
trastbetween descriptionand explanationdependson by observing that Japan's subsequent
development, like that of postfeudalEurope,followed
the researcher'stheoreticalframework,but the distinc- an economic that his theory explainsas the
tion is fundamentalto the contemporarypractice of trajectory
historicallegacyof a feudal state. "Thebasic parallel-
politicalscience. ism of the two great experiencesof feudalism,at the
The underlyingidea of nomologicalvalidationis that
scores which can validly be claimed to measure a opposite ends of Eurasia,was ultimatelyto receive its
most arrestingconfirmationof all, in the posterior
systematizedconceptshouldfit well-establishedexpec- destinyof each zone" (p. 414). Thus,he uses evidence
tationsderivedfromcausalhypothesesthat involvethis
concerning an expected explanatoryrelationshipto
concept.The firststep is to take as given a reasonably increaseconfidencein his descriptivecharacterization
well-established causal hypothesis, one variable in of Japan as feudal. Anderson,like Lijphart,thus fol-
which corresponds to the systematized concept of lows the two-step procedureof makinga descriptive
concern.The scholarthen examinesthe associationof claim about one or two cases, and then offeringevi-
the proposed indicatorwith indicatorsof the other dence for the
validityof this claimby observingthat it
concepts in the causal hypothesis.If the assessment is consistentwith an explanatoryclaimin whichhe has
produces an association that the causal hypothesis confidence.
leads us to expect, then this is positive evidence for A quantitativeexampleof nomologicalvalidationis
validity. found in Elkins'sevaluationof the proposalthat de-
Nomologicalvalidationprovidesadditionalleverage mocracyversus nondemocracyshould be treated as a
in assessing measurementvalidity. If other types of rather than in terms of gradations.One
validationraise concernsabout the validityof a given dichotomy,
indicator and the scores it produces, then analysts
probablydo not need to employ nomologicalvalida- 14SebastianMazzucasuggestedthis example.

542
Ameicn
Pliica
American Political Siece evew
Science Review ol 95
Vol. 95, N.
No. 3

potential defense of a dichotomousmeasure is based which the hypothesiseither is or is not reconfirmed,


on convergentvalidation.Thus,Alvarezand colleagues using the proposed indicator. Rather, nomological
(1996, 21) show that their dichotomous measure is validationmayfocus,as it does in Elkins(2000;see also
stronglyassociatedwith graded measures of democ- Hill, Hanna, and Shafqat 1997), on comparingtwo
racy.Elkins (2000, 294-6) goes on to applynomolog- differentindicatorsof the same systematizedconcept,
ical validation,exploringwhether,notwithstandingthis and on askingwhichbetter fits causal expectations.A
association, the choice of a dichotomous measure tentative hypothesis may not provide an adequate
makesa differencefor causalassessment.He compares standardfor rejectingclaims of measurementvalidity
tests of the democraticpeace hypothesisusing both outright,but it may serve as a point of reference for
dichotomousand graded measures.Accordingto the comparing the performance of two indicators and
hypothesis, democracies are in general as conflict therebygainingevidencerelevantto choosingbetween
prone as nondemocraciesbut do not fightone another. them.
The key finding from the standpointof nomological Anotherresponseto the concernthat causalhypoth-
validationis that this claim is stronglysupportedusing eses may be too tentative a groundfor measurement
a graded measure, whereas there is no statistically validation is to recognize that neither measurement
significantsupport using the dichotomous measure. claimsnor causalclaimsare inherentlymore epistemo-
These findings give nomological evidence for the logicallysecure.Both typesof claimsshouldbe seen as
greater validity of the graded measure, because they falsifiablehypotheses.To take a causal hypothesisas
better fit the overall expectations of the accepted given for the sake of measurementvalidationis not to
causal hypothesis.Elkins'sapproachis certainlymore say that the hypothesisis set in stone. It maybe subject
complex than the two-step procedure followed by to criticalassessmentat a later point. Campbell(1977/
Lijphartand Anderson,but the basic idea is the same. 1988,477) expressesthispointmetaphorically: "Weare
like sailorswho must repaira rottingship at sea. We
Skepticismabout Nomological Validation. Many schol- trust the great bulk of the timberswhile we replace a
ars are skeptical about nomologicalvalidation. One
concern is the potentialproblemof circularity.If one particularlyweak plank. Each of the timberswe now
trust we may in turn replace. The proportionof the
assumesthe hypothesisin orderto validatethe indica- we are replacingto thosewe treatas soundmust
planks
tor, then the indicatorcannot be used to evaluatethe alwaysbe small."
same hypothesis.Hence, it is importantto specifythat
any subsequenthypothesis-testingshould involve hy-
potheses different from those used in nomological CONCLUSION
validation. In conclusion,we returnto the four underlyingissues
A second concern is that, in additionto taking the that frameour discussion.First,we have offereda new
hypothesisas given, nomologicalvalidationalso pre- account of different types of validation. We have
supposes the valid measurementof the other system- viewed these types in the frameworkof a unified
atized concept involved in the hypothesis. Bollen conceptionof measurementvalidity.None of the
spe-
(1989, 188-90) notes that problems in the measure- cific types of validation alone establishes validity;
ment of the second indicator can undermine this rather, each provides one kind of evidence to be
approachto assessingvalidity,especiallywhen scholars integratedinto an overallprocessof assessment.Con-
rely on simple correlationalprocedures. Obviously, tent validationmakesthe indispensablecontributionof
researchersneed evidence about the validity of the assessing what we call the adequacy of content of
second indicator.Structuralequation models with la- indicators.Convergent/discriminant validation-taking
tent variablesoffera quantitativeapproachto address- as a baseline descriptiveunderstandingsof the rela-
ing such difficultiesbecause, in additionto evaluating tionship among concepts, and of their relation to
the hypothesis,these models can be specifiedso as to indicators-focuses on sharedand nonsharedvariance
provide an estimate of the validity of the second among indicatorsthat the scholar is evaluating.This
indicator. In small-N, qualitative analysis, the re- approachuses empiricalevidence to supplementand
searcherhas the resourceof detailed case knowledge temper content validation.Nomological/construct val-
to help evaluate this second indicator. Thus, both idation-taking as a baseline an established causal
qualitativeand quantitativeresearchershave a means hypothesis-adds a further tool that can tease out
for making inferences about whether this important additional facets of measurement validity not ad-
presuppositionof nomological validation is indeed dressedby convergent/discriminant validation.
met. We are convinced that it is useful to carefully
A third problemis that, in many domainsin which differentiatethese types. It helps to overcome the
politicalscientistswork,there may not be a sufficiently confusion derivingfrom the proliferationof distinct
well-establishedhypothesisto make this a viable ap- types of validation,and also of terms for these types.
proach to validation. In such domains, it may be Furthermore,in relationto methodssuch as structural
plausible to assume the measure and evaluate the equationmodels with latent variables-which provide
hypothesis,but not the otherwayaround.Nomological sophisticatedtools for simultaneouslyevaluatingboth
validationthereforesimplymaynot be viable.Yet, it is measurementvalidity and explanatoryhypotheses-
helpful to recognizethat nomologicalvalidationneed the delineationof typesservesas a usefulreminderthat
not be restrictedto a dichotomousunderstandingin validationis a multifacetedprocess. Even with these

543
MeasurementValidity:A SharedStandardfor Qualitativeand QuantitativeResearch September 2001
I

models, this processmust also incorporatethe careful Anderson, Perry. 1974. Lineages of the Absolutist State. London:
use of contentvalidation,as Bollen emphasizes. Verso.
Angoff,WilliamH. 1988."Validity:An EvolvingConcept."In Test
Second,we have encouragedscholarsto distinguish Validity,ed. HowardWainerand HenryI. Braun.Hillsdale,NJ:
between issues of measurementvalidityand broader LawrenceErlbaum.Pp. 19-32.
conceptualdisputes.Buildingon the contrastbetween Babbie, Earl R. 2001. The Practice of Social Research, 9th ed.
the backgroundconcept and the systematizedconcept Belmont,CA: Wadsworth.
Bachman,JeraldG., and PatrickM. O'Malley.1984. "Yea-Saying,
(Figure 1), we have exploredhow validityissues and Nay-Saying,and Going to Extremes:Black-WhiteDifferencesin
conceptualissues can be separated.We believe that ResponseStyles."PublicOpinionQuarterly
48 (Summer):491-509.
this separationis essential if scholars are to give a Banks, Arthur S. 1979. Cross-National Time-Series Data Archive
consistentfocus to the idea of measurementvalidity, User'sManual. Binghamton:Center for Social Analysis, State
and particularlyto the practiceof content validation. Universityof New York at Binghamton.
Baumgartner,FrankR., and Jack L. Walker.1990. "Responseto
Third, we examined alternative procedures for Smith's'Trendsin VoluntaryGroupMembership:Commentson
adaptingoperationalizationto specific contexts:con- Baumgartnerand Walker':MeasurementValidityand the Conti-
text-specificdomains of observation,context-specific nuity of Research in Survey Research."AmericanJournalof
indicators, and adjusted common indicators. These Political Science 34 (August): 662-70.
Berry, William D., Evan J. Ringquist,Richard C. Footing, and
proceduresmake it easier to take a middle position Russell L. Hanson. 1998. "MeasuringCitizen and Government
between universalizingand particularizingtendencies. Ideologyin the AmericanStates, 1960-93."AmericanJournalof
Yet, we also emphasize that the decision to pursue Political Science 42 (January): 327-48.
context-specificapproachesshouldbe carefullyconsid- Bevir, Mark. 1999. The Logic of the History of Ideas. Cambridge:
ered and justified. CambridgeUniversityPress.
Bollen,KennethA. 1980."Issuesin the ComparativeMeasurement
Fourth, we have presented an understandingof of Political Democracy." American Sociological Review 45 (June):
measurementvalidationthat can plausiblybe applied 370-90.
in both quantitativeand qualitativeresearch.Although Bollen, Kenneth A. 1989. StructuralEquations with Latent Variables.
most discussionsof validation focus on quantitative New York:Wiley.
Bollen, Kenneth A. 1990. "PoliticalDemocracy:Conceptualand
research,we have formulatedeach type in terms of Measurement Traps." Studies in ComparativeInternationalDevel-
basic questions intended to clarify the relevance to opment25 (Spring):7-24.
both quantitativeandqualitativeanalysis.We havealso Bollen,KennethA. 1993."LiberalDemocracy:ValidityandMethod
given examples of how these questions can be ad- Factors in Cross-National Measures."American Journal of Political
Science37 (November):1207-30.
dressedby scholarsfromwithinboth traditions.These
Bollen, KennethA., BarbaraEntwisle,and Arthur S. Anderson.
examplesalso illustrate,however,that while they may 1993."MacrocomparativeResearchMethods."AnnualReviewof
be addressingthe same questions, quantitativeand Sociology 19: 321-51.
qualitative scholars often employ different tools in Bollen, Kenneth A., and Richard Lennox. 1991. "Conventional
Wisdomon Measurement:A StructuralEquationPerspective."
findinganswers.
Withinthis framework,qualitativeand quantitative Psychological Bulletin 110 (September): 305-14.
Bollen, Kenneth A., and Pamela Paxton. 1998. "Detection and
researcherscan learn from these differences.Qualita- Determinantsof Biasin SubjectiveMeasures."
AmericanSociolog-
tive researcherscould benefit from self-consciously ical Review63 (June):465-78.
applyingthe validationproceduresthat to some degree Bollen,KennethA., andPamelaPaxton.2000."SubjectiveMeasures
of Liberal Democracy." ComparativePolitical Studies 33 (Febru-
they may alreadybe employingimplicitlyand, in par- ary):58-86.
ticular, from developing and comparing alternative Brady,Henry.1985."ThePerilsof SurveyResearch:Inter-Personally
indicatorsof a givensystematizedconcept.Theyshould Incomparable 11 (3-4): 269-91.
Responses."PoliticalMethodology
also recognize that nomological validation can be Brady,Henry,andDavidCollier,eds. 2001.RethinkingSocialInquiry:
Diverse Tools, Shared Standards. Berkeley: Berkeley Public Policy
importantin qualitativeresearch,as illustratedby the Press, Universityof California,and Boulder, CO: Roman &
Lijphartand Anderson examplesabove. Quantitative Littlefield.
researchers,in turn, could benefit from more fre- Research:A
Brewer,John, and Albert Hunter. 1989. Multimethod
quently supplementingother tools for validation by Synthesis of Styles. Newbury Park, CA: Sage.
employinga case-orientedapproach,using the close Cain,BruceE., andJohnFerejohn.1981."PartyIdentificationin the
United States and Great Britain." ComparativePolitical Studies 14
examinationof specific cases to identify threats to
measurementvalidity. (April):31-47.
Campbell,DonaldT. 1960."Recommendations for APA Test Stan-
dards Regarding Construct,Trait, or DiscriminantValidity."
American Psychologist 15 (August): 546-53.
Campbell,DonaldT. 1977/1988."DescriptiveEpistemology:Psycho-
REFERENCES logical,Sociological,and Evolutionary." and Episte-
Methodology
mology for Social Science: Selected Papers. Chicago: University of
Alvarez,Michael,Jose Antonio Cheibub,FernandoLimongi,and ChicagoPress.
AdamPrzeworski.1996."Classifying
PoliticalRegimes."Studiesin
Campbell,DonaldT., and DonaldW. Fiske. 1959."Convergentand
ComparativeInternationalDevelopment 31 (Summer): 3-36. Matrix."
DiscriminantValidationby the Multitrait-Multimethod
AmericanPsychologicalAssociation.1954. "TechnicalRecommen- Psychological Bulletin 56 (March): 81-105.
dationsfor PsychologicalTests and DiagnosticTechniques."Psy- Carmines,EdwardG., and RichardA. Zeller. 1979.Reliabilityand
chological Bulletin 51 (2, Part 2): 201-38. ValidityAssessment. Beverly Hills, CA: Sage.
AmericanPsychologicalAssociation.1966.Standards
for Educational Clausen, Aage. 1968. "Response Validity:Vote Report."Public
and Psychological Tests and Manuals. Washington, DC: American
Opinion Quarterly41 (Winter): 588-606.
PsychologicalAssociation. in the
of a Concept:'Corporatism'
Collier,David.1995."Trajectory
and
Anderson,BarbaraA., andBrianD. Silver.1986."Measurement Studyof LatinAmericanPolitics."In LatinAmericain Compara-
Mismeasurementof the Validity of the Self-ReportedVote." tive Perspective:Issues and Methods, ed. Peter H. Smith. Boulder,
American Journal of Political Science 30 (November): 771-85. CO:Westview.Pp. 135-62.

544
American PoliticalScience
American Political Science Review
Review Vol. 95, No.
Vol. No. 3

Collier,David,and RobertAdcock.1999."Democracyand Dichot- Liberal-Conservative Ideologyof U.S. Senators:A New Measure."


omies: A PragmaticApproachto Choices about Concepts."An- AmericanJournalof PoliticalScience41 (October):1395-413.
nualReviewof PoliticalScience2: 537-65. Huntington,SamuelP. 1991.TheThirdWave:Democratization in the
Collier,David,and StevenLevitsky.1997."Democracywith Adjec- TwentiethCentury.Norman:Universityof OklahomaPress.
tives: ConceptualInnovationin ComparativeResearch."World Jacoby,William G. 1991. Data Theoryand DimensionalAnalysis.
Politics49 (April):430-51. NewburyPark,CA: Sage.
Collier,David,andJamesE. Mahon,Jr. 1993."Conceptual'Stretch- Jacoby,William G. 1999. "Levels of Measurementand Political
ing' Revisited:Adapting Categoriesin ComparativeAnalysis." Research:An OptimisticView."AmericanJournalof Political
AmericanPoliticalScienceReview87 (December):845-55. Science43 (January):271-301.
Collier, Ruth Berins. 1999. Paths TowardDemocracy.Cambridge: Johnson,Ollie A. 1999."PluralistAuthoritarianism in Comparative
CambridgeUniversityPress. Perspective:White Supremacy,Male Supremacy,and Regime
Cook,ThomasD., andDonaldT. Campbell.1979.Quasi-Experimen- Classification."NationalPoliticalScienceReview7: 116-36.
tation.Boston:HoughtonMifflin. Kaplan,Abraham.1964. The Conductof Inquiry.San Francisco:
Coppedge,Michael.1997."Howa LargeN CouldComplementthe Chandler.
Small in DemocratizationResearch."Paper presented at the Karl, Terry Lynn. 1990. "Dilemmasof Democratizationin Latin
annual meeting of the AmericanPolitical Science Association, America."Comparative Politics22 (October):1-21.
Washington,DC. Katosh, John P., and Michael W. Traugott. 1980. "The Conse-
Coppedge,Michael,and WolfgangH. Reinicke.1990. "Measuring quencesof ValidatedandSelf-ReportedVotingMeasures."Public
Polyarchy."Studiesin Comparative International
Development25 OpinionQuarterly 45 (Winter):519-35.
(Spring):51-72. King,Gary,RobertO. Keohane,and SidneyVerba.1994.Designing
Cronbach,Lee J., and Paul E. Meehl. 1955. "ConstructValidityin SocialInquiry:ScientificInferencein QualitativeResearch.Prince-
PsychologicalTests."Psychological Bulletin52 (July):281-302. ton, NJ: PrincetonUniversityPress.
Dahl, Robert A. 1956.A Prefaceto DemocraticTheory.Chicago: Kirk,Jerome,and MarcL. Miller. 1986.Reliabilityand Validityin
Universityof ChicagoPress. Qualitative Research.BeverlyHills, CA: Sage.
Elkins,Zachary.2000. "Gradationsof Democracy:EmpiricalTests Kurtz,MarcusJ. 2000. "Understanding PeasantRevolution:From
of AlternativeConceptualizations." AmericanJournalof Political Conceptto Theoryand Case."Theoryand Society29 (February):
Science44 (April):293-300. 93-124.
Elster, Jon. 1999.Alchemiesof the Mind. New York: Cambridge Laitin,DavidD. 2000."WhatIs a LanguageCommunity?" American
UniversityPress. Journalof PoliticalScience44 (January):142-55.
Fearon,James,and DavidD. Laitin.2000."Ordinary Languageand Levitsky,Steven. 1998. "Institutionalizationand Peronism:The
ExternalValidity:SpecifyingConceptsin the Studyof Ethnicity." Concept, the Case and the Case for Unpackingthe Concept."
Paperpresentedat the annualmeetingof the AmericanPolitical PartyPolitics4 (1): 77-92.
ScienceAssociation,Washington,DC. Lijphart,Arend. 1984. Democracies:Patternsof Majoritarianand
Fischer,Markus.1992."FeudalEurope,800-1300:CommunalDis- ConsensusGovernment in Twenty-One Countries.New Haven,CT:
course and ConflictualPolitics."InternationalOrganization46 Yale UniversityPress.
(Spring):427-66. Lijphart,Arend.1996."ThePuzzleof IndianDemocracy:A Conso-
Fischer,Markus.1993. "On Context,Facts, and Norms."Interna- ciational Interpretation." AmericanPolitical Science Review90
tionalOrganization 47 (Summer):493-500. (June):258-68.
Freeden,Michael.1996.IdeologiesandPoliticalTheory: A Conceptual Linz, Juan J. 1975. "Totalitarianand AuthoritarianRegimes."In
Approach.Oxford:OxfordUniversityPress. Handbookof PoliticalScience,vol. 3, ed. Fred I. Greensteinand
Gallie,W. B. 1956."EssentiallyContestedConcepts."Proceedings of Nelson W. Polsby.Reading,MA:Addison-Wesley.Pp. 175-411.
theAristotelianSociety51: 167-98. Lipset,SeymourM. 1959. "SomeSocial Requisitesof Democracy:
Gastil,RaymondD. 1988.Freedomin the World:PoliticalRightsand EconomicDevelopmentand PoliticalLegitimacy." AmericanPo-
CivilLiberties,1987-1988.New York:FreedomHouse. liticalScienceReview53 (March):69-105.
George,AlexanderL., and AndrewBennett.N.d. CaseStudiesand Locke, Richard,and KathleenThelen. 1995."Applesand Oranges
TheoryDevelopment.Cambridge,MA: MIT Press.Forthcoming. Revisited:ContextualizedComparisonsandthe Studyof Compar-
Gerring,John. 1997. "Ideology:A DefinitionalAnalysis."Political ativeLaborPolitics."Politicsand Society23 (September):337-67.
ResearchQuarterly 50 (December):957-94. Locke, Richard,and KathleenThelen. 1998."Problemsof Equiva-
Gerring,John. 1999. "WhatMakes a Concept Good? A Criterial lence in ComparativePolitics: Apples and Oranges, Again."
Frameworkfor UnderstandingConceptFormationin the Social Newsletter of the APSAOrganizedSectionin ComparativePolitics
Sciences."Polity31 (Spring):357-93. 9 (Winter):9-12.
Gerring,John. 2001. PracticalKnowledge: A CriterialApproachto Loveman, Brian. 1994. "'Protected Democracies' and Military
Social Science Methodology.New York: CambridgeUniversity Guardianship: PoliticalTransitionsin LatinAmerica,1979-1993."
Press. Journalof Interamerican Studiesand WorldAffairs36 (Summer):
Gould, Andrew C. 1999. "ConflictingImperativesand Concept 105-89.
Formation."Reviewof Politics61 (Summer):439-63. Mainwaring,Scott, Daniel Brinks,and Anibal Perez-Lifinn.2001.
Green,DonaldPhilip.1991."TheEffectsof MeasurementErroron "ClassifyingPolitical Regimes in Latin America, 1945-1999."
Two-StageLeast-SquaresEstimates."In PoliticalAnalysis,vol. 2, Studiesin Comparative International Development36 (1): 37-64.
ed. JamesA. Stimson.Ann Arbor:Universityof MichiganPress. Markoff,John.1996.Wavesof Democracy.ThousandOaks,CA:Pine
Pp: 57-74. Forge.
Green,DonaldPhilip,andBradleyL. Palmquist.1990."OfArtifacts Messick,Samuel.1980."TestValidityandthe Ethicsof Assessment."
and PartisanInstability."
AmericanJournalof PoliticalScience34 AmericanPsychologist35 (November):1012-27.
(August):872-902. Messick,Samuel.1989."Validity."In EducationalMeasurement, ed.
Greenleaf, Eric A. 1992. "MeasuringExtreme Response Style." RobertL. Linn.New York:Macmillan.Pp. 13-103.
PublicOpinionQuarterly 56 (Autumn):382-51. Moene,KarlOve,andMichaelWallerstein.2000."Inequality,Social
Guion, Robert M. 1980. "On TrinitarianDoctrines of Validity." Insuranceand Redistribution."WorkingPaper No. 144, Juan
ProfessionalPsychology11 (June):385-98. MarchInstitute,Madrid,Spain.
Harding,Timothy,and JamesPetras.1988. "Introduction: Democ- Moss, PamelaA. 1992."ShiftingConceptionsof Validityin Educa-
ratizationand the ClassStruggle."LatinAmericanPerspectives 15 tional Measurement:Implicationsfor PerformanceAssessment."
(Summer):3-17. Reviewof EducationalResearch62 (Fall):229-58.
Hayduk,LeslieA. 1987.Structural EquationModelingwithLISREL. Moss,PamelaA. 1995."ThemesandVariationsin ValidityTheory."
Baltimore,MD: JohnsHopkinsUniversityPress. EducationalMeasurement: IssuesandPractice14 (Summer):5-13.
Hayduk,Leslie A. 1996. LISREL:Issues,Debates,and Strategies. Mouffe,Chantal.1992.Dimensionsof RadicalDemocracy.London:
Baltimore,MD: JohnsHopkinsUniversityPress. Verso.
Hill, Kim Q., Stephen Hanna, and Sahar Shafqat. 1997. "The Nie, NormanH., G. BinghamPowell,Jr.,andKennethPrewitt.1969.

545
MeasurementValidity:A SharedStandardfor Qualitativeand QuantitativeResearch September 2001

"SocialStructure,and PoliticalParticipation:
DevelopmentalRe- Schmitter,PhilippeC., and TerryLynnKarl. 1992. "The Types of
lationships, Part I." American Political Science Review 63 (June): DemocracyEmergingin SouthernandEasternEuropeand South
361-78. and Central America." In Bound to Change: ConsolidatingDemoc-
Niemi, Richard,RichardKatz,and David Newman.1980."Recon- racy in East Central Europe, ed. Peter M. E. Volten. New York:
The Failureof the PartyIdentification
structingPastPartisanship: Institutefor EastWestStudies.Pp. 42-68.
Recall Questions." American Journal of Political Science 24 (No- Schrodt,PhilipA., and DeborahJ. Gerner.1994."ValidityAssess-
vember): 633-51. ment of a Machine-CodedEvent Data Set for the MiddleEast,
O'Donnell,Guillermo.1993. "On the State, Democratizationand 1982-92." American Journal of Political Science 38 (August):
SomeConceptualProblems."WorldDevelopment 27 (8): 1355-69. 825-54.
O'Donnell,Guillermo.1996."Illusionsabout Consolidation."Jour- Schumpeter, Joseph. 1947. Capitalism, Socialism and Democracy.
nal of Democracy 7 (April): 34-51. New York:Harper.
Orum, Anthony M., Joe R. Feagin, and Gideon Sjoberg. 1991. Shepard,Lorrie.1993."Evaluating
TestValidity."Reviewof Research
The Natureof the Case Study."In A Casefor the
"Introduction: in Education 19: 405-50.
Case Study,ed. Joe R. Feagin,AnthonyM. Orum,and Gideon Shively, W. Phillips. 1998. The Craft of Political Research. Upper
Sjoberg.ChapelHill:Universityof NorthCarolinaPress.Pp. 1-26. SaddleRiver,NJ: PrenticeHall.
Paxton,Pamela.2000."Womenin the Measurementof Democracy: Shultz,KennethS., MattL. Riggs,and JanetL. Kottke.1998."The
Problems of Operationalization." Studies in ComparativeIntera- Need for an Evolving Concept of Validity in Industrialand
tional Development 35 (Fall): 92-111. PersonnelPsychology:Psychometric,
Legal,andEmergingIssues."
Pitkin, Hanna Fenichel. 1967. The Concept of Representation.Berke- CurrentPsychology 17 (Winter): 265-86.
ley: Universityof CaliforniaPress. Sicinski,Andrzej.1970. "'Don'tKnow'Answersin Cross-National
Pitkin,HannaFenichel.1987."RethinkingReification."Theoryand Surveys." Public Opinion Quarterly34 (Spring): 126-9.
Society16 (March):263-93. Sireci,StephenG. 1998."TheConstructof ContentValidity."Social
Przeworski,Adam, Michael Alvarez,Jose Antonio Cheibub,and Indicators Research 45 (November): 83-117.
FernandoLimongi.1996. "WhatMakes DemocraciesEndure?" Skocpol, Theda. 1992. Protecting Soldiers and Mothers. Cambridge,
Journal of Democracy 7 (January): 39-55. MA: HarvardUniversityPress.
Przeworski,Adam, and HenryTeune. 1970. Logic of Comparative Sussman,LeonardR. 1982."TheContinuingStrugglefor Freedom
Social Inquiry. New York: John Wiley. of Information."
In Freedomin the World,ed. RaymondD. Gastil.
Rabkin, Rhoda. 1992. "The Aylwin Governmentand 'Tutelary' Westport,CT: Greenwood.Pp. 101-19.
Democracy:A Concept in Searchof a Case?"Journalof Inter- Valenzuela,J. Samuel. 1992. "DemocraticConsolidationin Post-
american Studies and WorldAffairs 34 (Winter): 119-94. TransitionalSettings:Notion, Process, and FacilitatingCondi-
Ragin,CharlesC. 1987.TheComparative
Method.Berkeley:Univer- tions." In Issues in Democratic Consolidation, ed. Scott Mainwar-
sity of CaliforniaPress. ing, Guillermo O'Donnell, and J. Samuel Valenzuela. Notre
Ragin, Charles C. 1994. Constructing Social Research. Thousand Dame, IN: Universityof Notre Dame Press.Pp. 57-104.
Oaks,CA: Pine Forge. Vanhanen, Tatu. 1979. Power and the Means to Power. Ann Arbor,
Reus-Smit,Christian.1997."TheConstitutionalStructureof Inter- MI: UniversityMicrofilmsInternational.
national Society and the Nature of FundamentalInstitutions." Vanhanen, Tatu. 1990. The Process of Democratization. New York:
International Organization51 (Autumn): 555-89. CraneRussak.
Ruggie, John G. 1998. "WhatMakes the World Hang Together? Verba,Sidney.1967."SomeDilemmasin ComparativeResearch."
Neo-Utilitarianismand the Social ConstructivistChallenge."In- WorldPolitics 20 (October): 111-27.
ternational Organization52 (Autumn): 855-85. Verba,Sidney.1971."Cross-National
SurveyResearch:The Problem
Russett, Bruce. 1993. Graspingthe Democratic Peace. Princeton, NJ: of Credibility." In Comparative Methods in Sociology, ed. Ivan
PrincetonUniversityPress. Vallier.Berkeley:Universityof CaliforniaPress.Pp. 309-56.
Sartori,Giovanni. 1970. "ConceptMisformationin Comparative Verba,Sidney,StevenKelman,GaryR. Orren,IchiroMiyake,Joji
Research." American Political Science Review 64 (December): Watanuki,IkuoKabashima,andG. DonaldFerree,Jr. 1987.Elites
1033-53. and the Idea of Equality. Cambridge, MA: Harvard University
Sartori,Giovanni.1975."TheTowerof Babel."In Towerof Babel, Press.
ed. GiovanniSartori,Fred W. Riggs,and HenryTeune. Interna- Verba, Sidney,NormanNie, and Jae-On Kim. 1978.Participation
tional StudiesAssociation,Universityof Pittsburgh.Pp. 7-37. and PoliticalEquality.Cambridge:CambridgeUniversityPress.
Sartori, Giovanni, ed. 1984. Social Science Concepts: A Systematic Webb,Eugene J., Donald T. Campbell,RichardD. Schwartz,and
Analysis.BeverlyHills, CA: Sage. Lee Sechrest. 1966. UnobtrusiveMeasures:NonreactiveResearch in
Sartori,Giovanni,FredW. Riggs,and HenryTeune. 1975.Towerof the Social Sciences. Chicago: Rand McNally.
Babel: On the Definition and Analysis of Concepts in the Social Zeitsch,John,Denis Lawrence,andJohnSalernian.1994."Compar-
Sciences.InternationalStudies Association,Universityof Pitts- ing Like with Like in ProductivityStudies:Apples, Orangesand
burgh. EconomicRecord70 (June):162-70.
Electricity."
Schaffer, Frederic Charles. 1998. Democracy in Translation: Under- Zeller,RichardA., and EdwardG. Carmines.1980.Measurement in
standing Politics in an Unfamiliar Culture. Ithaca, NY: Cornell the Social Sciences: The Link between Theoryand Data. Cambridge:
UniversityPress. CambridgeUniversityPress.

546

Das könnte Ihnen auch gefallen