Kalton Schuman 1982

The Effect of the Question on Survey Responses: A Review
Author(s): Graham Kalton and Howard Schuman

Source: Journal of the Royal Statistical Society. Series A (General), Vol. 145, No. 1 (1982), pp.
42-73
Published by: Wiley for the Royal Statistical Society
Stable URL: http://www.jstor.org/stable/2981421 .
Accessed: 18/02/2015 12:39
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
Wiley and Royal Statistical Society are collaborating with JSTOR to digitize, preserve and extend access to
Journal of the Royal Statistical Society. Series A (General).
http://www.jstor.org
This content downloaded from 134.155.150.133 on Wed, 18 Feb 2015 12:39:17 PM

All use subject to JSTOR Terms and Conditions
.1. R. Statist.Soc. A (1982),

145,Part 1,pp. 42-73
The Effectof theQuestionon SurveyResponses:A Reviewt

KALTONand HOWARDSCHUMAN,
By GRAHAM
of Michigan,USA
SurveyResearchCenter,University
[Read before the ROYALSTATISTICALSOCIETYat a meeting organized by the SOCIAL STATISTICSSECTIONOll Wednesday,
September 30th,1981,ProfessorG. HOINVILLE in the Chair]
SUMMARY
ina
ofquestions
form
andplacement
wording,
oftheprecise
theeffects
Thepaperreviews
surveyquestionnaireon the responsesobtained.Topics discussedinclude:randomized
questionlength;the
feedbackand commitment;
instructions;
response;respondent
theuse ofbalancedquestions;
of"don'tknows";openand closedquestions;
treatment
order
andquestion
ofa middle
theorderofalternatives;
alternative;
theoffer
acquiescence;
effects.
and context
BIAS;
DESIRABILITY
MEMORY
ERRORS;
SOCIAL
OPINION
FACTUAL
QUESTIONS;
QUESTIONS;
Keywords:
CONTEXT
EFFECT
FORM;QUESTION
QUESTION
QUESTION
WORDING;
1. INTRODUCTION
that surveyresponsesmay be
THE surveyliteratureabounds withexamplesdemonstrating
sensitiveto theprecisewording,formatand placementofthequestionsasked.A usefulstartto
sought.
is to classifyquestionsaccordingto the typeof information
examiningtheseeffects
A widely-useddistinctionis that betweenfactualand opinion questions.Questionslike
"What was yourregularhourlyrateofpay on thisjob as ofSeptember30?" clearlyfallin the
formercategory,whilequestionslike "As you know,manyolder people share a home with
theirgrownchildren.Do you thinkthisis generallya good idea or a bad idea?" clearlyfallin
the latter.However,not all surveyquestionscan be classifiedas eitherfactualor opinion
ones: othertypesof question include questionstestingrespondents'knowledge,questions
questions.
questionsand preference
askingforreasons,hypothetical
typeof question,widelyused in surveypractice,deservesspecial comment.
One further
These questions,whichhave a factualcomponentoverlaidwithan evaluation,maybe termed
judgementor perceptualquestions.Examplesare: "Do you have street(highway)noisein this
neighbourhood?"and "Would you say yourhealthnow is excellent,good, fairor poor?" In
but the approach
manycases the intentof such questionsis to obtain factualinformation,
according
evaluationsof thefactsratherthantheirmeasurement
adopted seeks respondents'
to objectivecriteria.The use ofperceptualquestionsforthispurposeprobablyresultsfromthe
questions or take the
questionnairedesigner'sdecision that he could not ask sufficient
objectively;hencehe has respondents
measurements
necessaryto determinetheinformation
make theassessmentsforhim.The oftenlow levelsofcorrelationfoundbetweenperceptions
and factsmake this use of perceptualquestions,althoughwidespread,a dubious one. A
use of perceptualquestionsis indeed to obtain respondents'perceptionsof their
different
situations;in thiscase the questionsare similarto opinionquestions.
to dividequestionsintofactualand non-factual
For presentpurposes,it willbe sufficient
ones (includingas factualquestionsthose perceptualquestionsseekingto ascertainfactual
An importantdifference
betweenthesetwotypesof questionis thatwithfactual
information).
soughtwhichcan-at least in
questionsthereare individualtruevalues forthe information
f This paper is a slightlyrevisedversionof a paper presentedat the AmericanStatisticalAssociationmeetings,
Houston,August1980(Kalton and Schuman,1980,withdiscussionby Rothwelland Turner).
1982 Royal StatisticalSociety
0035-9238/82/145042$2.00

1982]
KALTON AND SCHUMAN -
Effectof theQuestionon SurveyResponses
43
fromsome sourceotherthanrespondents'reports,whereaswithother
theory be determined
questionsthisdoes not apply. While it is truethatthe responsesto some factualquestions
cannotbe validatedagainstexternalsources-forinstance,reportsofpast unrecordedprivate
holds in general.As a consequence,validitystudies are often
behaviour-the difference
conducted to examine how successfulfactual questions are in obtaining respondents'
individualtruevalues,whereaswithnon-factualquestionssuch studiesare not possible.
Althoughnumerousvaliditystudiesofresponsesto factualquestionshave beencarriedout
in manysubjectareas,themajorityofthemhave examinedonlythelevelofaccuracyachieved
procedures,as required
bya givenquestioningprocedure;theyhave notcomparedalternative
theaccuracyoftheresponses
formakingan assessmentofhow aspectsofa questionmayaffect
obtained.Many ofthecomparativestudiesthathave been conductedhave avoided theneed
fordata froman externalvalidatingsource by makingan assumptionabout the general
directionof the responseerrorsto be encountered,the assumptionadopted beingbased on
evidencefromothervaliditystudies.Thus,forinstance,itis oftenassumedfrompast evidence
thatcertaineventssuch as purchasesmade or illnessesexperiencedin a givenperiodwill be
Giventhisassumption,thebestquestionformis thentakento be theone that
underreported.
fortheevents.On theotherhand,a sociallydesirableactivity
producesthehighestfrequencies
in whichcase thebestquestionformis theone that
maybe assumedto be generallyoverstated,
ofobtainingvaliditydata make
forit.Whilethedifficulties
givesthelowestreportedfrequency
it does dependcriticallyon thevalidityoftheassumptionabout the
thisapproachattractive,
directionof responseerrors.
and lesscertain.The accuracy
Withnon-factual
questions,validationis evenmoredifficult
ofresponsescan oftenbe examinedonlybymeansofconstructvalidity,thatis bydetermining
oftheresponseswithothervariablesconformto thosepredictedby
whethertherelationships
theory.At thecurrentstageoftheorydevelopmentin thesocial sciences,a failureofdata to fit
Then,
a theoryis usuallyas likelyto cast doubton thetheoryas on themeasuringinstruments.
thisagreementis
coincidewiththeirtheoreticalpredictions,
eveniftheobservedrelationships
thattheresponsesare valid;itmay,forinstance,insteadbe an artifact
nota clearconfirmation
employed-a "methodseffect".
of the set of measuringinstruments
of validatingresponsesto non-factualquestions,researchon
In view of the difficulties
in which
withsuchquestionshas reliedmainlyon split-ballotexperiments,
questioningeffects
with
to comparablesamplesof respondents,
alternativeformsofa questionare administered
questionformsbeingcomparedforconsistency.This conceln
the responsesto the different
whichis
withconsistencyratherthanvaliditymeansthattheresearchusuallyfailsto identify
ofresponsesto
thebest questionform.It servesonlyto warntheresearcherofthesensitivity
markedly,or to increasehis feelingsof securityin the
questionformif the responsesdiffer
resultsiftheydo not differ.
questionsinvolvesthefeaturesstudied
betweenfactualand non-factual
A seconddifference
as possibleinfluenceson the responsesobtained.Althoughmanyof the featurespotentially
have beenmoreconcernedabout someofthem
applywithboth typesofquestion,researchers
withfactualquestionsand otherswithnon-factualones. Thus researchon factualquestions
has focused on problems of definition,comprehension,memoryand social desirability
responsebias,whilethaton non-factualquestionshas concentratedon variousquestionform
and theorderofpresentation
ofmiddlealternatives,
suchas issuesofbalance,theoffer
effects,
studiedin relationto factualquestionsare reviewedin
The featuresprimarily
ofalternatives.
the nextsection,and those studiedin relationto non-factualquestionsare takenup in the
of question orderand context,whichhave receivedattentionin
followingone. The effects
relationto both typesof question,are discussedin Section4.
we should note that we are
Beforeembarkingon the discussion of question effects,
ask
the
interviewers
in
questionsand recordthe
which
with
surveys
concerned
primarily
by a numberof other
questionnairesmay be affected
answers.Responsesto self-completion
features,such as the physicallocation of a questionon the questionnaire,the placementof

44
- Effect
KALTONAND SCHUMAN
of theQuestionon SurveyResponses
[Part 1,
instructions,
the general layout,and the colours of printfor questions and instructions.
Reportsof experimentson the effects
of some of thesefeaturesare givenby Rothwelland
Rustemeyer(1979) for the US Census of Population and Housing, and by Forsytheand
Wilhite(1972) forthe US Census of Agriculture.
2. QUESTION EFFECTSWITH FACTUAL QUESTIONS
The startingpointin constructing
a factualquestionis a precisedefinition
ofthefactto be
collected.It has beenshownon manyoccasionsthatapparentlymarginalchangesin definition
can have profoundeffects
on surveyresults.Definitionsof unemployment
and labour force
raise a host of issues (e.g. Bancroftand Welch, 1946; Jaffeand Stewart,1955), but even
ostensiblysimplefactslike the numberof rooms occupied by a householdpose a rangeof
definitional
problems(forinstance:Is a kitchento be includedifonlyused forcooking?Are
bathrooms,toilets,closets,landings,hallsto be included?Is a roompartitionedbycurtainsor
portablescreensone or two rooms?).
Once the fact has been defined,the request for it has to be communicatedto the
respondent.A numberofdifficulties
can arisein thisprocess.In thefirstplace,theneed fora
precisedefinition
can lead to an unwieldyquestionwhichtherespondentcannot or willnot
make theeffort
to absorb.In thequalitycheckon the 1966 Sample Census of Englandand
Wales,Grayand Gee (1966) foundthat1 in 6 householdersreportedan inaccuratenumberof
roomsin the household,whichtheyascribemainlyto thefactthathouseholdersknow how
and theytherefore
many rooms theyhave accordingto theirown definitions,
ignoredthe
To avoid thisproblem,some loosenessis oftenacceptedin survey
detailedcensusdefinition.
questions (especially in perceptual questions), but this may well lead to inconsistent
interpretations
betweenrespondents.
Anotheraspect of the communicationprocess is to ensure that the respondentfully
understandswhathe is beingasked and whatis an appropriateanswer.At one levelhe needs
to understandthe conceptsand framesof reference
impliedby the question(Cannell and
Kahn, 1968). At a more basic level he needs to comprehendthe questionitself.Methodological researchby Belson and Speak foundthateven some simplequestionson television
viewingwere oftennot perceivedas intendedby a sizeable proportionof respondents.For
instance,thequestions"Whatproportionofyoureveningviewingtimedo youspendwatching
come on betweentwo programmeson a
and "When theadvertisements
newsprogrammes?"
weekdayevening,do you usuallywatchthem?"weremisinterpreted
byalmosteverybodywho
answeredthem.Withthefirstquestion,veryfewrespondents
knewwhat"proportion"meant,
and only 1 ofthe246 respondentsknewhow to workit out. Withthesecond,"weekday"was
as either"anyday oftheweek"or "anyday exceptSunday"(Speak, 1967;
oftenmisinterpreted
Belson, 1968).
To givea correctanswerto a factualquestion,a respondentneeds to have thenecessary
firstrequiresthathe has had theinformation
information
accessible.Accessibility
at sometime
and has understoodit.Then,ifthequestionasks about thepast,he needsto be able to retrieve
it fromhis memory.Ease of recalldependsmainlyon thelengthof therecallperiodand the
salienceto the respondentof the information
being recalled(see, forexample,Cannell and
Kahn, 1968).His successin recallingthe information
dependson the ease of recalland the
efforthe is persuaded to make. Many surveyquestions ask about eventsoccurringin a
specifiedreference
period(e.g. seeinga doctorin thelast year),in whichcase the respondent
also has to be able to place the eventsin time.A well-knownplacementdistortionis the
an eventas havingoccurredmorerecently
thanin factis the
telescopingerrorofremembering
case (see, forexample,Sudman and Bradburn,1973, 1974).
The effects
of recallloss and telescopingworkin oppositedirections,recallloss causing
and telescopingcausingoverreporting.
The extentofthesetwosourcesoferror
underreporting
dependson thelengthof thereference
period:the longerthe period,the greateris the recall
loss,but thesmalleris thetelescopingeffect.
Thus,forshortreference
periods,thetelescoping

1982]
- Effect
oftheQuestionon SurveyResponses
KALTONAND SCHUMAN
45
effect
may outweighthe recallloss, whileforlong periodsthe reversewill apply;in between
counterbalanceeach other
therewillbe a lengthof reference
periodat whichthe two effects
periodsvaries
(Sudman and Bradburn,1973). The meaningof "short"and "long" reference
with the event under investigation,
dependingon the event's salience. The choice of an
appropriatereference
periodneedsto takeintoaccountthetelescopingand recallloss effects,
as well as the factthatlongerperiodsprovideestimateswithsmallersamplingerrors.This
choice has been examinedin a numberof different
subjectareas (see, forexample,National
CenterforHealth Statistics,1972; Sudman, 1980).
is
A techniquewhich aims at eliminatingtelescopingerrorsby repeatedinterviewing
knownas bounded recall(Neter and Waksberg,1965). Respondentsare interviewedat the
eventswhich
beginningand end of the reference
period.The firstinterviewservesto identify
occurredpriorto thestartoftheperiodso thattheycan be discountediftheyare thenreported
again at the second interview.
Three proceduresare widelyused in surveypracticeto attemptto minimizeor avoid
memoryerrors theuse of records,aided recalltechniquesand diaries and each procedure
has its own sizeable literature.Whererecordsare available,say frombills or cheque book
as wellas provideaccurate
records,theiruse can reducebothrecallloss and telescopingeffects,
details of the events.Aided recall techniquesaim to reduce recall loss by providingthe
respondentwithmemorycues; thesetechniquesare widelyused in media research,wherethe
respondentwould be provided with,say, a list of newspapersor yesterday'stelevision
of
programmesfromwhichhe chooses theones he looked at. In theirsummaryoftheeffects
aided recalltechniques,Sudmanand Bradburn(1974)concludethattheydo increasereported
an increasein telescopingerrors.
but pointout thatthismayat leastin partrepresent
activity,
theremaybe no way
Wheretheeventsto be reportedare numerousand relatively
insignificant,
to help respondents
remember
themwithsufficient
accuracy.In suchcases,as withhousehold
and tripsoutsidethehouse,memoryproblemsmaybe avoided
foodconsumption
expenditures,
byhavingrespondents
completediariesoftheeventsas theytakeplace.Diaries,however,have
theirdisadvantages:theyare expensive,it is harderto gainrespondents'
cooperation,thediary
over
keepingmay affect
behaviour,it may be incomplete,and its qualityusuallydeteriorates
time.
in responsesto factual(and other)questions
sourceofinvalidity
Anotherwell-documented
is a social desirability
bias: respondents
distorttheiranswerstowardsones theyconsidermore
favourableto them.Thus,forinstance,it has beenwellestablishedthata higherproportionof
surveyrespondentsreportthattheyvotedin an electionthanthevotingreturnsindicate(for
instance,Parry and Crossley,1950; Traugottand Katosh, 1979). If an event is seen as
therespondent
sensitiveor threatening,
mayrepressitsreport,or he maydistort
embarrassing,
his answerto one he considersmoresociallyacceptable.Thereare a numberof well-known
includingmakingresponsesmore privateby
techniquesforelicitingsensitiveinformation,
to desensitize
usinga numberedcard(oftenused forincome)or a sealedballot,and attempting
a particularresponseby makingit appear to be a commonor acceptableone. Barton(1958)
has providedan amusingsummaryof thesetechniques.
A more recentdevelopmentforasking sensitivequestionsis the randomizedresponse
technique,in whichtherespondentchooseswhichoftwo(or more)questionshe answersby a
randomdevice;he answersthe chosen questionwithoutthe interviewer
beingaware which
question is being answered.In this way the respondent'sprivacy is protected,and in
response.SinceWarner(1965)introduced
consequenceitis hopedthathe givesa moretruthful
the technique,many articleshave appeared developingit, extendingits potentialrange of
application,and examiningits statisticalproperties.The main focusof thiswork has been,
littleattentionhas beengivento itspractical
however,on theoretical
issues,and comparatively
fromstudiesin whichit has beenappliedis that
One common,butnotobvious,finding
utility.
it has generallybeen well receivedby respondents.In a small-scaleexperimentalstudyby
Locander et al. (1976),forinstance,only 1 in 20 respondentssaid it was confusing,
sillyor

46
- Effect
KALTONANDSCHUMAN
[Part 1,
thoughtthatabout 7 out of8 understoodtheuse oftherandom

theinterviewers
unnecessary;
responsebox,and thata similarproportionacceptedtheexplanationofthebox and believed
thattheiranswersreallywereprivate.
studieshave obtainedhigherratesofreportsofsensitiveinformation
Severalexperimental
from randomized response techniquesthan from traditionalquestioning for instance,
Abernathyet al. (1970) and Shimizuand Bonham (1978) on abortionrates,Goodstadt and
Gruson(1975) on school students'druguse, and Madigan et al. (1976) on death reportsin a
provincein the Philippines.In theirvaliditystudycomparingthe accuracyof reportingof
self-administered
questionnairesand a randomized
telephoneinterviews,
personalinterviews,
responsetechniqueforfiveissuesofvaryingdegreesofthreat,Locanderet al. (1976)foundthat
of the
in reducingunderreporting
the randomizedresponsetechniquewas most effective
sociallyundesirableacts,beingdeclaredbankruptand beingchargedwithdrunkendriving.
of
However,the use of the techniquestill led to a substantialamount of underreporting
average
an
with
compared
technique
response
the
randomized
cent
for
per
(35
drunkendriving
of 48 per centforthe othertechniques).
Anygain in bias reductionwiththerandomizedresponsetechniquehas to be setagainsta
sizeable increasein samplingerror.The use of the techniquealso hampersanalysesof the
questionand othervariables.For these
betweentheresponsesto thethreatening
relationships
reasons the techniqueseems usefulonly forspecial,verysensitive,issues forwhichoverall
estimatesare required.It does notappear to providea widelyapplicableapproachfordealing
withsensitivesurveyquestions.
ofresearchextendingoverthelasttwodecades,Cannelland hiscolleagues
In a programme
at theSurveyResearchCenterhave developeda varietyofnew approachesto deal withboth
problemsof memoryerrorsand problemsof sensitivequestions.Their researchhas been
ofhealthevents,butithas widepotential
thequalityofreporting
directedmainlyat improving
application.In theirearly work theyidentifiedthe need to have respondentsunderstand
to retrieveand
adequatelythetaskexpectedofthemand have themmake thenecessaryeffort
into a suitable reportingform.They then developed techniques
organize the information
aimed at meetingtheseobjectives.
One techniquestemmedfromresearchon speech behaviourin employmentinterviews,
speechdurationhas beenfoundto resultin an increasein
wherean increasein an interviewer's
This
findingraisedthepossibilitythat,counterto accepted
therespondent'sspeechduration.
yieldlonger,and hencemorevalid,
surveydogma,longerquestionsmayin somecircumstances
wereconductedto compareresponsesto a short
experiments
answers.To testthishypothesis,
questionwiththoseto a longerquestionformedby addingredundancieswhichdid not affect
comparedresponsesto thefollowingtwo questions:
thecontent.One such experiment
Shortquestion:"What healthproblemshave you had in the past year?"
Long question:"The nextquestionasks about healthproblemsduringthe last year.This is
somethingwe ask everyonein the survey.What healthproblemshave you had in the past
year?"
longer
The experimentsdid not findthat the longer questions produced significantly
responses,buttheydid yielda greaternumberofrelevanthealtheventsbeingreported.Sincea
involvinga
experiments
questionnairemade up ofonlylongquestionswould be cumbersome,
mixtureoflongand shortquestionswerealso carriedout:thismixturewas foundto yieldmore
ofhealtheventsto boththelongand shortquestions(Cannellet al., 1977;Cannellet
reporting
1981).
al.,
The researcherspostulate threereasons for this effect:that by essentiallystatingthe
is increased;thatthetimelag betweenthefirst
understanding
questiontwicethe respondent's
need
to answeritat theend allowstherespondent
the
ofthequestionat thestartand
statement
thelengthof the
to martialhis thoughts;and thattherespondentinterprets
theopportunity
thusencouraginghimto giveit greaterattention.
questionas an indicationofitsimportance,
werein no way more
It shouldbe observedthatthelongerquestionsin theseexperiments

1982]
EffectoftheQuestionon SurveyResponses
47
complex than the short ones. The usual advice "keep questions short" is probably an
fromlong
inaccurateway of saying "keep questions simple"; in practice the difficulties
questionsprobablyderivefromtheircomplexityratherthan theirlengthper se.
Other techniquesdevelopedby Cannell and his colleaguesto improvesurveyreporting
theuse offeedbackand thesecuringofrespondent
includetheuse ofrespondentinstructions,
commitment.
in the questionnaireis to advise the
The purpose of includingrespondentinstructions
with
his task.Cannell et al. (1981) have experimented
respondenton how he should perform
at the startof the interviewto ask the respondentto think
providinggeneralinstructions
takehistimeand checkrecords,and to tellhimthataccurateand
carefully,
searchhismemory,
on
completeanswersare wanted.In addition,respondentscan be givenspecificinstructions
how to answer individualquestions;these specificinstructionshave the added benefitof
lengthening
the questions,thussecuringthe advantagesassociatedwithlongerquestions.
The
The purposeof feedbackis to informthe respondenton how well he is performing.
fromwhichto choose,their
are providedwitha selectionoffeed-backstatements
interviewers
choice beinggovernedby the respondent'sperformance.
Examplesof positiveand negative
feed-backstatementsare "Thanks, we appreciateyour frankness"and "Uh-huh. We are
interestedin details like these" on the one hand and "You answeredthat quickly" and
"Sometimesit's easy to forgetall the thingsyou feltor noticedhere.Could you thinkabout
it again?" on the other.
techniqueis thatifa respondentcan be persuadedto
The theorybehindthecommitment
he will feelbound by the termsof the
enterinto an agreementto respondconscientiously
by askingrespondentsto
The techniquecan be appliedwithpersonalinterviewing
agreement.
sign an agreementpromisingto do theirbest to give accurate and completeanswers.In
refuse
practiceCannelland hiscolleagueshavefoundthatonlyabout 5 percentofrespondents
to co-operate.With telephoneinterviewing,
respondentsmay be asked to make a verbal
to respondaccuratelyand completely:a studyapplyingthisprocedureencouncommitment
teredno problemsin securingrespondents'co-operation.
The evidencefromthe various experimentsconductedto examine the utilityof these
techniquessuggeststhat each of them leads to an improvementin reporting,with a
combinationof all threegivingthe best results.A concernthathigh-education
respondents
mightreactnegativelydid not materialize.In a healthstudy,the use of the threetechniques
togetherincreasedthe average numberof itemssuppliedin answersto open questionsby
about one-fifth;
substantiallyimprovedthe precisionof dates reportedfor doctor visits,
increasedby about three-fold
the checkingof data
medicaleventsand activitycurtailment;
fromoutsidesources;and securedalmosta thirdmorereportsofsymptomsand conditionsfor
In a smallthepelvicregion(consideredto be potentially
embarrassing
personalinformation).
witha combinationof
scale studyofmediause,comparingan experimental
groupinterviewed
withnone of them,the experimental
all threetechniqueswitha controlgroup interviewed
and a lesseramount
groupreporteda greateramountforactivitieslikelyto be underreported
Thus 86 per cent of the experimentalgroup reported
forthose likelyto be overreported.
watchingTV on the previousday compared with 66 per cent of the controlgroup; the
for24 hourson average,comparedwithan
experimental
grouplistenedto theradio yesterday
average of 1Nhours forthe controlgroup; 38 per cent of the experimentalgroup reported
readingtheeditorialpage on thepreviousday comparedwith55 percentofthecontrolgroup;
and the experimentalgroup reportedan average of 2 9 books read in the last 3 months
comparedwith5 3 forthecontrolgroup.
These experimentalresultssuggestthat the techniqueshold considerablepromisefor
itseemspremature
to
However,at thisstageoftheirdevelopment,
improving
surveyreporting.
advocate their general use in routine surveys.They involve significantalterationsto
need to be trainedin theiruse, and interviewstake longerto
interviewers
questionnaires,
researchis called for,to attemptto replicate
complete.Beforetheyare widelyadopted,further

48
KALTON AND SCHUMAN- Effect

[Part 1,
the findingsacross a varietyof surveytypesin different

surveyenvironments,
to identify
restrictions
on theirrangeof application,and to seek improvements
in them.An important
componentof thisresearchshould be experiments
whichincorporatean externalsource of
data againstwhichthesurveyreportscan be validated.The availabilityofthevaliditydata not
onlyavoids theneedforassumptionsabout thedirectionsofreporting
errors,but also means
thattheextentofanyimprovements
can be assessedagainsttheamountofreporting
errorstill
remaining.
3. QUESTION EFFECTSWITH NON-FACTUALQUESTIONS

As witha factualquestion,theinitialstagein theformation
ofa non-factual
questionis the
conceptualizationof the constructto be measured.By its naturea non-factualconstructis
to definepreciselyin
usually more abstractthan a factualone, and hence more difficult
theoreticalterms;it is also more difficult
to operationalize.Since oftenno singleitemcan
represent
theessenceofa construct,
multipleindicatorsare needed;each indicatoroverlapsthe
withthe set of indicatorsbeingchosenso thatthe overlapsbetweenthemtap the
construct,
construct.In attitudemeasurement,the conceptualizationand operationalizationof an
attitudedimensionare oftencloselyinterwoven:
the initialconceptualizationdeterminesthe
choice of itemsused to operationalizethedimension,but thentheitemsthemselvesserveto
refinethe dimension'sdefinition.With the infinityof attitudedimensionsthat could be
measuredmergingimperceptibly
fromone to another,theprecisedefinition
of theone being
measuredmustdepend ultimately
on the set of itemsused to operationalizeit.
We have notedthatchangesmade in factualquestionsto accommodatemarginalchanges
in definition
can frequently
have substantialeffects
on theresponsesobtained.Surveyanalysts
deal withthisinstability
by carefullyspecifying
exactlywhat has been measured.In general
theyare not too disturbedby a variationin the resultsobtained fromdifferent
questions
because, as a resultof the relativelyprecisedefinitions,
theycan usually account for the
variationas the neteffect
of the variousdefinitional
changesinvolved.
With non-factualquestions,similareffectsoccur with marginalquestion changes:two
apparentlyclosely similarquestions may yield markedlydiscrepantresults.Sometimesa
detailedexaminationof thequestionscan identify
subtlecontentdifferences
whichprovidea
convincingexplanationforthe discrepancies.Thereare, however,also some cases whereno
obvious contentchan2e can be detected.and vet the resultsstill differsubstantially.In
particular,certainvariationsin questionformhave oftenbeen shownto have sizeableeffects
on results.
In viewof thissensitivity
ofresponsesof non-factualquestionsto minorchangesand the
somewhatimprecisedefinitions
oftheconceptsunderstudy,experiencedsurveyanalystsare
ofanswersto such questionstoo seriously.Instead
waryoftakingthemarginaldistributions
they concentratetheir attentionon some form of correlationalanalysis, for instance
contrastingthe responsedistributionsof different
subclasses of the sample. This formof
analysisis justifiedby the assumptionthat,even thoughdifferent
questionformsmay yield
thequestionformeffect
markedlydifferent
cancelsout in thecontrast.Schuman
distributions,
and Presser(1977) have termedthisassumptionthatof"formresistantcorrelation".Evidence
is givenbelowto showthat,whilethisassumptionis oftenreasonable,itdoes notalwayshold.
In constructinga non-factualquestion, the questionnaire designerhas to make a number of
decisionson theformof thequestionto be asked. We willbriefly

reviewa selectionof these
decisions,mainlywithrespectto opinionquestions,to see how theymightinfluence
theresults
obtained.
(a) Treatment
of"Don't knows"
a factualquestiona responseof"Don't know"(DK) represents
WA'ith
a failureto obtaina
required item of information;there is an answer to the question, but the respondent cannot

1982]
49
forrespondents
interpretation,
provideit.Withopinionquestions,however,DK has a different
may trulyhave no opinionon theissue understudy.
The standardway of allowingforDK's withopinionquestionsis the same as thatused
includedin thequestion,and
withfactualquestions;theoptionto answerDK is notexplicitly
it
to use theDK responsecategoryonlywhentherespondentoffers
are instructed
interviewers
mayfeelpressuredto
The dangerwiththisprocedureis thatsomerespondents
spontaneously.
give a specificanswereven thoughDK is theirproperresponse.This dangerexistsforboth
factualand opinionquestions,but it is probablygreaterforthe latter.
will
Two examplesgivenby Schumanand Presser(1980) illustratethatmanyrespondents
foran opinionquestioneventhoughtheydo not
offered
indeedchoose one ofthealternatives
wereaskedfortheirviewsabout
knowabout theissueinvolved.In bothexamples,respondents
proposed legislationwhichfew,if any, would be aware of,and yet 30 per cent expressed
opinions.Bishop et al. (1980b) reportsimilarfindingsabout a whollyfictionalissue.
maybe used.
withoutopinions,sometypeoffiltering
out respondents
As a wayofscreening
in theresponsecategories
is to includean explicit"no opinion"optionor filter
One possibility
this offer
in the Schuman and Presserexperiment,
offered
to respondents a "quasi-filter";
reducedthe proportionof respondentsexpressingopinionson the two laws to 10 per cent
filterquestion"Do you have an opinion
possibilityis a preliminary
or less. A moreforceful
on . . .?" a "fullfilter".
Schuman and Presser(1978) carriedout severalexperimentsto examine the effectsof
They foundthatthe use of thefullfiltertypicallyincreasedthe percentageof DK's
filtering.
over thoseobtainedfromthestandardformby around 20-25 per cent.Bishop et al. (1980a)
thattheincreasesweregenerallyin therange20-25 per cent,
also foundin theirexperiments
but theyreporta muchsmallerincreasefora veryfamiliartopicand a muchlargerone foran
unfamiliartopic.
In Schuman and Presser'sexperimentsthe effectof the variationin question formon
substantiveresultswas somewhatunexpected.In the firstplace, once the DK's had been
of responses
eliminated(as would usuallybe done in analysis),the marginaldistributions
by questionform;also the relations
affected
turnedout in mostcases not to be significantly
between the opinion responses and standard background variables were littleaffected.
in
significantly
did differ
However,theassociationsbetweentheopinionresponsesthemselves
withthefiltered
certaincases betweenquestionforms:in one case theassociationwas stronger
form,in anotherit was weaker.
(b) Open or closedquestions
Whenasked a surveyquestionrespondents
mayeitherbe suppliedwitha listofalternative
responsesfromwhichto choose or theymaybe leftto makeup theirown responses.The major
advantages of the formertype of question termedvariouslya closed, fixed-choiceor
precodedquestion are standardizationof responseand economyof processing.Its major
disadvantages,and hence argumentsin favourof open questions,are that the alternatives
imposed by the closed formmay not be appropriatefor these respondents,and that the
alternativesoffered
may influencethe responsesselected.
The main contextin which open questionsare used extensivelyis when the potential
responsesare both nominalin natureand sizeable in number.These conditionsoccur often
and with
withmotivationquestions,askingfortheprincipalor all reasonsforan occurrence,
questionsaskingforthechoiceofthemost,or severalmost,importantfactorsinvolvedin an
issue. In such cases the questionnairedesignerfacesa real choice betweenopen and closed
questions.
Schumanand Presser(1979) carriedout
As partoftheirresearchon questionformeffects,
on open and closed questions,usingitemschosenfortheirutilityin one
severalexperiments
occurred
differences
important
formor theotherin a majorpast survey.In all theexperiments

50
[Part 1,
betweenthe responsedistributionsto the open and closed forms.The two versionsof a

were:
questionon workvalues in the firstexperiment
thingsin a job.
Open:"The nextquestionis on thesubjectofwork.People look fordifferent
What would you mostpreferin a job?"
Closed:"The nextquestionis on thesubjectofwork.Would you please look at thiscard and
tellme whichthingon thelistyou mostpreferin a job? 1. High income(12 4 per cent);2. No
dangerofbeingfired(7 2 percent);3. Workinghoursare short,lots offreetime(3 0 percent);
4. Chances foradvancement(17 2 per cent);5. The workis importantand givesa feelingof
accomplishment(59 1 per cent)." (Figures in brackets are percentageschoosing the
alternatives.)
Whileall but 1 percentofresponsesto theclosedquestionfellintoone ofthefiveprecoded
categories,nearly60 per cent of those to the open question fell outside these categories
(importantadditionalcodes developedwere:pleasantor enjoyablework,15 4 percent;work
thejob, 17 0 per cent).The open question
conditions,14 9 per cent;and satisfaction/liking
responsesgave riseto fivecodingcategoriescomparableto thoselistedabove forthe closed
choosingthecode werealmost
questionform.For thefirsttwo,theproportionsofrespondents
forthethirdtheproportionwas somewhatlowerwiththe
identicalforthetwoquestionforms,
open form,whileforthelast two it was muchlowerwiththeopen form.The equivalentcode
forpromotion"withtheopen form:
forthe"Chance foradvancement"code was "Opportunity
thiscode was used foronly 1 8 per centof responsesas comparedwiththe 17 2 per centuse
of the "Advancement"code. The code correspondingto "Accomplishment"was called
"Stimulatingwork"; it was used for only 213 per cent of responsesas compared with
591 per centfor"Accomplishment".
is thattheywerenot caused by the
A possibleexplanationforthesesubstantialdifferences
change in questionformper se, but wereratherthe resultof the use of unsuitableresponse
usinga revised
conductedtwo moreexperiments
categories.Schumanand Pressertherefore
forthe
fromthecodes developedin thefirstexperiment
set ofresponsecategoriesconstructed
open question.This revisedset aimed to representmore adequatelythe work values that
goals behindthequestion.
and also to retainthetheoretical
spontaneously
offered
respondents
Even withtheserevisedcodes, however,the responsecategoriesof the closed formcovered
betweentheproportions
only58 percentofall open responses,and thereremaineddifferences
choosingthe fivecommoncategorieson the two formsof the question.
Schumanand Presseralso foundthatthe
in marginaldistributions,
Besidesthedifferences
relationshipsof the responsesto backgroundvariables.In
questionformsometimesaffected
theresultsfromtheclosed formindicatedthatmenweremorelikelythan
thefirstexperiment,
women to value "Pay" and "Advancement",but in the open formno such associations
therewas a substantialdownwardtrendin the
appeared. In the second two experiments,
proportionchoosingthe"Security"categorywithincreasingeducationfortheclosed question
form,but therewas no clear relationshipbetweenthiscategoryand educationforthe open
form.The authors presentindirectevidence suggestingthat when open and properly
responses,the responsesto the closed
constructedclosed formsof questionsyielddifferent
questionsare sometimesmore valid in theirclassificationof respondentsand in describing
relationshipsof the responseswithothervariables.
(c) The use ofbalance
In askingforrespondents'opinions on an issue,the questionnairedesigneroftenhas a
the
contraryopinion.At one extreme,
choiceoftheextentto whichhe presentsthealternative
questionmay be expressedin an unbalancedformsimplyas "Do you favourX?", withthe
contraryopinion left entirelyunmentioned,while at the other extremea substantive
alternativemay be explicitlystated,as in the question "Do you favourX or Y?". An
to draw attentionto the existenceof the
intermediate
positionis to use a tokenalternative,

1982]
KALTON AND SCHUMAN- EffectoftheQuestionon SurveyResponses
51
exactlywhat it is; questionslike "Do you favouror

alternativeopinion withoutspecifying
oppose X?" are of thistype.
have beenconductedto comparetheresultsobtained
experiments
A numberofsplit-ballot
usingtheunbalancedformofthequestionand thoseusingtheformwiththetokenalternative.
have generallyfoundonlysmall differences.
theseexperiments
Perhapsnot surprisingly
have often-but not always-been foundbetweenthe
On theotherhand,largedifferences
In a numberof
responsesgivento questionsasked withand withouta substantivealternative.
it can be arguedthattheinclusionof thealternativehas introducednew issues,
experiments
thechoicetherespondentis beingaskedto make(Hedges,1979).Evenso,
modifying
effectively
sinceitmeansthattwoquestionsapparently
thesurveyanalystneedsto be awareofthiseffect,
tappingcloselycomparableissuescan yieldverydivergentresults.
(d) Acquiescence
A widelyused methodof attitudemeasurementis to presentrespondentswitha set of
withwhichtheyare asked to agreeor disagree.An issue thatariseswith
opinionstatements
mighttend,regardlessofcontent,to give"agree"ratherthan
thisprocedureis thatrespondents
"disagree"responses.This tendency,which has receiveda good deal of attentionin the
is oftenknownas acquiescenceor agreeingresponseset (bias). The
psychologicalliterature,
is oflittleimportancein psychologicaltesting,
dominantviewnow appearsto be thattheeffect
butCampbellet al. (1960)long ago providedevidenceto suggestthatthisconclusionmaynot
hold forthe social surveysituation.
In one of theirexperimentson this issue Schuman and Presser (1981) compared the
withwhichrespondentswereasked eitherto agreeor to disagree
responsesto two statements
to be
wereconstructed
(and also a forcedchoiceversionofthequestion).The two statements
thepresenceofan agreeing
exactoppositesofeach other.The detailedanalysisto investigate
responsebias cannotbe adequatelysummarizedhere,but we willreportone simpleresultto
found.
indicatethemagnitudeof theeffect
statementswere:
The two agree/disagree
A. "Individualsare more to blame than social conditionsforcrimeand lawlessnessin this
country."
B. "Social conditionsare more to blame than individualsforcrimeand lawlessnessin this
country."
the proportionof respondentsanswering"agree" to A
Withouta questionformeffect,
shouldbe thesame as theproportionanswering"disagree"to B, and vice versa.In theevent,
however,59 6 per cent agreed with A and only 43 2 per cent disagreedwith B, a highly
associations
Schumanand Presseralso foundthatthevariationaffected
difference.
significant
betweentheresponsesand educationand otherimportantvariables.
(e) Middle alternatives
When respondentsare asked theirviewson an issue,oftensome may want to choose a
middleor neutralresponse.The problemfacingthequestionnairedesigneris how to allow for
Should a neutralresponsebe
thisresponse.Should a middlealternativebe explicitlyoffered?
Or should it be activelydiscouraged?
spontaneously?
acceptedonlyifoffered
increases
oftensubstantially
ofa middlealternative
As mightbe expected,theexplicitoffer
conductedby
theproportionofrespondentsstatinga neutralview.In a seriesofexperiments
Kalton et al. (1980),the increaseswerebetween15 and 49 per cent;in a seriesreportedby
Presserand Schuman(1980) the increaseswerebetween10 and 20 per cent.
Presserand Schumanobservethatin theirstudiesand earlierones involvingthreepoint
scales (pro, neutraland anti) the increasein supportforthe neutralview withthe offered
fromthe polar positions,so that the balance between
questionformcame proportionately
This comforting
pro's and anti's was not affectedby the variationin alternativesoffered.

52
[Part 1,
scalesreported
withthree-point
finding
failedto hold,however,in twoofthethreeexperiments
by Kalton et al.
There is littleevidence that this question formvariationaffectsassociations between
opinionresponsesand othervariables.In viewofthesubstantialimpactofthequestionform
however,it seemsdangerousto place uncriticalreliance
variationon marginaldistributions,
on the "formresistantcorrelation"assumption.
(f) Orderofalternatives
are
bytheorderin whichthealternatives
The responsesto closedquestionsmaybe affected
presented.In discussingthis order effect,two modes of presentationmay need to be
distinguished:the alternativescan be presentedin writtenform,as with self-completion
questionnairesor when flashcardsare used; or they can be presentedorally,with the
interviewer
readingthemto respondents,sometimesas a runningprompt.When theyare
presentedin writtenform,thereappears to be a slighttendencyforthefirstalternativeto be
favoured(e.g. Belson, 1966;Quinn and Belson, 1969).When theyare presentedorally,Rugg
and Cantril(1944) provideexampleswherethe last-mentioned
alternativeis favoured,but
is negligible.Kalton et al. (1978)
Payne also gives severalexampleswherethe ordereffect
withfour
reporttheresultsofexperiments
on varyingtheorderoforallypresentedalternatives
simplequestions.In all cases, the evidencesuggestedthat,if anything,the first-mentioned
alternativewas favoured;theeffects
were,however,verysmall(around a 2 per centincrease),
and onlyon the borderof statisticalsignificance.
4. GENERAL QUESTION EFFECTS
The precedingdiscussionhas been dividedinto two parts,questioningissues relatingto
division
factualquestionsand thoserelatingto non-factual
(opinion)questions.This arbitrary
in emphasisof question
was made forconvenienceof expositionto reflectthe differences
wordingand formatresearchbetweenthetwo typesof question.However,it should not be
notedforone typeofquestiondo not applyto theother.Thus,
takento implythattheeffects
can clearlyarisewithopinionstatements,
as also wouldissues
forinstance,issuesofsensitivity
difficult
matter
ofmemoryifthesurveywas concernedabout changesofopinion(an extremely
on whichto collectaccurateinformation
byretrospective
questioning).Equally,whilemanyof
thequestionformvariationsdiscussedabove fornon-factualquestionsare not applicablefor
factualquestions,thelattermayalso be affected
by variationin questionform.Locanderand
Burton(1976),forinstance,showhow fourversionsofa questionaskingforfamilyincome,all
incomedistributions.
yieldedmarkedlydifferent
designedforuse withtelephoneinterviewing,
thesetofresponsecategoriesas a
All thequestionsused an unfolding
techniqueforpresenting
formsofthetechnique.Forms 1 and 4,
sequenceofbinarychoices,buttheyemployeddifferent
forexample,bothasked whetherthefamilyincomewas "morethatX" forX = $5000,$7500,
$10 000, $15 000, $20 000 and $25 000; form1 startedwith$5000 and took increasingvaluesof
X untila "no" answerwas given,whileform4 startedwith$25 000 and took decreasingvalues
untila "yes" answerwas given.With form1 37 5 per cent of respondentsreportedfamily
incomesof $15 000 or more;withform4 thecorresponding
percentagewas 63 7 per cent.
A final,important,questioningeffectto be discussed concernsthe presenceof other
and thepositionofthosequestionsin relationto thequestion
questionsin thequestionnaire,
understudy.Questionorderand contexteffects
may occurwithbothfactualand non-factual
ways.
questions,but theyappear to operatein different
ofquestionorder
A sizeablenumberofstudieshave been carriedout to examinetheeffect
has been discovered,
on responsesto opinionquestions.On manyoccasions no ordereffect
evenforquestionscloselyrelatedin subjectmatter.However,one typeofquestionordereffect
occurswhenone
exploration.This effect
has beenfoundin twocases and seemsworthfurther
ofthequestionsis a generalone on an issue and theotheris morespecificon thesame issue.
Schumanet al. (1981) withtwo opinionquestionson abortion,and Kalton et al. (1978) with

1982]
- Effect
KALTONANDSCHUMAN
53
twoquestionson drivingstandards,bothfoundthatthedistributions
ofanswersto themore
specificquestionswerethe same whetherthe specificquestionwas asked beforeor afterthe
generalquestion,but that the distributionsof answers to the general questions differed
accordingto the questions' position. (However, Kalton et al. also reportanother such
witha contraryfinding.)
In theKalton et al. experiment,
experiment
respondentswereasked
aboutdrivingstandardsgenerallyand about drivingstandardsamongyoungerdrivers.When
the generalquestionwas asked first,34 per cent of respondentssaid that generaldriving
standardswere lower than theyused to be; when that questionfollowedthe more specific
questionabout youngerdrivers,thecorresponding
percentagefellby 7 percentto 27 percent.
Furtheranalysisshowedthatthequestionorderaffected
onlyrespondentsaged 45 or older,
in thepercentageswas 12 percent.No definitive
wherethedifference
reasonforthiseffect
has
beenestablished,but it maypossiblybe explainedas a subtractioneffect:
afteransweringthe
specificquestion,somerespondents
assumethatthegeneralquestionexcludesthespecificpart
(e.g.in the drivingexample,theyassume thatthegeneralquestionexcludesconsiderationof
thedrivingstandardsof youngerdrivers).
With factual questions,one situationwhere other questions on a questionnairemay
influence
theanswersto a particularquestionariseswhenrespondents
are asked to respondto
a longlistofsimilaritems,as forinstancein readershipsurveyswheretheyare takenthrougha
listofnewspapersand periodicalsto findout whichones theyhave looked at. Here levelsof
reporting
sometimestendto be lowerwhenitemsare placed laterin thelist.For instance,in
studyingreadershipreportingin the UK National Readership Surveys,Belson (1962)
conductedan experimentin whichhe varied the relativepositionof the different
typesof
periodicalsbetweendifferent
partsofthesample.The weeklypublicationsweremostaffected
bythepresentation
order:whentheyappearedlast theirreportedlevelofreadershipwas only
of whatit was whentheyappeared first.
three-quarters
Anothersourceofevidenceon thedisturbing
influenceof otherquestionscomes froman
examinationbyGibson et al. (1978) oftheeffects
oftheinclusionofsupplements
on theresults
forcore itemsin theNational CrimeSurvey(NCS), CurrentPopulation Surveyand Health
InterviewSurvey.
In the NCS Cities Sample a lengthyseriesof attitudequestions about topics such as
neighbourhoodsafety,opinionsregardinglocal police, crimetrendsand news coverageof
crimewas askedofa randomhalfofthesampleofadultsin additionto thecoreNCS questions
on crimevictimization.
Sinceit was thoughtthattheresponsesto theattitudequestionsmight
be affected
by thevictimization
questionsiftheywereasked afterthecore items,the attitude
questionswere asked first.The effectof the priorinclusionof the attitudequestionswas,
increase the reportedvictimizationrates: on
however,to substantiallyand significantly
averagethe rate forpersonalcrimeswas around 20 per cent greaterand thatforproperty
crimeswas around 13 per cent greaterfor the half sample that answered the attitude
supplementthan forthe halfsample that did not. Possible explanationsforthis effectare
thatthe attitudequestionsservedto stimulaterespondents'awarenessor memoryregarding
victimizationexperiences,that they increased respondents'desire to produce what they
perceivedto be the desired answers-victimizationexperiences or that a combination
ofboth thesecauses operated.
Froma further
analysisoftheNCS CitiesSample,Cowan et al. (1978)deducethattheeffect
of administeringthe attitudesupplementwas to increase reportingof the less serious
victimizations
(such as simpleassault,thosenot reportedto thepolice and thoseinvolvinga
loss of under$50) and to increasereportingamong populationsubgroupsexperiencing
high
victimization
rates(youngerpersons,males).Theyalso foundthatthehigherrateswerespread
the 12-monthreference
throughout
periodwithno discerniblepattern,a factorwhichargues
stimulatedby the attitudesupplement.Theyconclude
againstan increasedtelescopingeffect
in thereference
thattheeffect
ofthesupplementis to producebetterreporting
period,butthey
to attributethiseffect
to memorystimulation.
suggestthatit may be an oversimplification

54
EffectoftheQuestionon SurveyResponses
[Part 1,
thatthe inclusionof supplementscan have on the

The findingsof the substantialeffects
ofresultsacross surveys.
responsesto core itemsraisea major concernforthecomparability
in resultsbetween
ofdifferences
Surveyanalystsare properlycautiousin theirinterpretations
surveyswhenthereare even slightchangesin the questionsbeingcompared.These findings
in therestofthequestionnaires.
suggestthattheyalso needto be concernedabout differences
This conclusionhas seriousconsequencesforthereplicationofsurveyresultsbecause,whileit
is oftenfairlyeasy to replicateindividualquestionsor small sets of themforpurposes of
to replicatean entirequestionnaire.In view of the
it is extremelydifficult
comparability,
importanceof replicationand measuresof change in the analysis of surveydata, further
researchin thisarea is surelycalled for.
5. CONCLUDING REMARKS
The generalconclusionfromthisreviewmustbe thatsurveyquestioningis not a precision
tool. The surveyliteraturecontainsample evidenceto indicatethatseriousresponseerrors
can, and do, occur with factual questions,and many experimentshave shown that the
responses to opinion questions can sometimesbe substantiallyaffectedby apparently
variationsin the questionasked. This conclusionon the limitationsof current
insignificant
butitis not
to surveymethodologists,
surveyquestioningproceduresmaybe unexceptionable
recognizedbythewidevarietyofpeoplewho carryout surveysand use surveydata
sufficiently
purposes.
fora rangeof different
oftentreatmarginalresults
experiencedsurveypractitioners
In viewofthisstateofaffairs,
theirattentionmore on comon absolute levels withconsiderablecaution,concentrating
subgroupsofthesampleor betweentwo or moresurveys.In
parisons,eitherbetweendifferent
theyare assumingthatthereis a
thesecontrastsas estimatesof truedifferences,
interpreting
a bias whichthuscancelsout in thecontrast:thisis
constantbias acrosssubgroupsor surveys,
essentiallytheformresistantcorrelationassumptiondiscussedearlier.Whilethisassumption
we
not be reliedon uncritically:
it should nevertheless
is oftena reasonableapproximation,
have noted severalreportedexamplesof its failurewithopinion questions,and differential
biases can also be expectedto occur withfactualquestions.
surveysis thequestionorderand
theresultsof different
A special problemin contrasting
of responsesto
sensitivity
discussedin theprevioussection.The demonstrated
contexteffect
both factualand opinionquestionsto the presenceof otherquestionson the questionnaire
makegreatuse ofcomparisons
who frequently
mustbe a majorconcernto surveyresearchers,
betweensurveysin theiranalyses.A good illustrationof thisproblemis providedby Turner
ofcomparingsubjectivesocial indicatorsacross
and Krauss (1978)who examinethedifficulties
a seriesof surveys.
The evidencefrommethodologicalresearchpointsto considerableroomforimprovement
in thequestioningphase ofthesurveyoperation.Althougha substantialamountof research
has beenconductedin thisarea,we remainlargelyignorantofthenatureofquestionwording
will occur and when theywill not, and of how they
of when such effects
and formeffects,
operate.In thepast,mostoftheresearchon factualquestionshas simplyassessedtheextentof
responseerrors,and most of that on opinion questions has just examineddiscrepancies
betweentwo or morevariantsof a question.Presentresearchis beginningto studyquestion
and thento
in a systematicway,withan attemptfirstto codifythe typesof effects,
effects
understandthe psychologicalprocessesunderlyingthem.This stage of work is stillin its
the task is a dauntingone, and progressis likelyto be slow withlittleprospectfor
infancy,
ofthepast,whatis now neededis a
Ratherthantheisolatedexperiments
majorbreakthroughs.
series of developmentalprograms aiming to build and test theoreticalstructuresfor
ofthefactorsinvolvedin the
Untilwe have a muchclearerunderstanding
questioningeffects.
lack
the
basis forconstructing
we
will
good
their
and
interrelationships,
process
questioning
questions.

1982]
55
is thatan existingpsychologicaltheorycan explaina broad rangeofeffects

One possibility
For example,Bishopet
ofthetypewe have notedand perhapsalso lead thewayto prevention.
al. (1981) suggestthat recentcognitivetheoriescan provide such a frameworkand they
theyobtainedbyaccident.Untilthetheory
a singlecontexteffect
thisbyinterpreting
illustrate
way,however,or at leastappliedto a varietyofexistingexamples,it
can be used in a predictive
fithas been achieved.A
to knowwhethermuchmorethana onetimeafter-the-fact
is difficult
slowerand morearduous approachcan be illustratedby our
and admittedly
ratherdifferent
withtwo abortionitemsmentioned
own recentwork,whichstartsfromthe contexteffect
again firstdiscoveredby accidentand thenreplicatedexperimentally,
above.The initialeffect,
involvedtwoadjacentitemsin NORC's GeneralSocial Survey.Our firststepwas to substitute
otherquestionson abortionthatwe thoughtto be conceptuallysimilarto theoriginalitemsin
is limitedto the exact wordingof the originalNORC
orderto determinewhetherthe effect
questions.This was done one itemat a time,and we have now foundthatthecontexteffect
does generalizebeyond both the originalitems to other parallel questions dealing with
abortion.
requiresthatthe itemsbe
A further
step was thentakento determinewhetherthe effect
thusfar,or whetheritextendsto situationswherethetwo
contiguous,as in all theexperiments
are separatedby a numberofunrelateditems.Initialresultssupporttheformerrequirement,
is eliminated,althoughthisis a case where
seemsto disappearonce contiguity
fortheeffect
replicationis essentialand is beingpursued.Finally,havingobtaineda good fixon thedegree
of generalityof the abortioncontexteffectover variantsin wordingand position,we are
Thus far
sourceoftheeffect.
and testhypothesesas to theunderlying
to formulate
attempting
an attemptto use open-endedfollow-upquestionsto respondentshas not been successful,
fromone contextto theother.We are stillconsideringwaysof
sincetheanswersdid notdiffer
and as yetare stillfarfromhavingreacheda satisfactory
varioushypotheses,
testingrigorously
conclusionas to thecause of the originaleffect.
This whole process of generalizationand search for cause has required a series of
of the initial
each of whichattemptsto widen and deepen our understanding
experiments,
Ifclosureis achievedin thisone case it willthenbe necessaryto determine
accidentalfinding.
and
whetherothercases can be assimilatedto it;or ifnot,to developseparatesetsoffindings
explanations.
slow. But we suspectthata
and frustratingly
This approachis obviouslytime-consuming
in orderto understandthenatureof,
willbe nec/fessary
investigations
seriesofsuch systematic
In otherareas ofscienceprogress
moregenerally.
and thenresponseeffects
contexteffects,
first,
involvesrepeatedsmallsteps,and thereis no particularreasonto believethatour
ordinarily
If we are to go beyond
in surveyscan avoid similarefforts.
of responseeffects
understanding
and ex post facto explanations,this kind of
merelyproducingad hoc instancesof effects
mediumleveldetectiveworkmay be essential.
In the meantime,the surveypractitionerhas one strongdefenceagainst the kinds of
thatwe have describedin thispaper:use ofmultiplequestions,contexts,and modesof
artifacts
are notrare,buttheyare also notso pervasiveas to occur
research.Damagingresponseeffects
in thesame waywitheverysurveyitem.By tieingan importantconceptto at leasta fewitems
is unlikelyto be
in form,wording,and context,theinvestigator
among themselves
thatdiffer
trappedintomistakinga responseartifactfora substantivefinding.
REFERENCES
ABERNATHY,J. R., GREENBERG,B. G. and HORVITZ, D. G. (1970). Estimatesof induced abortionin urban North
7, 19-29.
Carolina. Demography,
J. Amer.Statist.
withproblemsoflaborforcemeasurement.
BANCROFT,G. and WELCH, E. H. (1946).Recentexperience
Ass.,41, 303-312.
question.Public OpinionQuart.,22, 67-68.

BARTON,A. H. (1958). Askingtheembarrassing
BELSON,W. A. (1962). Studiesin Readership:A Reportofan Enquiry.London: BusinessPublications.
thepresentationorderof verbalratingscales. J. Advert.Res.,6(4), 30-37.

of reversing
(1966). The effects

56
[Part 1,
BELSON, W. A. (1968). Respondentunderstanding

of surveyquestions.(Initernational
Rei'iewon Public Opinion),
3(4), 1-13.
BISHOP, G. F., OLDENDICK,

R. W. and TUCHFARBER,A. J.(1980a). Experiments
in filtering
politicalopinions.Political
Behavior,2, 339-369.
BISHOP, G. F., OLDENDICK, R. W. and TUCHFARBER,A. J. (1981). Question orderand contexteffects

in measuring
politicalinterest.
Paper presentedat the36thAnnualConferenceoftheAmericanAssociationforPublic Opinion
Research,May 1981.
BISHOP, G. F., OLDENDICK, R. W., TUCHFARBER,A. J.and BENNETT,S. E. (1980b).Pseudo-opinionson publicaffairs.
Public OpinionQuart.,44, 198-209.
BRADBURN,N. M., SUDMAN, S. and AssocIATES (1979). Improving
InterviewMethodand Questionnaire
Design.San
Francisco:Jossey-Bass.
CAMPBELL,A., CONVERSE,P. E., MILLER, W. E. and STOKES, D. E. (1960). The AmericanVoter.New York: Wiley.
CANNELL,C. F. (1977).A Summary
ofStudiesofInterviewing
Methodology.
Vitaland HealthStatistics,
Series2, No. 69.
Washington,DC: US GovernmentPrintingOffice.
CANNELL,C. F. and KAHN, R. L. (1968).Interviewing.
In The HandbookofSocial Psychology.VolumeTwo: Research
Methods(G. Lindzeyand E. Aronson,eds), 2nd ed., Chapter 15. Reading,Mass.: Addison-Wesley.
CANNELL, C. F., MILLER, P. V. and OKSENBERG, L. (1981). Researchon interviewing
techniques.In Sociological
Methodology,
1981 (S. Leinhardt,ed.),pp. 389-437. San Francisco:Jossey-Bass.
COWAN, C. D., MURPHY, L. R. and WIENER,J.(1978). Effects
of supplementalquestionson victimization
estimates
fromthe National Crime Survey.Proceedingsof the Sectionon SurveyResearchMethods,AmericanStatistical
Association,1978,pp. 277-282.
FORSYTHE, J. B. and WILHITE, 0. (1972). Testing alternativeversions of AgricultureCensus questionnaires.
ProceedingsoftheBusinessand EconomicStatisticsSection,AmericanStatisticalAssociation,1972,pp. 206-215.
GIBSON,C. O., SHAPIRo,G. M., MURPHY, L. R. and STANKO,G. J.(1978).Interaction
ofsurveyquestionsas itrelatesto
interviewer-respondent
bias. Proceedingsof the Section on SurveyResearch Methods,AmericanStatistical
GOODSTADT, M. S. and GRUSON, V. (1975). The randomizedresponsetechnique:a teston druguse. J. Amer.Statist.
Ass.,70, 814-818.
GRAY, P. and GEE, F. A. (1972). A QualityCheckon the 1966 Ten Per Cent SampleCensusof Englandand Wales.
London: HMSO.
one or both sides of a case. Statistician,
HEDGES, B. M. (1979). Questionwordingeffects:
presenting
28, 83-99.
JAFFE,J.A. and STEWART,C. D. (1955). The rationaleof the currentlabor forcemeasurement.
In The Languageof
Social Research(P. F. Lazarsfeldand M. Rosenberg,eds), pp. 28-34. New York: The Free Press.
in wordingopinionquestions.Appl.Statist.,27, 149-161.
KALTON,G., COLLINS, M. and BROOK,L. (1978).Experiments
ofoffering
a middleresponseoptionwithopinionquestions.
KALTON,G., ROBERTS,J.and HOLT, D. (1980).The effects
Statistician,
29, 65-78.
of the questionon surveyresponses:a review.(Withdiscussionby
KALTON, G. and SCHUMAN,H. (1980). The effect
N. D. Rothwelland C. F. Turner).ProceedingsoftheSectionon SurveyResearchMethods,AmericanStatistical
of questionformon gatheringincomedata by telephone.J.
LOCANDER,W. B. and BURTON,J. P. (1976). The effect
MarketingRes., 13, 189-192.
LOCANDER,W., SUDMAN,S. and BRADBURN,N. M. (1976). An investigation of interview method, threat and response
distortion.J. Amer.Statist.Ass.,71, 269-275.
MADIGAN,F. C., ABERNATHY,J.R., HERRIN,A. N. and TAN, C. (1976). Purposive concealment of death in household
surveysin Misamis OrientalProvince.PopulationStudies,30, 295-303.
NATIONAL CENTER FOR HEALTH STATISTICS(1972). Optimum
Recall Periodfor ReportingPersonsInjuredin Motor
VehicleAccidents.Vitaland HealthStatistics,Series2, No. 50. Washington,
DC: US GovernmentPrintingOffice.
Data byHouseholdInterviews:
An
NETER, J.and WAKSBERG,J.(1965). ResponseErrorsin CollectionofExpenditures
Experimental
Study.Bureau of the Census TechnicalPaper No. 11. Washington,DC: US GovernmentPrinting
Office.
PARRY, H. J. and CROSSLEY, H. M. (1950). Validityof responsesto surveyquestions.Public OpinionQuart.,14,
61-80.
Press.
PAYNE, S. L. B. (1951). The ArtofAskingQuestions.Princeton:PrincetonUniversity
ofa middlepositionin attitudesurveys.PublicOpinionQuart.,
PRESSER,S. and SCHUMAN,H. (1980).The measurement
44, 70-85.
QUINN, S. B. and BELSON,W. A. (1969). The Effects
ofReversingtheOrderofPresentation
of VerbalRatingScales in
SurveyInterviews.
SurveyResearchCentre,London School of Economics.
J. MarketingRes.,16,401ROTHWELL,N. D. and RUSTEMEYER,A. M. (1979). Studiesof Census mail questionnaires.
409.
RUGG, D. and CANTRIL,H. (1944).The wordingofquestions.In GaugingPublicOpinion(H. Cantril,ed.),Chapter2.
Princeton:PrincetonUniversityPress.
variablein surveyanalysis.Sociol.Methods
SCHUMAN,H. and PRESSER,S. (1977).Questionwordingas an independent
and Res.,6, 151-170.

1982]
DiscussionofthePaper by Dr Kalton and Dr Schuman
57
SCHUMAN,H. and PRESSER, S. (1978). The assessmentof "no opinion" in attitudesurveys.In Sociological
1979 (K. Schuessler,ed.), Chapter 10. San Francisco:Jossey-Bass.
Methodology,
(1979). The open and closed question.Amer.Sociol. Rev.,44, 692-712.
(1980).Publicopinionand publicignorance:thefinelinebetweenattitudesand nonattitudes.
Amer.J.Sociol.,85,
1214-1225.
-(1981).
Questionsand Answerson AttitudeSurveys.Experiments
in QuestionForm,Wording,and Context.New
York: AcademicPress.
SCHUMAN,
H., PRESSER,
S. and LUDWIG,J.(1981). Contexteffects
on surveyresponsesto questionsabout abortion.
PublicOpinionQuart.,45, 216-223.
I. M. and BONHAM,
SHIMIZU,
G. S. (1978).Randomizedresponsetechniquein a nationalsurvey.J. Amer.Statist.Ass.,
73, 35-39.
SPEAK,M. (1967).Communicationfailurein questioning:errors,misinterpretations
and personalframesofreference.
OccupationalPsychology,
41, 169-181.
SUDMAN,
S. (1980). Reducingresponseerrorsin surveys.Statistician,
29, 237-273.
SUDMAN,
S. and BRADBURN,
N. M. (1973).Effects
oftimeand memoryfactorson responsein surveys.J. Amer.Statist.
Ass.,68, 805-815.
(1974). ResponseEffects
in Surveys.Chicago: Aldine.
TRAUGOTT,
M. W. and KATOSH,
J.P. (1979). Responsevalidityin surveysof votingbehavior.Public OpinionQuart.,
43, 359-377.
E. (1978). Fallibleindicatorsofthesubjectivestateofthenation.AmericanPsychologist,
TURNER,
C. F. and KRAUSS,
33, 456-470.
WARNER,
S. L. (1965).Randomizedresponse:a surveytechniqueforeliminating
evasiveanswerbias. J. Amer.Statist.
Ass.,60, 63-69.
DISCUSSIONOF THE PAPERBY DR KALTONAND DR SCHUMAN

ProfessorD. HOLT(University
of Southampton):It givesme greatpleasureto propose the vote of
thanksthiseveningand also to welcomeback GrahamKalton,who servedtheSocietyso wellon Section
committees
and Council beforemovingto AnnArbor.May I say,too, how appropriateit is to have,as
authorson thisimportant
topic,twomembersoftheSurveyResearchCenter,Michigan,whereso much
has been done to advance our understanding
of the effects
of questionwordingand presentationon
surveyresponses.
The paper itselfcollectstogethera selectionof resultsand conclusionson various aspects of the
questionwordingproblemin surveys.I use theterm"questionwording"as a catch-allphrase,although
thepaper we have heardthiseveninggoes beyondthemerewordingofindividualquestions.It is clear
that,in the last twentyyears,a greatdeal of workhas been done in thisarea, and yetat theend ofthe
paperone is left,and I thinkthattheauthorssharethissentiment,
withsomefeelingofdisappointment
at
thesize ofthetaskwhichremains.Thisis perhapsnotsurprising
sincetheissueswhichariseare extremely
themagnitudeoftheeffects
complex.Nevertheless,
whichcan ariseare nottrivialas we have seentonight.
Yet in somesituationswe seemto be littlenearerto thedevelopment
ofan acceptedmethodology,
except
in thenarrowcase ofsaying"Ifyou wantto ask thisparticularquestionthendo notdo thisor that."Add
to thistheconclusionlaterin thepaperthat,notonlyare therequestionwordingeffects
butthereare also
contextualeffects,
and thepractisingsurveyresearchermaybe forgiven
forthinkingthatwe are sinking
further
intoa morassofcomplexity.
I offer
no criticism,
ofcourse,to thisevening'sauthors,sincetheyare
rightto draw our attention,once again, to theseproblemsbut what is clearlyneeded,as the authors
indicatein theirfinalsection,is a plannedcampaignto reach a positionwhereat least some of these
effects
are so wellunderstoodthatstandardmethodswillbe adoptedby all responsiblepractitioners.
If,
as theauthorsindicate,theroad is goingto be long and painful,I wonderwhetherit is because we are
sometimestreadingthewrongpath.It seemsto me thatthereis a tendencyto trivializecomplexissuesto
producesimplemeasuressuch as theproportion"in favour"or "against"a particularissue.It mightbe
thatthegrossoversimplification
of issuesis the sourceof some of the largeeffects
observed.
Ifquestionnairedesignand questionwordingis an artform,thenwhatwe needare clearlyenunciated
principles,which the surveymethodologistcan apply in designinga questionnairefor a particular
purposeand in analysingtheresults.Some crudeprinciplesare alreadyavailable,suchas theavoidance
ofobviouslybiasedpractices,butcrudeprinciplesare clearlynotenough,sincesignificant
still
differences
arisebetweenalternatives
whichone would have thoughtwerecomparable.One hopes fora timewhen
explanationsofparticulareffects
are givenin termswhichgo beyondthespecificcontextofthequestion
and I congratulatetheauthorson offering
thesein variousplaces.To be convincing,
such explanations

58
Discussionof thePaper by Dr Kalton and Dr Schuman
[Part 1,
in othersituations.The topic
ofeffects
willneedto standthetestoftimeand to explainthenon-existence
of formresistantcorrelationsis one in which no clear patternhas yet emergedand no theoretical
justificationexists.To illustratemy point,several timesthis eveningwe have heard about specific
methodsthat"thisoftenoccursbut not always".
in
Ifthistopicis a scienceratherthanan art form,thenwe clearlyneed a muchstrongerframework
whichwe observeand thereare one or two places wherethe possibilityof
whichto measuretheeffects
theworkofCannelland otherson theeffects
looks likea distantpromise.In particular,
sucha framework
The authors give a numberof
of question lengthand the completenessof responsesis interesting.
suggestionsas to whyquestionswhichhave been padded out withredundanciesmayencouragea more
to leap in withboth feetand
foramateursocial psychologists
completeresponseand it is not difficult
is to be developedforsuchtopics,thenwe need a wayofcodingquestions
suggestothers.Ifa framework
probablyin psychologicalterms,to yielda measureforthe
and,indeed,wholesectionsofquestionnaires,
question.For example,in Cannell'sworka measureof redundancyor fluencyor timelag betweenthe
of the
introductionof a conceptand the need of a response,or whateverthe relevantcharacteristics
questiondesignare thoughtto be. If such a measureis shownto be relatedto thecompletenessofthe
responsesovera varietyofquestionnairesin formand content,thenwe would have thebeginningsofa
againstwhichfuturequestionnairedesignscould be measured.It is thisdevelopmentto get
framework
beyond ex-post explanations of what has happened, to predict what effectswill exist in future
whichis the most urgentneed.
and to provideguidanceto surveymethodologists
questionnaires,
Finally,may I take issue with the authors on an assertionwhich theymake in several places
Section5. Here I feelthatthereis tacitsupportfortheviewthat,
thepaper and particularly
throughout
on comparisonsis acceptable.Of course
to measure,theconcentration
sincemarginalresultsare difficult
comparisonsare valid in theirown rightbut I have recentlyreturnedfromNew Zealand, wherepress
coveragehas been dominatedby therecentSpringboktour.A greatdeal ofemphasiswas placed on the
that54 percentofthepopulationwas againstthetourtakingplace.It seemsto me thatthe
surveyfinding
desireto derivesome measureof the attitudeof the population-to a Springboktourin thiscase-is
valid and can influencethe politicaland social lifeof a country.If the currenttechniquesfor
perfectly
measuringthisare inadequate,thenthereis a need to derivevalid methodsto measuretheproportion
thata simpleproportionis an inadequate
is to demonstrate
supportinga particularissue.The alternative
new set ofmeasuring
measureofopinionin such a complexissue and to providean alternativeentirely
otherwisesurveymethodologistswill be cast into the role of sayingthat theycannot
instruments;
measurewhat is needed.
review.It givesme
and a well-presented
theauthorson a well-written
May I close by complimenting
greatpleasureto proposea vote of thankson behalfof the Society.
Mr B. HEDGES (Social and CommunityPlanningResearch):I should begin,perhaps,by declaringa
no doubtsharedby manyof theaudience,in thesubjectofthispaper.I spendmytime
specialinterest,
welcomethis
primarilyin conductingsample surveysor on mattersconnectedwiththem.I therefore
ofquestionwordingon responses,and it is
paperas an excellentreviewofwhatis knownabout theeffect
one thatI shall findveryusefulin mywork.
Kalton and Schumanremindus thata greatdeal of workhas been done on thistopic,and thatwe
havelearnta lot fromit.It is notuncommonto meetpeoplewho approachthetaskofdesigningtheirfirst
whatafterall appearsto be an ordinary,
confidencein theirabilityto perform
questionnairewithperfect
everydaybusiness-the askingofquestions.Everyonedoes thatall thetime;it is simpleenough,then,to
devisea stringof questionsthatwillproducegood data readyforsophisticatedanalysis.However,this
paper should make such people thinktwice.Designingquestionsis not easy,and is fullof pitfalls.
Althougha lot is knownas a resultoftheworkthathas been done,whatis not knownis fargreater.
and in thatprocessrefersto theliteratureforguidance,is
Anyonewho triesto designa questionnaire,
likelyto be leftwitha largequantityofunsolvedproblems.I thinkthatthiswillcontinueto be so fora
is speeding
in,questionexperiments
on,and interest
longtime.It is truethattherateofexperimentation
researchtheauthorsadvocateis adopted,ratherthanthe
up. In spiteofthat,and eveniftheintegrated
willsolveonlya small
experiment
piecemealresearchwe have had in thepast,I stillthinkthatsystematic
proportionof the questionnairedesigner'sproblems.
is clearlythe best basis forquestion
Referenceto what is knownfromwell-plannedexperiments
design,but whatabout all the cases in whichsuch knowledgeis not available?My beliefis thatmuch
ofquestions,evenin theabsenceofspecific
could be learntfrommuchmoreopen discussionand criticism
There tends to be an implicitassumptionin the literaturethat in the absence of hard
experiments.

1982]
59
evidencethereis littlepointin speculatingor reasoningabout questioneffects

because any opinion is
likelyto be as good as anyother.That is,ifyou have knowledge,you have knowledge;ifyou do not,you
have onlyopinion,and opinionis not a particularly
valuable commodity.
It seemsto me thatlarge numbersof bad questionsare asked in surveys.I have been askingbad
questionsfor26 years,and I know thatI am not the only one to do so. How are we to improveour
questionnairedesignstandards?Certainly,
as faras we can,by further
systematic
experiment.
But also, I
think,by bringingquestionsundermuchmorecriticaldiscussion.
This is a verysensitiveissue. It is unsatisfactory
to inventquestions to discuss,but it is also
unsatisfactory
to discussquestionsthathave actuallybeen used,because such discussioncan appear to
be a personalcriticism
oftheresearcher
concerned.I knowonlytoo wellhow unfairsuchcriticism
can be,
because the researcherwhose name appears on the reportin whicha certainquestionappears is not
necessarily
thepersonwho inventedit.For example,it is oftennecessaryto repeata questionwhichhas
been used in thepast simplyforthe sake of comparability.
It is, ofcourse,veryeasy to be criticalaftertheevent-forexample,whena questionhas produceda
result.That does not mean, though,that all criticismis illegitimate.Many
clearlyunsatisfactory
questionnaireerrorsare in all probabilityunavoidable,and wouldescape detectionby thescrutiny
even
ofa panel of researchers,
howevereminent.But manycouldbe avoided.I have myselfoftenpolisheda
draftquestionnaireto the stage at whichI thinkit is entirelysatisfactory
forits purpose.I have then
submittedit to colleaguesfordiscussion,onlyto findthattheycan quicklyspotmanyflaws,because my
ownperceptionofwhatI was askingwas incomplete.In someexperiments
withwhichI myself
have been
concernedtheproblemhas beenthelack ofclaritywithwhichtheconceptthatthequestionis intendedto
It is all too easy to finishup witha formofwordsintendedto embodyan
embodyhas been formulated.
idea whichsuperficially
seemsprecise,butin factis not.The persondevisingthequestioncan veryeasily
overlookpossible misinterpretations.
If it is submittedto otherpeople's criticism,
thesemay well be
discovered.
Discussionsofquestionsis undoubtedlya usefulwayofweedingout thoseopen to misinterpretation.
When I read the questionliterature,
even the part of it that containsexperimentson questionsand
I oftenfeelthatthatkind of discussionseemsnot to have takenplace.
questioneffects,
My plea,thenis notonlyformoreexperiments
but formuchmoreopen discussionofquestions,and
muchmorewillingness
to debate theirstructure
and to expose possiblepitfalls.
It givesme greatpleasureto second the vote of thanksforan excellentreviewpaper.
The vote of thankswas passed by acclamation.
Dr JOHNBYNNER(Open University):
The plea at the end of the paper fora propertheoryof and
is one to endorse.Untilwe have sucha theoryofquestioneffects
researchintoquestioneffects
systematic
and its accompanyingstructuralmodel then surveypractitioners
must by necessitygo on with the
in thecontextofa particularsurveyratherthanlook to anygeneral
proceduresofpilotingand pre-testing
guidelinesto helpthem.This raisesthequestionofwhetherwe modelbias in themeanbroughtabout by
inaccuratemeasurementand/orbias in measuresof relationshipsbroughtabout by correlatederrors,
whichmay be a more appropriatedistinctionthan the standardone proposed in the paper between
"factual"and "opinion"questions.
This raises the question: Whenis a bias nota bias?As Kalton and Schuman point out, many
factualquestionscan serveeitherthepurposeofelucidatingfactsor elucidatingperceptionsof
apparently
a personalcondition.In manycases such questionscan serveboth purposesin whichcase the bias in
in itsown rightin relationto thesecond.Ifwe ask a
relationto thefirstpurposeis ofsubstantiveinterest
childwhetherhe smokesor notand he deniesit,or claimsto havegivenup,whenfromthenicotinestains
on his fingerswe can see thisto be patentlynot true,thenthispointsto an ambivalencein relationto
factualstatusand consequentlypossiblya propensity
to changebehaviour(ofcentral;interestin,say,a
healtheducationsurvey).It seemsto me importantto decide,in line withCronbach'sgeneralizability
theory,whetherour focusof interestdoes lie in such individualsbeforewe dismisstheirresponsesas
biased.For in suchcases itis preciselythequestionsthatwillbias themeanthatmaywelldo mostto help
us illuminateambivalence,i.e. throughtheinvestigation
of theirrelationshipwithotherquestions.
This leads to my second question:Whois producing
theBias and Why?withone exceptionin the
printedpaper,the National CrimeSurvey(Cowan et al., 1978),thereis no mentionof sub-groupand
in responseto different
individualdifferences
questionforms;nor is thereany mentionof attemptsto
itself.Ifwe takethetypicalexperiment
elucidatethereasonsforthemwithinthecontextoftheexperiment

60
[Part 1,
comparingtwo formsofthesame question,we findthatsome people conformwiththenull hypothesis

ofthesedifferences
detailedinvestigation
and some people do not,but veryrarelydo we see any further
eitherat thegrouplevelor at theindividuallevelto see whatelsedefinesthem.Thisis ofcourseaskingfor
butwithoutsuchan approachto theproblemof
itself,
thistimeabout theinterviewing
moreinterviewing,
theoryto accountforits originsis goingto come from.
to see whereany satisfactory
bias,it is difficult
Mr F. E. WHITEHEAD (Social SurveyDivision,Officeof PopulationCensusesand Surveys):I would
like to add my congratulationsto the authorson a usefuland comprehensivereviewwhichwill be
profitablyread by all those who, like those in my organization,are concernedwiththe conduct of
keyskillsofthesurvey
surveys.Questionnairedesignand thewordingofquestionsare certainly
interview
but,as people have alreadycommentedthisevening,theyremainmuchmoreofan artthana
researcher
factualquestionsthequestionnairedesignerhas to
science.In designingevenapparentlystraightforward
steerbetweenmanyunchartedhazards. Shortand simplequestionsare oftenconceptuallyvague and
and qualificationsthewhole mouthful
henceunreliable,and as soon as we startbuildingin definitions
and we look forways of breakingit down.
becomesincomprehensible
We have simpleconcepts-at least,we thinkthattheyare simple.Incomeis a conceptthoughtto be
simple,butin one ofour big surveysit takes20 or 30 questionsactuallyto disentangleit.A lot ofsurvey
researchis concernedwithbreakingdown conceptsintotheircomponentpartsand askingquestionson
each one of them.
I wantto givean examplefromtheearlydays oftheGeneralHousehold Surveyofquestionwording
It concernsa questionasked in 1971 to establishtheprevalence
whichseemsto me quitestriking.
effects
incapacity.This was an importanttopicat thetime,just afterwe had
ofchronicsicknessand long-term
publishedtheresultsofa verydetailedsurveyintothehandicappedand impaired;and we werelooking
chronicsicknessovertime.The questionthatevolvedin 1971was: "Do yousuffer
forwaysofmonitoring
whichlimitsyouractivitiescomparedwithmost
illness,disabilityor infirmity
fromany long-standing
people of yourown age?"
In 1972thequestionwas changed,and it was brokenup intotwo separatequestions,thefirstpartof
If theanswerto that
illness,disabilityor infirmity?"
fromany long-standing
whichwas: "Do you suffer
questionwas "yes",we thenasked "Does it limityouractivitiescomparedwithmostpeople ofyourown
age?"
prevalenceratesfromthesetwo questionsforfarfewerpeople answered
We obtainedquite different
"yes"to both partsof the questionin 1972 than answered"yes" to the singlequestionin 1971,as the
followingtable illustrates.
Age group
0-4
5-14
15-44
45-54
65-74
75+
disability
longstanding
Limiting,
ratesper 1000 population
1971
1972
31
55
89
250
412
484
17
36
62
190
329
390
because we suspectedthatthe phrase"mostpeople of your

Later we changedthe questionfurther
on theanswers.We now ask thequestionbyusingthephrase"in anyway",
ownage" was havingan effect
instead.You willnotethatthedrop in prevalencebetween1971and 1972 occurredoverall age groups,
greaterforchildrenthanforadults.This is a ratherdetailedexample,and
but happenedto be relatively
has beenpublishedin the 1972reporton theGeneralHousehold Survey.I thinkthatthisis quitea good
exampleof what ProfessorKalton describedas a "judgement"question:is the existenceof long-term
limitingcapacitya factor an opinion?
I wantto finishbysayingwhatI thinksurveyorganizationscan do-and, in fact,do-to takeaccount
thathavebeendescribed.As an organizationwe tryto detectand eliminateas many
ofsomeoftheeffects
faultsas we can in questionnairedesign and questionwordingat the pilotingstage. Many surveys,
allowanceoftimeand money.
onesincluded,arenotadequatelypilotedbecauseofinsufficient
government
cannotbe properlyansweredwithoutcontrolledexperimentation
Ofcourse,manyquestiondesigneffects

1982]
61
ofthekindsdescribed,but in myviewit is oftenpossibleto weed out questionswhichare patentlynot

at thepilot
and feedbackfrominterviewers
workingproperlythroughconsultationwithotherresearchers
stage.
repliesto attitudequestionsarisebecause theyhave been
I feelthatmanyproblemsin interpretating
knowledgeof the subject
includedwithoutappropriatepriorquestionsbeingasked about informant's
Many yearsago whenI was at the Departmentof Health and Social Securitywe
underinvestigation.
and to certainaspectsofthe
commissioneda surveyofpublicattitudesto Social Securityarrangements
Health Service.The level of knowledgeof theservicesand theway theywerefinancedwas appallingly
gave opinionson thesemattersand, in fact,knewa lot ofthingsabout the
low,but thepublicwillingly
Health Serviceand Social Securitywhichjust werenot so. It is importantto distinguishbetweenthe
informed
and uniformed
opinion.
In seeking
ProfessorMARTIN COLLINS (SCPR SurveyMethods Centreand The City University):
solutionsto the various questioningproblemsraised here,it may help to distinguishbetweenfour
categories:
1. Imprecision.
Imprecisionin framinga question will lead individualrespondents,or sometimes
We need to develop questions that minimizethis
to apply theirown interpretations.
interviewers
thattheymaynot be thequestionsthatcome closestto thetruth.To quote
acknowledging
variability,
fromDeming (1944): "Many biased methods. . . show smallervariabilitythan so-called unbiased
methods.If therelationshipbetweenbiased and unbiasedresultswereknown... thebiased techniques
We need,then,a reliablequestionwitha knownrelationshipto the
would sometimesbe preferable."
truth-an objectivethatat least narrowsthe field.
The problemwitheffects
such as yea-sayingis one of bias among all or partof a
2. Method
efects.
thatcan easilybe mistakenfora substantivefinding.The solutionhereis to
sampleand of patterning
and more
as in thecurrentpaper,so thattheybecomebetter-known
documentand classifysucheffects,
predictable(giventhattheymay be unavoidable).
We knowa good
These problemsincludethoseofrecallor frankness.
taskproblems.
3. Respondent
has been performed.
deal about such problemsbecause the necessarydocumentationand classification
Hence,we have developedways of avoiding-or at least copingwith-problemsthatare predictable.
responsesaccordingto questionformis
The factthatwe obtaindifferent
questions.
difJerent
4. Asking
questions,the answersto both of
not always a problem.It can mean thatwe have asked two different
whichare relevantto the researchobjectives.This applies to many of the examplescited,including
fortwo questionsconcerningattitudestowardsabortion.
effect
perhapsthequestion-order
wantto remove.Indeed,we shouldmoreoftencriticizethe
These are effects
thatwe do notnecessarily
"opinion-poll"approach to a problem that does not deliberatelyintroducesuch variation.The
thaneithersetofanswers
questionsmaywellbe moreinformative
comparisonofanswersto twodifferent
in isolation.It would be wrongin our methodologicalworkto over-statethe importanceof any one
arisesin theimmediatecontextofa
question-or even any one survey.Anygivenresponsedistribution
fullsurveyand in the broadercontextof backgroundknowledge,some but not all of it formalizedor
data-based.
ofa solutionto each ofthesefourproblemtypesis theneed to knowoftheirexistence,
A prerequisite
in generaland in the particularcase. This pointsto some desirabledevelopmentsin surveypractice:
of questions,withformalanalysis;
moreand betterpre-testing
surveydesignand analysisthatexposesfailures;
deliberateredundancyin a questionnairein readinessforsuch failures;
moreuse of good standardizedquestions.
ProfessorLouIs Moss (BirkbeckCollege, London University):This paper contributesto a very
anyverydirectguidanceabout design.Thismaybe because,as the
necessarydiscussionbutdoes notoffer
framework
to whichtheirresultscan be relatedbutI wonder
authorsnote,thereis notyetanytheoretical
whetherenoughattentionhas beengivento whatwe do knowalreadyand can use in designingstudies.It
is not alwaysclear whatis factor opinionor judgement.We know,forexample,thatafteran election
somepeople changetheirsupposedvote(how theysay theyvoted)fromthelosingto thewinningparty.
And it has been shownthatappreciableproportionswho say theyvotedhave not in factdone so. But
in electionsup untilrecentyearsin thiscountryhas been sociallydesirable,and so is being
participation

62
[Part 1,
opinionand judgementswheninvitedto do so on issues

on thewinningside.Similarlywhenpeople offer
withwhichtheyare unfamiliarmaybetheyare only doing theirbest to cooperatein responseto the
to do so. As the authors note howevermany,when given the
urgentpromptingof interviewers
of"no opinion"or "don'tknow"statuswillin factopt out.Ifit seems
opportunity
to opt out by an offer
ofrespondingout oftheirown experience
important
to giveall sectionsofthesampleequal opportunities
or generalizedaccordingly.People
or knowledgethe wordingof questionswould have to be simplified
can onlyrespondout oftheirownexperienceofthesubjectmatterunderreviewor theirperceptionsand
whoseinterests
and patternsofthinking
is actingon behalfofresearchers
beliefsabout it.The interviewer
The resultsdiscussedsuggestto me thatin designingquestionswe
maybe in manywaysverydifferent.
have to be more willingto recognizethe natureof thissocial event-the contactof respondnentand
interviewer.
The relationshipachieved in the interviewwill affectthe response.It thereforeseems
in somewaytheclimateoftheinterview
and perhapsto incorporateitas an explicit
to identify
important
variable in futureexperimentalwork. This would be a logical conclusion to draw also fromthe
experimentalwork of Charles Cannell. As I understandit he showed that the relationshipbetween
interviewerand respondentwas one of the most importantvariables influencingthe outcome of
interviews.
attitudese.g. agree withboth
Clearlyrespondentsdo sometimesexpressapparentlycontradictory
sides of an issue. But on most mattersonly small minoritiesmay be expectedto voice completely
consistentopinions.This means thatin any chosendimensiontheresponseto one questionalone may
saferto constructscalesbased on a clusterofquestionswhichbetweentheminvite
mislead.It is therefore
expressionsofthesame dimension.Thismakesthe"validity"ofresponseto anyone
responsesto different
questiona less urgentmatter.Each questionused to helpconstructa scale mayalso be muchsimplerif,
thewholeofa dimension.The simplertheindividualquestionsthe
it does nothave to represent
byitself,
willinterpret
themin thewaytheinvestigator
desiresand thiswillmake
morelikelyitis thatrespondents
a further
contribution
to validity.
So in somewayswe can alreadybuildon whatwe know.But thisdoes notin anywayreducetheneed
forthe kindof cirticalreviewthe authorshave givenus.
Dr W. A. BELSON(SurveyResearchCentre,London):The paperby Kalton and Schumanis,I believe,
to our developingtechnologyofquestiondesignforsurveywork.It is against
a veryusefulcontribution
the backgroundof thatgeneralstatementthatI am puttingforwardwhat I considerare some of its
weaknesses.
In thefirstplace,theprintedpaperappearsto acceptat facevalue theadequacy ofthemanyfindings
in this respectfromthat devastatingappraisal of postal survey
reviewed.I find it very different
Scottsomeyearsago. Scottdealtwithdefects
methodologyresearchmade to thisSocietybyChristopher
to thetotalpopulation,the
liketheuse ofvolunteersubjects,theuse ofstudentsas a basis forgeneralizing
use ofverysmallsamples,theinadequacyofresearchdesign,theabsenceofstatisticalevaluations,and so
utilizedin thisreviewhave passed thesharplycriticalappraisal
on. I wantto be assuredthatthefindings
of its authors.And I need to be warnedabout the weaknessesof thefindingstheydo present.
I am less than happyabout the way in whichopinion-type
questionsappear to be gradedas nonfactualquestions.Throughopinion
questionswe are seekingfactsabout a person'sopinions.If a woman
that is meant to presenta factas faras her feelings
rates some public serviceas fairlysatisfactory,
are concerned.Moreover,froma validationpoint of view,we can go a long way, using intensive
techniques,towardsfindingout ifshe has correctlystatedthisfactabout her position.I do not think
it helps us in testingthe validityof opiniontypequestions,to call themnon-factual.
in thepaper thatI thinktheauthorsmighthave made more
On theotherhand,thereis a distinction
of thantheydid. It is thedistinctionbetweenopen-endedand closed questions.Indeed it seemsat one
pointas iftheyare equatingthe functionsof the two. That of coursewould be entirelyincorrect.The
in character.It tellsus thediferent
principalfunctionofthegenuineopen-endedquestionis exploratory
kindsof
kindsofbehaviour-theirdifferent
ofideas thatpeoplehave on someissue-their different
kinds
beliefs.To get themlistedfairlyfully,we go on adding more surveyrespondentstillthe numberand
itemsceases to grow.We are thenin a positionto buildthoseitemsintoa head
varietyofthosedifferent
countingsystem-a closed questioningsystem-thathas been validated.It is a mistaketo equate the
of thosetwo methods-or even to riskequatingthem.The distinctionis fundamental.
functions
I think it is unfortunatethat the authors referredto a question that repeated some of its
instructions
and asked forcarefulattentionas beinga "long" question.That is not what we normally
mean by "long". A long questionis more oftenone thatgoes on adding more and more information

1982]
Discussionof thePaper byDr Kalton and Dr Schuman
63
whichtherespondentis requiredto absorbin orderto knowwhatitis thatthequestionerwantsto know

about. A long questionmayalso be fairlycomplexin its structure.
Of thefindingitself,thereis nothing
surprising
really.Anyonewho is in the fortunatepositionof beingable to repeatthe questionand to
somehowinjectinterestinto therespondent,is likelyto get moreconsideredreplies.
But whatwe mustkeepin mindwhenwe sendinterviewers
out to delivertheserepetitive
questions,is
thatmanyofthemare likelyto rip thequestionback to itsessentials.This sortofinterviewer
deviation
frominstructions
is a major considerationin realisticquestiondesign,and I am rathersurprisedthat
theauthorsdon't referto it in presenting
an otherwiseinteresting
result.
Thereare manyotherissuesthatcould and shouldbe raisedin relationto thispaper,butI mustleave
theseto others.Having said that,let me concludeby adding thatI have particularly
appreciatedthe
opportunity
to read thisreview,forit opens up again the veryimportantbut muchneglectedfieldof
questiondesign.
I add myown welcometo theresearchpresented.
Mr N. WEBB (Gallup):Certainly,
Ifwe look through
theliterature
ofthe Royal StatisticalSocietyon surveys,we findall kindsofmaterialabout multistage
samples,concomitantvariables,disproportionate
stratification,
and othermathematical
points-thenwe
findthatit is all to do withthe studyof the rice crop in a provinceof India, whichdoes not help us
practitioners
at all. To a largeextent,theliterature
ignoresthebluntend (I am at Gallup)-that is, the
interface
betweentheinterviewer
and therespondent.
Quiteincidentally,
a friendofminesuggestedto me
thatone verygood way ofpilotingquestionswas to mumblethequestionsto a slightly
deafrespondent.
Thiswouldactuallyreproducewhatwas happeningout therein practicerathermorethanwe mighthave
thought!I commendthatas an idea.
Validationoccurs,shall we say,fortuitiously,
quite often.For instance,our materialis validatedby
the actual vote in elections,whateverpeople may see in the cheaper newspapers.Again, we have
in a forthcoming
foundthatthedegreeofinterest
generalelectionwas a betterindicatorofturnoutthana
questionabout whethertheywould turnout or not. That is odd, but we have validatedthat.
We had a complicatedstudyon skinconditionsin whichitwas possibleto validatethefrequency
that
peopleclaimedto havegoneto thedoctorabout theirskinconditionbymeansofdoctors'records,which
advice-which was whatwe were
gave us a greatdeal offaithin otherdata such as theirgrandmother's
reallytryingto findout about-when theydid not go to theirdoctor.
I would like to tryto relatea near horrorstory.We wereconductingan experiment
on television
research,whichwouldprobablycostabout ?4millionnowadays-this was nearly20 yearsago. Although
I was in co-chargeof theresearch,I pilotedsome ofit. We did our pilotworkin Cambridgebecause I
wantedto visitmyold College.The firstquestionafterringingat thedoor was to ask whethertheywere
watchingtelevisionwhen they heard the doorbell ring. That seemed to be all right;a perfectly
situation.A lady said thatshe was, and we wenton
straightforward,
simple,factual,easilyidentifiable
withthe questionnaire.Afterwe had finishedI asked her anotherquestion-I told her thatshe had a
and as I rangthebellI saw thatshe was in thekitchen
picturewindowthroughwhichI could thekitchen,
washingup,yetshesaid thatshewas watchingtelevisionwhenthedoorbellrang.She repliedthatshe was
watchingtelevisionwhenthe doorbellrang,but she had just slipped out to wash up a fewcups. We
suddenlyrealizedthatwhat we thoughtwas a precisequestionwas not a precisequestion.Since this
wholestudywas carefully
timedwitha view to measuringtheaudienceto commercialbreaks,among
otherthings,the whole projectwould have been destroyedhad we not had the luckychance in our
pilotingto visita home bulitin such a way thatwe could observewhat was goingon.
I would like to mentiontwo pointsin thisarea. First,we do a lot ofinternational
research:thatis,
researchis done simultaneously
withthe same questions,say, in France, Germany,Italy and Great
in meaningand so on. I knowthatthere
Britain.Thereare problemswithtranslation,
correspondence
has beensomeresearchdone on whetherverbalscaleshave thesamemeaningsin different
languages,but
thisis a problemwhichwe findveryardous to solve.The morewe do this-we are doingthesesurveys
almosteveryweek-we findthatit is not theproblembetweenGermanand English(thatis no problem
because we know thatwe are dealingwitha problem),but theproblemis betweenVenezuelanSpanish
and Chilean Spanish,Belgian Frenchand FrenchFrench,or AmericanEnglishand EnglishEnglish,
wherewe do not necessarilyrecognizethe problem.It occurs to me that maybe withincountriesas
variousas Britain,withScotland,Wales,theNorth,theSouth-Westand so on,we maythinkthatthereis
a culturethatis sufficiently
uniformto have a nationalquestionnaire,but thatthereare problemsof
culturalvariationwhichmean thatthisquestionnaireneed not be valid throughout
a nation.The same

64
[Part 1,
maybe truein America,withtheWestCoast, theEast Coast, theSouthand Mid-West.Nobody seemsto

have addressedhimselfto thatquestion.It may be a veryreal one.
The titleof the paper is "The Effectof the Question on SurveyResponses. , so thereis another
pointI wantto raise.We have a responsibility
to developa questionnaireon a generalissueto coverthat
issuefully,or adequately.This cannotalwaysbe done. If you happento be an anti-abortionist
pressure
group,you willcommissionus to do a surveyon theproblemsofdoctorsand nursesinvolvedin abortion
in termsoftheirprofessional
careerand professional
conscience.However,ifyouare pro-abortionist,
you
willask us to do a surveyofonlyfertile
women-and nobodyelse-to see whattheythink.Ethically,we
are doingtherightthingin bothcases,butwe knowthatto someextent,bythesimplechoiceofthetopic
area, we are beingmanipulatedor utilizedin some way whichworriesme fromtimeto time.
We feelthatthereis also a questionofmoneyinvolved-perhapstheRightand therichcan finance
opinionpolls,or social surveys,whereasthe Leftand the poor and the weirdoscannot.An important
ethicalproblemis raisedhere,and itis somethingto whichwe shouldin factaddressourselvesin a much
widercontext.
ProfessorT. M. F. SMITH (University
ofSouthampton):It givesme particularpleasureto see Graham
Kalton at theSocietyagain and to add mycongratulations
to himand Howard Schumanon theirpaper.
Grahamand I have sharedmanycommonfeaturesin our careersnot theleast ofwhichis our research
interestin the theoryand practiceof samplesurveys.
WheneverI lectureon samplesurveysI alwaysstartby listingthepossiblesourcesoferror.Having
gone througha massivelistofnon-samplingerrorsI arrivefinallyat statisticalsamplingerrors.I then
have to admitthatdespitethefactthatthemajorityoferrorsare non-sampling
errorsthemajorityofmy
I thenhave to justifythis
lectureswillbe devotedto samplingerrrorswhichrelatively
are unimportant.
on thegroundsthatitis onlyforsamplingerrorsthatwe have a satisfactory
statisticaltheory;or shouldI
say theories.
This paper is welcomebecauseit remindsus about theimportanceofquestionwordingand concept
in thepracticeofsurveysampling.It is quiteclearthatthese
in otherwords,ofmeasurement,
formulation,
errorscould easilyswampsamplingerrors.The disappointing
aspectofthepaperis thatat theend we are
the wordingof questions.We have to relyon
stilllackingan agreedtheoreticalbasis fordetermining
thatuncommonqualitycalled commonsense,and it is hard to teach this.
In thispaper theauthorsdraw attentionto Gray and Gee's excellentevaluationoftheerrorsin the
1966census.Am I alone innotbeingsatisfiedwiththisyear'scensusform.I foundsomequestionshardto
interpret
and I mustadmitto makingan errorin myresponsesto theoccupationquestions.On further
timeand had to be
enquiryI was toldthata veryhighproportionofformswereincorrectly
completedfirst
This is surelynot satisfactory
correctedsubsequentlyby theenumerator.
by any standards.Is it worth
askingsuch errorpronequestions?
of
Anotherexampleofinterestto me is aircraftnoise surveys.A major problemis themeasurement
the respondent'sannoyance to aircraftnoise. Aubrey McKennell has devised several scales of
scale and some morecomplexGuttmanscales based on
whichincludea simpleself-rating
measurement
activitiesdisturbed.Aubreyand I used thelatterscales in our analysesbutI stillfeelthatthesimpleselfnoise?"mayhave been
ratingscale based on thequestion"how annoyedor disturbedare you by aircraft
of the activities
just as appropriateas a measureof annoyancealthoughit lacks the interpretation
disturbedscale.
My questionsto the authorsare the following:
(i) What criteriashouldwe have adoptedforchoosingbetweenthealternativescales formeasuring
annoyance?
(ii) Would Graham be preparedto give Mr Heseltineand Membersof Parliamenta lectureon
on local authority
questionwordingbias beforewe have compulsoryreferenda
spendingimposed
on us.
May I expressmythanksto both authorsfortheirexcellentreviewpaper.
I would like to have seen rather
Dr F. H. HANSFORD-MILLER (InnerLondon EducationAuthority):
to it in Section 3(f)-"Order of
more discussionon the orderof questions.There is a shortreference
in Section2 theauthorsreferto employment
interviews.
One wouldexpectthatthe
alternatives"-whilst
does that
candidatesin thesewouldbe linedup in alphabeticalorder-that is myexperience.Whateffect
alphabeticalorderhave on the resultof theinterviews?

1982]
oE
:t^~~~~L
ce O-
I- nC
I.
cn
C~~~~~~~~~~~~~~~~~
O4
rq
En 0,
>
U~~~~~
m WX w
O c
11s tj
W O"
m
XN?m^N^
tr
O Sn ca
"
t1
I.+
Dc
;~~~~~~~~~~~
s O\
I+
65
tn
" cn CD W
)0 Ot
cyo>oon
o?^ôv
xn O O: O :
00
N w
Ô
n W)
O)
O"
tn
'It
OON
^m
n
tn t- C ) cn t?111
cnW>,
I t
n C) O
B~~~~~~~~~r
(Z
R+
3N^vL
qXxN?v
ti -o
aftl
11
tn :t 00
- CN 00
tN "O
00
. b 4
~~~~~~~~C;
ra
C;
RV
tl
umno
sn
anc
uOdla
C )yo
oo
I_
-X
ho0
st.
tn
"t
:t
on "O
ya
cn
) 00
cn
w (1
cn
r-
O 6
n_ __ ,O cn __
cn
81 1
do 4
o m??o
t NC
,b 6
V) 0,
tn .I
iwn
f4
@
>
C# ,9,
*--4Uv1v,
^NFtôX
uuawin1-s"nn
l>s~f .,1,7
v?o0
a~
O4
Zt 0
l!?a
ôoFNm

N Ns )0s 0s
OOO
66
[Part 1,
This questionoforderis somethingthatI feelneedsto be broughtin. Of course,thereis theworkof

Brook and Upton,relatingto variousaspectsofbias in electionsdue to thecandidate'spositionon the
ballot paper (Brook and Upton (1974) Upton and Brook (1974) and Upton and Brook (1975)). They
positionis
oftheOptimumPositionon a BallotPaper" thatfirst
showin theirpaper"The Determination
fora
less favouredthanthosepositionswhichfollowit,and suggestthatgood strategy
almostuniformly
candidatenamed Smithwould be to changehis name to B. Smith.
It was in 1977 thatI was struckby thefactthatof the40 Fellows electedin thatyearto theRoyal
that
SocietytherewerethreesurnamesbeginningwiththeletterA, 8 withB and 5 withC. I feltintuitively
thereseemedto be a rathergreateremphasison thehigherreachesofthealphabetthanon thoselower
down,althoughthatyeartherewas infacta Z. I did nothingabout thatuntilthisyear,1981,whenI found
to show
thatonce again the Fellows electedto the Royal Societyseemedin theirsurnamedistribution
similarevidenceof bias, withno A's, 7 B's and 4 C's, again in a total of 40 names.
I have now checkedon FRS electionsforthe twentyyears 1962-81,a Total of 679, and theseare
accordingto theinitiallettersoftheirsurnames.How do we find
shownin Table 1,column4, distributed
the expectedfrequencieswithwhich to compare them?My methodwas to take sample Telephone
Directoriesfromvariousgeographicallocations in the United Kingdom (BritishTelecom Telephone
Exeter,Leeds,London and Manchesterfrom
Canterbury,
in particularBirmingham,
Directory,1980/81),
fromWales and N. Ireland,and fromthistotalof 10 175 pages
England,Glasgow fromScotland,Cardiff
namesto countthenumberofpages devotedto each alphabeticalletter.These are shown
ofsubscriber's
in Column 2, withthe correspondingpercentagesin Column 3. The Expected numberof FRS per
surnamecategorycan thenbe easilycalculatedand theseare shownin Column 5 of theTable.
I thencarriedout a chi-squaredteston the data, althoughsome mightconsiderothertestsmore
lettersI, Q, U, V, X, Y, Z. Chi-square
and to do so had to grouptogetherthesmallfrequency
appropriate,
amountsto 33-28,whichis greaterthan the P = 5 per cent value of 30 14 and also greaterthan the
P = 2 5 per centvalue of 32 85. This suggeststhatFRS electionsare indeedsubjectto initialsurname
to note that the letterB, as in Brook and Upton's
letterbias and in thisconnectionit is interesting
category,followedby K, R and W. The leastpopularletters,
research,is again themostover-represented
are T, M and A. Althoughit is perhapsreassuringthatthe augustelection
withunder-representation,
processesof the Royal Societyappear to be subjectto the same influencesto bias as themorehumble
citizensvoting in an election,I do suggest that if nominationsfor Fellowship are processed in
alphabeticalorderI have certainlymade a case forsome morerandomprocessto operatein thefuture.
ofmyTelephone
or otherwise,
BeforeleavingthismatterI feltthatI would likesome confirmation,
did a similarteston thesurnamesofMembers
Directorymethodofallocatingsurnameletters.I therefore
of Parliamentsittingin the UnitedKingdomParliamentin 1981 (Whitaker,1981).The figuresforthis
ofthevalidityofmymethodin
testare shownin Table 1,columns8-11. The resultis someconfirmation
that,withchi-squareforM.P.'s namesof 16 41 and P = 5 percentof28 87,thereis no suggestionofbias
in M.P.'s selectionby names,unlikethatwiththe Fellows of the Royal Society.
Finally, as we seem to be approachinga time when we shall having voting in electionsby
vote,it is worthrecallingthework
in thiscountrywitha singletransferable
proportionalrepresentation
of M. J. Mackerrasin Australiawheretheyalreadyhave thissystem.He has shownthatthe "donkey
vote",thatis people voting1,2, 3 4 down thelistwithoutanyknowledgeofthecandidates,accountsfor
about 4 percentofthevoting.Withvotingcompulsorythiscould be in parta protestvotebutevenso to
supportersof P.R. likemyselfit is ratherchasteningthatin a closelyfoughtelectionthis"donkeyvote"
could be enoughto decide theissue.
Mr BRIANALLT(Head ofMarketResearch,MirrorGroup Newspapers):As a guest,I wouldlike to
paper.
thankthe Societyand the speakersfora veryinteresting
response
to theneed forresearchintoall thevariousfactorsthatmayaffect
We have heardreference
though,we have to go on designingquestions-those ofus who workin
to questions.In themeantime,
whichI have foundhelpfulto theway in whichone thinksabout theproblemof
thefield.A framework
questiondesigninvolvesan initiallysmallchangein thetitleofthepaper.What I shall say is, I believe,
behavioural areas where people have perfectrecall of the
operativefor relativelystraightforward
occasion.Whetheror not it is truein thatextremecase, I thinkthatit is certainlyapplicablein all the
morecomplicatedthings,such as behavioura long timeago, attitudes,politicalopinions,etc.
ofthephysicalscienceson what
we have triedto imposethestructure
It seemsto methattraditionally
we do in surveydesignas ifthereis a clear-cutobservableentitythere,and thatifwe could onlybreak
open therock,we wouldfindthisfossilizedfact,behaviouror attitude.It just does not correspondwith

1982]
DiscussionofthePaper by Dr Kalton and Dr Schuman
67
anyofmyexperience.
We makethemistakeoftalkingabout questions
and responses,whereaswe should
talkabout stimuli
and responses.All our questions,evenon themostsimplethings,are stimuli.Whatwe
get back are responses.The only objectification
of what we thinkthat we are measuringlies in the
wordingof the question-an imposedstructuring.
Mr Hedgeshas alreadyreferred
to thegap betweentheconceptthattheresearcher
thinkshe is asking
about and how he actuallywritesthe question.We thenhave the gap betweenthatand the way the
interviewer
administers
it,followedby anothergap betweenwhattheinterviewer
administers
and what
Ifwe go fromtheresearcherto theinformant
theinformant
understands.
thereis an enormousdistance,
and thesample'sanswerto anyquestion,I would hypothesize,
mustrepresent
thenetresultofa rangeof
interpretations
ofthestimulus.Ifwe getdifferences
whenwe changea questionor ask thesame question
in a different
in thedistribution
way,probablywhatthatrepresents
is a difference
ofinterpretations,
and
we are lookingat the netresultof that.
This may sound fairlypessimistic,
but it suggestswhatone or two otherspeakershave referred
to,
thatpeople shouldbe asked muchmoreabout whattheyunderstandby thequestions-what theythink
we wantto know.Certainmethodsof pilotingneverreallytacklethat;theytot up theanswersto one
wordingor anotherwording,but neverget at whyresultsdiffer.
Pilotingcan be usefulor useless,
whereI want
dependingon how it is done. In a methodI use,whichI call structured
depthinterviewing,
to quantify,I ask virtuallythe same questionseveraltimesforabout 15 or 20 minutes,veryslightly
worded.I have foundthatdifferent
differently
people startto "give",startto "flow",at different
pointsin
thissequence.That tiesin withthepointabout longerquestionsand whatDr Belson said-give people a
chanceto thinkabout thetopicand you getdifferent
replies.Ifwe flasha lightin somebody'seye,we do
not expectthateveryonewillblinkat exactlythe same numberof seconds afterthatflash-thereis a
of responsesto theflash,and likewiseof interpretation
of questions.
distribution
For reallyimportant
issuesI thinkwe mayhave to faceup to somethingwhichis ostensiblyverynonscientific,
perhapsaskingpeople how theywould like to talk to us about the subjectin whichwe are
interested.
Standardquestionsare not necessarilythe bestway to obtain equally accurateinformation
fromall people. Is it more importantto have identicalstimuli
whichsatisfysome shackle-likegoal of
answers?In that sense, I think that perhaps we should
"scientific"rectitudeor identically-based
sometimesreappraiseour targetswhensurveysare beingdesigned-what it is thatwe are tryingto do
and forwhat purposes,i.e. "just ask questions"or guide problem-solving.
The followingcontributions
werereceivedin writing,
afterthemeeting.
Dr AUBREY MCKENNELL (Universityof Southampton):The authors are to be congratulatedin
whichis at once wideprovidingus withan accountoftheperplexingvarietyofquestionwordingeffects
ranging,succinctand insightful.
Many people are goingto wantto referto thispaper forthatreason
alone. On itsinnovativeside,two themesin thepaperstandout forme.One is theemphasis,wherenonfactualor at leastopinionquestionsare concerned,on multipleindicators;theotheris theavowedaim to
These two aspectsare not onlyrelatedbut,in myview,need to be
build and testtheoreticalstructures.
theoriesabout the operationof single
developedtogether.I am inclinedto doubt whethersatisfactory
opinionitemscan be establishedotherwise.
It is now approachingfortyyearssince the psychometrician
Quinn McNemar (1946) publishedhis
on thewordingofopinionquestions-an abundantliterature
classicalreviewofexperiments
eventhenand came out firmly
withtherecommendation
thatsingle-item
opiniongaugingbe discardedin favourof
a multi-item
of
scaling approach. Broadly,speaking,that advice has been ignoredby the fraternity
professionallarge-scale surveyresearchers,but has been followed and been the basis of many
Attitudescalingmethodshave become
developmentsby academic researchers,
notablypsychologists.
and
increasinglysophisticatedsince McNemar's day. Techniques such as the semanticdifferential
smallestspace analysis,forexample,allow subtlevariationsin the connotationsof verbalitemsto be
in psychometrics
would seemrelevant
preciselymapped.On thefaceofit,theseand otherdevelopments
hardto traceanyinfluence
to thestudyofissuesin thewordingofsurveyquestionsbutit is surprisingly
in the way such issueshave customarilybeen investigated.
It could be thatthetraditionalsplitballot approachwithitsfocuson one questionvariantat a time
fromthe disciplineof psychometrics.
The psychometric
has obscuredthe possibilityof contributions
focusis on seriesofitemsand on theitemintercorrelations
ratherthantheirmarginals,it is true,butthe
It oughtto be possiblein principleto
basic notionsof trueand errorscoreswould seem transferrable.
in the technicalpsychometric
compare question wordingvariantsfor reliability,
sense, or even for

68
[Part 1,
validity,in termsof theirrelativeloadingson an underlying

factor.Identification
of the best question
formbythismeanswouldachievesomethingwhich,as theauthorsnote,splitballotexperiments
usually
failto do.
The applicationofpsychometric
principlesin newareas is beginningto look morefeasiblewithrecent
instructural
developments
equationmodelling(cf.Alwin,1979).It maybe possibleto construct
modelsin
whichtheoverallbias in marginalsis reflected
in methodeffect
parameters(thatis in termsofcorrelated
error),and to do thisevenwheretheassumptionof"formresistantcorrelation"holds.The typeofmultitraitmulti-method
experimentaldesign requiredhas already been applied withsome success to the
analysisof subjectivesocial indicators.It provedpossible to orderthe variantsin ratingscale format
commonlyusedin surveysofsubjectivewell-being
in termsoftheirvalidity,
randomerrorand correlated
error(methodeffects)
components(McKennell 1980).The findingshelp to counterat least some of the
difficulties,
mentionedby Kalton and Schuman,ofcomparingsocial indicatorsacrossa seriesofsurveys.
WorkI have in hand withtheSCPR MethodsCentreis aimedat seeingifthenewmethodologycan be
combinedwiththesplitballot techniqueand generalizedto questionwordingeffects.
Thereare special
difficulties
whichwould take too long to outlinehere.Formateffects
are not quite thesame as wording
and it remainsto be seen ifthe lattercan be investigatedwitha similarkind of experimental
effects,
design.
Thereis goingto be no royalroad to thestudyofquestionwordingeffects.
I suspectthatprogresswill
dependon breakingout ofthemould intowhichinvestigations
have been set by thefirstresearchers
in
thefield.The initialpioneeringwave ofintensiveinvestigation
peteredout morethana generationago as
it became evidentthat understanding
was not beingfurthered
by yetmore splitballot trialsof single
items.The researchtraditionestablishedwas remarkableforbeingso atheoretical.
It is notjust thatpast
have been isolatedformeach other;theyhave also, as a whole,been peculiarlyinsulated
experiments
fromdevelopments
in alliedfields.Our presentauthorsshowsensitivity
to suchdevelopments.
Theyrefer
to recentworkon cognitivetheoryand theslightly
olderpsychologicalliterature
on acquiescentresponse
sets.Besidesthese,and thedevelopmentsI have myselfcited,theremustbe manyotherswhicha more
activescanningwithinsuch disciplinesas psychology,sociologyand linguisticswould show to have a
bearing.Older developmentsignoredby the pioneeringgenerationof surveygenerationresearchers
mightbe looked at afresh.(One thinksfor example of the knowledgeaccumulatedby the early
on differential
about judgementsin an intervalof uncertainty-intheirexperiments
psychophysicists
thresholds,and on the way internalizedframesof referenceoperate to determinea distributionof
choose not to extendtheirresearchefforts
in
responsesoververbalcategories.)The authorsdeliberately
and followa piecemealbuildingsuchdirections.Rathertheystartwitha specificquestionwordingeffect
block approach toward its explanationin broader terms.It remainsto be seen how viable this
slow"approachwillproveoverthelongterm.The alternative
"frustratingly
approachwouldbe to invest
effort
establishedelsewhere,
and thenattempt
initiallyin thesearchforrelevantexplanatoryframeworks
in the lightof these.It is perhapstoo much to expect
to model examplesof questionwordingeffects
and I am not surethattheworkofderivingtestable
existingtheoriesto be directlytestableimmediately,
modelsby thisrouteis any quickeror less arduous thantheapproachadopted by theauthors.But the
dearthofgood theoriesis a principalreasonwhyresearchin thisfieldlanguishedin thepast,and a more
a fresh,sustainable
determined
searchforexplanatoryprinciplesmay be a preconditionof introducing
momentum.In any case the two approachesare not inconsistent.
The choice betweenthemis perhaps
bestrepresented
as one ofbalanceor relativeemphais.I wouldhope thatauthorsin theirsearchforcause
and testablehypothesesrelatingto theparticularwordingeffect
theyhave singledout forattentionwill
and be preparedto consider
fieldsmoreintensively
thanhave previousresearchers,
studyneighbouring
experimental
designsthatgo beyondthe splitballot approach.
Dr CHRIS SCOTT(World FertilitySurvey):Withoutcriticizing
the authorsforwhatis a veryuseful
review,I wouldliketo raisea basic issue.It appearsto me thatin this,as in most,discussionsofresponse
errora keyelementhas been ignored,namelythatof the questionobjectives.
A centralassumptionindrafting
is thatthequestionmeansthesame to respondents
questionnaires
as
it means to its author. When we learn that the numberof respondentsagreeingwith A may be
substantially
largerthanthenumberdisagreeingwithnot-A,thisassumptionis thrownintodoubt,forin
thelanguageoftheauthorofthequestionthesetwo mustbe thesame.This is not a matterofresponse
or instability,
variability
althoughitis revealedbysuchinstability.
The problemis whetherthequestioner
has succeededin askingthequestionhe wantedto ask. Ifhe has not,theneveniftheresponsevarianceis
zero and the relationshipsare perfectly
form-resistant
he is in trouble(or oughtto be).

1982]
69
Thereis a tendencyto overlookthisissuein a systematic

way.For a minorexamplein thispaper,see
thediscussionon offering
a middlealternative.
The issueis presentedas ifwe had twoalternative
waysof
presenting
thesame question,withdifferent
resultsdependingon theway selected,so thatthematteris
seenas one ofquestiondesign.But ifquestionauthorswereclearheadedenoughtheywould not regard
the two questionsas beingthe same and the issue would be one of theirintention:do theywantthe
expressionof opinionwiththemiddleviewexcluded,or not?
In muchthesame way,theattemptto overcomequestioninstabilities
of multiby theintroduction
questionattitudescales ignoresthequestionofcontent:
evenifit worksit yieldsan improvedmeasuring
instrument
onlyat theexpenseofgreatervaguenessabout whatis beingmeasured.That theseissuesare
not generallyperceivedas crucialis perhapsa reflection
of thefactthatquestionauthorsusuallydon't
muchmindexactlywhattheyask. This leavesthemethodologist
freeto treattheissueas one ofquestion
design.But ifthequestionauthorreallydoesn'tmindwhichoftwoversionshe asks thenhisneedsare so
imprecisethathe is unlikelyto mindmuchabout thevariationbetweentheirresponsesdiscoveredbythe
methodologist.
Perhapsthisexplainsthegenerallylow levelofinterestin,and supportfor,methodological researchfromthesurveyusers.
Thereare,ofcourse,someusers-electionpollstersand householdbudgeteconomists,
forexamplewho do knowalmostexactlywhattheywant.For them,researchon questiondesignis highlyrelevant.
ProfessorSEYMOUR SUDMAN (University
ofIllinois):Thisis an excellentand thoroughreviewofa very
important
topic.Responseeffects
are typically
thelargestsourceofsurveyerrors,and itis encouragingto
note thatthe problemhas obtainedincreasedattentionin recentyears.I have no disagreements
with
whatis said. My commentsare intendedto supplementsome of the authors'discussion.
Kalton and Schumandescribetheuse offilter
questionsto screenout respondentswithoutopinions.
Anotheralternativeis to precedethe attitudequestionwithone or more knowledgequestionson the
same topic.Thisprocedureis notnew.It was developedand has beenusedextensively
byGeorgeGallup.
The use ofproperlyconstructed
closedquestionsto obtainmeasuresofattitudesis describedas more
valid thanthe use of open questions.It is moreimportant,
however,thatthisneed not be the case for
factualquestions,especiallyif the topic is sensitive.Bradburn,Sudman and Associates(1979) found
substantiallyhigherlevelsof reporteduse of beer,wine and liquor and of sexual activitywhenopen
questionswereused,ratherthanclosed questions.
The authorsnote that,as expected,the use of a middlealternativeclearlyincreasestheproportion
selectinga neutralviewwhenthreepointscales are used.Ifone wishesto allow fora middlealternative,
butto avoid bunchingin thecenter,one solutionis simplyto increasethenumberofpointson thescale to
5, 7 or 9.
One topicthatis notcoveredsinceitis slightly
butis offrequent
tangential,
concernto usersofsurvey
data, is theeffect
ofmethodofadministration.
The generallyencouragingfindings
are thatresponsesdo
notdiffer
significantly
bymethodofadministration
foreitherfactualor non-factual
questionsifthetopic
is not sensitive.For questionsdealingwithsociallydesirablebehavior,thereis some indicationthatless
personalmethods,mail and telephone,producemorevalid information
thando face-to-face
interviews.
For sociallyundesirablebehavior,however,thethreatofthequestiongenerallyswampsanydifferences
in
methodofadministration.
Even theuse ofsealed ballotsand theothertechniquesmentionedby Barton
(1958) do not producesatisfactory
answers.Two methodsthatseem worthyofadditionalexperimentationare theuse ofgroupadministration
ofanonymousformsand theuse ofrespondents
to reportabout
thebehaviorof close friendswho remainanonymous.
The AUTHORS repliedlater,in writing,
as follows.
We thankall thediscussantsfortheirvaluablecontributions.
We are in broad agreementwithmany
of theirremarks,and will therefore
confineour responseto some briefcommentson the issuesraised.
Severaldiscussants(ProfessorHolt,Mr Hedges,Dr Belson,ProfessorSmithand ProfessorSudman)
commenton the long way we have to go in understanding
questioningeffects,
and we welcometheir
supportforour viewthatthesubjectis a difficult
butimportantone. We are pleasedthatProfessorHolt,
Dr Bynnerand Dr McKennellemphasizedtheneed to developmoregeneraltheoreticalframeworks
to
whichwe considerto be theessentialdirectionforfutureresearch.We are convinced
explaintheeffects,
thatsystematic
researchprogrammesare requiredforthischallengingtask.
The task ahead, however,should not obscurethe factthatwe have learneda good deal frompast
research,as Mr Hedges and ProfessorMoss note,even ifit is mainlywarningof major pitfalls.Yet,
despitetheguidancethatpast researchprovides,obviouslypoor questionsare stillprevalenton survey

70
[Part 1,
These questionsoughtto be eliminatedat thedesignstagebut,as Mr Whiteheadpoints

questionnaires.
timeand moneyoftenlead to inadequatepiloting.We feelthatoftentoo littleattentionis
out,insufficient
givento questionnairedesign,and thatmanysurveyswouldbe muchimprovedifmorecareweretakenin
thiscrucialstageofthesurveyprocess.We recognizeMr Hedges's pointthatthedesignerofa question
ofit,and see value in his suggestionthata questionnaireshouldbe
oftenhas a blinkeredinterpretation
We also believe,likeProfessorCollins,thatmorethoroughpilotingis
reviewedbya panel ofresearchers.
describedby Mr Allt,where
depthinterviewing
oftenneeded,perhapsalong thelinesof thestructured
that is practicable.A tellingpoint in Mr Webb's account of his discoveryof the problemwiththe
televisionviewingquestionis thathe carriedout some of the piloting;a good case can be made that
researchersshould routinelytake part in piloting,thus obtainingfirsthandexperienceof how their
All
thereportsofinterviewers.
questionnaireworksin practiceratherthanrelyingon (or evenneglecting)
in
thisdevelopmentalworkmusttake into account thatthe questionnairewillbe used by interviewers
settings;Mr Webb's suggestionofpilotingquestionsby mumbling
uncontrolledand sometimesdifficult
them to a slightlydeaf respondenthas much to commend it! As Dr Belson observes,realistic
Although
oftendeviatefrominstructions.
questionnairedesignmustalso recognizethat interviewers
morecarefulquestionnairedesignmayhelpto reduceconsiderablythenumberofclearlybad questions.
can,we believe,
oftheseeffects
discussedinourpaper.Understanding
itwillnotaddressthekindsofeffects
research.
come onlyfromexperimental
We are in full agreementwith Dr McKennell that those attemptingto develop theoriesfor
to
should searchtheworkofrelateddisciplineslikepsychologyand socio-linguistics
questioningeffects
However,in our workwe have not yetuncoveredany ready-madetheories
see whattheyhave to offer.
and forthisreasonwe do not anticipatespeedyprogressfromsuchcrossthatweredirectlytransferable,
line is to interestresearchersin these disciplinesin long-term
Perhaps a more fruitful
fertilization.
applytheirgeneraltheoretical
to see iftheycan profitably
collaborativeresearchon questioningeffects,
structuresand researchmethodologiesto our problems.Our approach to theorydevelopmentfor
theoriesto explainthem;whena
and attemptsto identify
startsfromknowneffects
questioningeffects
possibletheoreticalexplanationarises,thenextstepis to testit by seeingwhetheritspredictionshoid,
sometimeswithotherquestions.As Dr McKennellnotes,one could startinsteadwitha theoryfroma
relateddiscipline,and constructquestionsto testits applicabilityin thequestioningcontext.We prefer
our approach partlybecause of the lack of any clearlyrelevanttheoryin otherdisciplinesand partly
problems;however,it may be possibleto make
because we wantto stayclose to thesurveyresearcher's
headwaywiththeotherapproach.AnotherpointDr McKennellmakesis thatresearchon questioning
equation
methodsin conjunctionwithstructural
effects
maybenefitfromtheapplicationofpsychometric
modelling.There are some attractivepossibilitieshere,and we look forwardto seeingtheresultsof the
researchhe is conductingin collaborationwiththe SurveyMethodsCentreat Social and Community
PlanningResearch.
ProfessorCollins, ProfessorMoss and Dr McKennell commenton the need to use a numberof
We fullyconcurthatthisshould be
of responseartifacts.
questionson an issue to minimizethe effects
fromitems
however,thereremainstheproblemofcumulativeeffects
done forthekeyissuesinvestigated;
can includeonly
havingthesame formand also theproblemofincidentalissueson whichtheresearcher
one or two questions.On the latterpoint,ProfessorHolt gives the specificexample concerningthe
Springboktour of New Zealand. It would be valuable to be able to estimatethe proportionof the
thetour,butcareis neededin assessingtheanswersto a singlequestionsinceboth
populationsupporting
changes in wording(or even placement)can shiftunivariateresults
substantiveand nonsubstantive
of
and also handletheoversimplification
markedly.Multipleitemsreducetheriskofmisinterpretation
at an earlierpointin hisdiscussion.Giventhatseveralitems
complexissuesto whichProfessorHolt refers
thereremainsa questionofhow theyare usedin theanalysis.Professor
are includedin thequestionnaire,
butDr. Scottis negativeabout scalingon
Moss and Dr McKennellarguein favourofscale construction,
thegroundsthatit createsgreatervaguenessabout whatis beingmeasured.We acknowledgethatscale
scores are more abstractthan the answersto individualquestions,but we believethat,withcareful
side with
and therefore
thedimensionschartedby scales can be fairlypreciselyidentified,
interpretation,
ProfessorMoss and Dr McKennell;however,we feelthatdue attentionshould also be givento the
feelforthedata. Thereis no needto
responsesto theindividualitems,fortheseprovidea down-to-earth
relyexclusivelyon eitherscales or separateitems.
ProfessorSmithasks how one should choose betweenscales.The considerationswe would suggest
and constructvalidity,thelatterbeingthaton theoreticalgroundsthescale scoresshould
are reliability
relateto othervariablesin clearlydefinedways.The chosenscale is thena reliableone thatconformsbest

1982]
71
to the theoreticalexpectations.However,in practice,usefultestsof constructvalidityare extremely

difficult
to devise.
It is tempting
to viewProfessorHolt's Springboktourexampleas a samplereferendum,
butcautionis
neededhere.As Mr Whiteheadsays,and some resultsin thepaperillustrate,
manypeople are willingto
giveopinionson issuesabout whichtheyknowlittleor nothing.It is unclearwhetherthesepeople would
votein a referendum
and,iftheydid so, whethertheywouldfirstacquaintthemselves
withtheissue.One
out manyofthosewho have no opinionsbytheproceduressuggestedin thepaperor
can,ofcourse,filter
by knowledgequestions,as ProfessorSudman pointsout,but thisis a matterof degreesincedifferent
typesof filtersremovevaryingproportionsof respondents.In addition,theremay be a sizeable gap
betweenan off-the-cuff
surveyresponseand a referendum
vote.For a similarreason,care also needsto be
taken in attemptingto transferDr Hansford-Miller'sexample concerningthe possibilityof an
alphabeticalbias in the electionof Fellows to the Royal Societyto the surveycontext;the electionof
severalFellows also differs
fromthe usual surveytasks of selectinga singlecategoryor choosingan
unrestricted
numberof all the categoriesthat apply. (In passing,it may be noted that alternative
explanationsto the one offeredby Dr Hansford-Millermay be hypothesizedto account for the
associationhe reports.)
Mr Whiteheadprovidesa good example of how changesin a questioncan affectresponses.The
thequestionintotwopartsappearsto be a questionformvariation,
changefrom1971to 1972ofsplitting
but thelaterwordingchangefrom"limitingactivitiescomparedwithmostpeople of yourown age" to
"limiting
activitiesin anyway"is clearlya substantivechangewitha different
questionbeingasked.We
need to distinguishbetweenthesetwo situations,as ProfessorCollins observes.
Dr Scottarguesthatall questionformeffects
in factarisefromsubstantive
changes,sinceat leastsome
respondentsmust be interpreting
the questionsdifferently.
We agree that this is oftenthe case, but
due to the orderof responses(such as Payne, 1951,gives)can scarcelybe
certainlynot always:effects
avoided by clarifying
the investigator's
goals. More generally,the distinctionbetweenquestionform
and substantive
effects
is useful,eventhoughblurredin places.The researcher
effects
can-or believeshe
and hence he can choose the questionto meet his
can-identifythe reason fora substantiveeffect,
An examplewherethedistinctionbecomes
objective,but thisdoes not hold fora questionformeffect.
to whichDr Scott refers:since the
blurredis the issue of offering
or not offering
a middlealternative,
modifiesthesubstanceof
apparentquestionformchangeof includingthemiddlealternativeeffectively
thequestion,thechangemayreasonablybe viewedas a substantiveone. As Payne (1951) advisedlong
iftheobjectiveis to discoverdefiniteconvictions,
themiddlealternative
ago, it is probablybetterto offer
but not to offer
it iftheobjectiveis to findout whichways respondentsare leaning.
As noted in the paper, the offerof the middle alternativecan lead to a sizeable proportionof
respondents
optingfortheneutralview.ProfessorSudmanpointsout thatthisbunchingat themiddleof
a three-point
scale can be reducedbyincreasingthenumberofscale pointsto five,seven,or nine.There
stillremains,however,the questionof whetherto use an even or an odd numberof scale points(see
Kalton et al., 1981,on thecomparisonof fourand fivepointscales).
Dr Bynnerobservesthat biases in responsesto factualquestionsmay be of substantiveresearch
in theirown right.It shouldbe noted,however,thatthebiasesthemselves
have to be measuredinterest
thatis boththerespondents'
reportsand thetruevalueshave to be obtained-if theyare to featurein the
in
surveyanalysis.Dr Bynner'ssecond point concernsthe value of identifying
subgroupdifferences
oftheeffects.
This is an importantpoint,
as a way to obtaina greaterunderstanding
questioningeffects
and in our workwe routinely
look fordifferences
variablesand also oftenby
bybasic socio-demographic
forsubgroupanalysis-for instance,a measureof the
variablesincludedin the experiment
specifically
and it is
salienceoftheissue.Disappointingly,
however,subgroupanalysesrarelyyieldpositivefindings,
But thisabsence
forthisreason thatthepaper reportsso fewexamplesof differential
subgroupeffects.
correlationhypothesisthat many surveyinvestigators
also lends some supportto the form-resistant
implicitly
accept.
Dr Belsonarguesthatresponseto open questionsshouldbe used onlyto findout therangeofanswers
thatpeople give,withthealternatives
identified
thenbeingbuiltintoclosed questionsforheadcounting.
In practice,however,responsesto open questionsare oftenclassifiedand counted,and forthisreasonit
seemsusefulto findout whetherthe distributions
and classifications
arisingfromthe closed and open
questionformscoincide.Indeed,theabsenceofsuchcomparisonshas longmade thedebateoveropen vs.
ratherthan scientific
closed questionsa matterof personalpreference
judgement.ProfessorSudman
thanclosed
drawsattentionto an interesting
examplewhereopen questionsobtainedgreaterreporting
ofcertainactivitieswhichmightbe expectedto be underreported,
questionson thefrequencies
possibly

72
[Part 1,
withtheclosed formsuggestedthenormsfortheactivities.It
offered
classification
becausethefrequency
to further
explorethispoint.
would be usefulto conductadditionalexperiments
of
In thefirstplace he is unhappyabout our classification
Dr Belsonraisestwoissuesofterminology.
opinion questionsas non-factualquestions.Since responsesto opinion questionscannot be validated
againstan externalsource,it seemsnaturalto classifythemas non-factualquestionsaccordingto our
usage of the term.In this connectionwe mighttake the opportunityto emphasizethat we used the
as a simpledivisionof questionsforpurposesof organizingour material;
distinction
factual/nonfactual
whichwould be a substantial
questionclassification
to devisea comprehensive
we werenot attempting
taskin itself(see,forinstance,thediscussionofRothwellto Kalton and Schuman,1980).Dr Belson also
byredundancies,
thatwe use theterm'long'to describethequestionslengthened
considersitunfortunate
It seemsto us,however,that"long"is the
byCannelland hisco-workers.
as employedin theexperiments
appropriateadjectiveto describesuchquestions,whereasthekindsofquestionDr Belsonwoulddescribe
perhapsas "complex"or "long and complex".
as "long"shouldmoreaccuratelybe describedotherwise,
of survey
Mr Webb draws attentionto the possibilityof regionalvariationin the understanding
questions.Like him,we are unawareofresearchon thistopic,and we agreethatitwarrantsinvestigation.
Some of the interestin "Black English"in the UnitedStatessuggestssimilarproblemsalong ethnicor
social lines.
bya rangeoffactorsin additionto the
In orderto bringout thefactthatsurveyresponsesare affected
thatwe talkabout stimuliand responsesratherthanquestionsand
questionasked,Mr Alltrecommends
on responses,but we still
and othereffects
responses.We acknowledgethe importanceof interviewer
as one componentinthedata collectionprocess.Mr Allt
consideritusefulto studythequestionseparately,
answers.While
goes on to pose questionofwhetherwe shouldhave identicalstimulior identically-based
we can see thata case maybe made forsome limitedvariationin questionsto obtainmorevalid factual
classes of respondent(a proceduremade more feasibleby computerassisted
responsesfromdifferent
cautiouslyand onlyafter
weconsiderthattheprocedurewouldbe usedextremely
telephoneinterviewing),
preparatoryresearchhas documentedthe comparabilityof responses.In statisticalsurveys,and
ofstimuliso
withopinions,itseemsto us essentialto insiston a highdegreeofstandardization
particularly
that responsecan be aggregatedin a meaningfulway forquantitativeanalysis.We fail to see how
and howto assess
stimuli,
usingdifferent
answerscould be obtainedbysurveyinterviews
identically-based
the comparabilityof responsesobtainedthisway.
findingin this
drawsattentionto one consolingand somewhatsurprising
ProfessorSudman usefully
bymode of
differ
significantly
issuesresponsesdo notseem_to
difficult
area; namelythatfornonsensitive
This is an importantpoint in view of the currentinterestin mixed modes of data
administration.
collection.
In conclusion,we would like to provideDr Belson withthe assurancehe requestedthatwe have
criticallyappraisedthematerialthatwe presentedin thispaper. It was not possibleto supplydetailed
accountsofall thestudiescited,and forthesethereadermustturnto theoriginalpublications.We can,
studies,and the findingshave oftenbeen
however,say thatwe reliedmainlyon large,well-conducted,
by one or morereplications.
reinforced
REFERENCES IN THE DISCUSSION

ALWlN,D. F. and JACKSON,D. J. (1979) Measurementmodels for responseerrorsin surveys.In Sociological
1980.(K. F. Sehuessler,ed.). pp. 68-119. San Francisco:Jossey-Bass.
Methodology,
Area,Section211; CanterburyArea,Section282
1). Birmingham
BRITISHTELECOM TELEPHONEDIRECTORY (1980/198
(Alpha); Exeterand North Devon, Section 293 (Alpha); Leeds Area, Section 282 (Alpha); Exeter and North
BirminghamArea, Section 211; CanterburyArea, Section 282 (Alpha); Exeterand North Devon, Section 293
(Alpha);Leeds Area,Section232 (Alpha);London Postal Area,Sections101-104;Manchest/NEand CityCentre,
Section264; Glasgow Area,Section275,Cardiffand SE Wales,Section301 (Alpha);and N. Ireland,Section241
(Alpha).
electionsdue to positionon theballotpaper.Appl.
BROOK,D. and UPTON, G. J.G. (1974).Biases in local government
Statist., 23, 414-419.
DEMING, W. E. (1944). On errorsin surveys. Amer.Sociol. Rev.,9, 359-369.
McKENNELL, A., ATKINSON,T. and ANDREWS,F. M. (1980). Structural constancies in surveys of perceived well-being.
In The Qualityof Life:Compar-ativeStudies(A. Szalai and F. M. Andrews, eds). London: Sage.
McNFMAR. Q. (1946). Opinion-attitude methodology. Psychol.Bull. 43, 289-374.

1982]
73
oftheoptimumpositionon a ballotpaper.Appl.Statist.,24,
D. (1975).The determination
G. J.G. and BROOK,

279-287.
(1974). The importanceof positionalvotingbias in Britishelections.Polit.Studies,22, 178-190.
the YearofOur Lord 1981. London: Whitaker.
WHITAKER, J.(1981). An Almanackfor
UPTON,
As a resultof theballot held duringthemeeting,thefollowingwereelectedFellows of the Society.

DAVIES, Peter M.
DHARMALINGHAM, Thiru
FREEMAN, David R.
M.
INGRAM, David M.
RUSKIN, Heather J.
THOMAS, Roger K.
TURNER, Keith
WALLEY, Peter
WHITAKER, John J. M.
As a resultof the ballot held duringthe meetingof October 14th,1981,the followingwereelected

Fellows of the Society.
ALI, Asghar
ALLOTEY, Charles A.
ANDERSON, Bernard G.
ANGHILERI, Roberto A.
BIRKHEAD, Brian G.
BROWN, Peter J. B.
BUTLER, Ronald W.
CHAPPELL, Roma
DARZENTAS, John
L.
DAVIES, Martin V.
DAVIES, Peter T.
DAVISON, Anthony C.
EASTBROOK, Gillian A.
EL-HELBAWY, Abdalla
ELLERAY,Elaine A.
FOCHTMANN, John A.
FOTOPOULOS, Stergios
FRANE, James W.
GRAY, Christopher T.
GRIFFITHS, Caroline L.
HILL, Barry
HILL, Peter R.
KIDD, Eileen P.
KIRBY, Simon P. J.
T.
KROLL, Mary E.
MCGIVERN, Kevin
MORRIS, Alfred C.
MYLVAGANAM, Arunthathi
NESS, Mitchell R.
NICHOLAS, Timothy R. M.
PEWSEY, Arthur R.
POON, Fun C.
RAFEE, Najib M.
REGAL, Ronald R.
SANDBACH, Jonathan
SMEDLEY, Peter J.
STEPHENS, Helen J.
WATSON,Gordon J.
WHARTON, Ann
WILSON, Ian S.
WORRALL, Leslie


Kalton Schuman 1982

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Kalton Schuman 1982

Hochgeladen von

Copyright:

Verfügbare Formate

The Effect of the Question on Survey Responses: A Review

Author(s): Graham Kalton and Howard Schuman

This content downloaded from 134.155.150.133 on Wed, 18 Feb 2015 12:39:17 PM

.1. R. Statist.Soc. A (1982),

The Effectof theQuestionon SurveyResponses:A Reviewt

surveyquestionnaireon the responsesobtained.Topics discussedinclude:randomized

1982 Royal StatisticalSociety

This content downloaded from 134.155.150.133 on Wed, 18 Feb 2015 12:39:17 PM

KALTON AND SCHUMAN -

Effectof theQuestionon SurveyResponses

This content downloaded from 134.155.150.133 on Wed, 18 Feb 2015 12:39:17 PM

This content downloaded from 134.155.150.133 on Wed, 18 Feb 2015 12:39:17 PM

This content downloaded from 134.155.150.133 on Wed, 18 Feb 2015 12:39:17 PM

thoughtthatabout 7 out of8 understoodtheuse oftherandom

This content downloaded from 134.155.150.133 on Wed, 18 Feb 2015 12:39:17 PM

KALTON AND SCHUMAN -

This content downloaded from 134.155.150.133 on Wed, 18 Feb 2015 12:39:17 PM

KALTON AND SCHUMAN- Effect

the findingsacross a varietyof surveytypesin different

3. QUESTION EFFECTSWITH NON-FACTUALQUESTIONS

decisionson theformof thequestionto be asked. We willbriefly

This content downloaded from 134.155.150.133 on Wed, 18 Feb 2015 12:39:17 PM

KALTON AND SCHUMAN -

Effectof theQuestionon SurveyResponses

This content downloaded from 134.155.150.133 on Wed, 18 Feb 2015 12:39:17 PM

betweenthe responsedistributionsto the open and closed forms.The two versionsof a

This content downloaded from 134.155.150.133 on Wed, 18 Feb 2015 12:39:17 PM

KALTON AND SCHUMAN- EffectoftheQuestionon SurveyResponses

exactlywhat it is; questionslike "Do you favouror

This content downloaded from 134.155.150.133 on Wed, 18 Feb 2015 12:39:17 PM

This content downloaded from 134.155.150.133 on Wed, 18 Feb 2015 12:39:17 PM

This content downloaded from 134.155.150.133 on Wed, 18 Feb 2015 12:39:17 PM

KALTON AND SCHUMAN -

thatthe inclusionof supplementscan have on the

This content downloaded from 134.155.150.133 on Wed, 18 Feb 2015 12:39:17 PM

KALTON AND SCHUMAN -

Effectof theQuestionon SurveyResponses

is thatan existingpsychologicaltheorycan explaina broad rangeofeffects

question.Public OpinionQuart.,22, 67-68.

thepresentationorderof verbalratingscales. J. Advert.Res.,6(4), 30-37.

This content downloaded from 134.155.150.133 on Wed, 18 Feb 2015 12:39:17 PM

KALTON AND SCHUMAN -

Effectof theQuestionon SurveyResponses

BELSON, W. A. (1968). Respondentunderstanding

BISHOP, G. F., OLDENDICK,

BISHOP, G. F., OLDENDICK, R. W. and TUCHFARBER,A. J. (1981). Question orderand contexteffects

This content downloaded from 134.155.150.133 on Wed, 18 Feb 2015 12:39:17 PM

DiscussionofthePaper by Dr Kalton and Dr Schuman

DISCUSSIONOF THE PAPERBY DR KALTONAND DR SCHUMAN

This content downloaded from 134.155.150.133 on Wed, 18 Feb 2015 12:39:17 PM

Discussionof thePaper by Dr Kalton and Dr Schuman

This content downloaded from 134.155.150.133 on Wed, 18 Feb 2015 12:39:17 PM

Discussionof thePaper by Dr Kalton and Dr Schuman

evidencethereis littlepointin speculatingor reasoningabout questioneffects

This content downloaded from 134.155.150.133 on Wed, 18 Feb 2015 12:39:17 PM

Discussionof thePaper by Dr Kalton and Dr Schuman

comparingtwo formsofthesame question,we findthatsome people conformwiththenull hypothesis

because we suspectedthatthe phrase"mostpeople of your

This content downloaded from 134.155.150.133 on Wed, 18 Feb 2015 12:39:17 PM

Discussionof thePaper by Dr Kalton and Dr Schuman

ofthekindsdescribed,but in myviewit is oftenpossibleto weed out questionswhichare patentlynot

This content downloaded from 134.155.150.133 on Wed, 18 Feb 2015 12:39:17 PM

Discussionof thePaper by Dr Kalton and Dr Schuman