Validity AND Reliability of Questionnaires: Dr. R. Venkitachalam

VALIDITY
AND
RELIABILITY OF
QUESTIONNAIRES
Dr. R. VENKITACHALAM
CONTENTS
 Introduction
 Steps in questionnaire designing
 Validity
 Concept of validity
 Types of validity
 Steps in questionnaire validation
 Reliability
 Types and measurement of reliability
 Conclusion
 References
INTRODUCTION
 Questionnaire: Important method of data collection used
extensively
 Advantages of questionnaire
 Less expensive
 Offers greater anonymity
 Disadvantages
 Application is limited
 Response rate is low
 Opportunities to clarify issues is lacking
 Ideal requisites of a questionnaire:
 Should be clear and easy to understand
 Layout is easy to read and pleasant to eye
 Sequence of questions easy to follow
 Should be developed in an interactive style
 Sensitive questions must be worded exactly
 NOTE: The terminologies research instrument, measuring

instrument,scale and test in various parts of the seminar
represent questionnaire in this context . . . And item represents
each question in a questionnaire
Steps in questionnaire designing
Validity
The concept of validity
 Validity is the ability of an instrument to measure what it is intended to
measure.
 Degree to which the researcher has measured what he has set out to
measure (Smith, 1991)
 Are we measuring what we think we are measuring? (Kerlinger, 1973)
 Extent to which an empirical measure adequately reflects the real

meaning of the concept under consideration (Babbie, 1989)
Why validity ?
 Validity is done mainly to answer the following questions:
 Is the research investigation providing answers to the research

questions for which it was undertaken?
 If so, is it providing these answers using appropriate methods and

procedures?
Questions to ponder
Investigator
Readers of report
Experts in the field
Logic
Statistical tests
Logical thinking
 Justification of each question in relation to objective of study
 Easy if questions relate to tangible matters
 Difficult in situations where we are measuring attitude,

effectiveness of a program, satisfaction etc
 Everybody’s logic doesn’t match . . No statistical backing

Statistical procedures
 By calculating coefficient of correlations between

questions and outcome variables
Types of validity
Validity
Content Criterion Construct

validity related validity
Face validity Concurrent Predictive

CONTENT VALIDITY
 Uses logical reasoning and hence easy to apply
 Extent to which a measuring instrument covers a

representative sample of the domain of the aspects measured
 Whether items and questions cover the full range of the

issues or problem being measured
FACE VALIDITY
 The extent to which a measuring instrument appears valid
on its surface
 Each question or item on the research instrument must have a

logical link with the objective
Face validity is not content validity. Why?
 Face validity
 Simply addresses whether a measuring instrument looks
valid
 Not a validity in technical sense because it does not refer
to what is actually being measured rather what it appears
to measure
 It has more to do with rapport and public relations than
with actual validity
Other aspects of content validity
 Coverage of issue should be balanced
 Each aspect should have similar and adequate representation

in questions
Problems associated with content validity
 Based on subjective logic; no definitive conclusion can be
drawn or consensus reached
 Extent to which questions reflect the objectives of the study

may differ. If wordings changed or question substituted,
magnitude of link changes
CRITERION VALIDITY
 The extent to which a measuring instrument accurately
predicts behaviour or ability in a given area.
 The measuring instrument is called ‘criteria’
 It is of two types:
 Predictive validity
 Concurrent validity
Predictive validity
 If the test is used to predict future performance
 Eg: Entrance exam . . . . Performance of these tests correlates

with later performance in professional college
 Eg: Written driving test
 Eg: measurement of sugar exposure for caries development

Concurrent validity
 If the test is used to estimate present performance or person’s
ability at the present time not attempting to predict future
outcomes
 Professional college exam
 Eg: driving test, pilot test
 Eg: measurement of DMFT for caries experience

Problems in criterion validity
 Cannot be used in all circumstances
 Esp in social sciences where some conditions do not have a

relevant criteria
 Eg: for measuring self-esteem, no criteria can be applied

CONSTRUCT VALIDITY
 Most important type of validity
 Assesses the extent to which a measuring instrument

accurately measures a theoretical construct it is designed to
measure
 Measured by correlating performance on the test with

performance on a test for which construct validity has been
determined
 Eg: a new index for measuring caries can be validated by

comparing its values with a standard index (like DMFT)
 Another method is to show that scores of the new test differs
across people with different levels of outcomes being
measured
 Eg: Establishing the validity of a new caries index by

applying it to different stages of dental caries and calculating
its accuracy
Summary of Validity
CONTENT CRITERION CONSTRUCT
CONCURRENT PREDICTIVE
What it Whether the test The ability of The ability of the The extent to
measures covers a the test to test to predict which the
representative estimate present future instrument
sample of the performance performance measures a
domains to be theoretical
measured construct
How it is Ask experts to Correlate Correlate Correlate
accomplished assess the test to performance on performance on performance on
establish that the the test with a the test with a the instrument
items are concurrent behaviour in with a
representative of behaviour future performance on
the outcome an established
instrument
Steps in
questionnaire
validation
FACE VALIDITY
 Evaluate in terms of:

Readability
Layout
Feasibility
and style
Clarity of wording
CONTENT VALIDITY
Two phases
Specify the full domain of

content that is relevant to
Experts: Enhancement of content of
the issue
questionnaire (Seven or more
experts)
Sample specific areas form

this domain
Researcher: ConceptualizationPut
anditems/questions in a form
domain analysis that is testable
How do experts evaluate validity
 Method 1: Average Congruency Percentage (ACP)
[Popham, 1978]
 Experts compute the percentage of questions deemed

to be relevant for them
 Take the average of all experts
 If the value is > 90 . . . Valid
 Eg: 2 experts . . (Expert 1-100%, Expert 2-80%)

 Then ACP = 90%
 Method 2: Content validity index [Martuza 1977]
 Content validity Index for individual items (I-CVI)

 Content Validity Index for the scale (S-CVI)
I-CVI
 Panel of content experts asked to review the relevance of
each question on a 4-point Likert scale (minimum 3
maximum 10 experts)
 1= not relevant
 2= somewhat relevant
 3= relevant
 4= very relevant
 Then for each question, number of experts giving 3 or 4
score is counted (3,4 – relevant; 1,2 – nonrelevant)
 Proportion is calculated
 Eg: If 4/5 experts give score 3 or 4: I-CVI = 0.80

Critics of I-CVI
 Collapses experts multipoint assessment into two categories
(relevant and non-relevant)
 Does not give inference on comprehensiveness of whole

questionnaire
 Problem of chance agreement. To overcome that, Lynn

proposed
 Five or fewer experts: all must agree (I-CVI = 1.0)
 Six or more: (I-CVI should not be less than 0.78)
S-CVI
 The proportion of items on an instrument that achieved a
rating of 3 or 4 by all the content experts
 Two approaches:
 S-CVI/UA – Universal agreement
 S-CVI/Ave - Average
 Which would be an effective measure here ??
 S-CVI/UA or S-CVI/Ave
 Which to follow?
 Report both the values I-CVI and S-CVI rather than using
CVI as an acronym
 Report the range of I-CVI values
 The best method is S-CVI/UA for stringent validity, but

will be difficult to use if multiple experts are validating. .
In such situations S-CVI/Ave is used
CONSTRUCT VALIDITY
 Method: Factor analysis
 To examine empirically the interrelationship among items and to

identify clusters of items that share sufficient variation to justify
their existence as a factor or construct to be measured by the
instrument
 Various items are gathered into common factors
 Common factors are synthesized into fewer factors and then

relation between each item and factor is measured
 Unrelated items are eliminated

Reliability
RELIABILITY
 Definition: It is the ability of an instrument to create
reproducible results
 Each time it is used, similar scores should be obtained
 A questionnaire is said to be reliable if we get same/similar

answers repeatedly
 Though it cannot be calculated exactly, it can be measured

by estimating correlation coefficients
Reliability measured in aspects of:
• Done to ensure that same results are obtained

STABILITY when used consecutively for two or more times
• Test-retest method is used
INTERNAL • To ensure all subparts of a instrument measure

the same characteristic (Homogeneity)
CONSISTENCY • Split-half method
• Used when two observers study a single

EQUIVALENCE phenomenon simulataneously
• Inter-rater reliability
Test-Retest reliability (for stability)
 Test administered twice to the same participant at different
times
 Used for things that are stable over time
 Easy and straight-forward approach
 Useful for questionnaires, checklist, rating scales etc
 Disadvantages
 Practice effect (mainly for tests)
 Too short intervals in between (effect of memory)
 Some traits may change with time
Statistical calculation
 Administration of instrument to a sample on two

different occasions
 Scores compared and calculated by using

correlation coefficient formula (pearson)
Correlation coefficient
 Measures the degree of relationship between two sets of
scores
 Can range from -1 to +1
 0 indicates absence of any relationships
Correlation coefficient Strength of relationship
+/- 0.7 to 1.0 Strong
+/- 0.3 to 0.69 Moderate
+/- 0.0 to 0.29 None to weak

Split halves reliability (homogenity)
 Split the contents of the questionnaire into two equivalent
halves; either odd/even number or first/second half
 Correlate scores of one half with scores of the other
 Formula: r = Σ (x-x’)(y-y’)
√ Σ(x-x’)2 (y-y’)2
 But this r is only for the half, so to check reliability of

entire test, use the formula
 R’ = 2r/1+r
 (r = coefficient of split half, R’ = coefficient of entire
test)
 Cronbach’s alpha:
 Another method of calculation using the formula:
R = k/k-1 (1-Σσ12/σy2)
k = total number of items in list
σ1 = variance of individual items
σy2 = variance of total test scores
Inter-rater reliability (Equivalence)
 Used when a single event is measured simultaneously and
independently by two or more trained observers
R= Number of agreements
Number of agreements + Number of disagreements
Summary of Reliability
TEST RETEST SPLIT HALF INTERRATER
What it Stability over Equivalency of items Agreement between

measures time raters
How it is Administer the Correlate Have multiple

accomplished same test to the performance for a researchers measure
same people at group of people on same instrument and
two different two equivalent determine percentage
times halves of same test of agreement between
them
Conclusion
 Validated questionnaire
 It is one which has undergone a validation procedure to
show that it accurately measures what it aims to,
regardless of who responds, when they respond, and to
whom they respond or when self-administered and whose
reliability has also been examined thereby:
 Reducing bias and ambiguities

 Better quality of data and credible information
In a nutshell . . . .
A questionnaire can be reliable but invalid . . .

But a valid questionnaire is always reliable . . .
Acknowledgements
 Dr. Joe Joseph

 Dr. Chandrashekar
References
 Linda Del Greco, Walop W, Richard H McCarthy. Questionnaire
development: 2. Validity and Reliability. CMAJ. 1987;136:699–700.
 Sushil S, Verma N. Questionnaire validation made easy. Eur J Sci Res.

2010;46(2):172–8.
 Polit DF, Cheryl Tatano Beck. The Content Validity Index: Are You Sure
You Know What’s Being Reported? Critique and Recommendations. Res
Nurs Health. 2006;29:489–97.
 Reliability and Validity Module 6. Cengage Learning; 2010.

 Rama B Radhakrishna. Tips for Developing and Testing
Questionnaires/Instruments. J Ext. 2007;35(1):710–4.
 06Article04.pdf [Internet]. [cited 2015 Apr 7]. Available from:

http://www.uk.sagepub.com/salkind2study/articles/06Article04.pdf
 pta_6871_6791004_64131.pdf [Internet]. [cited 2015 Apr 7]. Available

from:
http://cfd.ntunhs.edu.tw/ezfiles/6/1006/attach/33/pta_6871_6791004_6
4131.pdf
 Questionnaire designing and validation [Internet]. [cited 2015 Apr 7].

Available from:
http://www.jpma.org.pk/full_article_text.php?article_id=3414
 Suresh K Sharma. Nursing Research and Statistics. 1st ed. New Delhi:
Elsevier Saunders;
 Edward G, Richard Zeller. Reliability and Validity Assessment. New Delhi:

SAGE publication; 1979.
 Ranjit Kumar. Research Methodology - A step by step guide for beginners.

3rd ed. New Delhi: SAGE publication; 2012.
 Articles from Dr. Joe

Validity AND Reliability of Questionnaires: Dr. R. Venkitachalam

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Validity AND Reliability of Questionnaires: Dr. R. Venkitachalam

Hochgeladen von

Copyright:

Verfügbare Formate

VALIDITY

 NOTE: The terminologies research instrument, measuring

 Are we measuring what we think we are measuring? (Kerlinger, 1973)

 Extent to which an empirical measure adequately reflects the real

 Is the research investigation providing answers to the research

 If so, is it providing these answers using appropriate methods and

 Easy if questions relate to tangible matters

 Difficult in situations where we are measuring attitude,

 Everybody’s logic doesn’t match . . No statistical backing

 By calculating coefficient of correlations between

Content Criterion Construct

Face validity Concurrent Predictive

 Extent to which a measuring instrument covers a

 Whether items and questions cover the full range of the

 Each question or item on the research instrument must have a

 Each aspect should have similar and adequate representation

 Extent to which questions reflect the objectives of the study

 The measuring instrument is called ‘criteria’

 Eg: Entrance exam . . . . Performance of these tests correlates

 Eg: Written driving test

 Eg: measurement of sugar exposure for caries development

 Professional college exam

 Eg: driving test, pilot test

 Eg: measurement of DMFT for caries experience

 Esp in social sciences where some conditions do not have a

 Eg: for measuring self-esteem, no criteria can be applied

 Assesses the extent to which a measuring instrument

 Measured by correlating performance on the test with

 Eg: a new index for measuring caries can be validated by

 Eg: Establishing the validity of a new caries index by

 Evaluate in terms of:

Specify the full domain of

Sample specific areas form

 Experts compute the percentage of questions deemed

 Eg: 2 experts . . (Expert 1-100%, Expert 2-80%)

 Content validity Index for individual items (I-CVI)

 Eg: If 4/5 experts give score 3 or 4: I-CVI = 0.80

 Does not give inference on comprehensiveness of whole

 Problem of chance agreement. To overcome that, Lynn

 The best method is S-CVI/UA for stringent validity, but

 To examine empirically the interrelationship among items and to

 Various items are gathered into common factors

 Common factors are synthesized into fewer factors and then

 Unrelated items are eliminated

 Each time it is used, similar scores should be obtained

 A questionnaire is said to be reliable if we get same/similar

 Though it cannot be calculated exactly, it can be measured

• Done to ensure that same results are obtained

INTERNAL • To ensure all subparts of a instrument measure

• Used when two observers study a single

 Used for things that are stable over time

 Easy and straight-forward approach

 Useful for questionnaires, checklist, rating scales etc

 Administration of instrument to a sample on two

 Scores compared and calculated by using

Correlation coefficient Strength of relationship

+/- 0.7 to 1.0 Strong

+/- 0.3 to 0.69 Moderate

+/- 0.0 to 0.29 None to weak

 Correlate scores of one half with scores of the other

 But this r is only for the half, so to check reliability of

TEST RETEST SPLIT HALF INTERRATER