Sie sind auf Seite 1von 26

Introduction:

Reliability is defined mathematically as the ratio of the variation of the true score and the
variation of the observed score. Validity can be defined as the accuracy with which the scale
measures what it claims to measure. Validity and purpose are like two sides of coin. The main
purpose of any tool is to obtain data which is reliable and valid so the researcher can read the
prevalent situation accurately and arrive at some conclusions to offer some suggestions.
However, no tool is perfectly reliable or valid. So, it should be accompanied by statement of its
reliability and validity.
Validity is an evolving complex concept because it relates to the inference regarding assessment
result. Focusing on the consequence of the inference made implies that they should be
appropriate and adequate. On the whole, validity is seen a unitary concepts.
All measurements, especially measurements of behaviors, opinions, and constructs, are subject to
fluctuations that can affect the measurement’s reliability and validity. Reliability refers to
consistency or stability of measurement. Validity refers to the suitability or meaningfulness of
the measurement.
In statistical terms:
• Validity is analogous to unbiasedness (valid = unbiased)
• Reliability is analogous to variance (low reliability = high variance)
Importance of reliability and Validity:

Validity and reliability play a vital role in the process of testing of hypothesis.

• Without hypothesis testing, one cannot support a theory.

• Without a support theory one cannot explain why events occur.

• Without adequate explanation one cannot develop effective material and non- material
technologies, including programs designed for positive social change.

• No matter how well the objectives are written or how clever the items, the quality and
usefulness of an examination is predicated on validity and reliability.
Literature review
The concept of incremental validity is essentially a simple and straightforward one: does a
measure add to the prediction of a criterion above what can be predicted by other sources of
data? In clinical contexts, assessment can be conducted for numerous reasons, including
diagnosing a disorder or problem, developing a case.
Conceptualization, treatment planning, treatment monitoring, and/or treatment outcome
evaluation. Thus, a measure may have incremental validity in some assessment applications but
not others. Finally, assuming a measure has been demonstrated to have incremental validity in a
specific applied decision-making task, it then becomes important to consider (a) the range of
circumstances in which the measure makes an incremental contribution and (b) the cost issues, as
the financial and human resource costs associated with the measure must be balanced against the
applied value of the validity increment.[1]
An important part of social science research is the quantification of human behaviour — that is,
using measurement instruments to observe human behaviour. The measurement of human
behaviour belongs to the widely accepted positivist view, or empirical analytical approach, to
discern reality. Because most behavioural research takes place within this paradigm,
measurement instruments must be valid and reliable. The objective of this paper is to provide
insight into these two important concepts, and to introduce the major methods to assess validity
and reliability as they relate to behavioural research. The paper has been written for the novice
researcher in the social sciences. It presents a broad overview taken from traditional literature,
not a critical account of the general problem of validity of research information. [3]

Reliability refers to the extent to which the same answers can be obtained using the same
instruments more than one time. In simple terms, if your research is associated with high levels of
reliability, then other researchers need to be able to generate the same results, using the same
research methods under similar conditions. It is noted that “reliability problems crop up in many
forms.

Reliability is a concern every time a single observer is the source of data, because we have no
certain guard against the impact of that observer’s subjectivity” (Babbie, 2010, p.158). According
to Wilson (2010) reliability issues are most of the time closely associated with subjectivity and
once a researcher adopts a subjective approach towards the study, then the level of reliability of
the work is going to be compromised.

Validity of research can be explained as an extent at which requirements of scientific research


method have been followed during the process of generating research findings. Oliver (2010)
considers validity to be a compulsory requirement for all types of studies. There are different forms
of research validity and main ones are specified by Cohen et al (2007) as content validity, criterion-
related validity, construct validity, internal validity, external validity, concurrent validity and face
validity.

Measures to ensure validity of a research include, but not limited to the following points:

a) Appropriate time scale for the study has to be selected;

b) Appropriate methodology has to be chosen, taking into account the characteristics of the study;

c) The most suitable sample method for the study has to be selected;

d) The respondents must not be pressured in any ways to select specific choices among the answer
sets.

It is important to understand that although threats to research reliability and validity can never be
totally eliminated, however researchers need to strive to minimize this threat as much as possible.

The Incremental Validity of Psychological Testing and Assessment: Conceptual,


Methodological, and Statistical Issues. 2003, Vol. 15, No. 4, 446–455. John Hunsley University
of Ottawa. Gregory J. Meyer University of Toledo Education Research and Perspectives, Vol.38,
No.1.
Validity and Reliability in Social Science Research
Ellen A. Drost, California State University, Los Angeles, John Dudovskiy, research-
methodology.
Reliability

Reliability of a test pertains to reliable measurement which means that the measurement is
accurate and free from any sort of error. Reliability is one of the most essential characteristics of
a test. If a test gives same result on different occasions, it is said to be reliable. So, reliability
means consistency of the test result, internal consistency and consistency of results over a period
of time.

According to Anastasi and Ubrina (1982)“Reliability refers to the consistency of scores obtained
by the same persons when they are re-examined with the same test and different occasions, or
with different sets of equivalent items, or under other variable examining conditions.”

Reliability is defined mathematically as the ratio of the variation of the true score and the
variation of the observed score. Or, equivalently, one minus the ratio of the variation of the error
score and the variation of the observed score.
rx .x is the symbol for the reliability of the observed score x, and x2, T2 and E2 are the variances
of the measured, true and error scores respectively. However, there is no direct way to observe or
calculate the true score, so a variety of methods are used to estimate the reliability of a test.

6.2.1 Types of Reliability:


There are four general types of reliability.
• Inter-rater or Inter-observer Reliability: Measures the degree to which different observers
give consistent estimates of the same persons.
• Test - Re-test Reliability: Measures the consistency of measurement on two separate
occasions.
• Parallel-Forms Reliability: Measures the consistency of results of two parallel forms of same
test constructed in the same way.
• Internal consistency Reliability: Measures the consistency of results across items within a
test.
??????? What are the below given things?? Techniques to measure reliability?

A. Split - Half Reliability


• Spearman and Brown formula
• Rulon - Guttman’s formulas
• Flanagan Formula

B. Cronbach’s Alpha
• Methods of Rational Equivalence
• Kuder Richardson - KR20
• Kuder Richardson - KR21

Reliability of the Present Inventory:


In the present study, the reliability of the SRL Inventory was estimated by
• Test-Re-test method
• Split-Half method
• Cronbach’s Alpha (α) (Internal Consistency)

 Test-Retest Method
The correlation coefficient between 2 sets of scores describes the degree of reliability.
 For cognitive domain tests, correlation of .9o or higher are good.
 For personality test.50 or higher.
 For attitude scales .70 or higher.

This type of Reliability is estimated by the Pearson product – moment coefficient of correlations
between two administrations of the same inventory.
Formula for Pearson Product-Moment Correlation2
∑(x −)(y
i−y−)
i−x
𝑟=
Nσx σy

Where
r = Correlation between x & y
𝑥𝑖 = ith value of x variable.
𝑥 − = mean of x
𝑦𝑖 = ith value of y variable.
𝑦 − = mean of y
N= Number of pairs of observations of x & y
𝜎𝑥 = Standard deviation of x (test)
𝜎𝑦 = Standard deviation of y (retest)

Estimation is based on the correlation between scores of two or more administrations of the same
inventory. Synonyms for reliability includes dependability, stability, consistency Test –retest
correlation provides an indication of stability over time.

Example
1) If we ask the respondents in our sample the 4 questions once in this September and again
in November, we can examine whether the two waves of the same measures yield similar
result.
2) A good test is stable over repeated administration. An IQ test given to the same person 3
times in one month should reveal the same IQ score, more or less ,each time .likewise
any adequate measure personality ,skill value, or belief s should come out the same , time
after time, unless the person him or himself actually change s in either a short –term or
long-term sense.

It is a proportion of the consistency of test outcomes when the test is regulated to a similar
individual twice, where the two cases are isolated by a particular timeframe, utilizing a similar
testing instruments and conditions. The two scores are then assessed to decide the genuine score
and the steadiness of the test. This sort is utilized if there should be an occurrence of qualities
that are not expected to change inside that given timespan. Initially, the quality being considered
may have experienced a change between the two examples of testing. Furthermore, the
experience of stepping through the examination again could adjust the manner in which the
examinee performs. What's more, in conclusion, if the time interim between the two tests isn't
adequate, the individual may give various answers dependent on the memory of his past
endeavor.

Example –
Medical monitoring of "critical" patients works on this principle since vital statistics of the
patient are compared and correlated over specific-time intervals, in order to determine whether
the patient's health is improving or deteriorating. Depending on which, the medication and
treatment of the patient is altered

Problems with test-retest reliability procedure:


A. Difference in performance on the second test may be due to the first test.ie, response
actually change as a result of the first test.
B. Many constructs of interest actually change over time independent of the stability of the
measure.
C. The interval between the 2 administrations may be too long and the construct you are
attempting to measure may have changed.
 Split - Half method:
In this method, the inventory was divided into two equal halves and correlation between scores
of these halves was worked out. The measuring instrument can be divided in various ways but
the best way to divide the measuring instrument into two halves is odd numbered and even
numbered items. This coefficient of the correlation denotes the reliability of the half test.

Entire information regarding items in each half, item wise scores, difference ‘d’, SD for both the
halves i.e. SD of first half 𝜎1 SD for second half 𝜎2 and SD for entire 𝜎𝑡 Variance for odd items
𝜎12 was 455.46, variance for even items 𝜎22 was 359.17and total variance 𝜎𝑡2 was 1115.18.

In the present study, the coefficient of correlation was calculated by


using following formula
 Spearman and Brown formula
 Rulon formula.
 Flanagan Formula

Spearman and Brown Formula: - The spearman and Brown formula estimates the reliability of
a test n times. From the reliability of the half test, the self-correlations coefficient of the whole
test is estimated by the following
2rhh
rtt =
1 + rhh

Where,
𝑟𝑡𝑡 =reliability coefficient of the whole test.
𝑟ℎℎ = reliability coefficient of the half test.

Rulon Formula: In this method, the variance of the differences between each person’s scores on
the two half-tests and the variance of total scores are considered.
𝑆𝐷 2
𝑟𝑡𝑡 =1-𝑆𝐷𝑑2
𝑥

Where,
𝑟𝑡𝑡 = Reliability of the test
𝑆𝐷𝑑2 = Variance of the differences between each person’s scores on the two- half test

𝑆𝐷𝑥2 = Variance of total score

Flanagan Formula: This formula is very close to Rulon's formula. In this formula the variance
of two halves are added instead of difference between two halves. The formula is as under
𝜎1 2 +𝜎2 2
𝑟𝑡𝑡 = 2(1 − )
𝜎𝑡 2

Where,
𝑟𝑡𝑡 = Reliability of the test
𝜎1 2 = Variance of scores of 1st half (odd numbered items)
𝜎2 2 = Variance of scores of 2nd half (even numbereditems)
𝜎𝑡 2 = Variance of total scores

B) Inter-rater or Inter observer Reliability:


Used to assess the degree to which different rates observers give consistent estimates of the same
phenomenon. Inter-rater reliability is the level of agreement between raters or judges.
Suppose if everyone agree inter-rater reliability is 1(or 100%) and if everyone is disagree inter-
rater reliability is 0(0%)

There are two major ways to actually estimate inter-rater reliability:

1) If your measurement consists of categories –the raters are checking off which category each
observation falls in-you can calculate the % of agreement between the raters.

Example
Suppose we had 100 observations that were being rated by two raters. For each observation, the
rater could check 1 of 3 categories. imagine that on 86 of the 100 observation the raters checked
the same category. In this case, the percentage of agreement would be 86%.
2)The other major way to estimate inter-rater reliability is appropriate when the measure is
continuous one. There all you need to do is calculate the correlation between rating of the 2
observers.

Example
They might be rating the overall level of activity in a classroom on a 1-to-7 scale .you could
have the give their rating at regular time intervals( e.g .every 30 second) .the correlation between
these ratings would give you an estimate of the reliability or consistency between the raters.

Inter-Rater Reliability refers to statistical measurements that determine how similar the data
collected by different raters are. A rater is someone who is scoring or measuring a performance,
behavior, or skill in a human or animal.

Examples of raters would be a job interviewer, a psychologist measuring how many times a
subject scratches their head in an experiment, and a scientist observing how many times an ape
picks up a toy.

It is important since not all individuals will perceive and interpret the answers in the same way,
hence the deemed accurateness of the answers will vary according to the person evaluating them.
This helps in refining and eliminating any errors that may be introduced by the subjectivity of the
evaluator. If a majority of the evaluators judge are in agreement with regards to the answers, the
test is accepted as being reliable. But if there is no consensus between the judges, it implies that
the test is not reliable and has failed to actually test the desired quality. However, the judging of
the test should be carried out without the influence of any personal bias. In other words, the
judges should not be agreeable or disagreeable to the other judges based on their personal
perception of them.
Example-
This is often put into practice in the form of a panel of accomplished professionals, and can be
witnessed in various contexts such as, the judging of a beauty pageant, conducting a job
interview, a scientific symposium, etc.

c) Parallel-forms Reliability:
parallel forms of reliability it is also call as equivalent reliability. Suppose we use one set of
question divided in two equivalent sets(‘forms’) where both sets contain questions that measure
the same construct, knowledge or skill. The two sets of questions are given to the same sample of
people within a short period of time and an estimate of reliability is calculated from the two sets.

Put simply, you’re trying to find out if test A measures the same thing as test B. In other words,
you want to know if test scores stay the same when you use different instruments.

Example:
You want to find the reliability for a test of mathematics comprehension, so you create a set of
100 questions that measure that construct. You randomly split the questions into two sets of 50
(set A and set B), and administer those questions to the same group of students a week apart.

Steps:
 Step 1: Give test A to a group of 50 students on a Monday.
 Step 2: Give test B to the same group of students that Friday.
 Step 3: Correlate the scores from test A and test B.

It measures reliability by either administering two similar forms of the same test, or conducting
the same test in two similar settings. Despite the variability, both versions must focus on the
same aspect of skill, personality, or intelligence of the individual. The two scores obtained are
compared and correlated to determine if the results show consistency despite the introduction of
alternate versions of environment or test. However, this leads to the question of whether the two
similar but alternate forms are actually equivalent or not.
Example:
If the problem-solving skills of an individual are being tested, one could generate a large set of
suitable questions that can then be separated into two groups with the same level of difficulty,
and then administered as two different tests. The comparison of the scores from both tests would
help in eliminating errors, if any.

Advantages and Disadvantages

Advantages:
 Parallel forms reliability can avoid some problems inherent with test-resting.

Disadvantages:
 You have to create a large number of questions that measure the same construct.
 Proving that the two test versions are equivalent (parallel) can be a challenge

D) Internal Consistency Reliability:


We have three ways to check the internal consistency of the index.

a) Split-half correlation. We could split the index of "exposure to televised


news" in half so that there are two groups of two questions, and see if the
two sub-scales are highly correlated. That is, do people who score high on
the first half also score high on the second half?

b) Average inter-item correlation. We also can determine internal


consistency for each question on the index. If the index is homogeneous,
each question should be highly correlated with the other three questions.

c) Average item-total correlation. We could correlate each question with


the total score of the TV news exposure index to examine the internal
consistency of items. This gives us an idea of the contribution of each
item to the reliability of the index.

Is a measure of how well related, but different, items All measure the same thing. Is applied to
groups of items thought to measure different aspects of the same concept. A single item taps
only one aspect of a concept. If several different items are used to gain information about a
particular behavior or construct, the data set is richer and more reliable.

Example:
1)The Rand 36-item Health Survey measures 8 dimensions of health. One of these dimensions is
“physical function.”
2) Instead of asking just one question, “How limited are you in your day-to-day activities?”
Rand found that asking 10 questions produced more reliable results, and conveyed a better
understanding of “physical function.”

Rand Example
■ The following questions are about activities you might do during a typical
day. Does your health now limit you in these activities. If so, how much?
(Response options are: limited a lot, limited a little, not limited at all).
• Vigorous activities, such as running, lifting heavy objects,participating in strenuous sports.
• Moderate activities, such as moving a table, pushing a vacuum
cleaner, bowling, or playing golf.
• Lifting or carrying groceries.
• Climbing several flights of stairs.
• Climbing one flight of stairs.
• Bending, kneeling, or stooping.
• Walking more than a mile.
• Walking several blocks.
• Walking one block.
• Bathing or dressing yourself.

Cronbach’s alpha:

This method is commonly used as a measure of internal consistency or


reliability of a test. This was developed by Lee Cronbach in 1951. As an
extension of the Kuder-Richardson formula (KR20). This method uses the
variance of scores of odd, even and total items to workout the reliability. The
software NRTVB-99 is based on the following formula.

Cronbach’s a = 2[1-(s2odd + s2even) s2 total)

Indicates degree of internal consistency and it is a function of the number of items in the scale
and the degree of their inter correlations. Its ranges from 0 to 1 (never see 1). Measures the
proportion of variability that is shared among
items (covariance). When items all tend to measure the same thing they are highly related and
alpha is high. When items tend to measure different things they have very little correlation with
each other and alpha is low.
The major source of measurement error is due to the sampling of content.

Example:

1) The scores of all 2000 students on 80 items were entered into an excel
software we got the value of Cronbach’s a directly as 0.89 This value also
indicates very good internal consistency in the present inventory.

2) Conceptual Formula:
𝑘𝑟 −
𝛼=
1 + (𝑘 − 1)𝑟 −
where
k = number of items
r = the average correlation among all items.

• Given a 10 item scale with the average correlation between items is .50 (e.g., calculate the
correlation between items 1 and 2, between item 1 and 3, between 1 and 4, etc. for all possible
pairs). (10 taken 2 at a time = 45 pairs)

𝛼 = 10 (.5) / (1+9(.5)) =0.91

Test - Retest

Stability over
time
Alternative
Forms

RELIABILITY
Equivalence

Internal
Consistency

Split -Half Inter- rater Cronbach


Alpha

Fig: Reliability type


Validity

Test validity refers to the degree to which the tool actually measures what it claims to measure.
Validity can be defined as the accuracy with which the scale measure what it claims to measure.
Validity and purpose are like two sides of a coin.
Any measuring instrument which fulfils the purpose for which it is developed can be called a
valid measuring instrument. It is also the extent to which the inferences and conclusions made on
the basis of scores earned on measuring are appropriate and meaningful.

According to H. E. Garrett (1965)6


“The validity of a test or any measuring instrument depends upon the fidelity with which it
measures what it proposes to measure.”

According to Freeman (1960)7


"An index of validity shows the degree to which a test measures what it purpose to measure when
compared with accepted criteria"

According to Anastasi (2007)8


“The validity of a test concerns with what the test measures and how well it does so.”
Very simply, validity is the extent to which a test measures what it is supposed measure. The
question of validity is raised in the context of the 3 points made above from the test and the
population for whom it is intended. Therefore, we cannot ask the general question “is this a valid
test?” The question to ask is “how valid is this test for the decision that I need to make?”
We can divide the types of validity into logical and empirical.
Types of validity:
All procedures for determining test validity are concerned with relationship between
performance on a test and other independently observable facts about the behavior characteristics
under consideration.

1. Criterion (Pragmatic)Validity
It measures the correlation between the outcomes of a test for a construct and the outcomes of
pre-established tests that examine the individual criteria that form the overall construct. In other
words, if a given construct has 3 criteria, the outcomes of the test are correlated with the
outcomes of tests for each individual criteria that are already established as being valid.

Example: If a company conducts an IQ test of a job applicant and matches it with his/her past
academic record, any correlation that is observed will be an example of criterion-related validity.
Based on different time frames used, two kinds of criterion-related validity can be differentiated.

a) Concurrent validity
The measures should distinguish individuals -- whether one would be good for a job, or whether
someone wouldn't. It refers to the degree to which the results of a test correlates well with the
results obtained from a related test that has already been validated. The two tests are taken at the
same time, and they provide a correlation between events that are on the same temporal plane
(present).

Example 1:
Say a political candidate needs more campaign workers; she could use a test to determine who
would be effective campaign workers. She develops a test and administers it to people who are
working for her right now. She then checks to see whether people who score high on her test are
the same people who have been shown to be the best campaign workers now. If this is the case,
she has established the concurrent validity of the test.

Example 2:
If a batch of students is given an evaluative test, and on the same day, their teachers are asked to
rate each one of those students and the results of both sets are compared, any correlation that is
observed between the two sets of data will be concurrently valid.

b) Predictive Validity. In this case our political candidate could use the
index to predict who would become good campaign workers in the future.
Example:
Say, she runs an ad in the paper for part-time campaign workers. She asks them all to come in for
an interview and to take the test. She hires them all, and later checks to see if those who are the
best campaign workers are also the ones who did best on the test. If this is true, she has
established the predictive validity of the test and only needs to hire those who score high on her
test.

2. Construct Validity:
Three types of evidence can be obtained for the purpose of construct validity, depending on the
research problem. It refers to the ability of the test to measure the construct or quality that it
claims to measure, i.e., if a test claims to test intelligence, it is valid if it truly tests the
intelligence of the individual. It involves conducting a statistical analysis of the internal structure
of the test and its examination by a panel of experts to determine the suitability of each question.
It also studies the relationship between the test responses to the test questions, and the ability of
the individual to comprehend the questions and provide apt answers.

Example:
If a test is prepared with the intention of testing students subject knowledge of science, but the
language used to present problems is highly sophisticated and difficult to comprehend. In such a
case, the test, instead of gauging the knowledge, ends up testing the language proficiency, and
hence is not a valid construct for measuring the subject knowledge of the students.
a) Convergent Validity: Where different measures of the same concept yield similar
results. Here we used self-report versus observation (different measures).Yet; these two
measures should yield similar results since they were to measure verbal (or physical)
aggression. The results of verbal aggression from the two measures should be highly
correlated. Evidence that the same concept measured in different ways yields similar
results. In this case, you could include two different tests.

Example:
1. You could place people on meters on respondent’s television sets to record the time that
people spend with news programs. Then, this record can be compared with survey results of
“exposure to televised news”; or
2. You could send someone to observe respondent’s television use at their home, and compare
the observation results with your survey results.

b) Discriminant Validity: Evidence that the concept as measured can be differentiated from
other concepts. Our theory says that physical aggression and verbal aggression are
different behaviors. In this case, the correlations should below between questions asked
that dealt with verbal aggression and questions asked that dealt with physical aggression
in the self-report measure. Evidence that one concept is different from other closely
related concepts.

Example:
1) TV news exposure, you could include measures of exposure to TV entertainment
programs and determine if they differ from TV news exposure measures. In this case, the
measures of exposure to TV news should not related highly to measures of exposure to
TV entertainment programs.
c) Hypothesis-testing: Evidence that a research hypothesis about the relationship between
the measured concept (variable) and other concept (variable), derived from a theory, is
supported. In the case of physical aggression and television viewing, for example, there is
a social learning theory stating how violent behavior can be learned from observing and
modeling televised physical violence.

3) Face validity:
Basically face validity refers to the degree to which a test appears to measure what it purports to
measure. The researchers will look at the items and agree that the test is a valid measure of the
concept being measured just on the face of it. That is, we evaluate whether each of the measuring
items matches any given conceptual domain concept. It is an estimate of whether a particular test
appears to measure a construct. It does in no way imply whether it actually measures the
construct or not, but merely projects that it does.

Example: If a test appears to be measuring what it is supposed to, it has high face validity, but if
it doesn't then it has low face validity. It is the least sophisticated form of validity and is also
known as surface validity. Hence, if an intelligence appears to be testing the intelligence of
individuals, as observed by an evaluator, the test possesses face validity.

4) Content Validity
Content validity regards the representativeness or sampling adequacy of the content of a
measuring instrument. Content validity is always guided by a judgment: Is the content of the
measure representative of the universe of content of the concept being measured .Although both
face validation and content validation of a measurement is judgmental, the criterion for judgment
is different. While the belonging of each item to the concept being measured is to be determined
in the evaluation of face validity, content validation determines whether any left-out item should
be included in the measurement for its representativeness of the concept.

Example:
1) If a test is designed to assess the learning in the biology department, then that test must
cover all aspects of it including its various branches like zoology, botany, microbiology,
biotechnology, genetics, ecology, etc., or at least appear to cover.

2) May clarify the distinction. Now, the task here is to determine content validity of a
survey measure of "political participation." First, we may specify all the aspects/or
dimensions of this concept. Then, we may take the measurement apart to see if all of
these dimensions are represented on the test (e.g.the questionnaire).

POLITICAL PARTICIPATION
Dimensions Behavior: Behavior: Cognitions

Expressing own Learning other's

viewpoint viewpoint

Indicators Political activity Viewing broadcasts Interest in

politics

Voting Discuss with Party affiliation

registration family/friends

Voted in past Reading campaign Political

materials knowledge

Membership in
organizations

Bases of validity in Quantitative Research:


 Controllability
 Replicability
 Predictability
 Randomization of sample
 Neutrality
 Objectivity
 Observations
 Inference
 Manipulation of variables

Construct Validity Types fig


Content
Face Validity
Validity

Translation
Validity

CONSTRUCT
VALIDITY

Criterion -
Predictive Concurrent
Related
Validity Validity
Validity

Convergent Discriminant
Validity Validity
Difference between Validity and Reliability:

Reliability concerns the extent to which any measuring procedure yields the same results on
repeated trials. The more consistent the results achieved by the same participants in the same
repeated measurements, the higher the reliability of the measuring procedure; conversely the
less consistent the results, the lower the reliability An assessment instrument, for example, is
quite reliable if an individual obtains approximately the same score or outcome on repeated
examinations. Reliability is an important indicator of an instrument’s readability,
understandability, and general usefulness

Validity on the other hand is in a general sense, any measuring device that is valid if it does what
it is intended to do. More specifically, validty concerns the crucial relationship between
concept and indicator. Unlike reliability that focuses on the performance of empirical
measures, validity is usually more of a theoretically-oriented issue because it raises the question,
“valid for what purpose?” Validity is crucial to an instrument’s credibility; it is an indication that
the instrument is indeed measuring what it was designed to measure and that it is measuring it
accurately.
Validity, like reliability, is a matter of degree. Attaining a perfectly valid indicator—one that
represents the intended, and only the intended, concept—is unachievable. However, the higher
an instrument’s validity the higher the likelihood that it is measuring the theoretical constructs
for which it is expressly designed.

2. Provide an example of either weak reliability or weak validity.

Here is an example of a weak validity. Many recreational activities of high school students
involve driving cars. A researcher, wanting to measure whether recreational activities have a
negative effect on grade point average in high school students, might conduct a survey asking
how many students drive to school and then attempt to find a correlation between these two
factors. Because many students might use their cars for purposes other than or in addition to
recreation (e.g., driving to work after school, driving to school rather than walking or taking a
bus), this research study might prove invalid. Even if a strong correlation was found between
driving and grade point average, driving to school in and of itself would seem to be an invalid
measure of recreational activity.

Relationship between reliability and validity

When using reliability and validity, the following four results can occur:

1) High reliability, but low validity — the indicators measure something consistently but
not the intended concept;
2) High validity, but low reliability — the indicator represents the concept well but does
not produce consistent measurements;
3) Low validity and low reliability — the worst case; the indicators neither measure the
concept nor produce consistent results of whatever they measure; or
4) High validity and high reliability — what we hope for; the indicators consistently
measure what they were intended to measure.

When poor validity or reliability results occur, a researcher has to seek the source of error, revise
the measuring instrument as appropriate, and test again for validity and reliability.
Reference
1. R. C. Kothari (2009), Research Methodology, Method and Techniques, (New Delhi: New
Age International (P) ltd.), 2009 pp. 139-140.
2. A. Anastasi, & Urbina, S., op.cit. p. 110.
3. P. J. A. Rulon, Simplified procedure for determining the reliability of a test by split-
halves theory, 1939. Edu. Pr. 9, pp. 99-103
4. J. H. E. Garrelt, Statistics in Psychology and Education; Vakils, (Bombay: Feffer and
Simons Pvt. Ltd.), 1985, p.354.
5. S. F Freeman, Theory and Practice of Psychological Testing, (New York:
6. Holt, Rinehart and Winston. Inc. No.4), 1968 p.26.
7. A. Anastasi, & Urbina, S., op.cit., p.127.
8. A. Anastasi, Psychological Testing, (London: The Macmillan Co., Collier Mac Millan),
1970. p.135.

Das könnte Ihnen auch gefallen