Assessing reliability and validity in research tools

Reliability is the degree to which an assessment tool produces stable and consistent results.
Reliability is used to describe the overall consistency of a measure. A measure is said to have a high reliability
if it produces similar results under consistent conditions.
The goal of reliability theory is to estimate errors in measurement and to suggest ways of improving tests so that
errors are minimized. (Wikipedia)
Reliability is concerned with questions of stability and consistency - does the same measurement tool yield stable and consistent
results when repeated over time. Think about measurement processes in other contexts - in construction or woodworking, a tape
measure is a highly reliable measuring instrument.
Validity refers to how well a test measures what it is purported to measure.

Validity is important because it can help determine what types of tests to use, and help to make sure researchers
are using methods that are not only ethical, and cost-effective, but also a method that truly measures the idea or
construct in question.(WIKIPEDIA)
Validity refers to the extent we are measuring what we hope to measure (and what we think we are measuring). To continue
with the example of measuring the piece of wood, a tape measure that has been created with accurate spacing for inches, feet,
etc. should yield valid results as well. Measuring this piece of wood with a "good" tape measure should produce a correct
measurement of the wood's length.
To apply these concepts to social research, we want to use measurement tools that are both reliable and valid. We want
questions that yield consistent responses when asked multiple times - this is reliability. Similarly, we want questions that get
accurate responses from respondents - this is validity.
Joppe (2000) defines reliability as: The extent to which results are consistent over time and an accurate
representation of the total population under study is referred to as reliability and if the results of a study can be
reproduced under a similar methodology, then the research instrument is considered to be reliable. (p. 1)
Embodied in this citation is the idea of replicability or repeatability of results or observations.
Our study has limited reliability and validity as results cannot be compared. Results do not represent the total population.
So as to increase reliability and validity the research instruments should be used multiple times and in different studies so as to
probe the consistency of results over time.
What are some ways to improve validity?

1. Make sure your goals and objectives are clearly defined and operationalized. Expectations of students
should be written down.

2. Match your assessment measure to your goals and objectives. Additionally, have the test reviewed by
faculty at other schools to obtain feedback from an outside party who is less invested in the instrument.
3. Get students involved; have the students look over the assessment for troublesome wording, or other
difficulties.
4. If possible, compare your measure with other measures, or data that may be available.
http://www.uni.edu/chfasoa/reliabilityandvalidity.htm
Difference of reliability from validity [edit]

Reliability does not imply validity. That is, a reliable measure that is measuring something consistently is not
necessarily measuring what you want to be measuring. For example, while there are many reliable tests of
specific abilities, not all of them would be valid for predicting, say, job performance. In terms of accuracy and
precision, reliability is a more accurate way of describing precision, while validity is a more precise way of
describing accuracy.
While reliability does not imply validity, a lack of reliability does place a limit on the overall validity of a test.
A test that is not perfectly reliable cannot be perfectly valid, either as a means of measuring attributes of a
person or as a means of predicting scores on a criterion. While a reliable test may provide useful valid
information, a test that is not reliable cannot possibly be valid.[4]
An example often used to illustrate the difference between reliability and validity in the experimental sciences
involves a common bathroom scale. If someone who is 200 pounds steps on a scale 5 times and gets readings of
"15", "250", "95", "140", and "500", then the scale is not reliable. If the scale consistently reads "150", then it is
reliable, but not valid. If it reads "200" each time, then the measurement is both reliable and valid.
General model[edit]
In practice, testing measures are never perfectly consistent. Theories of test reliability have been developed to
estimate the effects of inconsistency on the accuracy of measurement. The basic starting point for almost all
theories of test reliability is the idea that test scores reflect the influence of two sorts of factors:[4]
1. Factors that contribute to consistency: stable characteristics of the individual or the attribute that one is
trying to measure
2. Factors that contribute to inconsistency: features of the individual or the situation that can affect test
scores but have nothing to do with the attribute being measured.
These factors include:[4]
Temporary but general characteristics of the individual: health, fatigue, motivation, emotional strain
Temporary and specific characteristics of individual: comprehension of the specific test task, specific
tricks or techniques of dealing with the particular test materials, fluctuations of memory, attention or
accuracy
Aspects of the testing situation: freedom from distractions, clarity of instructions, interaction of
personality, sex, or race of examiner
Chance factors: luck in selection of answers by sheer guessing, momentary distractions
The goal of estimating reliability is to determine how much of the variability in test scores is due to errors in
measurement and how much is due to variability in true scores.[4]
A true score is the replicable feature of the concept being measured. It is the part of the observed score that
would recur across different measurement occasions in the absence of error.
Errors of measurement are composed of both random error and systematic error. It represents the
discrepancies between scores obtained on tests and the corresponding true scores.
This conceptual breakdown is typically represented by the simple equation:
Observed test score = true score + errors of measurement
Test validity[edit]
Reliability (consistency) and validity (accuracy)[edit]
Validity & Reliability

Validity of an assessment is the degree to which it measures what it is supposed to measure. This is not the same
as reliability, which is the extent to which a measurement gives results that are consistent. Within validity, the
measurement does not always have to be similar, as it does in reliability. When a measure is both valid and
reliable, the results will appear as in the image to the right. Though, just because a measure is reliable, it is not
necessarily valid (and vice-versa). Validity is also dependent on the measurement measuring what it was
designed to measure, and not something else instead.[3] Validity (similar to reliability) is based on matters of
degrees; validity is not an all or nothing idea. There are many different types of validity.
An early definition of test validity identified it with the degree of correlation between the test and a criterion.
Under this definition, one can show that reliability of the test and the criterion places an upper limit on the
possible correlation between them (the so-called validity coefficient). Intuitively, this reflects the fact that
reliability involves freedom from random error and random errors do not correlate with one another. Thus, the
less random error in the variables, the higher the possible correlation between them. Under these definitions, a
test cannot have high validity unless it also has high reliability. However, the concept of validity has expanded
substantially beyond this early definition and the classical relationship between reliability and validity need not
hold for alternative conceptions of reliability and validity.
Within classical test theory, predictive or concurrent validity (correlation between the predictor and the
predicted) cannot exceed the square root of the correlation between two versions of the same measure that is,
reliability limits validity.

Assessing reliability and validity in research tools

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Assessing reliability and validity in research tools

Hochgeladen von

Copyright:

Verfügbare Formate

Reliability is the degree to which an assessment tool produces stable and consistent results.

Validity refers to how well a test measures what it is purported to measure.

What are some ways to improve validity?

should be written down.

Difference of reliability from validity [edit]

Validity & Reliability

Das könnte Ihnen auch gefallen