Reliability and Validity

Reliability and Validity
In order for assessments to be sound, they must be free of bias and distortion. Reliability and validity are two
concepts that are important for defining and measuring bias and distortion.
Reliability refers to the extent to which assessments are consistent. Just as we enjoy having reliable cars (cars that
start every time we need them), we strive to have reliable, consistent instruments to measure student achievement.
Another way to think of reliability is to imagine a kitchen scale. If you weigh five pounds of potatoes in the morning,
and the scale is reliable, the same scale should register five pounds for the potatoes an hour later (unless, of course,
you peeled and cooked them). Likewise, instruments such as classroom tests and national standardized exams
should be reliable it should not make any difference whether a student takes the assessment in the morning or
afternoon; one day or the next.
Another measure of reliability is the internal consistency of the items. For example, if you create a quiz to measure
students ability to solve quadratic equations, you should be able to assume that if a student gets an item correct, he
or she will also get other, similar items correct. The following table outlines three common reliability measures.

Type of Reliability
How to Measure
Stability or Test-Retest Give the same assessment twice, separated by days, weeks, or months. Reliability is stated as
the correlation between scores at Time 1 and Time 2.
Alternate Form Create two forms of the same test (vary the items slightly). Reliability is stated as correlation
between scores of Test 1 and Test 2.
Internal Consistency (Alpha, a) Compare one half of the test to the other half. Or, use methods such as Kuder-
Richardson Formula 20 (KR20) or Cronbach's Alpha.
The values for reliability coefficients range from 0 to 1.0. A coefficient of 0 means no reliability and 1.0 means
perfect reliability. Since all tests have some error, reliability coefficients never reach 1.0. Generally, if the reliability of
a standardized test is above .80, it is said to have very good reliability; if it is below .50, it would not be considered a
very reliable test.
Validity refers to the accuracy of an assessment -- whether or not it measures what it is supposed to measure. Even
if a test is reliable, it may not provide a valid measure. Lets imagine a bathroom scale that consistently tells you that
you weigh 130 pounds. The reliability (consistency) of this scale is very good, but it is not accurate (valid) because
you actually weigh 145 pounds (perhaps you re-set the scale in a weak moment)! Since teachers, parents, and school
districts make decisions about students based on assessments (such as grades, promotions, and graduation), the
validity inferred from the assessments is essential -- even more crucial than the reliability. Also, if a test is valid, it is
almost always reliable.
There are three ways in which validity can be measured. In order to have confidence that a test is valid (and
therefore the inferences we make based on the test scores are valid), all three kinds of validity evidence should be
considered.

Type of Validity
Definition
Example/Non-Example
Content The extent to which the content of the test matches the instructional objectives. A semester or
quarter exam that only includes content covered during the last six weeks is not a valid measure of the course's
overall objectives -- it has very low content validity.
Criterion The extent to which scores on the test are in agreement with (concurrent validity) or predict
(predictive validity) an external criterion. If the end-of-year math tests in 4th grade correlate highly with the
statewide math tests, they would have high concurrent validity.
Construct The extent to which an assessment corresponds to other variables, as predicted by some rationale or
theory. If you can correctly hypothesize that ESOL students will perform differently on a reading test than English-
speaking students (because of theory), the assessment may have construct validity.
So, does all this talk about validity and reliability mean you need to conduct statistical analyses on your classroom
quizzes? No, it doesn't. (Although you may, on occasion, want to ask one of your peers to verify the content validity
of your major assessments.) However, you should be aware of the basic tenets of validity and reliability as you
construct your classroom assessments, and you should be able to help parents interpret scores for the standardized
exams.
Try This
Reflect on the following scenarios.
A parent called you to ask about the reliability coefficient on a recent standardized test. The coefficient was
reported as .89, and the parent thinks that must be a very low number. How would you explain to the parent that
.89 is an acceptable coefficient?
Your school district is looking for an assessment instrument to measure reading ability. They have narrowed the
selection to two possibilities -- Test A provides data indicating that it has high validity, but there is no information
about its reliability. Test B provides data indicating that it has high reliability, but there is no information about its
validity. Which test would you recommend? Why?

Reliability and Validity

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Reliability and Validity

Hochgeladen von

Copyright:

Verfügbare Formate

Reliability and Validity

Das könnte Ihnen auch gefallen