Sie sind auf Seite 1von 4

RELIABILITY OF MEASUREMENT

Overview
Reliability is, in essence, the extent to which a measurement method is consistent with
itself. There are two important aspects of reliability. The first is the extent to which a
measurement method produces the same score for the same case under the same conditions.
This is called test-retest reliability. The second is the extent to which responses to the
individual items in a multiple-item measure are consistent with each other. This is
called internal consistency.
Test-Retest Reliability
If you take an IQ test today, and then you take it again next week, you would expect your
scores to be quite similar. This is because we think of intelligence as a relatively stable
aspect of our personalities. If you scored very differently when you took the test the second
time, you would rightly wonder about the accuracy of the test.
In principle, it is very easy to assess test-retest reliability. Just use the measurement method
in question to test several people today and then use it to test the same people again in a day
or a week or a year. Then compute the Pearson's r for the correlation between the two sets of
scores. A high correlation (say .80 or above) indicates good test-retest
reliability. (A scatterplot for such a correlation would have most of the points falling pretty
close to a single straight line.) For example, the scatterplot below shows the relationship
between 20 people's scores on tha self-esteem test taken first on Monday and then again on
Friday. The correlation between the two sets of scores is +.87, which indicates good test-
retest reliability.

In practice, however, this approach does not always work. Consider that people taking the
same test a second time might remember their original answers and give them again just to be
consistent. This would make the test appear to be more reliable than it is. Also, we
might expectsome variables to change between the first and second measurements. If we
measure the mood of everyone in our class today and then we do the same thing next week,
the correlation might be low because peoples moods have changednot because there is
anything wrong with our measurement method. There are ways to deal with these problems,
but you will have to learn about them in a more advanced measurement course.
Internal Consistency
If you take a 10-item self-esteem testand you have high self-esteemthen you should tend
to give high self-esteem responses to all 10 items. If you have low self-esteem, then you
should tend to give low self-esteem responses to all 10 items. In general, peoples
responses to the different items on a multiple-response measure should be positively
correlated with each other. If they are not, then this indicates a problem with internal
consistency. For example, if peoples responses to Item 3 are completely unrelated to their
responses to Item 6, then it does not make sense to think that these two items are both
measuring self-esteem so they probably should not be on the same test.
One simple way to check for internal consistency is to look at the item-total
correlations. These are the correlations between the individual items and the total score.
That is, you can compute the correlation between Item 1 and the total score, between Item 2
and the total score, and so on. If the measure is internally consistent, then these correlations
should all be positive. Because each item on a self-esteem test is there because it supposedly
measures self-esteem, each item should be positively correlated with the total score. If many
items have low or negative correlations with the total score, then this indicates poor internal
consistency. Such items are usually dropped from the test. Item-total correlations are
interesting in part because your instructors (not just in psychology) will sometimes use them
to identify, and maybe even throw out, poor exam questions.
A second way to check for internal consistency is to compute what is called the split-half
correlation. This is the correlation between two scores, one based on one half of the items
and the other based on the other half of the items. Imagine that 100 people have taken
the Rosenberg Self-Esteem Scale. You could compute two self-esteem scores for each
person: one based on Items 1, 3, 5, 7, and 9, and the other based on Items 2, 4, 6, 8, and
10. Then you could compute the correlation between these two sets of scores. Again, it
should be fairly strong and positive.
A final way to check for internal consistency is to compute Cronbachs alpha, which is the
statistic that is most often presented in research reports. You would do this using a computer,
of course, but conceptually Cronbachs alpha is the mean split-half correlation for all possible
ways of splitting the items in half. Note that you could split the items on a 10-item measure
into the even items and the odd items, the first half (Items 1 5) and the second half (Items 6
10) , or even Items 1, 3, 4, 9, and 10 vs. Items 2, 5, 6, 7, and 8 and so on. If you were to
split the items in each of these ways, compute the split-half correlation for each one, and take
the mean of these split-half correlations, you would have Cronbachs alpha. (By the way, Lee
J. Cronbach was an undergraduate at Fresno State and we have an undergraduate Cronbach
Scholarship that goes in alternate years to a psychology or math student interested in
measurement.)
Some psychological tests have what are called sub-scales. These are, in essence, separate
tests that are combined together into one, where each test measures a different construct or a
different aspect of the same construct. For example, researchers have identified two
components of test anxiety. The first is autonomic nervous system arousal or nervous
feelings (fast heartbeat, muscle tension), and the second is negative thoughts (e.g., Im
gonna fail, Im gonna fail). Furthermore, they have discovered that these two components
seem to be independent of each other. It is possible to have nervous feelings without the
negative thoughts, and it is possible to have negative thoughts without the nervous
feelings. So a good measure of test-anxiety contains two sets of items: one to measure
nervous feelings and the other to measure negative thoughts. Note that what is important
here is the internal consistency of each sub-scale. You would not want to measure internal
consistency across all items because you would not expect them all to be related to each
other.
Why Does Reliability Matter?
The main reason that reliability matters is that a measure that is not reliable cannot be
valid. You can think of reliability as being a prerequisite for validity. For example, if a self-
esteem test gives very different scores for the same person under essentially the same
conditions, then we are not very well justified in taking either of those scores as a measure of
the persons self-esteem. Similarly, if the items on a self-esteem test are not correlated with
each other, then they cannot all be measuring self-esteem, and an aggregate of them cannot
be a very good measure of self-esteem. There are other reasons that reliability matters, but
they can wait until you take an advanced measurement course.

Das könnte Ihnen auch gefallen