Sie sind auf Seite 1von 6

Reliability

Reliability refers to how consistently a method measures something. If the same result
can be consistently achieved by using the same methods under the same circumstances, the
measurement is considered reliable.

Validity

Validity refers to how accurately a method measures what it is intended to measure. If


research has high validity that means it produces results that correspond to real properties,
characteristics, and variations in the physical or social world. High reliability is one indicator that
a measurement is valid. If a method is not reliable, it probably isn’t valid
Test Retest Reliability

Test-retest reliability measures the consistency of results when you repeat the same test
on the same sample at a different point in time.

For Example

A test of color blindness for trainee pilot applicants should have high test-retest reliability,
because color blindness is a trait that does not change over time.

A group of participants complete a questionnaire designed to measure personality traits. If they


repeat the questionnaire days, weeks or months apart and give the same answers, this indicates
high test-retest reliability.

Interrater reliability

Interrater reliability (also called interobserver reliability) measures the degree of


agreement between different people observing or assessing the same thing.

For Example

A team of researchers observe the progress of wound healing in patients. To record the stages of
healing, rating scales are used, with a set of criteria to assess various aspects of wounds. The
results of different researchers assessing the same set of patients are compared, and there is a
strong correlation between all sets of results, so the test has high interrater reliability.

Two people may be asked to categorize pictures of animals as being dogs or cats. A perfectly
reliable result would be that they both classify the same pictures in the same way.

Parallel forms reliability

Parallel forms reliability measures the correlation between two equivalent versions of a test.

For Example

A set of questions is formulated to measure financial risk aversion in a group of respondents. The
questions are randomly divided into two sets, and the respondents are randomly divided into two
groups. Both groups take both tests: group A takes test A first, and group B takes test B first. The
results of the two tests are compared, and the results are almost identical, indicating high parallel
forms reliability.

An experimenter develops a large set of questions. They split these into two and administer them
each to a randomly-selected half of a target sample.

Internal consistency

Internal consistency assesses the correlation between multiple items in a test that are intended to
measure the same construct.

For Example

If we design a questionnaire to measure self-esteem. If we randomly split the results into two
halves, there should be a strong correlation between the two sets of results. If the two results are
very different, this indicates low internal consistency

A group of respondents are presented with a set of statements designed to measure optimistic and
pessimistic mindsets. They must rate their agreement with each statement on a scale from 1 to 5.
If the test is internally consistent, an optimistic respondent should generally give high ratings to
optimism indicators and low ratings to pessimism indicators. The correlation is calculated
between all the responses to the “optimistic” statements, but the correlation is very weak. This
suggests that the test has low internal consistency.

Types of Validity

Construct validity

Construct validity evaluates whether a measurement tool really represents the thing we are
interested in measuring. It’s central to establishing the overall validity of a method.

For Example

A self-esteem questionnaire could be assessed by measuring other traits known or assumed to be


related to the concept of self-esteem (such as social skills and optimism). Strong correlation
between the scores for self-esteem and associated traits would indicate high construct validity.

There is no objective, observable entity called “depression” that we can measure directly. But
based on existing psychological research and theory, we can measure depression based on a
collection of symptoms and indicators, such as low self-confidence and low energy levels.

Content validity

Content validity assesses whether a test is representative of all aspects of the construct.

For Example

A mathematics teacher develops an end-of-semester algebra test for her class. The test should
cover every form of algebra that was taught in the class. If some types of algebra are left out,
then the results may not be an accurate indication of students’ understanding of the subject.
Similarly, if she includes questions that are not related to algebra, the results are no longer a
valid measure of algebra knowledge.

Let’s suppose we are looking at the effects of stress on worker productivity. We have our
participants answer questionnaires how much they think they are affected by stress in the
workplace and how much it affects their productivity. We could argue that this is low in content
validity because it hasn’t actually tested the effects of stress on worker productivity.

Face validity

Face validity considers how suitable the content of a test seems to be on the surface. It’s similar
to content validity, but face validity is a more informal and subjective assessment.

For Example

Suppose we are trying to measure the effects of watching a scary movie on participant stress
levels. We show our participants a scary movie and measure their cortisol levels before and after.
Cortisol is produced in response to stress, which means that on the surface, this study appears as
though it is measuring the effects of a scary movie on participant stress level. Therefore it has
high face validity.

You create a survey to measure the regularity of people’s dietary habits. You review the survey
items, which ask questions about every meal of the day and snacks eaten in between for every
day of the week. On its surface, the survey seems like a good representation of what you want to
test, so you consider it to have high face validity.

Criterion validity

Criterion validity evaluates how closely the results of your test correspond to the results of a
different test.

For Example

A survey is conducted to measure the political opinions of voters in a region. If the results
accurately predict the later outcome of an election in that region, this indicates that the survey
has high criterion validity.
A university professor creates a new test to measure applicants’ English writing ability. To
assess how well the test really does measure students’ writing ability, she finds an existing test
that is considered a valid measurement of English writing ability, and compares the results when
the same group of students take both tests. If the outcomes are very similar, the new test has a
high criterion validity.

Das könnte Ihnen auch gefallen