Sie sind auf Seite 1von 6

c 


 ?
Warwick and Linninger (1975) point out that there are two basic goals in questionnaire
design.

1. To obtain information relevant to the purposes of the survey.

2. To collect this information with maximal reliability and validity.

How can a researcher be sure that the data gathering instrument being used will measure
what it is supposed to measure and will do this in a consistent manner? This is a question
that can only be answered by examining the definitions for and methods of establishing the
validity and reliability of a research instrument. These two very important aspects of
research design will be discussed in this module.


 ?

„alidity can be defined as the degree to which a test measures what it is supposed to
measure. There are three basic approaches to the validity of tests and measures as shown
by Mason and Bramble (1989). These are content validity, construct validity, and criterion-
related validity.

º 
 

This approach measures the degree to which the test items represent the domain or
universe of the trait or property being measured. In order to establish the content validity
of a measuring instrument, the researcher must identify the overall content to be
represented. Items must then be randomly chosen from this content that will accurately
represent the information in all areas. By using this method the researcher should obtain a
group of items which is representative of the content of the trait or property to be
measured.

Identifying the universe of content is not an easy task. It is, therefore, usually suggested
that a panel of experts in the field to be studied be used to identify a content area. For
example, in the case of researching the knowledge of teachers about a new curriculum, a
group of curriculum and teacher education experts might be asked to identify the content of
the test to be developed.

º 
 

Cronbach and Meehl (1955) indicated that, "Construct validity must be investigated
whenever no criterion or universe of content is accepted as entirely adequate to define the
quality to be measured" as quoted by Carmines and Zeller (1979). The term construct in
this instance is defined as a property that is offered to explain some aspect of human
behavior, such as mechanical ability, intelligence, or introversion („an Dalen, 1979). The
construct validity approach concerns the degree to which the test measures the construct it
was designed to measure.

There are two parts to the evaluation of the construct validity of a test. First and most
important, the theory underlying the construct to be measured must be considered. Second
the adequacy of the test in measuring the construct is evaluated (Mason and Bramble,
1989). For example, suppose that a researcher is interested in measuring the introverted
nature of first year teachers. The researcher defines introverted as the overall lack of social
skills such as conversing, meeting and greeting people, and attending faculty social
functions. This definition is based upon the researcher·s own observations. A panel of
experts is then asked to evaluate this construct of introversion. The panel cannot agree that
the qualities pointed out by the researcher adequately define the construct of introversion.
Furthermore, the researcher cannot find evidence in the research literature supporting the
introversion construct as defined here. Using this information, the validity of the construct
itself can be questioned. In this case the researcher must reformulate the previous
definition of the construct.

Once the researcher has developed a meaningful, useable construct, the adequacy of the
test used to measure it must be evaluated. First, data concerning the trait being measured
should be gathered and compared with data from the test being assessed. The data from
other sources should be similar or convergent. If convergence exists, construct validity is
supported.

After establishing convergence the discriminate validity of the test must be determined.
This involves demonstrating that the construct can be differentiated from other constructs
that may be somewhat similar. In other words, the researcher must show that the construct
being measured is not the same as one that was measured under a different name.

º  c 


 

This approach is concerned with detecting the presence or absence of one or more criteria
considered to represent traits or constructs of interest. One of the easiest ways to test for
criterion-related validity is to administer the instrument to a group that is known to exhibit
the trait to be measured. This group may be identified by a panel of experts. A wide range
of items should be developed for the test with invalid questions culled after the control
group has taken the test. Items should be omitted that are drastically inconsistent with
respect to the responses made among individual members of the group. If the researcher
has developed quality items for the instrument, the culling process should leave only those
items that will consistently measure the trait or construct being studied. For example,
suppose one wanted to develop an instrument that would identify teachers who are good at
dealing with abused children. First, a panel of unbiased experts identifies 100 teachers out
of a larger group that they judge to be best at handling abused children. The researcher
develops 400 yes/no items that will be administered to the whole group of teachers,
including those identified by the experts. The responses are analyzed and the items to
which the expert identified teachers and other teachers responding differently are seen as
those questions that will identify teachers who are good at dealing with abused children.

c  ?

The reliability of a research instrument concerns the extent to which the instrument yields
the same results on repeated trials. Although unreliability is always present to a certain
extent, there will generally be a good deal of consistency in the results of a quality
instrument gathered at different times. The tendency toward consistency found in repeated
measurements is referred to as reliability (Carmines & Zeller, 1979).

In scientific research, accuracy in measurement is of great importance. Scientific research


normally measures physical attributes which can easily be assigned a precise value. Many
times numerical assessments of the mental attributes of human beings are accepted as
readily as numerical assessments of their physical attributes. Although we may understand
that the values assigned to mental attributes can never be completely precise, the
imprecision is often looked upon as being too small to be of any practical concern. However,
the magnitude of the imprecision is much greater in the measurement of mental attributes
than in that of physical attributes. This fact makes it very important that the researcher in
the social sciences and humanities determine the reliability of the data gathering
instrument to be used (Willmott & Nuttall, 1975).
c 

One of the easiest ways to determine the reliability of empirical measurements is by the
retest method in which the same test is given to the same people after a period of time. The
reliability of the test (instrument) can be estimated by examining the consistency of the
responses between the two tests.

If the researcher obtains the same results on the two administrations of the instrument,
then the reliability coefficient will be 1.00. Normally, the correlation of measurements
across time will be less than perfect due to different experiences and attitudes that
respondents have encountered from the time of the first test.

The test-retest method is a simple, clear cut way to determine reliability, but it can be
costly and impractical. Researchers are often only able to obtain measurements at a single
point in time or do not have the resources for multiple administration.

     

Like the retest method, this method also requires two testings with the same people.
However, the same test is not given each time. Each of the two tests must be designed to
measure the same thing and should not differ in any systematic way. One way to help
ensure this is to use random procedures to select items for the different tests.

The alternative form method is viewed as superior to the retest method because a
respondent·s memory of test items is not as likely to play a role in the data received. One
drawback of this method is the practical difficulty in developing test items that are
consistent in the measurement of a specific phenomenon.

   

This method is more practical in that it does not require two administrations of the same or
an alternative form test. In the split-halves method, the total number of items is divided
into halves, and a correlation taken between the two halves. This correlation only estimates
the reliability of each half of the test. It is necessary then to use a statistical correction to
estimate the reliability of the whole test. This correction is known as the Spearman-Brown
prophecy formula (Carmines & Zeller, 1979)

Pxx" = 2Pxx'/1+Pxx'

where Pxx" is the reliability coefficient for the whole test and Pxx' is the split-half
correlation.

‘ 

If the correlation between the halves is .75, the reliability for the total test is:

Pxx" = [(2) (.75)]/(1 + .75) = 1.5/1.75 = .857

There are many ways to divide the items in an instrument into halves. The most typical
way is to assign the odd numbered items to one half and the even numbered items to the
other half of the test. One drawback of the split-halves method is that the correlation
between the two halves is dependent upon the method used to divide the items.

 º   
This method requires neither the splitting of items into halves nor the multiple
administration of instruments. The internal consistency method provides a unique estimate
of reliability for the given test administration. The most popular internal consistency
reliability estimate is given by Cronbach·s alpha. It is expressed as follows:

where N equals the number of items; equals the sum of item variance and
equals the variance of the total composite.

If one is using the correlation matrix rather than the variance-covariance matrix then
alpha reduces to the following:

alpha = Np/[1+p(N-1)]

where N equals the number of items and p equals the mean interitem correlation.

‘ 

The average intercorrelation of a six item scale is .5, then the alpha for the scale would be:

alpha = 6(.5)/[1+.5(6-1)]

= 3/3.5 = .857

An example of how alpha can be calculated can be given by using the 10 item self-esteem
scale developed by Rosenberg (1965). (See table) The 45 correlations in the table are first
summed: .185+.451+.048+ . . . + .233= 14.487. Then the mean interitem correlation is found
by dividing this sum by 45: 14.487/45= .32. Now use this number to calculate alpha:

alpha = 10(.32)/[1+.32(10-1)]

= 3.20/3.88

= .802

The coefficient alpha is an internal consistency index designed for use with tests containing
items that have no right answer. This is a very useful tool in educational and social science
research because instruments in these areas often ask respondents to rate the degree to
which they agree or disagree with a statement on a particular scale.


  ‘ 

  ? M? ? ? ? ? ? ? ? ? M?
1? 2? 2? 2? 3? 4? 5? 2? 1? 2? 4?

2? 1? 1? 2? 4? 5? 5? 1? 2? 2? 2?

3? 1? 2? 2? 5? 5? 4? 1? 2? 2? 1?

4? 3? 2? 2? 2? 1? 3? 2? 2? 2? 2?
5? 5? 5? 5? 4? 4? 3? 3? 2? 3? 4?

6? 1? 1? 1? 1? 5? 1? 1? 1? 1? 1?

7? 2? 2? 2? 2? 2? 2? 2? 2? 2? 2?

8? 2? 1? 2? 2? 4? 1? 3? 3? 1? 1?

9? 5? 5? 1? 1? 1? 2? 1? 2? 5? 4?

10? 4? 3? 3? 3? 1? 2? 1? 1? 3? 4?

?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??

N? 10? 10? 10? 10? 10? 10? 10? 10? 10? 10?

26? 24? 22? 27? 32? 27? 17? 18? 23? 25?
?

2.6? 2.4? 2.2? 2.7? 3.2? 2.8? 1.7? 1.8? 2.3? 2.5?

90? 78? 60? 89? 130? 98? 35? 36? 65? 79?
?

22.4? 20.4? 11.6? 16.1? 27.6? 19.6? 6.1? 3.6? 12.1? 16.5?
?

S 2? 2.5? 2.3? 1.3? 1.8? 3.1? 2.2? .68? .4? 1.3? 1.8?

= .917 .467 .337 .455 .014 -.146 .512 -.06 .74

P = Mean Interitem Correlation

alpha = Np / [1+p(N-1)]

= (10)(.36)/[1+.36(10-1)]

= 3.6/4.24

=.849

?

   ?


1. Name the three types of validity.

2. Name four ways to establish the reliability of an instrument.

Das könnte Ihnen auch gefallen