Beruflich Dokumente
Kultur Dokumente
2. A reliability coefficient is
a. an index.
b. a proportion of the total variance attributed to true variance.
c. unaffected by a systematic source of error.
*d. all of the above
Topic: The concept of reliability
5. Why might ability test scores among testtakers most typically vary?
a. because of the true ability of the testtaker
b. because of irrelevant, unwanted influences
c. because an error in the test norms adds five points to everyone’s score
*d. a and b
Topic: The concept of reliability
8. Which type of reliability estimate is obtained by correlating pairs of scores from the same person (or people)
on two different administrations of the same test?
a. parallel-forms
b. split-half
*c. test-retest
d. none of the above
Topic: Test-retest reliability estimates
9. Which type of reliability estimate would be appropriate only when evaluating the reliability of a test that
measures a trait that is relatively stable over time?
a. parallel-forms
b. alternate-forms
*c. test-retest
d. split-half
Topic: Test-retest reliability estimates
10. An estimate of test-retest reliability is often referred to as a coefficient of stability when the time interval
between the test and retest is more than
a. 30 days.
b. 60 days.
c. 3 months.
*d. 6 months.
Topic: Test-retest reliability estimates
12. Which of the following is true for estimates of alternate- and parallel-forms reliability?
a. Two test administrations with the same group are required.
b. Test scores may be affected by factors such as motivation, fatigue, or intervening events like practice,
learning, or therapy.
c. Item sampling is a source of error variance.
*d. all of the above
Topic: Parallel-forms and alternate-forms reliability estimates
14. Which source of error variance affects parallel- or alternate-form reliability estimates but does not affect test-
retest estimates?
a. fatigue
b. learning
c. practice
*d. item sampling
Topic: Parallel-forms and alternate-forms reliability estimates
15. Which of the following types of reliability estimates is the most expensive to obtain?
a. test-retest
*b. parallel-form
c. internal-consistency
d. Spearman’s rho
Topic: Parallel-forms and alternate-forms reliability estimates
16. What term refers to the degree of correlation between all the items on a scale?
a. inter-item homogeneity
*b. inter-item consistency
c. inter-item heterogeneity
d. parallel-form reliability
Topic: Split-half reliability estimates
17. Test-retest estimates of reliability are referred to as measures of ________, and split-half reliability
estimates are referred to as measures of ________.
a. true scores; error scores
b. internal consistency; stability
c. inter-scorer reliability; consistency
*d. stability; internal consistency
18. Which of the following is usually minimized when using split-half estimates of reliability as compared with
test-retest or parallel/alternate-form estimates of reliability?
*a. time and expense
b. reliability and validity
c. reliability only
d. none of the above
Topic: Split-half reliability estimates
19. Which of the following factors may influence a split-half reliability estimate?
a. fatigue
b. anxiety
c. item difficulty
*d. all of the above
Topic: Split-half reliability estimates
22. For a heterogeneous test, measures of internal-consistency reliability will tend to be ________ compared
with other methods of estimating reliability.
a. higher
*b. lower
c. very similar or higher
d. It is impossible to use measures of internal consistency with heterogeneous tests.
Topic: Split-half reliability estimates
23. Typically, adding items to a test will have what effect on the test’s reliability?
a. Reliability will decrease.
*b. Reliability will increase.
c. Reliability will stay the same.
d. b or c
24. Using estimates of internal consistency, which of the following tests would likely yield the highest reliability
coefficients?
a. a test of general intelligence
b. a test of achievement in the basic skill areas of reading, writing, and mathematics
*c. a test of reading comprehension
d. a test of vocational interest
Topic: Split-half reliability estimates
26. If items from a test are measuring the same trait, estimates of reliability yielded from split-half methods will
typically be ________ compared with KR-20.
a. higher
*b. lower
c. similar
d. exactly the same
Topic: Split-half reliability estimates
27. Which of the following is NOT an acceptable way to divide a test when using the split-half reliability method?
a. Randomly assign items to each half of the test.
b. Assign odd-numbered items to one half and even-numbered items to the other half of the test.
c. Assign the first-half of the items to one half of the test and the second half of the items to the other half
of the test.
*d. Assign easy items to one half of the test and difficult items to the other half.
Topic: Split-half reliability estimates
28. If items on a test are measuring very different traits, estimates of reliability yielded from split-half methods will
typically be ________ compared with KR-20.
*a. higher
b. lower
c. similar
d. exactly the same
Topic: Split-half reliability estimates
29. KR-20 is the statistic of choice for tests with which types of items?
a. multiple-choice
b. true-false
*c. all of the above
d. none of the above
Topic: Other methods of estimating internal consistency: The Kuder-Richardson formulas
31. Which is NOT an assumption that should be met in order to use KR-21?
a. Items should be dichotomous.
b. Items should be of equal difficulty.
c. Items should be homogeneous.
*d. Items should be scored by the same scorer.
Topic: Other methods of estimating internal consistency: The Kuder-Richardson formulas
32. Which of the following is generally the preferred statistic for obtaining a measure of internal-consistency
reliability?
a. KR-20
b. KR-21
c. split-half
*d. coefficient alpha
Topic: Other methods of estimating internal consistency: Coefficient alpha
33. Coefficient alpha is appropriate to use with all of the following test formats EXCEPT
a. multiple-choice
b. true-false
c. short-answer for which partial credit is awarded
*d. essay exam with no partial credit awarded
Topic: Other methods of estimating internal consistency: Coefficient alpha
39. Which best conveys the meaning of an inter-scorer reliability estimate of .90?
a. Ninety percent of the scores obtained are reliable.
*b. Ninety percent of the variance in the scores assigned by the scorers was attributed to true differences
and 10% to error.
c. Ten percent of the variance in the scores assigned by the scorers was attributed to true differences and
90% to error.
d. The test is stable.
Topic: Measures of inter-scorer reliability
40. When more than two scorers are used to determine inter-scorer reliability, the statistic of choice is
a. Pearson r.
b. Spearman’s rho.
c. KR-20.
*d. coefficient alpha.
Topic: Measures of inter-scorer reliability
41. For determining the reliability of tests scored using nominal scales of measurement, the statistic of choice is
a. Kendall’s Tau.
*b. the Kappa statistic.
c. KR-20.
d. coefficient alpha.
Topic: Measures of inter-scorer reliability
43. Which type(s) of reliability estimates would be most appropriate for a measure of heart rate?
*a. test-retest
b. alternate-form
c. inter-judge
d. all of the above
Topic: The nature of the test: Dynamic versus static characteristics
44. If a time limit is long enough to allow testtakers to attempt all items, and if some items are so difficult that no
testtaker is able to obtain a perfect score, then the test is referred to as a ________ test.
a. speed
*b. power
c. reliable
d. valid
Topic: The nature of the test: Speed versus power tests
47. Which of the following would result in the LEAST appropriate estimate of reliability for a speed test?
a. test-retest
b. alternate-form
*c. split-half from a single administration of the test
d. split-half from two independent testing sessions
Topic: The nature of the test: Speed versus power tests
48. A KR or split-half estimate of reliability for a speed test would provide an estimate that is
a. spuriously low.
*b. spuriously high.
c. insignificant.
d. equal to a test-retest method.
Topic: The nature of the test: Speed versus power tests
49. A test manual for a test of clerical speed (alphabetizing cards) cites a split-half reliability coefficient (for a
single administration of the test) of .95. What might you conclude?
a. The test is highly reliable.
b. The published reliability estimate is spuriously high.
c. The split-half estimate should not have been used in this instance.
*d. b and c
Topic: The nature of the test: Speed versus power tests
50. The Spearman-Brown formula can be used for which types of tests?
a. speed and multiple-choice
b. true-false and multiple-choice
*c. speed, true-false, and multiple-choice
d. none of the above
Topic: The nature of the test: Speed versus power tests
54. Traditional measures of reliability are inappropriate for criterion-referenced tests because
a. variability is maximized with criterion-referenced tests.
*b. variability is minimized with criterion-referenced tests.
c. variability cannot be determined by criterion-referenced tests.
d. criterion-referenced tests are generally speed tests.
Topic: The nature of the test: Criterion-referenced tests
55. If traditional measures of reliability are applied to criterion-referenced tests, the reliability estimates will likely
be
a. spuriously low.
*b. spuriously high.
c. reliable.
d. valid.
Topic: The nature of the test: Criterion-referenced tests
56. The fact that the length of a test influences the size of the reliability coefficient is based on which theory of
measurement?
*a. classical measurement theory
b. generalizability theory
c. domain sampling theory
d. Darwin’s theory of survival of the fittest
Topic: Alternatives to the true score model
57. Which estimate of reliability is most consistent with the domain sampling theory?
a. test-retest
b. alternate-form
*c. internal-consistency
d. inter-scorer
Topic: Alternatives to the true score model
58. Classical reliability theory estimates the portion of a test score that is attributed to ________, and domain
sampling theory estimates ________.
a. specific sources of variation; error
*b. error; specific sources of variation
c. the skills being measured; variation
d. the skills being measured; content knowledge
Topic: Alternatives to the true score model
59. Item Response Theory (IRT) of Latent Trait Theory focuses on which of the following?
a. circumstances under which the test was developed
b. how the test was administered
*c. the individual test items
d. all of the above
Topic: Alternatives to the true score model
61. The standard deviation of a theoretically normal distribution of test scores obtained by one person on
equivalent tests is
a. the standard error of the difference between means.
*b. the standard error of measurement.
c. the standard deviation of the reliability coefficient.
d. the variance.
Topic: The standard error of measurement
62. Which of the following is NOT a part of the formula for the standard error of measurement for a particular
test?
*a. the validity of the test
b. the reliability of the test
c. the standard deviation of the group of test scores
d. b and c
Topic: The standard error of measurement
63. “Sixty-eight percent of the scores for a particular test fall between 58 and 61” is a statement regarding
*a. a confidence interval.
b. the reliability of a test.
c. the validity of a test.
d. all of the above
Topic: The standard error of measurement
64. The standard error of measurement of a particular test of anxiety is 8. A student earns a score of 60. What is
the confidence interval for this test score at the 95% level?
a. 52–68
b. 40–68
*c. 44–76
d. 36–84
Topic: The standard error of measurement
65. As the confidence interval increases, the range of scores a single test score is likely to fall into
a. decreases.
*b. increases.
c. remains the same.
d. first increases, then decreases.
Topic: The standard error of measurement
67. If the standard deviations of two tests are identical but the reliability is lower for Test A as compared to Test
B, the standard error of measurement will be ________ for Test A as compared with Test B.
*a. higher
b. lower
c. the same
d. b or c
Topic: The standard error of measurement
68. Which statistic can help the test user determine how large a difference must exist for scores yielded from two
different tests to be considered statistically different?
a. standard error of measurement between two scores
*b. standard error of the difference between two scores
c. observed variance minus error variance
d. standard error of the difference between two means
Topic: The standard error of the difference between two scores
69. The standard error of the difference between two scores is larger than the standard error of measurement for
either score because the standard error of the difference between the two scores is affected by
a. the true score variance of each score.
b. the standard deviation of each score.
*c. the measurement error inherent in both scores.
d. all of the above
Topic: The standard error of the difference between two scores
70. If you were a school psychologist and you wanted to determine if the student you were evaluating scored
higher on a mathematics test than on a reading test, what statistic(s) would you be interested in computing?
a. the standard error of measurement for each test score
*b. the standard error of the difference between two scores
c. the raw score on each test as well as the mean of each distribution
d. all of the above
Topic: The standard error of the difference between two scores
71. The ________ in generalizability theory is analogous to the reliability coefficient in classical test theory.
a. universe coefficient
*b. coefficient of generalizability
c. universe score
d. Roulin coefficient
Topic: Alternatives to the true score model
74. In classical test theory, there exists only one true score. In Cronbach et al.’s generalizability theory, how
many of these true scores would exist?
a. one
b. as many as the number of times the test is administered to the individual
*c. many, depending on the number of different universes
d. b and c
Topic: Alternatives to the true score model
76. The Bayley Scales for Infant Development, Second Edition (BSID-II), contains Mental, Motor, and Behavior
Rating Scales. Because these three scales are designed to measure different characteristics (that is, they
are not homogeneous), it is inappropriate to combine the three scales in computing estimates of
a. alternate-forms reliability.
*b. internal-consistency reliability.
c. test-retest reliability.
d. inter-rater reliability.
Topic: Close-Up: The reliability of the Bayley Scales of Infant Development
77. Which of the following is NOT true of the Bayley Scales for Infant Development, Second Edition (BSID-II)?
a. It takes between 30 and 60 minutes to administer.
*b. It is designed for children between 1 and 18 months of age.
c. It contains Mental, Motor, and Behavior Rating Scales.
d. It is used to identify children who are behind developmentally.
Topic: Close-Up: The reliability of the Bayley Scales of Infant Development
78. The fact that young children develop rapidly and in “growth spurts” is a problem primarily in estimating which
type of reliability for the Bayley Scales?
a. internal-consistency reliability
b. alternate-forms reliability
*c. test-retest reliability
d. inter-rater reliability
Topic: Close-Up: The reliability of the Bayley Scales of Infant Development
79. Which of the following cannot be assessed for the Bayley Scales for Infant Development, Second Edition
(BSID-II)?
a. internal-consistency reliability
*b. alternate-forms reliability
c. test-retest reliability
d. inter-rater reliability
Topic: Close-Up: The reliability of the Bayley Scales of Infant Development
81. If a blood pressure machine consistently overestimated everyone’s blood pressure by 10 units, which of the
following would be true of the reliability of the blood pressure machine?
a. It would increase.
b. It would decrease.
*c. It would not be affected.
d. It cannot be determined, since it would depend on the actual blood pressure of the individuals.
Topic: The concept of reliability
82. In general, which of the following is true of the relationship between the magnitude of the test-retest reliability
estimate and the length of the interval between test administrations?
*a. The longer the interval, the lower the reliability coefficient.
b. The longer the interval, the higher the reliability coefficient.
c. Whether the interval is long or short, the magnitude of the reliability coefficient is not affected.
d. none of the above
Topic: Types of reliability estimates: Test-retest reliability estimates
83. What is the difference between alternate forms and parallel forms of a test?
*a. Alternate forms do not necessarily yield test scores with equal means and variances.
b. Alternate forms are designed to be equivalent in level of difficulty.
c. Alternate forms are different only in respect to how they are administered.
d. There are no differences between alternate and parallel forms of a test.
Topic: Types of reliability estimates: Parallel-forms and alternate-forms reliability estimates
84. Which of the following represents the major difference between estimates of alternate- or parallel-forms
reliability and test-retest reliability?
a. Two test administrations with the same group are required.
b. Test scores may be affected by factors such as the testtaker’s motivation or fatigue.
c. Test scores may be affected by practice or learning between administrations.
*d. Error variance from item sampling may result.
Topic: Types of reliability estimates: Parallel-forms and alternate-forms reliability estimates: Test-retest reliability
estimates
85. For which type of test is coefficient alpha the reliability estimate of choice?
a. tests with dichotomous items and binary scoring
b. tests with homogeneous items
*c. tests that can be scored along a continuum of values
d. tests with heterogeneous items and binary scoring
Topic: Other methods of estimating internal consistency: Coefficient alpha
86. In which type(s) of reliability estimates would test construction not be a significant source of error variance?
*a. test-retest
b. alternate-form
c. split-half
d. It would be a significant source in all of the above types.
Topic: The purpose of the reliability coefficient
87. What impact will it have on the magnitude of the coefficent of reliability if the variance of either variable is
restricted by the sampling procedures used?
*a. It is lowered.
b. It is raised.
c. It is unaffected.
d. It is different depending on the type of test (multiple-choice, true-false, etc.).
Topic: The nature of the test: Restriction or inflation of range
88. For criterion-referenced tests, which of the following reliability estimates is (are) recommended?
a. test-retest
b. alternate-form
c. split-half
*d. none of the above
Topic: The nature of the test: Criterion-referenced tests
90. If a student received a score of 50 on a math test with a standard error of measurement of 3, which of the
following statements would be true of his or her performance?
a. In 68% of the cases, the “true score” would be expected to be between 44 and 56.
*b. In 68% of the cases, the “true score” would be expected to be between 47 and 53.
c. In 95% of the cases, the “true score” would be expected to be between 47 and 53.
d. In 95% of the cases, the “true score” would be expected to be between 44 and 56.
Topic: The standard error of measurement
91. A psychologist administers a test and the testtaker scores a 52. If the cut-off score for eligibility for a
particular program is 50, what index will best help the psychologist determine how much confidence to place
in the testtaker’s obtained score of 52?
a. the standard error of difference
*b. the standard error of measurement
c. measures of central tendency: mean, median, or mode
d. measures of variability such as the standard deviation
Topic: The standard error of measurement
92. Which of the following is true of both the standard error of measurement and the standard error of
difference?
a. Both provide confidence levels.
b. Both can be used to compute confidence intervals for short answer tests.
*c. Both can be used to compare performance between two different tests.
d. None of the above
Topic: The standard error of measurement; The standard error of the difference between two scores
93. Because of the unique problems in assessing infants, which of the following is recommended when
estimating the reliability of the Bayley Scales for Infant Development, Second Edition (BSID-II)?
*a. Use relatively short test-retest intervals.
b. Use relatively long test-retest intervals.
c. Do not use the test-retest method of estimating reliability.
d. The length of the test-retest interval is not critical when calculating the test-retest reliability of the
Bayley.
Topic: Close-Up: The reliability of the Bayley Scales of Infant Development
94. Which of the following is true for the internal-consistency reliability estimates on the Bayley Scales for Infant
Development, Second Edition (BSID-II)?
*a. They are calculated separately for the Mental, Motor, and Behavior Scales because each scale is
assumed to measure a homogeneous set of abilities.
b. One reliability estimate is calculated across the Mental, Motor, and Behavior Scales because each
scale does not measure distinct abilities.
c. They are not calculated, because infants’ abilities are likely to change during the testing session.
d. Estimates are provided across the Mental, Motor, and Behavior Scales and for each separate scale.
Topic: Close-Up: The reliability of the Bayley Scales of Infant Development
95. According to the chapter Close-Up, directions to examiners on the Bayley Scales for Infant Development,
Second Edition (BSID-II), such as “Give credit if the child holds his hands open most of the time” could result
in problems when calculating which reliability estimates?
a. test-retest
b. alternate-form
*c. inter-rater
d. all of the above
Topic: Close-Up: The reliability of the Bayley Scales of Infant Development
97. A police officer mistakenly records the blood alcohol level of a suspected drunk driver after administering a
breathalyzer test. This mistake is most related to which type of reliability?
a. test-retest
*b. inter-scorer
c. internal-consistency
d. situational
Topic: Everyday Psychometrics: The Reliability Defense and the Breathalyzer Test
98. Which of the following represents the difference between a power test and a speed test?
a. Power tests involve physical strength; speed tests do not.
*b. In a power test, the testtaker has time to complete all items; in a speed test, a specific time limit is
imposed.
c. In a power test, a broad range of knowledge is assessed; in a speed test, a narrower range of
knowledge is assessed.
d. b and c
Topic: The nature of the test: Speed versus power tests
99. The index that allows a test user to compare two people’s scores on a specific test to determine if the true
scores are likely to be different is
a. the standard error of the mean.
*b. the standard error of the difference.
c. the standard deviation.
d. the correlation coefficient.
Topic: The standard error of the difference between two scores
102. McAfee’s test of attention span has a reliability coefficent of .84. The average score on the test is 10, with a
standard deviation of 5. Lawrence received a score of 64 on the test. We can be 95% sure that Lawrence’s
true attention span score falls between
a. 63 and 65.
b. 62 and 66.
c. 60 and 68.
*d. 54 and 74.
Topic: The standard error of measurement
104. The Spearman-Brown adjustment to the Pearson r is used in all of the following situations except
a. when split-half reliability estimates are computed.
b. in estimates of the effect of reducing the size of a test.
c. in estimates of the effect of increasing the size of a test.
*d. when computing reliability coefficents for rank-ordered data.
Topic: Types of reliability estimates: The Spearman-Brown formula
105. What type of reliability estimate is involved if you compare children’s performance on Form A and Form B of
the Peabody Picture Vocabulary Test—Revised?
a. test-retest
*b. alternate-forms
c. inter-rater
d. internal-consistency
Topic: Types of reliability estimates: Parallel-forms and alternate-forms reliability estimates
106. What index of reliability would you use to compare two evaluators’ assessments of a group of job applicants?
a. KR-20
b. coefficient alpha
*c. the Kappa statistic
d. the Spearman-Brown correction
Topic: Types of reliability estimates
109. A test containing 100 items is revised by deleting 20 items. What might be expected to happen to the
magnitude of the reliability estimate for that test?
a. It will increase.
*b. It will decrease.
c. It will stay the same.
d. It cannot be determined based on the information provided.
Topic: The concept of reliability: Types of reliability estimates
111. The greater the proportion of the total variance attributed to true variance, the more ____________ the test.
a. scientific
b. variable
*c. reliable
d. all of the above
Topic: The concept of reliability
113. A score earned by a testtaker on a psychological test is equal to which of the following?
a. the observed score
b. error
c. the true score
*d. the true score plus error
Topic: The concept of reliability
116. Which of the following is TRUE about systematic and unsystematic error in the assessment of physical and
psychological abuse in a relationship?
a. Few sources of unsystematic error exist, due to the nature of what is being assessed.
b. Few sources of systematic error exist.
*c. Gender represents a source of systematic error.
d. Agreement exists on current methods for estimating true versus error variance.
Topic: Other sources of error
117. Approximately what percentage of scores would be expected to fall within two standard deviations above or
below the standard error of measurement of the true score?
a. 85%
b. 90%
*c. 95%
d. 99%
Topic: The standard error of measurement
118. Manuel earns a 90 on a standardized math test. The standard error of measurement for this test is 5. We
would be confident that 95% of the scores fall between _____________________.
a. 85 and 95
b. 80 and 100
*c. 80 and 100
d. Cannot determine based on the information provided.
Topic: The standard error of measurement