5 - Reliability

Reliability
1. Reliability, in a broad statistical sense, is synonymous with

a. consistently good.
b. consistently bad.
*c. consistency.
d. validity.
Topic: The concept of reliability
2. A reliability coefficient is
a. an index.
b. a proportion of the total variance attributed to true variance.
c. unaffected by a systematic source of error.
*d. all of the above
3. Which of the following is true of systematic error?

a. It significantly lowers the reliability of a measure.
b. It insignificantly lowers the reliability of a measure.
c. It increases the reliability of a measure.
*d. It has no effect on the reliability of a measure.
4. As the degree of reliability increases, the proportion of

a. total variance attributed to true variance decreases.
*b. total variance attributed to true variance increases.
c. total variance attributed to error variance increases.
d. none of the above
5. Why might ability test scores among testtakers most typically vary?
a. because of the true ability of the testtaker
b. because of irrelevant, unwanted influences
c. because an error in the test norms adds five points to everyone’s score
*d. a and b
6. A source of error variance may take the form of

a. item sampling.
b. testtakers’ reactions to environment-related variables such as room temperature and lighting.
c. testtaker variables such as amount of sleep the night before a test, amount of anxiety, or drug effects.
Topic: Sources of error variance
7.Computer-scorable items have virtually eliminated error variance due to

a. item sampling.
*b. scorer differences.
c. content sampling.
d. testtakers’ reactions to environmental variables.
8. Which type of reliability estimate is obtained by correlating pairs of scores from the same person (or people)
on two different administrations of the same test?
a. parallel-forms
b. split-half
*c. test-retest
Topic: Test-retest reliability estimates
9. Which type of reliability estimate would be appropriate only when evaluating the reliability of a test that
measures a trait that is relatively stable over time?
a. parallel-forms
b. alternate-forms
*c. test-retest
d. split-half
10. An estimate of test-retest reliability is often referred to as a coefficient of stability when the time interval
between the test and retest is more than
a. 30 days.
b. 60 days.
c. 3 months.
*d. 6 months.
11. Which of the following might lead to a decrease in test-retest reliability?

a. the passage of time between the two administrations of the test
b. coaching designed to increase test scores
c. practice with similar test materials
Topic: Test-retest and alternate-forms reliability estimates
12. Which of the following is true for estimates of alternate- and parallel-forms reliability?
a. Two test administrations with the same group are required.
b. Test scores may be affected by factors such as motivation, fatigue, or intervening events like practice,
learning, or therapy.
c. Item sampling is a source of error variance.
Topic: Parallel-forms and alternate-forms reliability estimates
13. Which of the following is true for parallel forms of a test?

a. The means of the observed scores are equal for the two forms.
b. The variances of the observed scores are equal for the two forms.
*c. The means and variances of the observed scores are equal for the two forms.
14. Which source of error variance affects parallel- or alternate-form reliability estimates but does not affect test-
retest estimates?
a. fatigue
b. learning
c. practice
*d. item sampling
15. Which of the following types of reliability estimates is the most expensive to obtain?
a. test-retest
*b. parallel-form
c. internal-consistency
d. Spearman’s rho
16. What term refers to the degree of correlation between all the items on a scale?
a. inter-item homogeneity
*b. inter-item consistency
c. inter-item heterogeneity
d. parallel-form reliability
Topic: Split-half reliability estimates
17. Test-retest estimates of reliability are referred to as measures of ________, and split-half reliability
estimates are referred to as measures of ________.
a. true scores; error scores
b. internal consistency; stability
c. inter-scorer reliability; consistency
*d. stability; internal consistency
Topic: Test-retest reliability estimates: Split-half reliability estimates
18. Which of the following is usually minimized when using split-half estimates of reliability as compared with
test-retest or parallel/alternate-form estimates of reliability?
*a. time and expense
b. reliability and validity
c. reliability only
19. Which of the following factors may influence a split-half reliability estimate?
a. fatigue
b. anxiety
c. item difficulty
20. Internal-consistency estimates of reliability are inappropriate for

a. reading achievement tests.
b. scholastic aptitude/intelligence tests.
*c. typing tests based on speed.
d. tests purporting to measure a single personality trait.
21. The Spearman-Brown formula is used for

a. correcting for one half of the test by estimating the reliability of the whole test.
b. determining how many additional items are needed to increase reliability up to a certain level.
c. determining how many items can be eliminated without reducing reliability below a predetermined level.
22. For a heterogeneous test, measures of internal-consistency reliability will tend to be ________ compared
with other methods of estimating reliability.
a. higher
*b. lower
c. very similar or higher
d. It is impossible to use measures of internal consistency with heterogeneous tests.
23. Typically, adding items to a test will have what effect on the test’s reliability?
a. Reliability will decrease.
*b. Reliability will increase.
c. Reliability will stay the same.
d. b or c
24. Using estimates of internal consistency, which of the following tests would likely yield the highest reliability
coefficients?
a. a test of general intelligence
b. a test of achievement in the basic skill areas of reading, writing, and mathematics
*c. a test of reading comprehension
d. a test of vocational interest
25. Error variance for measures of inter-item consistency comes from

a. fatigue.
b. motivation.
c. a testtaker practice effect.
*d. heterogeneity of the content.
26. If items from a test are measuring the same trait, estimates of reliability yielded from split-half methods will
typically be ________ compared with KR-20.
a. higher
*b. lower
c. similar
d. exactly the same
27. Which of the following is NOT an acceptable way to divide a test when using the split-half reliability method?
a. Randomly assign items to each half of the test.
b. Assign odd-numbered items to one half and even-numbered items to the other half of the test.
c. Assign the first-half of the items to one half of the test and the second half of the items to the other half
of the test.
*d. Assign easy items to one half of the test and difficult items to the other half.
28. If items on a test are measuring very different traits, estimates of reliability yielded from split-half methods will
typically be ________ compared with KR-20.
*a. higher
b. lower
c. similar
d. exactly the same
29. KR-20 is the statistic of choice for tests with which types of items?
a. multiple-choice
b. true-false
*c. all of the above
Topic: Other methods of estimating internal consistency: The Kuder-Richardson formulas
30. The KR-21 reliability estimate was developed

a. to yield higher reliability coefficients.
*b. to facilitate computation by hand.
c. for use with less homogeneous items.
d. a and b
Topic: Other methods of estimating internal consistency: The Kuder-Richardson formulas34.
31. Which is NOT an assumption that should be met in order to use KR-21?
a. Items should be dichotomous.
b. Items should be of equal difficulty.
c. Items should be homogeneous.
*d. Items should be scored by the same scorer.
32. Which of the following is generally the preferred statistic for obtaining a measure of internal-consistency
reliability?
a. KR-20
b. KR-21
c. split-half
*d. coefficient alpha
Topic: Other methods of estimating internal consistency: Coefficient alpha
33. Coefficient alpha is appropriate to use with all of the following test formats EXCEPT
a. multiple-choice
b. true-false
c. short-answer for which partial credit is awarded
*d. essay exam with no partial credit awarded
34. The “20” and “21” in KR-20 and KR-21 represent

a. numbers held constant in the denominator.
b. numbers held constant in the numerator.
*c. the order in which the formulas were created.
35. Coefficient alpha is an expression of

a. the mean of split-half correlations between odd- and even-numbered items.
b. the mean of split-half correlations between first- and second-half items.
*c. the mean of all possible split-half correlations.
d. the mean of the best possible split-half correlations.
36. A coefficient alpha over .9 may indicate that

a. the items in the test are too dissimilar.
b. the test is not reliable.
*c. the items in the test are redundant.
d. the test is biased against low-ability individuals.
37. Which of the following is true about the coefficient alpha statistic?
a. It is always the best measure of reliability.
b. It measures bias against minority groups.
c. It measures test-retest reliability
*d. It is a characteristic of a particular set of scores, not of the test itself.
38. Which is a synonym for inter-scorer reliability?

a. inter-judge reliability
b. observer reliability
c. inter-rater reliability
Topic: Measures of inter-scorer reliability
39. Which best conveys the meaning of an inter-scorer reliability estimate of .90?
a. Ninety percent of the scores obtained are reliable.
*b. Ninety percent of the variance in the scores assigned by the scorers was attributed to true differences
and 10% to error.
c. Ten percent of the variance in the scores assigned by the scorers was attributed to true differences and
90% to error.
d. The test is stable.
40. When more than two scorers are used to determine inter-scorer reliability, the statistic of choice is
a. Pearson r.
b. Spearman’s rho.
c. KR-20.
*d. coefficient alpha.
41. For determining the reliability of tests scored using nominal scales of measurement, the statistic of choice is
a. Kendall’s Tau.
*b. the Kappa statistic.
c. KR-20.
d. coefficient alpha.
42. If a test is homogeneous

a. it is functionally uniform throughout.
b. it will likely yield a high internal-consistency reliability estimate compared with test-retest.
c. it would be reasonable to expect a high degree of internal consistency.
Topic: The nature of the test: Homogeneity versus heterogeneity of test items
43. Which type(s) of reliability estimates would be most appropriate for a measure of heart rate?
*a. test-retest
b. alternate-form
c. inter-judge
d. all of the above
Topic: The nature of the test: Dynamic versus static characteristics
44. If a time limit is long enough to allow testtakers to attempt all items, and if some items are so difficult that no
testtaker is able to obtain a perfect score, then the test is referred to as a ________ test.
a. speed
*b. power
c. reliable
d. valid
Topic: The nature of the test: Speed versus power tests
45. Typically, speed tests

*a. contain items of a uniform difficulty level.
b. are completed by fewer than 1% of all testtakers.
c. have low validity coefficients.
d. yield high rates of false positives.
46. Which type(s) of reliability estimates would be appropriate for a speed test?
a. test-retest
b. alternate-form
c. split-half from two independent testing sessions
47. Which of the following would result in the LEAST appropriate estimate of reliability for a speed test?
a. test-retest
b. alternate-form
*c. split-half from a single administration of the test
d. split-half from two independent testing sessions
48. A KR or split-half estimate of reliability for a speed test would provide an estimate that is
a. spuriously low.
*b. spuriously high.
c. insignificant.
d. equal to a test-retest method.
49. A test manual for a test of clerical speed (alphabetizing cards) cites a split-half reliability coefficient (for a
single administration of the test) of .95. What might you conclude?
a. The test is highly reliable.
b. The published reliability estimate is spuriously high.
c. The split-half estimate should not have been used in this instance.
*d. b and c
50. The Spearman-Brown formula can be used for which types of tests?
a. speed and multiple-choice
b. true-false and multiple-choice
*c. speed, true-false, and multiple-choice
51. An estimate of the reliability of a speed test is a measure of

a. the stability of the test.
*b. the consistency of the response speed.
c. the homogeneity of the test items.
d. all of the above
52. Use of the Spearman-Brown formula would be inappropriate

a. to estimate the effect on reliability of shortening a test.
b. to determine the number of items needed in a test to obtain the desired level of reliability.
*c. to estimate the internal consistency of a speed test.
d. All of the above represent appropriate uses of the Spearman-Brown formula.
Topic: Split-half reliability estimates: The Spearman-Brown formula
53. Interpretations of criterion-referenced tests are typically made with respect to

a. the total number of items the examinee responded to.
*b. the material the examinee evidenced mastery of.
c. a comparison of the examinee’s performance with that of others who took the test.
d. the total number of items the examinee passed or failed.
Topic: The nature of the test: Criterion-referenced tests
54. Traditional measures of reliability are inappropriate for criterion-referenced tests because
a. variability is maximized with criterion-referenced tests.
*b. variability is minimized with criterion-referenced tests.
c. variability cannot be determined by criterion-referenced tests.
d. criterion-referenced tests are generally speed tests.
55. If traditional measures of reliability are applied to criterion-referenced tests, the reliability estimates will likely
be
a. spuriously low.
*b. spuriously high.
c. reliable.
d. valid.
56. The fact that the length of a test influences the size of the reliability coefficient is based on which theory of
measurement?
*a. classical measurement theory
b. generalizability theory
c. domain sampling theory
d. Darwin’s theory of survival of the fittest
Topic: Alternatives to the true score model
57. Which estimate of reliability is most consistent with the domain sampling theory?
a. test-retest
b. alternate-form
*c. internal-consistency
d. inter-scorer
58. Classical reliability theory estimates the portion of a test score that is attributed to ________, and domain
sampling theory estimates ________.
a. specific sources of variation; error
*b. error; specific sources of variation
c. the skills being measured; variation
d. the skills being measured; content knowledge
59. Item Response Theory (IRT) of Latent Trait Theory focuses on which of the following?
a. circumstances under which the test was developed
b. how the test was administered
*c. the individual test items
d. all of the above
60. Generalizability theory focuses on which of the following?

a. circumstances under which the test was developed
b. circumstances under which the test was administered
c. circumstances under which the test was interpreted
61. The standard deviation of a theoretically normal distribution of test scores obtained by one person on
equivalent tests is
a. the standard error of the difference between means.
*b. the standard error of measurement.
c. the standard deviation of the reliability coefficient.
d. the variance.
Topic: The standard error of measurement
62. Which of the following is NOT a part of the formula for the standard error of measurement for a particular
test?
*a. the validity of the test
b. the reliability of the test
c. the standard deviation of the group of test scores
d. b and c
63. “Sixty-eight percent of the scores for a particular test fall between 58 and 61” is a statement regarding
*a. a confidence interval.
b. the reliability of a test.
c. the validity of a test.
d. all of the above
64. The standard error of measurement of a particular test of anxiety is 8. A student earns a score of 60. What is
the confidence interval for this test score at the 95% level?
a. 52–68
b. 40–68
*c. 44–76
d. 36–84
65. As the confidence interval increases, the range of scores a single test score is likely to fall into
a. decreases.
*b. increases.
c. remains the same.
d. first increases, then decreases.
66. As the reliability of a test increases, the standard error of measurement

a. increases.
*b. decreases.
c. remains the same.
d. first increases, then decreases.
67. If the standard deviations of two tests are identical but the reliability is lower for Test A as compared to Test
B, the standard error of measurement will be ________ for Test A as compared with Test B.
*a. higher
b. lower
c. the same
d. b or c
68. Which statistic can help the test user determine how large a difference must exist for scores yielded from two
different tests to be considered statistically different?
a. standard error of measurement between two scores
*b. standard error of the difference between two scores
c. observed variance minus error variance
d. standard error of the difference between two means
Topic: The standard error of the difference between two scores
69. The standard error of the difference between two scores is larger than the standard error of measurement for
either score because the standard error of the difference between the two scores is affected by
a. the true score variance of each score.
b. the standard deviation of each score.
*c. the measurement error inherent in both scores.
d. all of the above
70. If you were a school psychologist and you wanted to determine if the student you were evaluating scored
higher on a mathematics test than on a reading test, what statistic(s) would you be interested in computing?
a. the standard error of measurement for each test score
*b. the standard error of the difference between two scores
c. the raw score on each test as well as the mean of each distribution
d. all of the above
71. The ________ in generalizability theory is analogous to the reliability coefficient in classical test theory.
a. universe coefficient
*b. coefficient of generalizability
c. universe score
d. Roulin coefficient
72. According to Cronbach et al.’s generalizability theory, “facets” would include

a. the number of test items.
b. the amount of training the examiners received.
c. the purpose of administering the test.
73. The universe score in Cronbach et al.’s generalizability theory is analogous to the ________ in classical test
theory.
a. KR-20
*b. true score
c. standard deviation
d. internal-consistency estimate
74. In classical test theory, there exists only one true score. In Cronbach et al.’s generalizability theory, how
many of these true scores would exist?
a. one
b. as many as the number of times the test is administered to the individual
*c. many, depending on the number of different universes
d. b and c
75. The term coefficient of generalizability refers to

*a. how generalizable scores obtained in one situation are to other situations.
b. test-retest reliability estimates with respect to different “universes.”
c. split-half reliability estimates with respect to different “universes.”
76. The Bayley Scales for Infant Development, Second Edition (BSID-II), contains Mental, Motor, and Behavior
Rating Scales. Because these three scales are designed to measure different characteristics (that is, they
are not homogeneous), it is inappropriate to combine the three scales in computing estimates of
a. alternate-forms reliability.
*b. internal-consistency reliability.
c. test-retest reliability.
d. inter-rater reliability.
Topic: Close-Up: The reliability of the Bayley Scales of Infant Development
77. Which of the following is NOT true of the Bayley Scales for Infant Development, Second Edition (BSID-II)?
a. It takes between 30 and 60 minutes to administer.
*b. It is designed for children between 1 and 18 months of age.
c. It contains Mental, Motor, and Behavior Rating Scales.
d. It is used to identify children who are behind developmentally.
78. The fact that young children develop rapidly and in “growth spurts” is a problem primarily in estimating which
type of reliability for the Bayley Scales?
a. internal-consistency reliability
b. alternate-forms reliability
*c. test-retest reliability
d. inter-rater reliability
79. Which of the following cannot be assessed for the Bayley Scales for Infant Development, Second Edition
(BSID-II)?
a. internal-consistency reliability
*b. alternate-forms reliability
c. test-retest reliability
d. inter-rater reliability
80. The concept of reliability refers to

a. how well a test measures what it is intended to measure under specified conditions.
b. the lack of systematic errors.
*c. the proportion of total variance that can be attributed to true variance.
d. all of the above
81. If a blood pressure machine consistently overestimated everyone’s blood pressure by 10 units, which of the
following would be true of the reliability of the blood pressure machine?
a. It would increase.
b. It would decrease.
*c. It would not be affected.
d. It cannot be determined, since it would depend on the actual blood pressure of the individuals.
82. In general, which of the following is true of the relationship between the magnitude of the test-retest reliability
estimate and the length of the interval between test administrations?
*a. The longer the interval, the lower the reliability coefficient.
b. The longer the interval, the higher the reliability coefficient.
c. Whether the interval is long or short, the magnitude of the reliability coefficient is not affected.
Topic: Types of reliability estimates: Test-retest reliability estimates
83. What is the difference between alternate forms and parallel forms of a test?
*a. Alternate forms do not necessarily yield test scores with equal means and variances.
b. Alternate forms are designed to be equivalent in level of difficulty.
c. Alternate forms are different only in respect to how they are administered.
d. There are no differences between alternate and parallel forms of a test.
Topic: Types of reliability estimates: Parallel-forms and alternate-forms reliability estimates
84. Which of the following represents the major difference between estimates of alternate- or parallel-forms
reliability and test-retest reliability?
a. Two test administrations with the same group are required.
b. Test scores may be affected by factors such as the testtaker’s motivation or fatigue.
c. Test scores may be affected by practice or learning between administrations.
*d. Error variance from item sampling may result.
Topic: Types of reliability estimates: Parallel-forms and alternate-forms reliability estimates: Test-retest reliability
estimates
85. For which type of test is coefficient alpha the reliability estimate of choice?
a. tests with dichotomous items and binary scoring
b. tests with homogeneous items
*c. tests that can be scored along a continuum of values
d. tests with heterogeneous items and binary scoring
86. In which type(s) of reliability estimates would test construction not be a significant source of error variance?
*a. test-retest
b. alternate-form
c. split-half
d. It would be a significant source in all of the above types.
Topic: The purpose of the reliability coefficient
87. What impact will it have on the magnitude of the coefficent of reliability if the variance of either variable is
restricted by the sampling procedures used?
*a. It is lowered.
b. It is raised.
c. It is unaffected.
d. It is different depending on the type of test (multiple-choice, true-false, etc.).
Topic: The nature of the test: Restriction or inflation of range
88. For criterion-referenced tests, which of the following reliability estimates is (are) recommended?
a. test-retest
b. alternate-form
c. split-half
*d. none of the above
89. Which of the following is true of domain sampling theory?

a. It supports the existence of a “true score” when measuring psychological constructs.
*b. It can be used to argue against the existence of a “true score” when measuring psychological
constructs.
c. It relies on the concept of the validity of traits.
d. It seeks to estimate the portion of a test score attributable to error.
90. If a student received a score of 50 on a math test with a standard error of measurement of 3, which of the
following statements would be true of his or her performance?
a. In 68% of the cases, the “true score” would be expected to be between 44 and 56.
*b. In 68% of the cases, the “true score” would be expected to be between 47 and 53.
c. In 95% of the cases, the “true score” would be expected to be between 47 and 53.
d. In 95% of the cases, the “true score” would be expected to be between 44 and 56.
91. A psychologist administers a test and the testtaker scores a 52. If the cut-off score for eligibility for a
particular program is 50, what index will best help the psychologist determine how much confidence to place
in the testtaker’s obtained score of 52?
a. the standard error of difference
*b. the standard error of measurement
c. measures of central tendency: mean, median, or mode
d. measures of variability such as the standard deviation
92. Which of the following is true of both the standard error of measurement and the standard error of
difference?
a. Both provide confidence levels.
b. Both can be used to compute confidence intervals for short answer tests.
*c. Both can be used to compare performance between two different tests.
d. None of the above
Topic: The standard error of measurement; The standard error of the difference between two scores
93. Because of the unique problems in assessing infants, which of the following is recommended when
estimating the reliability of the Bayley Scales for Infant Development, Second Edition (BSID-II)?
*a. Use relatively short test-retest intervals.
b. Use relatively long test-retest intervals.
c. Do not use the test-retest method of estimating reliability.
d. The length of the test-retest interval is not critical when calculating the test-retest reliability of the
Bayley.
94. Which of the following is true for the internal-consistency reliability estimates on the Bayley Scales for Infant
Development, Second Edition (BSID-II)?
*a. They are calculated separately for the Mental, Motor, and Behavior Scales because each scale is
assumed to measure a homogeneous set of abilities.
b. One reliability estimate is calculated across the Mental, Motor, and Behavior Scales because each
scale does not measure distinct abilities.
c. They are not calculated, because infants’ abilities are likely to change during the testing session.
d. Estimates are provided across the Mental, Motor, and Behavior Scales and for each separate scale.
95. According to the chapter Close-Up, directions to examiners on the Bayley Scales for Infant Development,
Second Edition (BSID-II), such as “Give credit if the child holds his hands open most of the time” could result
in problems when calculating which reliability estimates?
a. test-retest
b. alternate-form
*c. inter-rater
d. all of the above
96. Test-retest reliability estimates of breathalyzers lead to

*a. a margin of error of approximately one-hundredth of a percentage point.
b. a margin of error of one percentage point.
c. a margin of error so high that it renders them too unreliable to be presented in court.
d. the conclusion that psychometrically sound test-retest studies have not yet been conducted.
Topic: Everyday Psychometrics: The Reliability Defense and the Breathalyzer Test
97. A police officer mistakenly records the blood alcohol level of a suspected drunk driver after administering a
breathalyzer test. This mistake is most related to which type of reliability?
a. test-retest
*b. inter-scorer
c. internal-consistency
d. situational
Topic: Everyday Psychometrics: The Reliability Defense and the Breathalyzer Test
98. Which of the following represents the difference between a power test and a speed test?
a. Power tests involve physical strength; speed tests do not.
*b. In a power test, the testtaker has time to complete all items; in a speed test, a specific time limit is
imposed.
c. In a power test, a broad range of knowledge is assessed; in a speed test, a narrower range of
knowledge is assessed.
d. b and c
99. The index that allows a test user to compare two people’s scores on a specific test to determine if the true
scores are likely to be different is
a. the standard error of the mean.
*b. the standard error of the difference.
c. the standard deviation.
d. the correlation coefficient.
100. Which type of reliability is directly affected by the heterogeneity of a test?

a. test-retest
b. inter-rater
*c. internal-consistency
d. alternate-forms or parallel-forms
Topic: Types of reliability estimates
101. Generalizability theory is most closely related to

a. developing norms.
b. item analysis.
*c. test reliability.
d. the way things are “in general.”
102. McAfee’s test of attention span has a reliability coefficent of .84. The average score on the test is 10, with a
standard deviation of 5. Lawrence received a score of 64 on the test. We can be 95% sure that Lawrence’s
true attention span score falls between
a. 63 and 65.
b. 62 and 66.
c. 60 and 68.
*d. 54 and 74.
103. By definition, estimates of reliability can range from _______ to _______.

a. –3.00; +3.00
b. 1; 10
*c. 0; 1
d. 0; 100
104. The Spearman-Brown adjustment to the Pearson r is used in all of the following situations except
a. when split-half reliability estimates are computed.
b. in estimates of the effect of reducing the size of a test.
c. in estimates of the effect of increasing the size of a test.
*d. when computing reliability coefficents for rank-ordered data.
Topic: Types of reliability estimates: The Spearman-Brown formula
105. What type of reliability estimate is involved if you compare children’s performance on Form A and Form B of
the Peabody Picture Vocabulary Test—Revised?
a. test-retest
*b. alternate-forms
c. inter-rater
d. internal-consistency
Topic: Types of reliability estimates: Parallel-forms and alternate-forms reliability estimates
106. What index of reliability would you use to compare two evaluators’ assessments of a group of job applicants?
a. KR-20
b. coefficient alpha
*c. the Kappa statistic
d. the Spearman-Brown correction
107. Which of the following is true of the standard error of measurement?

a. A large standard error of measurement is desirable.
b. The standard error of measurement is inversely related to the standard deviation (when one goes up,
the other goes down).
*c. The standard error of measurement is inversely related to reliability.
d. A low standard error of measurement is indicative of low validity.
108. What type of reliability estimate is obtained by correlating pairs of scores from the same persons on two
different administrations of the same test?
a. parallel-forms
b. split-half
c. inter-rater
*d. test-retest
109. A test containing 100 items is revised by deleting 20 items. What might be expected to happen to the
magnitude of the reliability estimate for that test?
a. It will increase.
*b. It will decrease.
c. It will stay the same.
d. It cannot be determined based on the information provided.
Topic: The concept of reliability: Types of reliability estimates
110. In the formula X = T + E, T refers to

*a. the true score.
b. the time factor.
c. the average test score.
d. test-retest reliability.
111. The greater the proportion of the total variance attributed to true variance, the more ____________ the test.
a. scientific
b. variable
*c. reliable
d. all of the above
112. How well a test measures a single concept is known as

a. uniformity.
b. consistency.
*c. homogeneity.
d. inter-judge reliability.
Topic: The nature of the test: Homogeneity versus heterogeneity of test items
113. A score earned by a testtaker on a psychological test is equal to which of the following?
a. the observed score
b. error
c. the true score
*d. the true score plus error
114. Which is NOT a possible source of error variance?

a. test administration
b. test scoring
c. test interpretation
*d. All are possible sources of error variance.
115. Which of the following represents a goal of a test developer?

a. to maximize error variance
b. to minimize true variance
*c. to maximize true variance
Topic: Sources of error variance: Test construction
116. Which of the following is TRUE about systematic and unsystematic error in the assessment of physical and
psychological abuse in a relationship?
a. Few sources of unsystematic error exist, due to the nature of what is being assessed.
b. Few sources of systematic error exist.
*c. Gender represents a source of systematic error.
d. Agreement exists on current methods for estimating true versus error variance.
Topic: Other sources of error
117. Approximately what percentage of scores would be expected to fall within two standard deviations above or
below the standard error of measurement of the true score?
a. 85%
b. 90%
*c. 95%
d. 99%
118. Manuel earns a 90 on a standardized math test. The standard error of measurement for this test is 5. We
would be confident that 95% of the scores fall between _____________________.
a. 85 and 95
b. 80 and 100
*c. 80 and 100
d. Cannot determine based on the information provided.

5 - Reliability

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

5 - Reliability

Hochgeladen von

Copyright:

Verfügbare Formate

Reliability

1. Reliability, in a broad statistical sense, is synonymous with

3. Which of the following is true of systematic error?

4. As the degree of reliability increases, the proportion of

6. A source of error variance may take the form of

7.Computer-scorable items have virtually eliminated error variance due to

11. Which of the following might lead to a decrease in test-retest reliability?

13. Which of the following is true for parallel forms of a test?

Topic: Test-retest reliability estimates: Split-half reliability estimates

20. Internal-consistency estimates of reliability are inappropriate for

21. The Spearman-Brown formula is used for

Topic: Split-half reliability estimates

25. Error variance for measures of inter-item consistency comes from

30. The KR-21 reliability estimate was developed

Topic: Other methods of estimating internal consistency: The Kuder-Richardson formulas34.

34. The “20” and “21” in KR-20 and KR-21 represent

35. Coefficient alpha is an expression of

36. A coefficient alpha over .9 may indicate that

38. Which is a synonym for inter-scorer reliability?

42. If a test is homogeneous

45. Typically, speed tests

51. An estimate of the reliability of a speed test is a measure of

52. Use of the Spearman-Brown formula would be inappropriate

53. Interpretations of criterion-referenced tests are typically made with respect to

60. Generalizability theory focuses on which of the following?

66. As the reliability of a test increases, the standard error of measurement

72. According to Cronbach et al.’s generalizability theory, “facets” would include

75. The term coefficient of generalizability refers to

80. The concept of reliability refers to

89. Which of the following is true of domain sampling theory?

96. Test-retest reliability estimates of breathalyzers lead to

100. Which type of reliability is directly affected by the heterogeneity of a test?

101. Generalizability theory is most closely related to

103. By definition, estimates of reliability can range from _______ to _______.

107. Which of the following is true of the standard error of measurement?

110. In the formula X = T + E, T refers to

112. How well a test measures a single concept is known as

114. Which is NOT a possible source of error variance?

115. Which of the following represents a goal of a test developer?

Das könnte Ihnen auch gefallen

103. By definition, estimates of reliability can range from _ to _.