Beruflich Dokumente
Kultur Dokumente
Hambleton & Pitoniak, 2006). First, it is common to set a policy about the composition of the panel that
will set the performance standards. Here, decisions about the demographic make-up of the panel,
such as gender, ethnicity, years of experience, geographical distribution, role (e.g., teachers,
administrators, curriculum specialists, parents), are usually considered, as well as other factors. Then a
plan is put in place to draw a representative panel to meet the specifications.
Another big decision concerns the choice of standard-setting method. There are probably 10 to 20
major methods, and large numbers of variations of each. The methods include Angoff, Ebel, Nedelsky,
contrasting groups, borderline groups, direct consensus, item cluster, booklet selection, extended
Angoff, bookmark, and more.
Prior to the meeting of the panel to set the performance standards it is common for a different panel to
prepare performance category descriptions. These descriptions lay out for the standard-setting panel
what it means to be a failing student, a basic student, and so on. The descriptions provide a basis for
the standard-setting panel to carry out its work of determining just how well candidates must perform
on the test to demonstrate basic, proficient, and advanced level performance. The descriptions are
also helpful in communicating what the expectations are for students in the performance categories,
and at the time of score reporting.
Next, the panel is brought together and the chosen method is applied to produce performance
standards. A typical panel meeting often begins with discussion of the purpose of the test and
exposure to the performance category descriptions. Having the panelists take a portion or even the
entire test is another activity that is included as part of the training. Then the method is introduced, and
practice is given prior to the panel starting on its task of setting the standards.
The meeting continues, and often two to three days are needed for the panelists to work through the
method and related discussions until a final recommended set of performance standards is produced.
Validity evidence is compiled about the process and the panelists' impressions of it, a technical
manual is often written, and then all of the information is forwarded to a board for setting the final
performance standards for the criterion-referenced test. If multiple tests are involved (e.g.,
mathematics, reading, and science tests at several grade levels), the task of making the complete set
of performance standards across subjects and grades consistent or coherent is especially
challenging.
USES
Criterion-referenced tests are used in many ways. Classroom teachers use them to monitor student
performance in their day-to-day activities. States find them useful for evaluating student performance
and generating educational accountability information at the classroom, school, district, and state levels.
The tests are based on the curricula, and the results provide a basis for determining how much is
being learned by students and how well the educational system is producing desired results.
Criterion-referenced tests are also used in training programs to assess learning. Typically pretestposttest designs with parallel forms of criterion-referenced tests are used. Finally, criterion-referenced
tests are used in the credentialing field to determine persons qualified to receive a license or
certificate. There are hundreds of credentialing agencies in the United States that are using criterionreferenced tests to make pass-fail credentialing decisions.
See also:Classroom Assessment
BIBLIOGRAPHY
Ciz ek, G. (Ed.). (2001). Setting performance standards: Concepts, methods, and perspectives.
Mahwah, NJ: Erlbaum. Glaser, R. (1963). Instructional technology and the measurement of learning
outcomes. American Psychologist, 18, 519521.
Hambleton, R. K., & Pitoniak, M. (2006). Setting performance standards. In R. L. Brennan (Ed.),
Educational measurement (pp. 4 334 70). Westport, CT: American Council on Education.
Hambleton, R. K., & Zenisky, A. (2003). Issues and practices of performance assessment. In C.
Reynolds and R. Kampaus (Eds.), Handbook of psychological and educational assessment of children
(pp. 3774 04 ). New York: Guilford.
Livingston, S., & Lewis, C. (1995). Estimating the consistency and accuracy of classifications based on
test scores. Journal of Educational Measurement, 32, 179180.
Popham, W. J., & Husek, T. R. (1969). Implications of criterion-referenced measurement. Journal of
Educational Measurement, 6, 19.
Copyright 2003-2009 The Gale Group, Inc. All rights reserved.