Sie sind auf Seite 1von 10

A norm-referenced test (NRT) is a type of test, assessment, or evaluation which yields an

estimate of the position of the tested individual in a predefined population, with respect to the
trait being measured. The estimate is derived from the analysis of test scores and possibly other
relevant data from a sample drawn from the population.[1] That is, this type of test identifies
whether the test taker performed better or worse than other test takers, not whether the test taker
knows either more or less material than is necessary for a given purpose.

The term normative assessment refers to the process of comparing one test-taker to his or her
peers.[1]

Norm-referenced assessment can be contrasted with criterion-referenced assessment and ipsative


assessment. In a criterion-referenced assessment, the score shows whether or not test takers
performed well or poorly on a given task, not how that compares to other test takers; in an
ipsative system, test takers are compared to previous performance.

Norm-Referenced Test
LAST UPDATED: 07.22.15

Norm-referenced refers to standardized tests that are designed to compare and rank test takers in
relation to one another. Norm-referenced tests report whether test takers performed better or
worse than a hypothetical average student, which is determined by comparing scores against the
performance results of a statistically selected group of test takers, typically of the same age or
grade level, who have already taken the exam.

Calculating norm-referenced scores is called the “norming process,” and the comparison group is
known as the “norming group.” Norming groups typically comprise only a small subset of
previous test takers, not all or even most previous test takers. Test developers use a variety of
statistical methods to select norming groups, interpret raw scores, and determine performance
levels.

Norm-referenced scores are generally reported as a percentage or percentile ranking. For


example, a student who scores in the seventieth percentile performed as well or better than
seventy percent of other test takers of the same age or grade level, and thirty percent of students
performed better (as determined by norming-group scores).

Norm-referenced tests often use a multiple-choice format, though some include open-ended,
short-answer questions. They are usually based on some form of national standards, not locally
determined standards or curricula. IQ tests are among the most well-known norm-referenced
tests, as are developmental-screening tests, which are used to identify learning disabilities in
young children or determine eligibility for special-education services. A few major norm-
referenced tests include the California Achievement Test, Iowa Test of Basic Skills, Stanford
Achievement Test, and TerraNova.
The following are a few representative examples of how norm-referenced tests and scores may
be used:

 To determine a young child’s readiness for preschool or kindergarten. These tests may be
designed to measure oral-language ability, visual-motor skills, and cognitive and social
development.
 To evaluate basic reading, writing, and math skills. Test results may be used for a wide
variety of purposes, such as measuring academic progress, making course assignments,
determining readiness for grade promotion, or identifying the need for additional
academic support.
 To identify specific learning disabilities, such as autism, dyslexia, or nonverbal learning
disability, or to determine eligibility for special-education services.
 To make program-eligibility or college-admissions decisions (in these cases, norm-
referenced scores are generally evaluated alongside other information about a student).
Scores on SAT or ACT exams are a common example.

Norm-Referenced vs. Criterion-Referenced Tests

Norm-referenced tests are specifically designed to rank test takers on a “bell curve,” or a
distribution of scores that resembles, when graphed, the outline of a bell—i.e., a small
percentage of students performing well, most performing average, and a small percentage
performing poorly. To produce a bell curve each time, test questions are carefully designed to
accentuate performance differences among test takers, not to determine if students have achieved
specified learning standards, learned certain material, or acquired specific skills and knowledge.
Tests that measure performance against a fixed set of standards or criteria are called criterion-
referenced tests.

Criterion-referenced test results are often based on the number of correct answers provided by
students, and scores might be expressed as a percentage of the total possible number of correct
answers. On a norm-referenced exam, however, the score would reflect how many more or fewer
correct answers a student gave in comparison to other students. Hypothetically, if all the students
who took a norm-referenced test performed poorly, the least-poor results would rank students in
the highest percentile. Similarly, if all students performed extraordinarily well, the least-strong
performance would rank students in the lowest percentile.

It should be noted that norm-referenced tests cannot measure the learning achievement or
progress of an entire group of students, but only the relative performance of individuals within a
group. For this reason, criterion-referenced tests are used to measure whole-group performance.

Reform

Norm-referenced tests have historically been used to make distinctions among students, often for
the purposes of course placement, program eligibility, or school admissions. Yet because norm-
referenced tests are designed to rank student performance on a relative scale—i.e., in relation to
the performance of other students—norm-referenced testing has been abandoned by many
schools and states in favor of criterion-referenced tests, which measure student performance in
relation to common set of fixed criteria or standards.

It should be noted that norm-referenced tests are typically not the form of standardized test
widely used to comply with state or federal policies—such as the No Child Left Behind Act—
that are intended to measure school performance, close “achievement gaps,” or hold schools
accountable for improving student learning results. In most cases, criterion-referenced tests are
used for these purposes because the goal is to determine whether schools are successfully
teaching students what they are expected to learn.

Similarly, the assessments being developed to measure student achievement of the Common
Core State Standards are also criterion-referenced exams. However, some test developers
promote their norm-referenced exams—for example, the TerraNova Common Core—as a way
for teachers to “benchmark” learning progress and determine if students are on track to perform
well on Common Core–based assessments.

Debate

While norm-referenced tests are not the focus of ongoing national debates about “high-stakes
testing,” they are nonetheless the object of much debate. The essential disagreement is between
those who view norm-referenced tests as objective, valid, and fair measures of student
performance, and those who believe that relying on relative performance results is inaccurate,
unhelpful, and unfair, especially when making important educational decisions for students.
While part of the debate centers on whether or not it is ethically appropriate, or even
educationally useful, to evaluate individual student learning in relation to other students (rather
than evaluating individual performance in relation to fixed and known criteria), much of
the debate is also focused on whether there is a general overreliance on standardized-test scores
in the United States, and whether a single test, no matter what its design, should be used—in
exclusion of other measures—to evaluate school or student performance.

It should be noted that perceived performance on a standardized test can potentially be


manipulated, regardless of whether a test is norm-referenced or criterion-referenced. For
example, if a large number of students are performing poorly on a test, the performance criteria
—i.e., the bar for what is considered “passing” or “proficient”—could be lowered to “improve”
perceived performance, even if students are not learning more or performing better than past test
takers. For example, if a standardized test administered in eleventh grade uses proficiency
standards that are considered to be equivalent to eighth-grade learning expectations, it will
appear that students are performing well, when in fact the test has not measured learning
achievement at a level appropriate to their age or grade. For this reason, it is important to
investigate the criteria used to determine “proficiency” on any given test—and particularly when
a test is considered “high stakes,” since there is greater motivation to manipulate perceived test
performance when results are tied to sanctions, funding reductions, public embarrassment, or
other negative consequences.

The following are representative of the kinds of arguments typically made by proponents of
norm-referenced testing:
 Norm-referenced tests are relatively inexpensive to develop, simple to administer, and
easy to score. As long as the results are used alongside other measures of performance,
they can provide valuable information about student learning.
 The quality of norm-referenced tests is usually high because they are developed by
testing experts, piloted, and revised before they are used with students, and they are
dependable and stable for what they are designed to measure.
 Norm-referenced tests can help differentiate students and identify those who may have
specific educational needs or deficits that require specialized assistance or learning
environments.
 The tests are an objective evaluation method that can decrease bias or favoritism when
making educational decisions. If there are limited places in a gifted and talented program,
for example, one transparent way to make the decision is to give every student the same
test and allow the highest-scoring students to gain entry.

The following are representative of the kinds of arguments typically made by critics of norm-
referenced testing:

 Although testing experts and test developers warn that major educational decisions
should not be made on the basis of a single test score, norm-referenced scores are often
misused in schools when making critical educational decisions, such as grade promotion
or retention, which can have potentially harmful consequences for some students and
student groups.
 Norm-referenced tests encourage teachers to view students in terms of a bell curve, which
can lead them to lower academic expectations for certain groups of students, particularly
special-needs students, English-language learners, or minority groups. And when
academic expectations are consistently lowered year after year, students in these groups
may never catch up to their peers, creating a self-fulfilling prophecy. For a related
discussion, see high expectations.
 Multiple-choice tests—the dominant norm-referenced format—are better suited to
measuring remembered facts than more complex forms of thinking. Consequently, norm-
referenced tests promote rote learning and memorization in schools over more
sophisticated cognitive skills, such as writing, critical reading, analytical thinking,
problem solving, or creativity.
 Overreliance on norm-referenced test results can lead to inadvertent discrimination
against minority groups and low-income student populations, both of which tend to face
more educational obstacles that non-minority students from higher-income households.
For example, many educators have argued that the overuse of norm-referenced testing has
resulted in a significant overrepresentation of minority students in special-education
programs. On the other hand, using norm-referenced scores to determine placement in
gifted and talented programs, or other “enriched” learning opportunities, leads to the
underrepresentation of minority and lower-income students in these programs. Similarly,
students from higher-income households may have an unfair advantage in the college-
admissions process because they can afford expensive test-preparation services.
 An overreliance on norm-referenced test scores undervalues important achievements, skills, and
abilities in favor of the more narrow set of skills measured by the tests.

Norm­ and Criterion­Referenced Testing.
Linda A. Bond 
North Central Regional Educational Laboratory 

Tests can be categorized into two major groups: norm­referenced tests and 
criterion­referenced tests. These two tests differ in their intended purposes, the 
way in which content is selected, and the scoring process which defines how the 
test results must be interpreted. This brief paper will describe the differences 
between these two types of assessments and explain the most appropriate uses of
each. 

INTENDED PURPOSES 

The major reason for using a norm­referenced tests (NRT) is to classify students. 
NRTs are designed to highlight achievement differences between and among 
students to produce a dependable rank order of students across a continuum of 
achievement from high achievers to low achievers (Stiggins, 1994). School 
systems might want to classify students in this way so that they can be properly 
placed in remedial or gifted programs. These types of tests are also used to help 
teachers select students for different ability level reading or mathematics 
instructional groups. 

With norm­referenced tests, a representative group of students is given the test 
prior to its availability to the public. The scores of the students who take the test 
after publication are then compared to those of the norm group. Tests such as the
California Achievement Test (CTB/McGraw­Hill), the Iowa Test of Basic Skills 
(Riverside), and the Metropolitan Achievement Test (Psychological Corporation) 
are normed using a national sample of students. Because norming a test is such 
an elaborate and expensive process, the norms are typically used by test 
publishers for 7 years. All students who take the test during that seven year 
period have their scores compared to the original norm group. 

While norm­referenced tests ascertains the rank of students, criterion­referenced
tests (CRTs) determine "...what test takers can do and what they know, not how 
they compare to others (Anastasi, 1988, p. 102). CRTs report how well students 
are doing relative to a pre­determined performance level on a specified set of 
educational goals or outcomes included in the school, district, or state 
curriculum. 

Educators or policy makers may choose to use a CRT when they wish to see how 
well students have learned the knowledge and skills which they are expected to 
have mastered. This information may be used as one piece of information to 
determine how well the student is learning the desired curriculum and how well 
the school is teaching that curriculum. 

Both NRTs and CRTs can be standardized. The U.S. Congress, Office of 
Technology Assessment (1992) defines a standardized test as one that uses 
uniform procedures for administration and scoring in order to assure that the 
results from different people are comparable. Any kind of test­­from multiple 
choice to essays to oral examinations­­can be standardized if uniform scoring and
administration are used (p. 165). This means that the comparison of student 
scores is possible. Thus, it can be assumed that two students who receive the 
identical scores on the same standardized test demonstrate corresponding levels 
of performance. Most national, state and district tests are standardized so that 
every score can be interpreted in a uniform manner for all students and schools. 

SELECTION OF TEST CONTENT 

Test content is an important factor choosing between an NRT test and a CRT 
test. The content of an NRT test is selected according to how well it ranks 
students from high achievers to low. The content of a CRT test is determined by 
how well it matches the learning outcomes deemed most important. Although no 
test can measure everything of importance, the content selected for the CRT is 
selected on the basis of its significance in the curriculum while that of the NRT is
chosen by how well it discriminates among students. 

Any national, state or district test communicates to the public the skills that 
students should have acquired as well as the levels of student performance that 
are considered satisfactory. Therefore, education officials at any level should 
carefully consider content of the test which is selected or developed. Because of 
the importance placed upon high scores, the content of a standardized test can be
very influential in the development of a school's curriculum and standards of 
excellence. 

NRTs have come under attack recently because they traditionally have 
purportedly focused on low level, basic skills. This emphasis is in direct contrast 
to the recommendations made by the latest research on teaching and learning 
which calls for educators to stress the acquisition of conceptual understanding as
well as the application of skills. The National Council of Teachers of 
Mathematics (NCTM) has been particularly vocal about this concern. In an 
NCTM publication (1991), Romberg (1989) cited that "a recent study of the six 
most commonly used commercial achievement tests found that at grade 8, on 
average, only 1 percent of the items were problem solving while 77 percent were 
computation or estimation" (p. 8). 

In order to best prepare their students for the standardized achievement tests, 
teachers usually devote much time to teaching the information which is found on 
the standardized tests. This is particularly true if the standardized tests are also 
used to measure an educator's teaching ability. The result of this pressure placed
upon teachers for their students to perform well on these tests has resulted in an
emphasis on low level skills in the classroom (Corbett & Wilson, 1991). With 
curriculum specialists and educational policy makers alike calling for more 
attention to higher level skills, these tests may be driving classroom practice in 
the opposite direction of educational reform. 

TEST INTERPRETATION 

As mentioned earlier, a student's performance on an NRT is interpreted in 
relation to the performance of a large group of similar students who took the test 
when it was first normed. For example, if a student receives a percentile rank 
score on the total test of 34, this means that he or she performed as well or better
than 34% of the students in the norm group. This type of information can useful 
for deciding whether or not students need remedial assistance or is a candidate 
for a gifted program. However, the score gives little information about what the 
student actually knows or can do. The validity of the score in these decision 
processes depends on whether or not the content of the NRT matches the 
knowledge and skills expected of the students in that particular school system. 

It is easier to ensure the match to expected skills with a CRT. CRTs give detailed
information about how well a student has performed on each of the educational 
goals or outcomes included on that test. For instance, "... a CRT score might 
describe which arithmetic operations a student can perform or the level of 
reading difficulty he or she can comprehend" (U.S. Congress, OTA, 1992, p. 170). 
As long as the content of the test matches the content that is considered 
important to learn, the CRT gives the student, the teacher, and the parent more 
information about how much of the valued content has been learned than an 
NRT. 
SUMMARY 

Public demands for accountability, and consequently for high standardized tests 
scores, are not going to disappear. In 1994, thirty­one states administered NRTs, 
while thirty­three states administered CRTs. Among these states, twenty­two 
administered both. Only two states rely on NRTs exclusively, while one state 
relies exclusively on a CRT. Acknowledging the recommendations for educational
reform and the popularity of standardized tests, some states are designing tests 
that "reflect, insofar as possible, what we believe to be appropriate educational 
practice" (NCTM, 1991, p.9). In addition to this, most states also administer 
other forms of assessment such as a writing sample, some form of open­ended 
performance assessment or a portfolio (CCSSO/NCREL, 1994). 

Before a state can choose what type of standardized test to use, the state 
education officials will have to consider if that test meets three standards. These 
criteria are whether the assessment strategy(ies) of a particular test matches the
state's educational goals, addresses the content the state wishes to assess, and 
allows the kinds of interpretations state education officials wish to make about 
student performance. Once they have determined these three things, the task of 
choosing between the NRT and CRT will becomes easier. 

REFERENCES 

Anastasi, A. (1988). Psychological Testing. New York, New York: MacMillan 
Publishing Company. 

Corbett, H.D. & Wilson, B.L. (1991). Testing, Reform and Rebellion. Norwood, 
New Jersey: Ablex Publishing Company. 

Romberg, T.A., Wilson, L. & Mamphono Khaketla (1991). "The Alignment of Six 
Standardized Tests with NCTM Standards", an unpublished paper, University of
Wisconsin­Madison. In Jean Kerr Stenmark (ed; 1991). Mathematics 
Assessment: Myths, Models, Good Questions, and Practical Suggestions. The 
National Council of Teachers of Mathematics (NCTM) 

Stenmark, J.K (ed; 1991). Mathematics Assessment: Myths, Models, Good 
Questions, and Practical Suggestions. Edited by. Reston, Virginia: The National 
Council of Teachers of Mathematics (NCTM) 

Stiggins, R.J. (1994). Student­Centered Classroom Assessment. New York: 
Merrill 

U.S. Congress, Office of Technology Assessment (1992). Testing in America's 
Schools: Asking the Right Questions. OTA­SET­519 (Washington, D.C.: U.S. 
Government Printing Office) 
Descriptors: *Achievement Tests; *Criterion Referenced Tests; Elementary Secondary Education; National
Norms; *Norm Referenced Tests; Selection; *Standardized Tests; *State Programs; Test Content; Test Norms;
*Test Use; Testing Programs

Advantages and limitations


The primary advantage of norm-reference tests is that they can provide information on how an
individual's performance on the test compares to others in the reference group.

A serious limitation of norm-reference tests is that the reference group may not represent the
current population of interest. As noted by the Oregon Research Institute's International
Personality Item Pool website, "One should be very wary of using canned "norms" because it
isn't obvious that one could ever find a population of which one's present sample is a
representative subset. Most "norms" are misleading, and therefore they should not be used. Far
more defensible are local norms, which one develops oneself. For example, if one wants to give
feedback to members of a class of students, one should relate the score of each individual to the
means and standard deviations derived from the class itself. To maximize informativeness, one
can provide the students with the frequency distribution for each scale, based on these local
norms, and the individuals can then find (and circle) their own scores on these relevant
distributions." [8]

Norm-referencing does not ensure that a test is valid (i.e. that it measures the construct it is
intended to measure).

Another disadvantage of norm-referenced tests is that they cannot measure progress of the
population as a whole, only where individuals fall within the whole. Rather, one must measure
against a fixed goal, for instance, to measure the success of an educational reform program that
seeks to raise the achievement of all students.

With a norm-referenced test, grade level was traditionally set at the level set by the middle 50
percent of scores.[9] By contrast, the National Children's Reading Foundation believes that it is
essential to assure that virtually all children read at or above grade level by third grade, a goal
which cannot be achieved with a norm-referenced definition of grade level.[10]

Norms do not automatically imply a standard. A norm-referenced test does not seek to enforce
any expectation of what test takers should know or be able to do. It measures the test takers'
current level by comparing the test takers to their peers. A rank-based system produces only data
that tell which students perform at an average level, which students do better, and which students
do worse. It does not identify which test takers are able to correctly perform the tasks at a level
that would be acceptable for employment or further education.

Das könnte Ihnen auch gefallen