Sie sind auf Seite 1von 13

cPage

Objectives


A. Formative vs. Summative Assessments
B. Setting Targets and Writing Objectives
C. Reliability and Validity

O 
  
O

 

Classroom assessments can include a wide range of options -- from recording anecdotal notes while
observing a student to administering standardized tests. The options can be roughly divided into two
categories -- formative assessments and summative assessments.

` s  
 are on-going assessments, reviews, and observations in a classroom. Teachers use
formative assessment to improve instructional methods and student feedback throughout the teaching and
learning process. For example, if a teacher observes that some students do not grasp a concept, she or he
can design a review activity or use a different instructional strategy. Likewise, students can monitor their
progress with periodic quizzes and performance tasks. The results of formative assessments are used to
modify and validate instruction.

  
 are typically used to evaluate the effectiveness of instructional programs and
services at the end of an academic year or at a pre-determined time. The goal of summative assessments is
to make a judgment of student competency after an instructional phase is complete. For example, in
Florida, the FCAT is administered once a year -- it is a summative assessment to determine each student's
ability at pre-determined points in time. Summative evaluations are used to determine if students have
mastered specific competencies and to identify instructional areas that need additional attention.

The following table highlights some formative and summative assessments that are common in K12
schools.

`     
   
Anecdotal records Final exams
Quizzes and essays Statewide tests (FCAT)
Diagnostic tests National tests
Lab reports Entrance exams (SAT and ACT)

  

1.‘ Reflect on these issues:

ë‘ Which of the formative assessment strategies do you implement in your classroom?


uPage

ë‘ Black and William (1998) recommend: "Frequent short tests are better than infrequent long ones."
Do you agree? Why or why not?

Objectives


A. Bloom's Taxonomy
B. Writing Selected Response Assessment Items
C. Item Analysis

Assignments

? 



1.‘ Correlate objectives and assessment items to Bloom's Taxonomy.


2.‘ Write selected response assessment items that adhere to established guidelines.
3.‘ Conduct an item analysis; indicating the difficulty index and discrimination index of items.

?
 


Selected response assessment items (also referred to as objective assessments) include options such as
multiple choice, matching, and true/false questions. These question types can be very effective and
efficient methods for measuring students¶ knowledge and reasoning. Because many of the standardized
tests are based heavily on multiple choice questions, teachers should be skilled at developing effective
objective assessment items. In addition, teachers should be able to construct quizzes that target higher
level thinking skills (consistent with the application, analysis, and synthesis levels of Bloom¶s taxonomy),
and they should evaluate their instruments by conducting item analyses.

O  

Questions (items) on quizzes and exams can demand different levels of thinking skills. For example,
some questions might be simple memorization of facts, and others might require the ability to synthesize
information from several sources to select or construct a response. Benjamin Bloom created a hierarchy of
cognitive skills (called Bloom's taxonomy) that is often used to categorize the levels of cognitive
involvement (thinking skills) in educational settings. The taxonomy provides a good structure to assist
teachers in writing objectives and assessments. It can be divided into two levels -- Level I (the lower level)
contains knowledge, comprehension and application; Level II (the higher level) includes application,
analysis, synthesis, and evaluation (see the diagram below).
·Page

Figure 1. Bloom's Taxonomy.

Bloom's taxonomy is also used to guide the development of standardized assessments. For example, in
Florida, about 65% of the questions on the statewide reading test (FCAT) are designed to measure Level II
thinking skills (application, analysis, synthesis, and evaluation). To prepare students for these standardized
tests, classroom assessments must also demand both Level I and II thinking skills. Integrating higher level
skills into instruction and assessment increases the likelihood that students will succeed on tests and
become better problem solvers.

Sometimes objective tests (such as multiple choice) are criticized because the questions emphasize only
lower-level thinking skills (such as knowledge and comprehension). However, it is possible to address
higher level thinking skills via objective assessments by including items that focus on genuine
understanding -- "how" and "why" questions. Multiple choice items that involve scenarios, case studies,
and analogies are also effective for requiring students to apply, analyze, synthesize, and evaluate
information

  

1.‘ Read Assessment Activities and Bloom's Taxonomy -- note that the assessment types are
correlated to Bloom¶s Taxonomy.
2.‘ Using your content area and grade level, write an assessment item at each level of Bloom's
taxonomy.
ëPage

 




O


 

Selected response (objective) assessment items are very efficient ± once the items are created, you can
assess and score a great deal of content rather quickly. Note that the term objective refers to the fact that
each question has a right and wrong answer and that they can be impartially scored. In fact, the scoring
can be automated if you have access to an optical scanner for scoring paper tests or a computer for
computerized tests. However, the construction of these ³objective´ items might well include subjective
input by the teacher/creator.

Before you write the assessment items, you should create a blueprint that outlines the content areas and
the cognitive skills you are targeting. One way to do this is to list your instructional objectives, along with
the corresponding cognitive level. For example, the following table has four different objectives and the
corresponding levels of assessment (relative to Bloom's taxonomy). For each objective, five assessment
items will be written, some at Level I and some at Level II. This approach helps to ensure that all
objectives are covered and that several higher level thinking skills are included in the assessment.

     

     
  
         
1 2 3
2 3 2
3 1 4
4 4 1

After you have determined how many items you need for each level, you can begin writing the
assessments. There are several forms of selected response assessments, including multiple choice,
matching, and true/false. Regardless of the form you select, be sure the items are clearly worded at the
appropriate reading level and do not include unintentional clues. The validity of your test will suffer
tremendously if the students can¶t comprehend or read the questions! This section includes a few
guidelines for constructing objective assessment items, along with examples and non-examples.


    

Multiple choice questions consist of a stem (question or statement) with several answer choices
(distractors). For each of the following guidelines, click the buttons to view an Example or Non-Example.

ë‘ All answer choices should be plausible and homogeneous.


@‘ Example
@‘ Non-Example
VPage

ë‘ Answer choices should be similar in length and grammatical form.


@‘ Example
@‘ Non-Example
ë‘ List answer choices in logical (alphabetical or numerical) order.
@‘ Example
@‘ Non-Example
ë‘ Avoid using "All of the Above" options.
@‘ Example
@‘ Non-Example

 

Matching items consist of two lists of words, phrases, or images (often referred to as stems and
responses). Students review the list of stems and match each with a word, phrase, or image from the list of
responses. For each of the following guidelines, click the buttons to view an Example or Non-Example.

ë‘ Answer choices should be short, homogeneous and arranged in logical order.


@‘ Example
@‘ Non-Example
ë‘ Responses should be plausible and similar in length and grammatical form.
@‘ Example
@‘ Non-Example
ë‘ Include more response options than stems.
@‘ Example
@‘ Non-Example
ë‘ As a general rule, the stems should be longer and the responses should be shorter.
@‘ Example
@‘ Non-Example


` 

True/false questions can appear to be easier to write; however, it is difficult to write effective true/false
questions. Also, the reliability of T/F questions is not generally very high because of the high possibility
of guessing. In most cases, T/F questions are not recommended.

ë‘ Statements should be completely true or completely false.


@‘ Example
@‘ Non-Example
ë‘ Use simple, easy-to-follow statements.
@‘ Example
@‘ Non-Example
ë‘ Avoid using negatives -- especially double negatives.
@‘ Example
@‘ Non-Example
ë‘ Avoid absolutes such as "always; never."
@‘ Example
@‘ Non-Example
]Page

  

Review the information at the following site:

ë‘ Improving Your Test Questions

For your subject area and grade level, write two multiple choice questions, two matching questions, and
two true/false questions. Try to address multiple levels of Bloom's Taxonomy.

C 
O  

After you create your objective assessment items and give your test, how can you be sure that the items
are appropriate -- not too difficult and not too easy? How will you know if the test effectively
differentiates between students who do well on the overall test and those who do not? An item analysis is
a valuable, yet relatively easy, procedure that teachers can use to answer both of these questions.

To determine the difficulty level of test items, a measure called the Ñ  
 is used. This measure
asks teachers to calculate the proportion of students who answered the test item accurately. By looking at
each alternative (for multiple choice), we can also find out if there are answer choices that should be
replaced. For example, let's say you gave a multiple choice quiz and there were four answer choices (A, B,
C, and D). The following table illustrates how many students selected each answer choice for Question #1
and #2.

Œ
      
#1 0 3 24* 3
#2 12* 13 3 2

* Denotes correct answer.

For Question #1, we can see that A was not a very good distractor -- no one selected that answer. We can
also compute the difficulty of the item by dividing the number of students who choose the correct answer
(24) by the number of total students (30). Using this formula, the difficulty of Question #1 (referred to as
p) is equal to 24/30 or .80. A rough "rule-of-thumb" is that if the item difficulty is more than .75, it is an
easy item; if the difficulty is below .25, it is a difficult item. Given these parameters, this item could be
regarded moderately easy -- lots (80%) of students got it correct. In contrast, Question #2 is much more
difficult (12/30 = .40). In fact, on Question #2, more students selected an incorrect answer (B) than
selected the correct answer (A). This item should be carefully analyzed to ensure that B is an appropriate
distractor.

Another measure, the Ñ 




, refers to how well an assessment differentiates between high
and low scorers. In other words, you should be able to expect that the high-performing students would
select the correct answer for each question more often than the low-performing students. If this is true,
then the assessment is said to have a   


 (between 0 and 1) -- indicating that
students who received a high total score chose the correct answer for a specific item more often than the
oPage

students who had a lower overall score. If, however, you find that more of the low-performing students
got a specific item correct, then the item has a
 


 (between -1 and 0). Let's look
at an example.

Table 2 displays the results of ten questions on a quiz. Note that the students are arranged with the top
overall scorers at the top of the table.

  Œ
  


   1 2 3
Asif 90 1 0 1
Sam 90 1 0 1
Jill 80 0 0 1
Charlie 80 1 0 1
Sonya 70 1 0 1
Ruben 60 1 0 0
Clay 60 1 0 1
Kelley 50 1 1 0
Justin 50 1 1 0
Tonya 40 0 1 0

"1" indicates the answer was correct; "0" indicates it was incorrect.

Follow these steps to determine the Difficulty Index and the Discrimination Index.

1.‘ After the students are arranged with the highest overall scores at the top, count the number of
students in the upper and lower group who got each item correct. For Question #1, there were 4
students in the top half who got it correct, and 4 students in the bottom half.
2.‘ Determine the Difficulty Index by dividing the number who got it correct by the total number of
students. For Question #1, this would be 8/10 or p=.80.
3.‘ Determine the Discrimination Index by subtracting the number of students in the lower group who
got the item correct from the number of students in the upper group who got the item correct.
Then, divide by the number of students in each group (in this case, there are five in each group).
‰Page

For Question #1, that means you would subtract 4 from 4, and divide by 5, which results in a
Discrimination Index of 0.
4.‘ The answers for Questions 1-3 are provided in Table 2.

      


       
  
     
Question 1 4 4 .80 0
Question 2 0 3 .30 -0.6
Question 3 5 1 .60 0.8

Now that we have the table filled in, what does it mean? We can see that Question #2 had a difficulty
index of .30 (meaning it was quite difficult), and it also had a negative discrimination index of -0.6
(meaning that the low-performing students were more likely to get this item correct). This question
should be carefully analyzed, and probably deleted or changed. Our "best" overall question is Question 3,
which had a moderate difficulty level (.60), and discriminated extremely well (0.8).

Another consideration for an item analysis is the cognitive level that is being assessed. For example, you
might categorize the questions based on Bloom's taxonomy (perhaps grouping questions that address
Level I and those that address Level II). In this manner, you would be able to determine if the difficulty
index and discrimination index of those groups of questions are appropriate. For example, you might note
that the majority of the questions that demand higher levels of thinking skills are too difficult or do not
discriminate well. You could then concentrate on improving those questions and focus your instructional
strategies on higher-level skills.

  

Now it's your turn. Try the same procedures using the data in the Item Analysis Worksheet. When you are
finished, you can check your answers by looking at the Answer Sheet.

ë‘ Item Analysis Worksheet


ë‘ Item Analysis Answer Sheet
rPage

Objectives


A. Fill-in-the-Blank Items
B. Essay Questions
C. Scoring Options

? 



1.‘ Write effective short answer (fill-in-the-blank) assessment items.


2.‘ Write appropriate essay questions -- both short and extended response.
3.‘ Outline scoring techniques for constructed response assessments.
4.‘ Create a quiz using an online program.

?
 


With  
 assessment items, the answer is visible, and the student needs only to recognize it.
Although selective response items  address the higher levels of Bloom's taxonomy, many of them
demand only lower levels of cognition. With 
  
assessments (also referred to as
subjective assessments), the answer is not visible -- the student must recall or construct it. Constructed
response assessments are conducive to higher level thinking skills.

In the broadest sense, constructed response assessments could refer to almost anything other than
objective quizzes, including essays, art projects, and personal communication. For the purposes of this
lesson, constructed response assessments will focus on written assessments-- short answer items and
essays. Performance assessment and classroom interactions are discussed in other lessons.

O 

 

The simplest forms of constructed response questions are fill-in-the-blank or short answer questions. For
example, the question may take one of the following forms:

1.‘ Who was the 16th president of the United States?


2.‘ The 16th president of the United States was ___________________.

These assessments are relatively easy to construct, yet they have the potential to test recall, rather than
simply recognition. They also control for guessing, which can be a major factor, especially for T/F or
multiple choice questions.

When creating short answer items, make sure the question is clear and there is a single, correct answer.
Here are a few guidelines, along with examples and non-examples

ë‘ Ask a direct question that has a definitive answer.


@‘ Example
c  P a g e

@‘ Non-Example
ë‘ If using fill-in-the blank, use only one blank per item.
@‘ Example
@‘ Non-Example
ë‘ If using fill-in-the blank, place the blank near the end of the sentence.
@‘ Example
@‘ Non-Example

Although constructed response assessments can more easily demand higher levels of thinking, they are
more difficult to score. For example, scantrons (optical grade scanners) cannot score this type of
assessment, and computer-based scoring is difficult because you must include all synonyms and
acceptable answers. For example, all of the following might be acceptable answers to the sample
question: "Who was the 16th president of the United States?" Abraham Lincoln; Abe Lincoln; Lincoln;
President Lincoln; Honest Abe; the Railsplitter. You might also want to accept common misspellings such
as Abrahem or Lencoln (depending on the objective).

  

Creating an interactive quiz on the web used to involve sophisticated programming. Now, it is easy to
create an online quiz because there are websites that provide the tools. For example, at most of the
websites listed below, all you have to do is fill in the question, fill in the possible answers, mark the
correct answer, and provide the feedback. Then, the test is automatically generated for you and your
students. In some cases, the quiz is stored on the remote site, and your students can access it from school,
home, or anywhere they have an Internet connection. Some of these online test generators also allow the
tests to be printed and used in the traditional hard copy format.

Try out one of the sites below. Find or create a quiz for your class that includes short answer or fill-in-
the-blank items.

ë‘ EasyTestMaker -- Easy Test Maker is a FREE online test generator to help you create your tests.
With Easy Test Maker you can create multiple-choice, fill-in-the-blank, matching, short answer
and true and false questions and print out the test.
ë‘ Test Maker - Test Maker is one of the resources at the Big Bus site. It's free and it allows you to
create free-standing, self-marking tests or quizzes which can be saved to disc and used offline as
required.
ë‘ Quiz Center - Quiz Center is a powerful tool you can use to create, administer, and grade quizzes
online. It is free and does not require any Web development experience!

  
  



!

Essay questions are a more complex version of constructed response assessments. With essay questions,
there is one general question or proposition, and the student is asked to respond in writing. This type of
assessment is very powerful -- it allows the students to express themselves and demonstrate their
reasoning related to a topic. Essay questions often demand the use of higher level thinking skills, such as
analysis, synthesis, and evaluation.

Essay questions may appear to be easier to write than multiple choice and other question types, but writing
effective essay questions requires a great deal of thought and planning. If an essay question is vague, it
cc  P a g e

will be much more difficult for the students to answer and much more difficult for the instructor to score.
Well-written essay questions have the following features:

ë‘ They specify how the students should respond.


@‘ Example
@‘ Non-Example
ë‘ They provide information about the value/weight of the question and how it will be scored.
@‘ Example
@‘ Non-Example
ë‘ They emphasize higher-level thinking skills.
@‘ Example
@‘ Non-Example

Essay questions are used both as formative assessments (in classrooms) and summative assessments (on
standardized tests). There are 2 major categories of essay questions --   
(also referred to as
  or  ) and 
 
.

u  

Short response questions are more focused and constrained than extended response questions. For
example, a short response might ask a student to "write an example," "list three reasons," or "compare and
contrast two techniques." The short response items on the Florida assessment (FCAT) are designed to take
about 5 minutes to complete and the student is allowed up to 8 lines for each answer. The short responses
are scored using a 2-point scoring rubric. A complete and correct answer is worth 2 points. A partial
answer is worth 1 point.

        Œ
  
c  

How are the scrub jay and the mockingbird different? Support your answer with details and information
from the article.

ÿ 

 

Extended responses can be much longer and complex then short responses, but students should be
encouraged to remain focused and organized. On the FCAT, students have 14 lines for each answer to an
extended response item, and they are advised to allow approximately 10-15 minutes to complete each
item. The FCAT extended responses are scored using a 4-point scoring rubric. A complete and correct
answer is worth 4 points. A partial answer is worth 1, 2, or 3 points.

       Œ
  
V    

Robert is designing a demonstration to display at his school¶s science fair. He will show how changing the
position of a fulcrum on a lever changes the amount of force needed to lift an object. To do this, Robert
will use a piece of wood for a lever and a block of wood to act as a fulcrum. He plans to move the fulcrum
cu  P a g e

to different places on the lever to see how its placement affects the force needed to lift an object.

  Identify at least two other actions that would make Robert¶s demonstration better.

  Explain why each action would improve the demonstration.

  

1.‘ Review the recommendations in Constructed Response: Connecting Performance and Assessment.
2.‘ Select a topic that is relevant to your classroom and write two constructed response items. For each
item, note the alignment to standards, student prompt, scoring guide, and example of a correct
response.

C   
 

Although essay questions are powerful assessment tools, they can be difficult to score. With essays, there
isn't a single, correct answer and it is almost impossible to use an automatic scantron or computer-based
system. In order to minimize the subjectivity and bias that may occur in the assessment, teachers should
prepare a list of criteria prior to scoring the essays. Consider, for example, the following question and
scoring criteria:

  Œ
   c    
 

Consider the time period during the Vietnam War and the reasons there were riots in cities and at
university campuses. Write an essay explaining three of those reasons. Include information on the impact
(if any) of the riots. The essay should be approximately one page in length. Your score will depend on the
accuracy of your reasons, the organization of your essay, and brevity. Although spelling, punctuation, and
grammar will not be considered in grading, please do your best to consider them in your writing. (10
points possible)
 
 

ë‘ Content Accuracy -- up to 2 points for each accurate reason the riots ensued (6 points total)
ë‘ Organization -- up to 3 points for essay organization (e.g., introduction, well expressed points,
conclusion)
ë‘ Brevity -- up to 1 point for appropriate brevity (i.e., no extraneous or "filler" information)
No penalty for spelling, punctuation, or grammatical errors.

By outlining the criteria for assessment, the students know precisely how they will be assessed and where
they should concentrate their efforts. In addition, the instructor can provide feedback that is less biased
and more consistent. Additional techniques for scoring constructed response items include:

ë‘ Do not look at the student's name when you grade the essay.
ë‘ Outline an exemplary response before reviewing student responses.
c·  P a g e

ë‘ Scan through the responses and look for major discrepancies in the answers -- this might indicate
that the question was not clear.
ë‘ If there are multiple questions, score Question #1 for all students, then Question #2, etc.
ë‘ Use a scoring rubric that provides specific areas of feedback for the students.

Detailed information about constructing checklists and rubrics for scoring is provided in the Performance
Assessments lesson.

  

One way for teachers and students to become more proficient at writing and scoring essay questions is to
practice scoring the responses of other students. Select one of the three programs below (the one closest to
your grade level) and practice applying the rubric to score the constructed responses. Feedback is
provided, as well as links to the corresponding rubric.

ë‘ FCAT Reading 4: A Staff Development Tool -- Select "Rubric Scoring" on the Main Menu
ë‘ FCAT Reading 8: A Staff Development Tool -- Select "Practice Activities" on the Main Menu
ë‘ FCAT Reading 10: A Staff Development Tool -- Select "Practice Activities" on the Main Menu

Das könnte Ihnen auch gefallen