Sie sind auf Seite 1von 64

ITEM WRITING AND

ITEM FORMATS
•Objectives
•1. To outline the steps taken in developing
test items.
•2. To discuss the different item formats, i.e.
dichotomous, polychotomous, Likert,
Checklists, Q-sorts and the category scale; their
advantages and disadvantages.
 Items are specific questions or problems that
make up a test (Kaplan &Saccuzo, 2009).
 An item is a specific stimulus to which a person
responds overtly (i.e can be observed) or can be
scored. This response can be scored or
Introduction evaluated for example on a scale or grade e.g.
75% meaning out of a 100-item test, the
individual has scored 75 items correct.
 A test is a measurement device or technique
used to quantify behaviour or help in
understanding and prediction of behaviour. It
is also termed as a collection of items
 1. Define clearly what you want to measure.
 Most often, it will be in one of these areas;
 A type of cognitive achievement –this can be
either a skill or knowledge. An example of
knowledge is – ‘knowledge of Ugandan history’
or for a skill – ‘demonstration of an ability to
multiply decimals’. A type of affective trait-
steps of Item for example - interest in psychology.
writing  The items should be made as specific as
possible.
2. Generate an item pool
 The item developer should take care in
selecting and developing items. They should
avoid jobless items. in order to get the
required number if items, one may need to
write 3-4 items for each item that they wish to
write. For example if you wish to write 20
items for your test, you may generate a pool of
60-80 items.
3. Avoid exceptionally long items.
 Writing exceptionally long items may lead to having items that are misleading or confusing. So they should be
avoided.

4. Keep the level of reading difficulty appropriate for those who will complete the scale.
It is important to be mindful of the level of reading difficulty of the targeted test takers. If for example the item
developer is writing for nursery school children, the items should be in line with the capability of the targeted test
takers. If this is not done, they will not understand the test and will therefore fail the test.

5. Avoid double meaning items that convey two or more ideas at the same time.

Double barrelled items may end up confusing the test taker since they may fail to decide whether to agree with or
disagree with the statement. This will eventually affect the results of the test. An example is ; Indicate whether
you agree or disagree with the statement. “ I vote NRM because I support Universal Secondary Education”. These
are two different statements; “I vote NRM” and “I support Universal Secondary Education”. Someone can agree
with one but not the other or viceversa.
6. Consider mixing positively and negatively worded items.
 At times, the test takers may develop the ‘acquiescence response set’ where
they tend to respond positively to all items. To avoid this bias, you may
include items that are worded in the opposite direction.
For example;
I. “I feel tired”.
II. “I feel energised”.
7. When writing test items, you need to be sensitive to the cultural and
ethnic differences.
 For example, if you are writing items for a religious population, it may not be
appropriate to write items reflecting mannerisms that may be offensive to
them like – alcohol drinking, eating certain foods that may be taboo to them,
etc.
8. It is important to realise that items become outdated. When they become outdated, they lose
reliability.
1. When items are used over a long period, they tend to lose reliability. Hence the need to ensure they
are reliable at any one point if they are to be used.
2. Other general guides for item writing.
3. “All of the Above” should not be an answer option
4. “None of the Above” should not be an answer option
5. All answer options should be credible
6. Order of answer options should be logical or vary
7. Items should cover important concepts and objectives
8. Negative wording should not be used
9. Answer options should include only one correct answer
10. Specific determiners (e.g. always, never) should not be used
11. Answer options should be homogenous
12. Items should be independent of each other
13. Test copies should be clear, readable and not hand-written
But I don’t know how to write items!!

Writing items gets easier with practice

Don’t be frustrated if you find it challenging,


most people do

Itis much easier once the objectives have


been written
Knowledge-Based Items:
So how do I write an item to measure an objective?

 Example:

1. Aliya is going to a job fair on campus where she plans to


meet with several hiring social work agencies, which of
the following is an appropriate question for her to ask
her interviewer?
A. What is the salary for a first year employee?
B. What are the job expectations you have for a new employee?
C. What kind of benefits do you offer to new employees?
D. What is your political party affiliation?
Guidelines for
Knowledge-Based Items
 The guidelines can be separated into 4 specific categories:

1. Language/Wording
2. Complexity
3. Item Patterns
4. Directions
Language/Wording

1. Avoid bias (e.g., age, ethnicity, gender or disabilities).

2. Avoid “loading” the questions by unintentionally incorporating


your own opinions into the items. This can bias the results and is
particularly important with attitudinal items.

 Example of “loading”:
Is there any reason to keep this program?
  Yes No
 Revised:
Does this program offer material not obtained in other courses?
Yes No
Language/Wording
3. Avoid unclear or ambiguous wording.

 Example of unclear wording

Student that are not taking a psychology course but are


taking a psychology internship but have not yet fulfilled
the requirements of their practical experience should
come to the seminar but not attend the training
workshop.
True False

4. Avoid the use of always, never, constantly.


Language/Wording

5. Avoid double-barreled items (i.e., items that express two or more ideas).

 Example of a double-barreled item:


I like to work out at sport complex at least 3-5 times a week.
Yes No
 If one disagrees with this statement, with which part does he or she disagree? Does the
person dislike working out at sport complex ? Does the person not like to work out at all? Does
he or she work out only 2 days a week instead or 3-5?
  Revised
I like to work out. Yes No
I like sport complex . Yes No
I like working out at sport complex . Yes No
Language/Wording
6. Avoid giving clues to the correct response through wording.

 Example of giving clues:


When the two main characters in the book went outside they
tossed around an
A. baseball
B. tomato
C. apple
D. football
 Revised
When the two main characters in the book went outside they
tossed around a(n)
Complexity

1. Be brief. Long, complex items can lead to confusion.

2. Use the appropriate reading level for the population that you are
testing.

3. Be clear. Items should challenge the test-taker’s knowledge, not


the ability to uncover hidden meanings.
Item Patterns
1. Make sure that items are unrelated and do not give hints about the correct
answer to other items.

2. Vary the position of the correct answer among the response options
(i.e., don’t have the correct answer most often in the “b” location).

To aid in this, make a frequency table of the number of times a response was used,
and try to use each option an equal number of times. Correct Answer # of times used
A 10
Frequency Table B 9
C 11
D 10
Total: 40 Items 40
Directions
1. All directions should be brief and easy to understand.
2. The use of bold type or a different font may help set off the directions from
the items by getting the test-taker’s attention.
3. Test directions:
 Refer to all items.
 May be read out-loud by the person administering the test.
 Should be placed at the top of the test.
 Include general information about the testing session and approximately
how long the test will last.
4. Item directions:
 Give specific directions about how to complete the items.
 Should be placed directly above the related section if the type of item
changes.
Close-ended Format
 Examinee selects the correct answer from
choices provided.

 We will review:
1. Multiple-Choice
2. True-False
3. Matching
Multiple-Choice Items

 For the bread, cereal, rice and pasta group, the food
pyramid recommends (stem)
 a.      2-3 daily servings (distractor)
 b.      3-5 daily servings (distractor)
 c.      4-6 daily servings (distractor)
 d. 6-11 daily servings (correct answer)
Multiple-Choice Items

1. Make answer choices brief, not repetitive

2. Offer at least 3 –5 answer choices. The aim is to provide


variability in responses; however, it should be noted that
“optimal” number of answer choices has been debated.

3. Write stems and answers in third person, rather than second


person.

 Example in the second person:


“What might happen if you get caught drinking on campus?”

 Revised in the third person:


“What might happen if a person gets caught drinking on campus?”
Multiple-Choice Items

4. Use italics and boldface appropriately.


 Italics should be used for names of books, journals, plays, poems, and
films.
 Boldface should be used for emphasizing negatives.

 Examples:
 Inthe novel The Catcher in the Rye, what was the main
character’s
name?
 Which of the following is not an option when you receive a
letter
from Judicial Affairs?
Multiple-Choice Distractors
 Distractors are alternative, incorrect answer choices for multiple-
choice items.
Distractors should:
1. be related to the question and similar to the correct answer choice.
2. not be easily identified as wrong choices.
3. be written in a similar style and manner (i.e. not noticeably longer or
shorter) as the correct choice.
4. attract examinees who have some knowledge of the material, but
who have not yet fully comprehended all of the subject matter.
Multiple-Choice Items

5. Be consistent when using “which” and “what” in the


question stem.
 “What” typically indicates that there is only one right answer for the
question.
 “Which” usually refers to “which of the following”.

 Examples:
 What does SC stand for?
Obviously there’s only one correct answer for this question.
 Which is a group fitness program offered by SC?
There are several programs offered, but only one should be listed as an answer
choice.

 
Distractors Should:

5. avoid the overuse all of the above, none of the above or combinations such
as A and B as options. There are several reasons for avoiding their use.

 It is tempting to overuse them because they are easy to write.


 Students with partial knowledge of the question may be able to
answer correctly by process of elimination. For example, a
student may know that two out of three response options are
correct.
 These items may make it harder to discriminate between those
who fully know the subject matter and those who do not.
True-False Items
 There are two types of true-false items:
1. right-wrong
2. yes-no.

Guidelines for Writing True-False Items:


1. Give clear directions for answering questions. Follow the
rules outlined under General Guidelines.
2. Write items such that they do not alert students as to the
correct answer. Alternate the length for both true and
false items to avoid this mistake.  
3. Avoid using negatively worded items, as they tend to lead
to confusion. If the statement has to be negatively
worded, bold the negative word.
True-False Item Examples
Example:
Instructions
Please indicate whether you think the following statements for items
1-10 are True or False by circling the right answer.
Right-Wrong Question
Students are allowed to choose which dorm they want to live in on
campus.
True False

Yes-No Question
If a student is placed in a temporary triple room, are they
guaranteed to be reassigned to a double room?
Yes No
Matching Items
 Matching items involve a set or list of related ideas and responses.
This format is often used when assessing a great deal specific facts.

Guidelines for writing matching items:


1. Provide specific directions regarding how and where to mark answers.

2. Limit the use of matching items. They often involve lower levels of
processing.

3. To decrease the likelihood of guessing, provide more response options


than items in the list. (e.g., offering eight response options for a list
of five items).
Matching Items
Guidelines Continued:

5. Make the list of responses or options uniform in type. In the example given
below, all of the responses are types of group fitness classes; no other option
types are listed.

6. Keep the number of choices reasonable (e.g., between 5 and 12 options).


When lists become long, it requires examinees to spend too much time
searching for answers.

7. List response options in a set order. Words should be listed in alphabetical


order; dates and numbers can be arranged in either ascending or descending
order. This makes it easier for an examinee to search for the correct answer.
Matching Example
Instructions:
Match the statements in items 1-5 with the appropriate group fitness class (A-F) they
describe from the list provided to the right.

1. Workout which involves dancing A. Athletic Box

2. Class includes aerobic moves at a moderate level B. Cardio Blast

3. Workout includes drills and strength training C. Cycle Fit

4. Workout involves boxing and athletic drills D. Energy Circuit

5. Class involves flexibility and relaxation E. Funk

F. Yoga Fitness
Alphabetical
order
More response options than items
Guidelines
Guidelines for writing short-answer and sentence completion items :
1. The item should be written so that the examinee infers that there is only one
answer that is reasonable.
2. Phrase the item so that the examinee knows that the answer should be concise.
3. Avoid items with several blanks in the sentence, which may be confusing or
unclear.

Example:

At Warren Hall and __________ on campus and students can go


to ___________.
Revised:

List three places where can students go to find out what movies are
playing on campus this semester?
Guidelines Continued
Guidelines Continued:
4. Place blanks near the end of the sentence to minimize confusion.

5. Make blanks the same length and long enough for the longest
answer to fit. If the blanks vary in length, examinees may be
able to determine the answer based on the length of the line.

 Example:
If problems with a roommate arise, students should contact their
_________________.
Example

 Example of an objective:
 Students should be able to identify key employability traits desired by employers.

 How can we write items using the various formats for this objective?

 Break into groups and write items for the objective using the various formats:
 Multiple choice
 True/False
 Matching
Example
 Multiple-Choice:

 Which of the following is the trait deemed MOST important by MOST


employers?
A. Interpersonal Skills
B. Leadership
C. Time Management
D. Flexibility

 True-False:

 Employers want new employees that focus on the solo completion of tasks.
True False
Example
Matching:

1. Working with others on a project A. Flexibility


2. Working overtime to complete a project B. Interpersonal Skills
3. Creating and documenting procedures for writing up a report C. Leadership
4. Completing project on time D. Organization
5. Supervising the progress of a group of coworkers E. Reliability
F. Teamwork
G. Time Management
Example

 Now what if the objective were:


 Students should be able to illustrate how their life and educational experiences
have helped them develop key employability traits.

 This objective will require a higher level question because it a more complex
objective.

 Example of Short answer question:


 How has this course and your prior work experiences prepared you for a career in
your desired field?
Example

 Now what if the objective were:


 Students should be able to demonstrate key employability traits.

 This type of objective requires a performance assessment to evaluate


Questions

 Any questions about knowledge-based items?

 Questions about different formats and guidelines?


Attitudinal/Developmental Items
 Measure self-reported feelings or interests

 No right or wrong answers

 This description will only include close-ended formats.

 Example Attitudinal Item:


I learned a great deal about my personal preferences from the Career and
Life Planning Course.
Strongly Disagree Disagree Neither Agree Agree Strongly Agree
Nor Disagree
 Example Developmental Item:
I have carefully thought out opinions regarding my place in the world
Strongly Disagree Slightly Disagree Slightly Agree Strongly Agree
Attitudinal/Developmental Items
Guidelines for writing attitudinal/developmental
items:

1. As much as possible, statements and questions should be written


in the present tense. Items written in past or future tense may
result in confusion and mistaken responses.

2. Statements should be clearly written and have only one meaning


or interpretation.

3. Instructions should mention that there are no right or wrong


answers.

4. Statements and questions should be short (no more than 20 words)


and simple.
Guidelines Continued

Guidelines Continued:

5. Try to avoid the use of the words “if” or “because”, which may complicate
the sentence.

6. Avoid use of the following words: not, none, never, all or always.

7. Response categories should not overlap.

8. Individual items should focus on one idea.


Types of Attitudinal/Developmental
Items

1. Checklist/Multiple-Response 
1
2
2. Ranking Scales 3

3. Ordered Scales
Checklist
Example Checklist:
From the following list of events being considered by the
University
Program Board, please check off ( ) all of the events that you
would
 Comedy
consider attending.
events  Multicultural events
 Formal Dances  Musical Events
 Halloween costume contest  Plays
 International film festival  Poetry readings
 Movies  Talent show
Multiple Response

Example Multiple Response:

Which of the following on-campus movie showings do you attend?


Check all that apply.
Evening Shows

Afternoon Shows

I do not attend on-campus


movies
Ranking Scales

 Ranking scales are used to order or rate things as they relate to one another.

 A respondent rank-orders a list based upon his or her attitude regarding the
topics listed.

 Limit rankings to no more than five items. When there are too many items to
be ranked, it may get confusing and respondents may misnumber the items.
Ranking Scales
Example of a Ranking Scale:

Please rank the following statements in order from


1(highest) to 4 (lowest) according to how well your
advisor
succeeds in accomplishing the task.

Assists student in developing realistic goals


Refers student to available resources
Assists student in planning an appropriate program
Monitors student’s academic progress
Ordered Scales

 Ordered scales are composed of items that are


combined to yield a score that provides a
measurement of one’s attitude concerning a
particular construct.
Types of ordered scales:
1. Likert-type
2. Frequency
3. Satisfaction
4. Rating
5. Intensity
6. Comparison
7. Influence scales
Ordered Scales

1. Likert-type item:
 I like to go on hikes.
Strongly Disagree Neither Agree Agree
Strongly
Disagree Nor Disagree
Agree

2. Frequency Item:
 I enjoy going to the gym.
Never Rarely Sometimes Frequently Always
Ordered Scales

3. Satisfaction
 Are you pleased with the courses being offered?

Very Satisfied Neither Satisfied Dissatisfied


Very
Satisfied Nor Dissatisfied
Dissatisfied

4. Rating
 I thought the concert was…
Excellent Good Fair Poor
Ordered Scales

5. Intensity
 The day after I drink, the side effects I experience are:
None Mild Moderate Severe Very Severe

6. Comparison
 When I go out, I drink…
Much more Somewhat more About the same Somewhat less
Much less
than others than others as others than others
than others
Ordered Scales

7. Influence
 I think that my drinking behavior is…
Very big Big Moderate Small Very small No
problem problem problem problem problem problem
Mistakes
 Pay careful attention to what you are asking the students to respond
to with ordered scales.

 For example, DON’T use an intensity item with a satisfaction


response options

 Example:

The course assignments were tailored to the course


material.
None Mild Moderate Severe Very
Severe
 Different item formats are used for different
purposes. The format used for evaluating
attitudes may not be the same to be used for
assessing personalities. Each format is chosen
based on the pros and cons for that particular
Item formats format.
a. Dichotomous Format

 This format offers two alternatives for each


item. If a test taker selects one of the
alternatives that is presented, they are
awarded a point. A common dichotomous
test is the True-False examination. The test
taker’s task is to choose either what is true
or what is false, but not both for a single
item. Other item responses on the format
include, “Yes” or “No”
An example of
dichotomous
items;
Advantages of the Dichotomous Format
 1. It is easy to construct and administer.
 2. It is easily scored. The tester only needs to count the number of correct items to get the
score.
 3. The true-false items require absolute judgement. The test taker cannot choose anything in
between.
Disadvantages of the Dichotomous Format
 1. They encourage students to memorise material and be able to pass the test even when they
have not really understood the concepts.
 2. Dichotomous items tend to be less reliable than other item formats. This is because it only
poses a mere chance of 50% of either passing the test or failing it! It is easy for a test taker to
simply guess a correct answer without understanding the context of the item.
b. The Polychotomous Format (Polytomous
This resembles the dichotomous format only that it has more than two alternatives.
A point is given for selecting one of the alternatives but not for selecting any other choice.
For a polychotomous examination, the test taker has to determine which alternative is correct. Incorrect
alternatives are called distractors.
According to the psychometric theory, adding more distractors increases the reliability of the item. It is
usually best to have 3- 4 distractors for this purpose. However, poorly written distractors may affect the
quality of the test.
Unlike in the dichotomous format where a 50% chance of success is observed, in the polychotomous format,
chances of success are dependent on the number of choices available per item, i.e. if the choices are four,
chance of a correct choice is one out of the four choices which is equivalent to 25%. If the choices are three,
the chance of a correct choice is one out of three which is equivalent to 33.3%.
Some test takers can get the items correct simply by guessing even if they have not read the subject matter.
Hence for a test with three alternatives, the chances of getting a correct choice is 33%, etc.
Because of guessing, a correction for guessing is done.
The formula to correct for guessing on a test is;
Corrected score = R - W/ �−1
Where

R = the number of right responses


W = the number of wrong responses
n = the number of choices for each item
Take an example of 100 items with 4 choices each, and the test taker decided to guess all through the exercise.
By default the expected score from guessing will be a quarter (25) of the 100 items. R is expected to be 25 of
the 100 items, and the number of wrong responses will be W = (100-25)= 75 and n = 4
Using the formula above;
Correct score = 25 -( 75/ 4−1 ) = 25 –(75/ 3) =25-25=0
 so when correction for guessing is applied, the corrected score is actually 0.
An example:
 Ali was subjected to a psychological test with 100 items, each item having four answer choices
to choose from. He scored 88 correct answers and was pronounced to have passed the test.
What is Ali’s score after correction for guessing?
 From the formula, Corrected score = R - W/ �−1

 R is observed to be 88 of the 100 items, W = 12 and n = 4


 Correct score = 88- ( 12/ 4−1 ) = 88 – (12/ 3) = (88-4 ) =84
So Ali’s corrected score is 84.
 The omitted numbers are not included. They provide neither credit nor penalty.
 The expression (W/n-1) is an estimate of the number of responses the test taker is expected to
get right by chance.
 Advantages of use of polychotomous format
 It takes little time for the test takers to respond since they do not write the answers. Hence one
can respond to a large number of items in a short time.
 The tests are easy to score. The tester only counts the correct items to get the score.
 Disadvantages of use of polychotomous format

 It may be easy to guess a correct answer and by chance a correct answer may be selected
c. The Likert Format
 This format requires that a respondent indicates the degree of agreement with a particular attitudinal
question.
 It is very popular with personality and attitude scales.
 This scale is non-comparative and measures only a single trait. The respondent is asked to indicate
their level of agreement with a given statement by way of an ordinal scale.
 It is sometimes expressed as a four, five or even six –point scale ranging from, Strongly agree, Agree,
Neutral, Disagree, Strongly Disagree. The more the number of points, the less likely it is for the
respondent to be neutral
Advantages
1. It is easy to construct
2. It produces a highly reliable scale
3. It is easy to read and complete by the test takers.
Weaknesses of this scale include:
4. Central tendency bias; participants may avoid extreme response categories
5. Acceptance bias; participants may agree with statements as presented in order to please the
tester.
6. Social desirability bias; Respondents may wish to portray themselves in a more favourable
light rather than being honest.
7. Validity may be difficult to demonstrate; it may not represent what the tester intended to
measure
d. The Category Format

 It is similar to the Likert scale but uses an even greater number of choices than the Likert scale.
 Although it may seem similar to the Likert format, the category scale uses a defined point rating
system.
 Test takers are required to rate a given item scenario on a scale in a category range. For example one
may use a scale of 1 to 5 or 1 to 10, where 1 is the lowest score and 5 or 10 being the highest score
respectively.
 The numbers that are assigned when using the rating scale are sometimes influenced by the context in
which the items are rated.
 The number of categories used depends on the fineness of the discrimination that the test takers are
willing to make. If they wish to have a fine discrimination they will take even more categories.
 An example:
 1. On a scale of 1 to 5, rate Ali ’s attitude towards class assignments. (where 1 is very negative and 5 is
very positive)
 2. On a scale of 1to 10, rate the level of academic excellence of Makerere university. (where 1 is very
ordinary and 10 is very competitive.
 Advantages
 It is very easy to administer
Disadvantages
 It does not take into consideration the context in which the test subject is being rated! E.g. in a
class of averagely performing students, a student may be rated as 9, which represents a very
good performer. Yet if the same student is placed in another class of only highly performing
students, the same student may be rated 3, which represents a relatively poor performance.

 Also on this scale, test takers have a tendency to spread their responses evenly across the entire
scale of 1 to 10, which may not fairly represent the actual score.
 In order to overcome the problems above, the end points of this scale have to be clearly defined,
by outlining the expected characteristics of each point (Kaplan & Ernest, 1983).
 For example if one is looking at the performance of students in a given class, for a student to
score 10, they must have been;
 - attending all classes - contribute to every question asked in class - solves problems fast -
assists others to complete their class work - regularly passes class tests with over 80%.
 On the other hand, the opposite can explain the characteristics of a student scoring 1.
Checklists
 These are used in personality measurement.
 The test taker is given a list of adjectives and asked to indicate whether each is characteristic of
him/herself or someone else.
 Here, a rating of 9 will mean that the statement on the card is the best description of the characteristic
of the person being studied, while 1 is the least description of that person’s characteristics.
 For example ;
 Castro is…
 Is a dependable person.
 Is a talkative individual.
 Behaves in a sympathetic or considerate manner.
 Appears to have a high degree of intellectual capacity
 Is protective of those close to him
 Tends to be self-defensive.
 Is thin-skinned; sensitive to anything that can be construed as criticism.
Q-Lists
 A test taker is given a list of statements about one their proposed personal characteristics and
asked to sort them into a given number of piles, e.g. 5, or 9 piles.
 These statements are sorted into piles that indicate the degree to which they appear to
describe a given person accurately.
 A pile list of 1 to 9 is provided to the test taker, where he/she will rate and place the statement
listed on the card, onto the pile number that appropriately describes the characteristics of the
person being studied.
 For example 100 statements about a person’s characteristics are listed on cards, with each card
having one statement, making 100 cards.
 The degree of representation of the statements on the cards can be distributed across the 9
piles, depending on the test taker’s interpretation of the subject being studied.
 The frequency of cards placed on the different piles is noted and the best characteristic
description of the person under study is noted.
 The observed results tend to follow a normal distribution. However, items that lie at the
extreme ends of the quantum always Speke volumes about the true personal characteristics of
the subject

Das könnte Ihnen auch gefallen