Beruflich Dokumente
Kultur Dokumente
ITEM FORMATS
•Objectives
•1. To outline the steps taken in developing
test items.
•2. To discuss the different item formats, i.e.
dichotomous, polychotomous, Likert,
Checklists, Q-sorts and the category scale; their
advantages and disadvantages.
Items are specific questions or problems that
make up a test (Kaplan &Saccuzo, 2009).
An item is a specific stimulus to which a person
responds overtly (i.e can be observed) or can be
scored. This response can be scored or
Introduction evaluated for example on a scale or grade e.g.
75% meaning out of a 100-item test, the
individual has scored 75 items correct.
A test is a measurement device or technique
used to quantify behaviour or help in
understanding and prediction of behaviour. It
is also termed as a collection of items
1. Define clearly what you want to measure.
Most often, it will be in one of these areas;
A type of cognitive achievement –this can be
either a skill or knowledge. An example of
knowledge is – ‘knowledge of Ugandan history’
or for a skill – ‘demonstration of an ability to
multiply decimals’. A type of affective trait-
steps of Item for example - interest in psychology.
writing The items should be made as specific as
possible.
2. Generate an item pool
The item developer should take care in
selecting and developing items. They should
avoid jobless items. in order to get the
required number if items, one may need to
write 3-4 items for each item that they wish to
write. For example if you wish to write 20
items for your test, you may generate a pool of
60-80 items.
3. Avoid exceptionally long items.
Writing exceptionally long items may lead to having items that are misleading or confusing. So they should be
avoided.
4. Keep the level of reading difficulty appropriate for those who will complete the scale.
It is important to be mindful of the level of reading difficulty of the targeted test takers. If for example the item
developer is writing for nursery school children, the items should be in line with the capability of the targeted test
takers. If this is not done, they will not understand the test and will therefore fail the test.
5. Avoid double meaning items that convey two or more ideas at the same time.
Double barrelled items may end up confusing the test taker since they may fail to decide whether to agree with or
disagree with the statement. This will eventually affect the results of the test. An example is ; Indicate whether
you agree or disagree with the statement. “ I vote NRM because I support Universal Secondary Education”. These
are two different statements; “I vote NRM” and “I support Universal Secondary Education”. Someone can agree
with one but not the other or viceversa.
6. Consider mixing positively and negatively worded items.
At times, the test takers may develop the ‘acquiescence response set’ where
they tend to respond positively to all items. To avoid this bias, you may
include items that are worded in the opposite direction.
For example;
I. “I feel tired”.
II. “I feel energised”.
7. When writing test items, you need to be sensitive to the cultural and
ethnic differences.
For example, if you are writing items for a religious population, it may not be
appropriate to write items reflecting mannerisms that may be offensive to
them like – alcohol drinking, eating certain foods that may be taboo to them,
etc.
8. It is important to realise that items become outdated. When they become outdated, they lose
reliability.
1. When items are used over a long period, they tend to lose reliability. Hence the need to ensure they
are reliable at any one point if they are to be used.
2. Other general guides for item writing.
3. “All of the Above” should not be an answer option
4. “None of the Above” should not be an answer option
5. All answer options should be credible
6. Order of answer options should be logical or vary
7. Items should cover important concepts and objectives
8. Negative wording should not be used
9. Answer options should include only one correct answer
10. Specific determiners (e.g. always, never) should not be used
11. Answer options should be homogenous
12. Items should be independent of each other
13. Test copies should be clear, readable and not hand-written
But I don’t know how to write items!!
Example:
1. Language/Wording
2. Complexity
3. Item Patterns
4. Directions
Language/Wording
Example of “loading”:
Is there any reason to keep this program?
Yes No
Revised:
Does this program offer material not obtained in other courses?
Yes No
Language/Wording
3. Avoid unclear or ambiguous wording.
5. Avoid double-barreled items (i.e., items that express two or more ideas).
2. Use the appropriate reading level for the population that you are
testing.
2. Vary the position of the correct answer among the response options
(i.e., don’t have the correct answer most often in the “b” location).
To aid in this, make a frequency table of the number of times a response was used,
and try to use each option an equal number of times. Correct Answer # of times used
A 10
Frequency Table B 9
C 11
D 10
Total: 40 Items 40
Directions
1. All directions should be brief and easy to understand.
2. The use of bold type or a different font may help set off the directions from
the items by getting the test-taker’s attention.
3. Test directions:
Refer to all items.
May be read out-loud by the person administering the test.
Should be placed at the top of the test.
Include general information about the testing session and approximately
how long the test will last.
4. Item directions:
Give specific directions about how to complete the items.
Should be placed directly above the related section if the type of item
changes.
Close-ended Format
Examinee selects the correct answer from
choices provided.
We will review:
1. Multiple-Choice
2. True-False
3. Matching
Multiple-Choice Items
For the bread, cereal, rice and pasta group, the food
pyramid recommends (stem)
a. 2-3 daily servings (distractor)
b. 3-5 daily servings (distractor)
c. 4-6 daily servings (distractor)
d. 6-11 daily servings (correct answer)
Multiple-Choice Items
Examples:
Inthe novel The Catcher in the Rye, what was the main
character’s
name?
Which of the following is not an option when you receive a
letter
from Judicial Affairs?
Multiple-Choice Distractors
Distractors are alternative, incorrect answer choices for multiple-
choice items.
Distractors should:
1. be related to the question and similar to the correct answer choice.
2. not be easily identified as wrong choices.
3. be written in a similar style and manner (i.e. not noticeably longer or
shorter) as the correct choice.
4. attract examinees who have some knowledge of the material, but
who have not yet fully comprehended all of the subject matter.
Multiple-Choice Items
Examples:
What does SC stand for?
Obviously there’s only one correct answer for this question.
Which is a group fitness program offered by SC?
There are several programs offered, but only one should be listed as an answer
choice.
Distractors Should:
5. avoid the overuse all of the above, none of the above or combinations such
as A and B as options. There are several reasons for avoiding their use.
Yes-No Question
If a student is placed in a temporary triple room, are they
guaranteed to be reassigned to a double room?
Yes No
Matching Items
Matching items involve a set or list of related ideas and responses.
This format is often used when assessing a great deal specific facts.
2. Limit the use of matching items. They often involve lower levels of
processing.
5. Make the list of responses or options uniform in type. In the example given
below, all of the responses are types of group fitness classes; no other option
types are listed.
F. Yoga Fitness
Alphabetical
order
More response options than items
Guidelines
Guidelines for writing short-answer and sentence completion items :
1. The item should be written so that the examinee infers that there is only one
answer that is reasonable.
2. Phrase the item so that the examinee knows that the answer should be concise.
3. Avoid items with several blanks in the sentence, which may be confusing or
unclear.
Example:
List three places where can students go to find out what movies are
playing on campus this semester?
Guidelines Continued
Guidelines Continued:
4. Place blanks near the end of the sentence to minimize confusion.
5. Make blanks the same length and long enough for the longest
answer to fit. If the blanks vary in length, examinees may be
able to determine the answer based on the length of the line.
Example:
If problems with a roommate arise, students should contact their
_________________.
Example
Example of an objective:
Students should be able to identify key employability traits desired by employers.
How can we write items using the various formats for this objective?
Break into groups and write items for the objective using the various formats:
Multiple choice
True/False
Matching
Example
Multiple-Choice:
True-False:
Employers want new employees that focus on the solo completion of tasks.
True False
Example
Matching:
This objective will require a higher level question because it a more complex
objective.
Guidelines Continued:
5. Try to avoid the use of the words “if” or “because”, which may complicate
the sentence.
6. Avoid use of the following words: not, none, never, all or always.
1. Checklist/Multiple-Response
1
2
2. Ranking Scales 3
3. Ordered Scales
Checklist
Example Checklist:
From the following list of events being considered by the
University
Program Board, please check off ( ) all of the events that you
would
Comedy
consider attending.
events Multicultural events
Formal Dances Musical Events
Halloween costume contest Plays
International film festival Poetry readings
Movies Talent show
Multiple Response
Afternoon Shows
Ranking scales are used to order or rate things as they relate to one another.
A respondent rank-orders a list based upon his or her attitude regarding the
topics listed.
Limit rankings to no more than five items. When there are too many items to
be ranked, it may get confusing and respondents may misnumber the items.
Ranking Scales
Example of a Ranking Scale:
1. Likert-type item:
I like to go on hikes.
Strongly Disagree Neither Agree Agree
Strongly
Disagree Nor Disagree
Agree
2. Frequency Item:
I enjoy going to the gym.
Never Rarely Sometimes Frequently Always
Ordered Scales
3. Satisfaction
Are you pleased with the courses being offered?
4. Rating
I thought the concert was…
Excellent Good Fair Poor
Ordered Scales
5. Intensity
The day after I drink, the side effects I experience are:
None Mild Moderate Severe Very Severe
6. Comparison
When I go out, I drink…
Much more Somewhat more About the same Somewhat less
Much less
than others than others as others than others
than others
Ordered Scales
7. Influence
I think that my drinking behavior is…
Very big Big Moderate Small Very small No
problem problem problem problem problem problem
Mistakes
Pay careful attention to what you are asking the students to respond
to with ordered scales.
Example:
It may be easy to guess a correct answer and by chance a correct answer may be selected
c. The Likert Format
This format requires that a respondent indicates the degree of agreement with a particular attitudinal
question.
It is very popular with personality and attitude scales.
This scale is non-comparative and measures only a single trait. The respondent is asked to indicate
their level of agreement with a given statement by way of an ordinal scale.
It is sometimes expressed as a four, five or even six –point scale ranging from, Strongly agree, Agree,
Neutral, Disagree, Strongly Disagree. The more the number of points, the less likely it is for the
respondent to be neutral
Advantages
1. It is easy to construct
2. It produces a highly reliable scale
3. It is easy to read and complete by the test takers.
Weaknesses of this scale include:
4. Central tendency bias; participants may avoid extreme response categories
5. Acceptance bias; participants may agree with statements as presented in order to please the
tester.
6. Social desirability bias; Respondents may wish to portray themselves in a more favourable
light rather than being honest.
7. Validity may be difficult to demonstrate; it may not represent what the tester intended to
measure
d. The Category Format
It is similar to the Likert scale but uses an even greater number of choices than the Likert scale.
Although it may seem similar to the Likert format, the category scale uses a defined point rating
system.
Test takers are required to rate a given item scenario on a scale in a category range. For example one
may use a scale of 1 to 5 or 1 to 10, where 1 is the lowest score and 5 or 10 being the highest score
respectively.
The numbers that are assigned when using the rating scale are sometimes influenced by the context in
which the items are rated.
The number of categories used depends on the fineness of the discrimination that the test takers are
willing to make. If they wish to have a fine discrimination they will take even more categories.
An example:
1. On a scale of 1 to 5, rate Ali ’s attitude towards class assignments. (where 1 is very negative and 5 is
very positive)
2. On a scale of 1to 10, rate the level of academic excellence of Makerere university. (where 1 is very
ordinary and 10 is very competitive.
Advantages
It is very easy to administer
Disadvantages
It does not take into consideration the context in which the test subject is being rated! E.g. in a
class of averagely performing students, a student may be rated as 9, which represents a very
good performer. Yet if the same student is placed in another class of only highly performing
students, the same student may be rated 3, which represents a relatively poor performance.
Also on this scale, test takers have a tendency to spread their responses evenly across the entire
scale of 1 to 10, which may not fairly represent the actual score.
In order to overcome the problems above, the end points of this scale have to be clearly defined,
by outlining the expected characteristics of each point (Kaplan & Ernest, 1983).
For example if one is looking at the performance of students in a given class, for a student to
score 10, they must have been;
- attending all classes - contribute to every question asked in class - solves problems fast -
assists others to complete their class work - regularly passes class tests with over 80%.
On the other hand, the opposite can explain the characteristics of a student scoring 1.
Checklists
These are used in personality measurement.
The test taker is given a list of adjectives and asked to indicate whether each is characteristic of
him/herself or someone else.
Here, a rating of 9 will mean that the statement on the card is the best description of the characteristic
of the person being studied, while 1 is the least description of that person’s characteristics.
For example ;
Castro is…
Is a dependable person.
Is a talkative individual.
Behaves in a sympathetic or considerate manner.
Appears to have a high degree of intellectual capacity
Is protective of those close to him
Tends to be self-defensive.
Is thin-skinned; sensitive to anything that can be construed as criticism.
Q-Lists
A test taker is given a list of statements about one their proposed personal characteristics and
asked to sort them into a given number of piles, e.g. 5, or 9 piles.
These statements are sorted into piles that indicate the degree to which they appear to
describe a given person accurately.
A pile list of 1 to 9 is provided to the test taker, where he/she will rate and place the statement
listed on the card, onto the pile number that appropriately describes the characteristics of the
person being studied.
For example 100 statements about a person’s characteristics are listed on cards, with each card
having one statement, making 100 cards.
The degree of representation of the statements on the cards can be distributed across the 9
piles, depending on the test taker’s interpretation of the subject being studied.
The frequency of cards placed on the different piles is noted and the best characteristic
description of the person under study is noted.
The observed results tend to follow a normal distribution. However, items that lie at the
extreme ends of the quantum always Speke volumes about the true personal characteristics of
the subject