Sie sind auf Seite 1von 17

NAME:ZAROON ALI

ROLL.NO.BW601478
ASSIGNMENT.NO.01
SEMESTER:AUTUMN,2019
COURSE CODE:6507

Q1:Define measurement and evaluation. Also explain the relationship


between measurement and evaluation with examples?

Answer:

The term educational measurement refers to any device for the general study and practice of
testing, scaling and appraising the outcomes of educational process. It includes
administration and scoring or tests, scale construction, validation and standardization and
application and statistical techniques in the interpretation of obtained measures or tests results

Definition of Evaluation

Educational Evaluation is broader in scope and more objective than measurement. It is the
process of carefully, appraising the individual from a variety of information giving device.
Besides testing and other tools of measurement, evaluation seeks additional evidences from
various sources of information supplementing each other; like interviews, questionnaires,
anecdotal records, cumulative records, case conferences, mechanical or electronic recorders, case
studies, or projective techniques, etc; and the selection, through careful analysis of data, most
pertinent to a wise just and comprehensive interpretation to make value judgment of the
individual, or group under study.

Measurement is the process of assigning numbers to individuals or their characteristics


according to specified rules. Measurement requires the use of numbers but does not require
that value judgments be made about the numbers obtained from the process

Examples

 A teacher measures Aslam’s height to be 180 cm. He evaluates his height when he
says that he is long.
A teacher measures Ali achievements in Economics to be 50 %. He evaluates his achievement
when he says that Ali achievement in Economics is satisfactory To be important an
outcome of education must make an observable difference. That is at some time under some
circumstances a person who has more of it must behave differently from a person who has less
of it. If different degrees or amounts of an educational achievement never make any
observable difference what evidence can be found to show that it is in fact important? But if
such differences can be observed then the achievement is measurable for all that measurement
requires is verifiable observation of a less relationship

Relationship between measurement and evaluation with examples

Doubois, Alverson and Staley (1979) explain the distinction between evaluation and
measurement in these words "As with any assessment process, the evaluation of entering
behavior involves the collection and evaluation of data. Psychologist working in the field of tests
and measurements use term measurements to refer to the collection portion of the process.

Evaluation is based two philosophies. One, traditional philosophy is that ability to learn is
randomly distributed in the general population. It means that if some learning task is
assigned to a class and then a test is administered to study their performance. The result of the
test shows that some students score is very high and some students’ score is low and
majority of the students, score falls between these two extremes. It was the opinion of old
educators that all are not endowed with same intellectual abilities to benefit from schooling.
Generally, teachers weeded out students who tended to learn less well than their peers. This was
the old philosophy based on the superiority of heredity.

Whilliam Wiersma and Stephen G Jurs (1990) remarks that Evaluation is a process that
includes measurement and possibly testing but it also contains the notion of value judgment.
If a teacher administers a test to a class and computers the percentage of correct responses, it is
said that measurement and testing has taken place. The scores must be interpreted which
may mean converting them to values like As Bs Cs and so on or judging them to be
excellent, good, fair or poor. This process is called evaluation.

Need of evaluation and also elaborate its types

As long as there is need for the educator to make some instructional decisions, curricular
decisions and selections decisions placement or classification decisions based on the present or
anticipated educational status of the child so long will there are need for education in
educational enterprise.

To the modern educator, the ultimate goal of evaluation is to facilitate learning. This could be
done in number of ways. In each way a separate type of decisions is required. The
evaluation decisions also determine which of tests is to be used for evaluation.

Q2: Explain the concept of reliability of a test. Also explain methods to


assure reliability of test?

Answer:

Concept of reliability

Reliability is essential to validity and the opposite is not so there is something to be said for
placing reliability at the head of the list. A test may be reliable without being valid. / .
Whereas the validity of a test depend in part of its reliability. Therefore a test is only as valid as
it’s reliable. Reliability refers to the consistency with which a test measures. We shall clarify the
meaning of consistency in as test with an illustration. It is observed that when an individual
measured the diameter of a very accurately turned steel ball several times with an exceedingly
accurate pair of the calipers, he did not get exactly the same result every time. Even with the
most accurate instruments available and the best possible control of conditions the
successive measurement of the diameter of the steel ball always somewhat verities. The extent of
such variations is measurable of the consistency or the lack of it in this measuring situation
some degree of the inconsistency is present in all measurement procedures.

Example

To illustrate simple example , let us suppose we had given a group of seven students a test in
social studies and ranked them according to their scores . A day or two later we repeated the test
on the same group of the students and ranked them again. The results might be as follows
Students First testing Second testing

Score Rank Score Rank

A 52 4 55 4

B 60 2 65 2

C 45 5 48 5

D 68 1 69 1

E 57 3 60 3

F 29 7 40 6

G 31 6 35 7

The degree of consistency of measurement can be judgment here by the extent to which the
pupils tend to hold the same relative position in their group. We see that this tendency is
high in this case since all pupils except F and G hold the same rank in both applications of the
test and those two pupils shift slightly.

Methods to enhance the reliability of test

Procedures Types of Reliability

Measures

Test-Retest method Measure of stability Give the same test twice to the same group with
any time interval between test , from several
minutes to several years

Equivalent form Measure of Give two forms of the test to the same group in
method equivalent close succession

Spilit half method Measure of internal Give test once, score two equivalent halves
of test (e.g odd items and even items)
Test retest method

To estimate reliability by means of the test –retest method. The same test is administrated twice
to the same group of pupils with a given time interval between the two administration. The
resulting score are correlated and the correlation coefficient provides measures of stability that is
it indicates how stable the test results are over the given period of time. If the test will tend to
high on the other administrations and the remaining pupils will to stay in their relatives
positions on both administrations. Such stability is indicated by large correlation coefficient.

Equivalent Form Method

The equivalent forms method of estimating reliability makes it possible to avoid the
disadvantages of too short or too long a time interval between successive administrations of the
evaluating device. Two equivalent forms of the test must be constructed so they are similar as
possible in the kind of content, mental process required, number of items, difficulty and all
other respects. The pupils take one form of the test and ten as soon as possible, the other
form.

Split Half method

It is often impossible to employ ether of the methods described above to determine


reliability of test. It may not be feasible to test twice, there may not be equivalent forms
available. In such case we generally use what is known as the split half technique. In this
procedure the test whose reliability we wish to measure is given in the ordinary manner, the
papers are scored as usual and then two scores for each individual are obtained by scoring
alternative halves of the test repetitively.

Q3: Describe the importance of table of specification also develop a two way
table of specifications for 50 marks paper by selecting any unit from 9th
General science learning outcomes. List the difficulties involved in
development table of specification?

Answer: The purpose of a Table of Specifications is to identify the achievement domains being
measured and to ensure that a fair and representative sample of questions appear on the test.
Teachers cannot measure every topic or objective and cannot ask every question they might wish to ask.
A Table of Specifications allows the teacher to construct a test which focuses on the key areas and
weights those different areas based on their importance. A Table of Specifications provides the teacher
with evidence that a test has content validity, that it covers what should be covered.

Designing a Table of Specifications

Tables of Specification typically are designed based on the list of course objectives, the topics
covered in class, the amount of time spent on those topics, textbook chapter topics, and the emphasis
and space provided in the text. In some cases a great weight will be assigned to a concept that is
extremely important, even if relatively little class time was spent on the topic. Three steps are
involved in creating a Table of Specifications: 1) choosing the measurement goals and domain
to be covered, 2) breaking the domain into key or fairly independent parts- concepts, terms,
procedures, applications, and 3) constructing the table. Teachers have already made decisions (or
the district has decided for them) about the broad areas that should be taught, so the choice of
what broad domains a test should cover has usually already been made. A bit trickier is to
outline the subject matter into smaller components, but most teachers have already had to design
teaching plans, strategies, and schedules based on an outline of content. Lists of classroom
objectives, district curriculum guidelines, and textbook sections, and keywords are other commonly
used sources for identifying categories for Tables of Specification. When actually constructing the
table, teachers may only wish to use a simple structure, as with the first example above, or they may
be interested in greater detail about the types of items, the cognitive levels for items, the best
mix of objectively scored items, open-ended and constructed-response items, and so on, with
even more guidance than is provided in the second example.

How can the use of a Table of Specifications benefit your students, including those with special
needs?

A Table of Specifications benefits students in two ways. First, it improves the validity of
teacher-made tests. Second, it can improve student learning as well.
A Table of Specifications helps to ensure that there is a match between what is taught and what
is tested. Classroom assessment should be driven by classroom teaching which itself is driven by
course goals and objectives. In the chain below, Tables of Specifications provide the link
between teaching and testing.
Objectives
Knows Understands Interprets Skill in
Content Symbol Specific Influence of Weather Use of Constructing Total
and facts each factor maps measuring weather No of
items on weather devices maps items
formation
Air pressure 2 3 3 3 Observe Evaluate 11
pupils maps
using construct by
pupils
Wind 4 2 8 2 16
Temperature 2 2 2 2 8
Humidity 2 1 2 5 10
And
Precipitation
Clouds 2 2 1 5

Total No of 12 10 16 12 50
Items
Percent of 12% 10% 16% 12% 25% 25% 100%
Evaluation

Q4: (a) Explain the cognitive domain of Bloom’s Taxonomy of


education objectives?

Answer: One of the most widely used ways of organizing levels of expertise is according to Bloom's
Taxonomy of Educational Objectives. (Bloom et al., 1994; Gronlund, 1991; Krathwohl et al., 1956.)
Bloom's Taxonomy (Tables 1-3) uses a multi-tiered scale to express the level of expertise required to
achieve each measurable student outcome. Organizing measurable student outcomes in this way will
allow us to select appropriate classroom assessment techniques for the course.

There are three taxonomies. Which of the three to use for a given measurable student
outcome depends upon the original goal to which the measurable student outcome is
connected. There are knowledge-based goals, skills-based goals, and affective goals
(affective: values, attitudes, and interests); accordingly, there is a taxonomy for each. Within
each taxonomy, the levels of expertise are listed in order of increasing complexity. Measurable
student outcomes that require the higher levels of expertise will require more sophisticated
classroom assessment techniques.
The course goal in Figure 2--"student understands proper dental hygiene"--is an example of a
knowledge-based goal. It is knowledge-based because it requires that the student learn certain
facts and concepts. An example of a skills-based goal for this course might be "student
flosses teeth properly." This is a skills-based goal because it requires that the student learn
how to do something. Finally, an affective goal for this course might be "student cares about
proper oral hygiene." This is an affective goal because it requires that the student's values,
attitudes, or interests be affected by the course.
Bloom’s Taxonomy Of Educational For Knowledge Based Goals

Level of Description Of Level Example Of Measurable


Expertise Student Out Come

1-Knowledge Recall, or recognition of terms, When is the first day of


ideas procedure, theories, etc spring?

2- Translate Interpret, extrapolate but What does the summer


Comprehension not see full implications or transfer solstice represent?
to other situation closer to literal
translation

3- Application Apply abstractions general What would earth’s


principles or methods to specific seasons be like in
concrete situation specific regions with a
different axis lift?
4-Analysis Separations of a complex idea into What are seasons
its constituent parts and on reversed in the southern
understanding of organization and hemisphere?
relationship between the parts
include realizing. The distinction
between hypothesis and fact as
well as between relevant and
extraneous variables

5-Synthesis Creative mental construction of ideas and If the longest day of year is
concepts from multiple sources to form in June, Why is the northern
complex ideas into a new, integrated and hemisphere hottest Auhust?
meaningful, pattern subject to give in
constraints

6-Evaluation To make a judgment of ideas or methods What would be the important


using external evidence or self-selected variables for predicting
criteria substantiated by observation or seasons on a new discovered
informed rationalization planet?

b) Compare the Blooms Taxonomy with SOLO Taxonomy of Educational

objectives?

Answer: Assessment is one of education's new four-letter words, but it shouldn't be, because it's not
assessment's fault that some adults misuse it. Assessment is supposed to guide learning. It creates a
dynamic where teachers and students can work together to progress their own understanding of a subject
or topic. Assessment should be about authentic growth.

Testing in the U.S. is very different from assessment. I know that sounds absurd but tests have
more finality here. When it comes to testing, we have a love affair with multiple choice or true
and false. We test whether they know the right answer...or not. Lots of tests are made of
hard questions and easy ones. How deeply they know the answer doesn't matter, just as long as
they know it. State tests focus less on what students know, and more on what teacher's
supposedly taught.

When it comes to assessing student learning, most educators know about Bloom's
Taxonomy. They use it in their practices, and feel as though they have a good handle on
how to use it in their instructional practices and assessment of student learning. In our
educational conversations we bring up Blooms Taxonomy, and debate whether students have
knowledge of a topic, and if they can apply it to their daily life.

Interestingly enough, Bloom himself has been quoted as saying that his handbook is "one
of the most widely cited yet least read books in American education". We are
guilty of doing

that from time to time. Its human nature to tout a philosophy that we may only have surface
level knowledge of, which is kind of ironic when we're talking about Bloom's Taxonomy.

For a more in depth understanding of Bloom's, the Center for Teaching at


Vanderbilt University website says, "Here are the authors' brief explanations of
these main categories from the appendix of Taxonomy of Educational Objectives
(Handbook One, pp. 201-207):

 Knowledge - "involves the recall of specifics and universals, the recall


of methods and processes, or the recall of a pattern, structure, or

setting."

 Comprehension - "refers to a type of understanding or apprehension


such that the individual knows what is being communicated and can

make use of the material or idea being communicated without

necessarily relating it to other material or seeing its fullest

implications."

 Application - refers to the "use of abstractions in particular and concrete


situations."

 Analysis - represents the "breakdown of a communication into its


constituent elements or parts such that the relative hierarchy of ideas is

made clear and/or the relations between ideas expressed are made

explicit."

 Synthesis - involves the "putting together of elements and parts so


as to form a whole."

 Evaluation - engenders "judgments about the value of material and


methods for given purposes."

According to the @LeadingLearner Blog, "it (Bloom's) was revised in 2000. In Bloom's

original work the knowledge dimensions consisted of factual, conceptual and

procedural knowledge. Later the metacognitive knowledge dimension was

added and the nouns changed to verbs with the last two cognitive processes

switched in the order.

 Remember

 Understand

 Apply
 Analyse

 Evaluate

 Create"

The criticism with Bloom's is that it seems to focus on regurgitating information, and that
anything goes. A student can provide a surface-level answer to a difficult question, or a
deep answer to a surface-level question. It may show a student has an answer, but does it allow
for teachers and students to go deeper with their learning, or do they just move on?
According to Pam Hook, "There is no necessary progression in the manner of
teaching or learning in the Bloom's taxonomy." If we want students to take control
over their own learning, can they use Bloom's Taxonomy, or is there a better method
to help them understand where to go next?

Going SOLO

A much less known taxonomy of assessing student learning is SOLO, which was created by
John Biggs and Kevin Collis in 1982. According to Biggs, "SOLO, which stands for
the

Structure of the Observed Learning Outcome, is a means of classifying learning


outcomes in terms of their complexity, enabling us to assess students' work in
terms of its quality not of how many bits of this and of that they got right."

According to Biggs and Collis (1982), there are five stages of "ascending structural
complexity." Those five stages are:

 Prestructural - incompetence (they miss the point).

 Unistructural - one relevant aspect

 Multistructural - several relevant and independent aspects

 Relational - integrated into a structure

 Extended Abstract - generalized to new domain

If we are going to spend so much time in the learning process, we need to do more than
accept that students "get" something at "their level" and move on. Using SOLO taxonomy
really presents teachers and students with the opportunity to go deeper into learning
whatever topic or subject they are involved in, and assess learning as they travel through that
learning experience.

Q5: Briefly describe the procedure for development of different kinds of


objectives types test items and assembling the test. Also highlight the
characteristics of good achievement test?
Answer: An objective test item is defined as one for which the scoring rules are so exhaustive and
specific that they do not allow scorers to make subjective inferences or judgments; thereby, any scorer that
marks an item following the rules will assign the same test score. Objective tests began to be used early
in the twentieth century as a means of evaluating learning outcomes and predicting future
achievement, and their high reliability and predictive validity led to the gradual replacement of the
essay test.

One common misconception about the objective test item is that it is limited to testing
specific, often trivial, factual details, which would sometimes lead to the use of an essay or
performance test to assess students' comprehension of broader principles or their ability to apply
them. However, as Robert Ebel pointed out, well written objective tests (especially multiple
choice tests) can actually assess such higher-order abilities to some extent. While it is true that
some types of knowledge or abilities cannot be assessed by objective tests, educators also
should keep in mind that what test items can assess depends largely on the skills and effort of
the test constructor, rather the test format per se.

OBJECTIVE TEST FORMATS

A variety of different types of objective test formats can be classified into two categories: a
selected response format, in which examinees select the response from a given number of
alternatives, including true/false, multiple choice, and matching test items; and a
constructed response format, in which examinees are required to produce an entire
response, including short answer test items. This distinction is sometimes captured in terms of
recognition and recall. These two general categories are further divided into basic types of
objective tests.

The true/false test is the simplest form of selected response formats. True/false tests are those
that ask examinees to select one of the two choices given as possible responses to a test question.
The choice is between true and false, yes and no, right and wrong, and so on. A major
advantage of the true/false test is its efficiency as it yields many independent responses
per unit of testing time. Therefore, teachers can cover course material comprehensively in a
single test. However, one apparent limitation of the true/false test is its susceptibility to guessing.
It should be noted, however, that test givers can attenuate the effects of guessing by increasing
the number of items in a test. In addition, some guessing might reflect partial knowledge, which
would provide a valid indication of achievement.

Another selected response format type is the multiple-choice test, which has long been the most
widely used among the objective test formats. Multiple-choice test items require the examinee
to select one or more responses from a set of options (in most cases, 3–7). The correct
alternative in each item is called the answer (or the key), and the remaining alternatives are
called distracters. Examinees have less chance of guessing the correct answer to a multiple-
choice test question compared to a true/false test question. In addition, the distracter an
examinee selects may provide useful diagnostic information. Related to the multiple-choice test
is the matching test, which consists of a list of premises, a list of responses, and directions for
matching the two. Examinees must match each premise with one of the responses on the basis
of the criteria described in the directions. A major strength of the matching test is that it is space-
saving and, therefore, can be used to assess several important learning targets at once.

HOW TO CONSTRUCT OBJECTIVE TEST ITEMS

Basically, scoring objective test items is easy: It only requires one to follow the scoring rules.
However, constructing good objective test items requires much more skill and effort. The first
step is to develop a set of test specifications that can serve to guide the selection of test
items. A table of specifications (or test blueprint) is a useful tool for this purpose. This tool is
usually a two-way grid that describes content areas to be covered by the test as the row headings
and skills and abilities to be developed (i.e., instructional objectives) as the column headings
(Figure 2). After specifying the content and ability covered by the test using the table of
specifications, the appropriate test item format is selected for each item. At this point, not only
objective test items but also other types of test items—essay test or performance assessment—
should be considered, depending on the learning outcomes to be measured.

The next step is to create specific test items. Typically, it is particularly important for
objective test items to be written in clear and unambiguous language to allow examinees to
demonstrate their attainment of the learning objectives. If complex wording is used, the
item simply reflects reading comprehension ability. It is also important for each objective test
item to focus on an important aspect of the content area rather than trivial details. Asking
trivial details not only makes the test items unnecessarily difficult, it also obscures what the test
constructor really wants to measure. Similarly, relatively novel material should be used when
creating items that measure understanding or the ability to apply principles. Items created by
copying sentences verbatim from a textbook only reflect rote memory, rather than
higherorder cognitive skills.
Many other specific rules exist for constructing objective test items. Test constructors must be
very careful that examinees with little or no content knowledge cannot arrive at the correct
answer by utilizing the characteristics of the test format that are independent of specific content
knowledge. Jason Millman and his colleagues called this skill of the examinees “test-
wiseness.” For example, in multiple-choice test items, all options should be grammatically
correct with respect to the stem (questions or incomplete statements preceding options), and
key words from a stem, or their synonyms, should not be repeated in the correct option. Any
violation of these rules would obviously provide an advantage for testwise examinees. Test
composers should also equalize the length of the options of an item and avoid using specific
determiners such as all, always, and never because some testwise examinees know that the
correct option is frequently long and without such specific determiners. Robert Thorndike
and Anthony Nitko have provided more comprehensive guidelines, with detailed explanations
for constructing objective test items.

Characteristics of good achievement test:

Characteristic # 1. Validity:

The first important characteristic of a good test is validity. The test must really measure what it
has been designed to measure. Validity is often assessed by exploring how the test scores
correspond to some criteria that is same behavior, personal accomplishment or
characteristic that reflects the attribute that the test designed to gauge.

Assessing the validity of any test requires careful selection of appropriate criterion measure and
that reasonable people may disagree as to which criterion measure is best. This is equally
true of intelligence test. Reasonable people may disagree as to whether the best criterion measure
of intelligence in school grades, teacher ratings or some other measures.

If we are to check on the validity of a test, we must settle on one or more criterion measures of
the attribute that the test is designed to test. Once the criterion measures have been
identified people scores on the measures can be compared to their scores on the test and the
degree of correspondence can be examined for what it tells us about the validity of the test.

Only valid test can give useful information about people but the correction coefficients for
validity are never as high as those for reliability. Though we try for reliabilities of 90 or 60,
validities which have corrections between test scores and criterion measures are not higher than
that of several tests with low but significant validity can sometimes be useful, if they are given
together as a battery and their results are considered together. One reason that validity
coefficients are lower than reliability coefficient is that the reliability of test sets limits on how
valid the test can be.

Characteristic# 2 Reliability:

A good test should be highly reliable. This means that the test should give similar results
even though different testers administrate it, different people scores in different forms of the test
are given and the same person takes that test at two or more different times. Reliability is
usually checked by comparing different sets of scores.

In actual practice, psychological tests are never perfectly reliable. One reason is that
changes do occur in individuals over time; for example, a person who scores low in her
group at an initial testing may develop new skills that rise her to a higher position in the
group at the time of the second testing.

Despite such real changes, the best intelligence test usually yields reliability correlation
coefficient of 90 or higher (where 1.00), indicates perfect correspondence and 0.00 indicates
number correspondence whatever.

If tests with low reliability are used, their scores should be interpreted with caution. To
improve reliability we should ensure that the test is administered and scored by a truly
standard procedure. Making the test procedure uniform might make the test more reliable.

Characteristic # 3 Objectivity:

By objectivity of a measuring instrument is meant for the degree to which equally


competent users get the same results. This presupposes subjective factor. A test is objective
when it makes for the elimination of the scorer’s personal opinion bias judgment. The
recognition of the quality objectivity in a test has been largely responsible for the
development of an raised and objective type tests.

Objective-based tests measure or evaluate the entire human development in three domains that is
cognitive, affective and psychomotor. As the name itself indicates they are based on particular
objective of teaching and evaluating. They provide proper direction, and thus streamline
the whole process of evaluation. These tests are all comprehensives.
Characteristic # 4 Norms:

In addition to reliability and validity good test needs norms. Norms are sets of score
obtained by whom the test is intended. The scores obtained by these groups provide a basic
for interpreting any individual score.

Das könnte Ihnen auch gefallen