Synthesis On Testing and Evaluation

Polytechnic University of the Philippines
COLLEGE OF EDUCATION
Department of Elementary and Secondary Education
Sta. Mesa, Manila
MEASUREMENT AND EVALUATION
A Synthesis on:
Historical Development of Testing and Evaluation
Submitted by:
Amoyo, Shekinah F.
Boyo, Jy Allyra S.
Dolot, Dyea C.
Flores, Jacqueline S.
Libron, Maricris P.
Monderin, Camille P.
Patrolla, Danilo B.
BSEDEN 4-1D
Submitted to:
Prof. Jay-R A. Manamtam
July 5, 2015
I.
RESEARCH
MAIN CONTENT
A Little History
Early Period
The Boom Period
The First Period of Criticism
The Battery Period
The Second Period of Criticism
The Age of Accountability
2200 B.C.
Chinese used competitive exam, civil service positions

Civil law, military affairs, agriculture, revenue, and geography.
Testing extremely rigorous
Confucian classics were emphasized.
Only 3% of the group became eligible for public office.
Chinese served as models for developing civil service exams in Europe
and America in the 1800s.
Weaknesses:
Chinese failed to validate the selection procedures.

Penmanship was at that time given a relevant predictor for suitability
for office.
Wundt, Galton, and Cattell laid the foundation for the 20th Century testing.
Studied conscious human experience using his psychological laboratory.

Acknowledged individual differences but inclination was on the study of the
human mind.
His legacy was on the rigorous experimental control of procedures, which is
very important in tests administration under standardized conditions.

Studied individual differences, most basic concept underlying psychological
testing.
Concentrated individual differences sensory and motor functioning. 10
years, tested 17,000 individuals

He pioneered the study of individual differences in mental ability.
Related intellectual ability to skills such as reaction time, sensitivity to
physical stimuli, and body proportions.

Demonstrated that objective tests could be devised through standardized
procedures.
VOID IN TESTING
Evolution of Intelligence and Standardized Achievement Tests:
Alfred Binet was on the verge of a major breakthrough in intelligence testing.

Binet developed his test to help identify children in Paris school system who could
not profit from ordinary instruction.
Binet-Simon Scale was established; major breakthrough in the creation
of modern test.
Boom Period (Americas involvement in World War I)
15-year boom period

New science of Psychology was called on to play a part in military situations
Yerkes used the Army Alpha (Verbal) and Army Beta for selection of
individuals for military service.
Robert Yerkes, a Harvard psychology professor. Convinced the Department of
War that it should test all of its 1.75 million recruits for intelligence tests, so they
could be classified and given appropriate assignments (Goddard and Terman also
chaired this committee).
Consequences of Kalikak Family
The height of Goddards success came at a time when America was

experiencing a large influx of immigrants from Europe. The
Immigration Restriction Act, passed in 1924 (which remained in effect
until 1965) was influenced by American eugenics efforts. In 1913
Goddard was invited to Ellis Island to help detect morons in the
immigrant population. In his Intelligence Classification of Immigrants
of Different Nationalities (1917) he asserted that most of the Ellis
Island immigrants were mentally deficient. For example, he indicated
that 83% of all Jews tested were feebleminded, as were 80% of the
Hungarians, 79% of the Italians, and 87% of the Russians. The result
was that many immigrants were turned away and sent back to Europe.
Measurement expanded in 12 years after the war; vocational, and personality tests
were developed.
Personality Tests: 1920-1940 (WWII)
Structured personality tests: paper and pencil tests; i.e., Woodworth
Personal Data Sheet
Tests like MMPI were published
Criticism and Consolidation (1930s)
Test developers and users placed too much reliance on the correctness of tests
results regarding peoples abilities and characteristics
Early Abuses of Tests in America
Goddard (1906) began testing 378 residents and categorized them as Idiot (ma
below 2), imbecile (3- 7), feebleminded (8-12), moron (foolish)

Goddards desire was to separate people out
Believed feeble minded people were the cause of most social problems (thievery,
laziness, alcoholism, prostitution, immorality)

Called for the colonization of morons to restrict their breeding. Further, he
believed that many immigrants were feeble minded.

Produced evidence that supported segregation. Sounded dire warnings that racial
intermixture would inevitably cause a deterioration of American intelligence.
Later recanted: without foundation Probably the result of cultural and language
differences.
Age of Discrimination testing revealed large score differences between White
Americans and minorities- feeble-minded; started to question the test and the
conclusions drawn from the tests
First Period of Criticism
1930s saw a crash in the expectations of mental measurement.

Criticisms led young psychologist to initiate the Mental Measurements Yearbook
(MMY) to critically review test.
Battery Period (1940s)
Psychological measurement was used again for military service where batteries of
tests were developed that measure several abilities.
Reduced failure rates and led to emphasis on test batteries.

1950s educational and psychological testing grew and expanded not lonely in the
field of education but other fields like business, industry, clinics.

APA set guidelines for good testing practice.
2nd Period of Criticism
In 1965, civil rights movement were in full swing; reacted to tests invasion of
privacy.
Tests were seen as biased tools; discriminate women and minorities in education
and employment.
Age of Accountability
Despite criticisms, governments and specifically educational institutions were

putting greater faith in testing to determine whether government and educational
programs were achieving their objectives.

Despite failures, school are accountable for maximum learning of the students
MORAL LESSONS
II.
Failures
Segregation between/among minorities.
Created intellectual hierarchy between/among races.
Labelling: Americans superior over African Americans and other minorities.
Discrimination between men and women in employment.
Invasion of privacy
SYNTHESIS
We started the class late last July 2, 2015 at 11AM instead of our usual 9AM
class because our professor needed to do something. Before the reporting start,
Group 5 gave the class some energizers. The first one is we need to get any piece
of paper and then well write information about ourselves, though there is a twist,
we need to write three truths and a lie, and then we passed the paper around the
classroom and whoever picked the paper of our classmates needs to guess which
are the truth and which is the lie. The last energizer is, well send one of our
classmates outside the classroom and the people remaining inside the classroom
will talk about what occupation will he or she have in the future and then by
actions or gestures, the person chosen to get out of the room will have to guess
what is it. The first chosen person was Mozart and he needed to guess checker,
but because we were so loud while discussing, I think he overheard what we
talked about and guessed it right. The second person, Andrea, didnt get hers right,
though she did get close to the answer which is astronaut but she answered
astronomer.
The energizers were successful in livening up the mood in the class and have
positive effect in our class because we had a good time. It took about 25-20
minutes of time.
After the energizers, the reporter of Group 2, Rodrigo Espina discussed their
topic, which is Historical Development of Testing and Evaluating, and started his
lesson by giving the class a timeline of the development of testing and evaluation.
The reporter explained that the development of testing, measurement and
evaluation was slow and difficult to explain because of the utilization of humans
for a long time. Then he proceeded to give us a summary of the development:
2200 B.C.
Chinese used competitive exam for civil service positions.

It consists of both oral and written examinations, which are informal
(until 1115 B.C. where their test procedures became formal)

Applicants are tested through their knowledge in civil law, military
affairs, agriculture, revenue and geography

Only 3% of the applicants passed the exam
The reporter explained that Chinese served as models for developing civil
service exams in Europe and America during 1800s and also discussed its
weaknesses which are: 1) Chinese failed to validate the selection procedures and
2) Penmanship was at that time given a relevant predictor for suitability for office.
Next, he discussed the introduction of formal measurement procedures in western
education systems in the 19th century:
Wilhelm Wundt
Studied conscious human experience using his psychological

laboratory.
Acknowledged individual differences but inclination was bon the study

of the human mind.
His legacy was on the rigorous experimental control of procedures,

which is very important in tests administration under standardized
conditions.
Sir Francis Dalton
In 1863, a half cousin of Charles Darwin, Sir Francis Galton worked on
individual differences.
Concentrated on individual differences, sensory and motor functioning.
In the span of 10 years, he tested 17,000 individuals.
Pioneered the study of individual differences in mental ability.
Related intellectual ability to skills such as reaction time, sensitivity to
physical stimuli, and body proportions.

Demonstrated that objective tests could be devised through
standardized procedures.
James Mckeen Cattell (1860-1944)
Transported brass instruments to the U.S., did an elaborate reaction
time studies; invented the term mental test.

Some of his famous students are:
Thorndike (1898)
Woodworth (1899) and E.K. Strong (1911) whose Vocational Interest
Blank, after so many revisions, is still widely used.
The reporter explained that during the 1650s-1800s, people struggle to fit
in the society. He also added further information about Francis Dalton who is
actually a half cousin of Charles Darwin. Dalton worked on individual
differences. In 1883 he published a book titled inquires into the Human Faculty
and Development. His work was regarded as the beginning of mental tests. The
reporter then proceeded to explain in detail:
Time Period 1: The Age of Reform (1792-1900s)
The first documented formal use of evaluation took place in 1792 when
William Farish utilized the quantitative mark to assess students
performance.
The quantitative mark permitted objective ranking of examinees and
the averaging and aggregating of scores.
Time Period 2: The Age of Efficiency and Testing (1900-1930)
Formulaic
Fredrick W. Taylors work on scientific management became influential
to administrators in education.
Taylors scientific management
measurement, analysis, and most importantly, efficiency.

Objective-based tests were critical in determining quality of instruction.
Tests were developed by departments set up to improve the efficiency
was
based
on
observation,
of the educational district.
Time Period 3: The Tylerian Age (1930-1945)
Ralph Tyler, considered the father of educational evaluation, made
considerable contributions to evaluation.

Tyler directed an Eight-Year Study (1932-1940) which assessed the
outcomes of programs in 15 progressive high schools and 15 traditional
high schools.
Time Period 4: The Age of Innocence (1946-1957)
Starting in the mid 1940s, Americans moved mentally beyond the war
(World War II) and great depression.

According to Madaus & Stufflebeam (1984), society experienced a
period of great growth; there was an upgrading and expansion of
educational offerings, personnel, and facilities.

Bloom, Engelhart, Furst, Hill, and Krathwohl (1956) gave objectivebased testing advancement when they published the Taxonomy of
Educational Objectives.
Expanded the facilities; avenues to expand knowledge
From one discipline to branch to another
Time Period 5: Age of Development (1958-1972)
In 1957, the Russians successful launch of Sputnik I sparked a national
crisis.
As a result, legislation was passed to improve instruction in areas that
were considered crucial to the national defense and security.

In the early 1960s, another important factor in the development of
evaluation was the emergence of criterion referenced testing.

Much more advancement
Raw data for advancement
The emergence of norm-reference test; criterion-reference test
Time Period 6: The Age of Professionalization (1973-1983)

Before: Teacher as a job
Now: Teacher as a profession
During the 1970s, evaluation emerged as a profession.

A number of journals including Educational Evaluation and Policy
Analysis, Studies in Educational Evaluation, CEDR Quarterly,
Evaluation Review, New Directions for Program Evaluation,
Evaluation and Program Planning, and Evaluation News were
published.
Further, universities began to recognize the importance of evaluation
by offering courses in evaluation methodology.
Time Period 7: The Age of Expansion and Integration (1983-Present)
In the early 1980s, evaluation struggled under the Reagan

administration. Cut backs in funding for evaluation took place and
emphasis on cost cutting arose.
According to Weiss (1998), funding for new social initiatives were

drastically cut. By the early 1990s, evaluation had rebounded with the
economy.
The field expanded and became more integrated.
Professional associations were developed along with evaluation
standards.
Before ending his report, Rodrigo left the class with word to ponder:
Sometimes its the very people who no one imagines anything of, who do the things that no
one can imagine.
- Alan Turing
After Rodrigos discussion, our professor discussed a few things. One is
memorizing is not bad its actually good. What the students need is for
information to retain in their mind and then theyll go from there, understanding
and thinking what they memorize is all about. Our professor also gave us our tasks
to finish and reminders for the next meeting and a quiz the week after that.
Thats the end of the class.
CONCLUSION
Testing can be very helpful if its use increases the learning and performance of
children. This is why, we have seen, that, the history of testing started very early,
it has grown from the test of individual differences to almost all aspects of
education and human life. Hence there is no aspect of life that can be mentioned
where there is no form of measurement or the other. This is because test from the
best means of detecting characteristics in a reasonable objective fashion. They
help us gain the kinds of information about learners and learning that we need to
help students learn.
III.
DISTRIBUTION OF TASKS
1. Researcher - Dyea C. Dolot and Maricris P. Libron
2. Keeper Shekinah F. Amoyo
3. Manager 4. Synthesizer 5. Encoder

6. Presenter -
Jacqueline S. Flores
Danilo B. Patrolla
Camille P. Monderin
Jy Allyra S. Boyo

Synthesis On Testing and Evaluation

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Synthesis On Testing and Evaluation

Hochgeladen von

Copyright:

Verfügbare Formate

Polytechnic University of the Philippines

MEASUREMENT AND EVALUATION

Chinese used competitive exam, civil service positions

Chinese failed to validate the selection procedures.

Studied conscious human experience using his psychological laboratory.

very important in tests administration under standardized conditions.

years, tested 17,000 individuals

Related intellectual ability to skills such as reaction time, sensitivity to

physical stimuli, and body proportions.

Alfred Binet was on the verge of a major breakthrough in intelligence testing.

Boom Period (Americas involvement in World War I)

15-year boom period

The height of Goddards success came at a time when America was

Criticism and Consolidation (1930s)

Early Abuses of Tests in America

below 2), imbecile (3- 7), feebleminded (8-12), moron (foolish)

laziness, alcoholism, prostitution, immorality)

believed that many immigrants were feeble minded.

First Period of Criticism

1930s saw a crash in the expectations of mental measurement.

Battery Period (1940s)

Reduced failure rates and led to emphasis on test batteries.

field of education but other fields like business, industry, clinics.

2nd Period of Criticism

Despite criticisms, governments and specifically educational institutions were

programs were achieving their objectives.

Chinese used competitive exam for civil service positions.

(until 1115 B.C. where their test procedures became formal)

affairs, agriculture, revenue and geography

Studied conscious human experience using his psychological

Acknowledged individual differences but inclination was bon the study

His legacy was on the rigorous experimental control of procedures,

Sir Francis Dalton

In 1863, a half cousin of Charles Darwin, Sir Francis Galton worked on

physical stimuli, and body proportions.

James Mckeen Cattell (1860-1944)

Transported brass instruments to the U.S., did an elaborate reaction

time studies; invented the term mental test.

Time Period 1: The Age of Reform (1792-1900s)

Time Period 2: The Age of Efficiency and Testing (1900-1930)

measurement, analysis, and most importantly, efficiency.

of the educational district.

Time Period 3: The Tylerian Age (1930-1945)

Ralph Tyler, considered the father of educational evaluation, made

considerable contributions to evaluation.

Time Period 4: The Age of Innocence (1946-1957)

(World War II) and great depression.

educational offerings, personnel, and facilities.

From one discipline to branch to another

Time Period 5: Age of Development (1958-1972)

In 1957, the Russians successful launch of Sputnik I sparked a national

were considered crucial to the national defense and security.

evaluation was the emergence of criterion referenced testing.

Time Period 6: The Age of Professionalization (1973-1983)

During the 1970s, evaluation emerged as a profession.

Time Period 7: The Age of Expansion and Integration (1983-Present)

In the early 1980s, evaluation struggled under the Reagan

According to Weiss (1998), funding for new social initiatives were

3. Manager 4. Synthesizer 5. Encoder

Das könnte Ihnen auch gefallen