Measurement and Evaluation PPP

MEASUREMENT AND EVALUATION
KEY CONCEPTS
Measurement: the process by
which information about the attributes or characteristics of things are determined and differentiated.
It is also a systematic procedure of determining the quantity or extent of all the measurable dimensions in the educative process.
Measurable Dimensions: intelligence, interest, aptitudes, values, health, personality traits, and scholastic achievements.
Evaluation: is a process of summing up the results of measurements, giving them some meaning based on some value judgments.
Test: is a type of measuring instrument designed to measure any quality, ability, skill, knowledge or attitude of students.
Quiz: is a relatively short test given periodically to measure achievement in material recently taught or on any small, newly completed unit of work.
Ex: 5 to 10 minute test or a 10-item test.
Item: is a part of a test that elicits a specific response. Ex: multiple choice question, a truefalse question, and the like.
Assessment: A process of gathering and organizing data into an interpretable form to have a basis for decision making.
PRINCIPLES OF EVALUATION
Principle
Application in Classroom Testing and Measurement The evaluation of learning outcomes necessitates careful planning and the use of appropriate measuring instruments.
1. Significance: Evaluation is an essential component of the teachinglearning process.
2. Continuity: Evaluation is a continuous process. It takes place before, during and after instruction.
Placement, formative, diagnostic and summative evaluation should be conducted.
3. Scope: Evaluation should be comprehensive and as varied as the scope of objectives.
The areas to be evaluated should include cognitive, (thinking skills, knowledge, and abilities) psychomotor, (physical and motor skills) and affective (social skills, attitudes and values).
4. Congruency: Evaluation must be compatible with the stated objectives.
The lesson objectives should be clearly stated. Appropriate evaluation measures should match these objectives.
5. Validity: There must be a close relationship between what an evaluation instrument actually measures and what it is supposed to measure.
Using a combination of evaluation procedures is likely to yield results that will provide a reliable picture of the learners performance.
6. Objectivity: Although effective evaluation should use all the available information, it is generally believed that this information is more worthwhile if it is objectively obtained.
The data and information needed for evaluation should be obtained in an unbiased manner.
7. Reliability: Evaluation instruments should be consistent in measuring what it does measure.
The classroom teacher should construct and use tests that will enable him/her to achieve specific lesson objectives consistently.
8. Diagnostic value: Effective evaluation should distinguish not only between levels of learners performance but also between the processes, which result in acceptable performance.
Provisions should be made for diagnostic evaluation to determine the strengths as well as the weaknesses and learning problems of the students.
9. Participation: Evaluation should be a cooperative effort of school administrators, teachers, students, and parents.
School administrators, teachers, students, and parents should be involved in the evaluation program. Specifically, students as well as their parents should be oriented on the evaluation policies of the school.
10. Variety: Evaluation procedures are of different types, namely: standardized tests and teacher-made tests; systematic observation and recording; rating scales, inventories, checklists, questionnaires; sentence completion and sociometry.
Using a combination of evaluation procedures is likely to yield results that will provide a reliable picture of the learners performance.
TYPES OF EVALUATION
When conducted
Type
Purpose
Sample Measures
Prior to Instruction
Determine entry Placement knowledge and or Pre-test skills of learners. assessment Place learners in evaluation appropriate (not graded) learning groups. Serve as basis in planning for a relevant instruction.
Pre-test Aptitude Test Readiness Test

Formative evaluation (Usually not graded)
During Instruction Diagnostic Test (usually not graded)
Reinforces successful learning Provides continuous feedback to both the students and teachers concerning learning, success and failures. Identifies learning errors that are in need of correction. Determines recurring or persistent difficulties. Searches for the underlying causes of these problems that do not respond to first aid treatment. Plan for detailed remedial instruction.
Teachermade tests Homework Classroom performance Observation
After Instruction
Summative Evaluation
Determine the extent to Achievement which Tests instructional objectives have been attained.
MODES OF ASSESSMENT
Mode
Description
Examples
Advantages
Disadvantages
T R A D I T I O N A L
Paper and Standardized - Scoring is pencil test and Teacher- objective which usually made tests assesses low Administration level thinking is easy skills. because students can take the test at the same time.
-Preparation of the instrument is time consuming -Prone to cheating
P E R F O R M A N C E
A mode of assessment that requires actual demonstration of skills or creation of products of learning.
Practical Preparation test of the Oral test instrument Projects is relatively easy - Measures behavior that cannot be deceived
-Scoring tends to be subjective without rubrics Administration is time consuming
P O R T F O L I O
A process of gathering multiple indicators of students progress to support course goals in dynamic, ongoing, and collaborative process.
Working portfolios Show portfolios Documentary portfolios
-Measures students growth and development -Intelligence fair
-Development is time consuming - Rating tends to be subjective without rubrics
EVALUATION MEASURES
Purposes of Test
Test provides useful data for making the following decisions: Instructional: identifying areas of specific weaknesses of learners.
Grading: identifying learners who pass or fail in a given subject.
Selection: Accepting or rejecting applicants for admission into a group, program, or institution.
Counseling and Guidance: Identifying learners who need assistance in personal and academic concerns.
Curriculum: assessing the strengths and weaknesses of a curriculum program.
Administrative Policy: determining the budget allocation for a particular school program.
Types of Test
Main points of Comparison
Types of Test Psychological - Aims to measure students intelligence or mental ability in a large degree without reference to what the student has learned. - Administered before the instructional process. Educational - Aims to measure the result of instructions and learning.
Purpose
- Administered after the instructional process.
Survey - Covers a broad range of objectives
Mastery
-Covers a specific objective Scope - Measures general - Measures of achievement in fundamental Content certain subjects skills and abilities -Is constructed by - Is typically trained professionals constructed by teachers
Norm-referenced
Criterion-referenced
I N T E R P R E T A T I O N
-Results is interpreted by comparing one student with another student -Some will really pass
-There is competition for a limited percentage of high score -Describes pupils performance compared to the others
-Results are interpreted by comparing a student based on predefined standard.

-All or none may pass -There is no competition for a limited percentage of high score -Describes pupils mastery of course objective
Verbal Language Mode -Words are used by students in attaching meaning to or responding to test item
Non-verbal
-Students do not use words in attaching meaning to or in responding to test items (e.g. graphs, number and 3-d subjects)
Standardized C O N S T R U C T I O N
Informal
-Constructed by a professional item writer -Covers a broad range of content covered in a subject area -Uses mainly multiple choice -Items written are screened and the best items were chosen for the final instrument -Can be scored by a machine
-Interpretation of results is usually norm-referenced
-Constructed by a classroom teacher -Covers a narrow range of content

-Various types of items are used -Teacher picks or writes item as needed for the test -Scored by a teacher -Interpretation of results is usually criterionreferenced
Individual Manner -Mostly given orally of or requires actual Administration demonstration of skill -One-on-one situation thus, many opportunities for clinical observation -Chance to follow-up examinees response in order to clarify/ comprehend it more clearly
Group -This is a paper and pencil test -Loss the rapport. Insight and knowledge about each examinee -Same amount of time needed to gather information from one student
Objective -Scorers personal judgment does not Effect affect the scoring of Biases - Only one answer
Subjective -Affected by scorers personal opinion, bias or judgment -Several answers are possible.
satisfies the requirement of statement -Little or no -Possible disagreement on what disagreement on is the correct answer what is the correct answer
Power -Consists of series of items Time arranged in Limit and ascending order Level of of difficulty Difficulty -Measures a students ability to answer more difficult items
Speed -Consists of items approximately equal in difficulty
-Measures students speed or rate and accuracy in responding
Selective
-There are choice for the answer -Multiple choice, TrueFalse, Matching type -Can be answered quickly
Supply
-There are no choices for the answer -Short answer, completion, restricted and extended-response essay -Preparation of items is relatively easy for only few questions are needed -Lessen the chance to guess the correct answer -Time consuming to score -Bluffing is a problem
F O R M A T
-Test constructor has a prone to guessing -Time consuming to construct -Guessing is a problem
Classification of Teacher-made Tests
Objective Test
Recall Types (Supply Test)

-Simple Recall -Completion/Fill in the blanks -Identification -Labeling -Enumeration
Recognition Types (Selective Test) -Alternative response

-Multiple choice -Matching type
Rearrangement of Elements
ESSAY EXAMINATION
Unrestricted or Uncontrolled type Restricted or Controlled type
Objective Test: generally calls for single words, phrases, numbers, letters, and other symbols as responses to items.
Objective Tests are classified as follows:

Simple Recall: is defined as one in which each item appears as a direct question, a stimulus word or phrase, or a specified direction. The response is recalled rather than selected from a list given by the teacher. The question should ask only for an important aspect of a fact.
Example: Answer the following questions. Write your answer at the spaces provided for at the left.
_____________ 1. Who was known as the Tagalog
Joan of Arc because of her exploits during the revolution. _____________ 2. The Katipunan in Cavite was divided into two factions-the Magdiwang and the Magdalo. While Magdiwang in Cavite was led by Mariano Alvarez, who led the Magdalo faction? ____________ 3. What is the primary source of the objectives of education in the Philippines at present?
Completion Test (Fill-in the Blanks): is defined as a series of sentences in which certain important words or phrases have been omitted for the pupils to fill. A sentence may contain one or more blanks and the sentences may be disconnected or organized into paragraph.
Example: Supply the missing word or words to complete the meaning of the statement. Write your answer at the left.
______ 1. A thermometer measures ____. ______ 2. _____ is the process that occurs when dry ice, (CO2 solid) is changed to CO2 (gas). ______ 3. Radium was discovered by ____.
Identification Test: a form of completion test in which a term is defined, describe, explained or indicated by a picture, diagram, or a concrete object and the term referred to is supplied by the pupil or student.
Example: Identify the following. Write the answers at the left. ________1. The conqueror of Magellan. ________ 2. The first Filipino cannon maker. ________ 3. The Rajah who led the fight against Legaspi in 1571.
Labeling Test: is a type of test in which the names of parts of a diagram, map, drawing, or picture are to be indicated.
Give the name of the island indicated.
1. _______ 2. _______ 3. _______ 4. _______ 5. _______
1 2 3 4
Enumeration: An enumeration test is a type of completion test in which there are two or more responses to an item.
Alternative Response: Made up of items each of which admits only one of only two possible responses. Varieties of this test are true or false, right or wrong, correct or incorrect, yes or no, fact or opinion, and the like.
Example: Before each statement, write True if the statement is true and write False if the statement is false.
______1. Andy gained knowledge and skills by talking and sharing his experiences to his parents, teachers, older siblings, and older cousins. This situation illustrates Psychosocial Theory. _____2. In Piagets theory of cognitive development, a child between birth to two years, that is during the sensorimotor period, does not see things in abstract forms. ____ 3. Law of Readiness by Edward Lee Thorndike explains that any connection is strengthened in proportion to the number of times it occurs and in proportion to the average vigor and duration of the connection.
Multiple Choice Test: made up of items each of which presents two or more responses, only one of which is correct or definitely better than the others. Each item must be in the form of a complete sentence, question, incomplete statement, or a stimulus word or phrase.
Example: Select the letter of the correct answer. Write at the left of the statement the letters that correspond to the correct options. _______1. A student collapsed in her social studies class. It was found out that she did not eat her lunch. What principle is shown in this situation? a. Physiological need c. Safety need b. Security need d. Psychological need ______ 2. Which of the following refers to the repetition of facts and skills which the teacher wishes to reinforce for mastery? a. Drill c. Recitation b. Review d. Mastery
The main advantage of the multiple choice test is its superior capacity to measure higher levels of knowledge, judgment, reasoning and understanding compared to other types of objective tests.
-
Matching Type of Test: A matching type test is composed of two columns; one is called stimulus and the other, the response column. Each item in one column is to be matched with another item to which it corresponds in the other column.
Example: Match the items in Column B with the items in Column A. Write the letter of your answer on the space provided before each number. Column A Column B ____1. Patron Saint of Teachers a. Paolo Freire ____ 2. Wrote the Pedagogy of the Oppressed b. Edward Lee Thorndike ____3. Authored the laws c. Friedrich August Froebel of learning ____ 4. Father of Kindergarten d. St. John Baptiste de la Salle e. Herbert Spencer
Rearrangement of Elements: consists of ordering items on some basis. Ordering measures memory of relationship and concepts of organization in many subjects.
Example: Arrange the following events in their chronological order. Write their corresponding letters at the left. _____1. a. Execution of Dr. Jose Rizal _____2. b. Declaration of the first Philippine Independence _____3. c. The Edsa Revolution _____4. d. World War II
ADVANTAGES OF OBJECTIVE EXAMINATIONS
1. The sampling of the objective examination is more representative and so measurement is more extensive. This is so because more items are include in the test.
2. Handicaps such as poor vocabulary, poor handwriting, poor spelling, poor grammar and the like do not adversely affect the ability to make a reply.
3. Scoring is not subjective because the responses are single words, phrases, numbers, letters, and other symbols with definite value points and hence, the personal element of the scorer is removed.
DISADVANTAGES OF OBJECTIVE EXAMINATIONS
1. Generally, it measures factual knowledge only. It hardly measures higher levels of knowledge or complex ideas.
2. It does not help in nor encourage the development of the ability of the students to organize and express their ideas.
3. It encourages memory work even without understanding. 4. It is easy to cheat in an objective examination. 5. It is hard to prepare.
Essay Examinations: The essay type test is a type of examination in which the subject or examinee is made to discuss, enumerate, compare, state, explain, analyze or criticize.
Classification of Essay Examinations or Questions
Unrestricted or Uncontrolled Type: In this type, the students have very wide latitude of expression. They have a wide freedom of organizing their ideas in the way they want. Example: Discuss the economic condition of the country.
Restricted or Controlled Type: in this type, the student is limited in organizing his response. There are guides in making a response. Example: Give and discuss the causes of the Philippine Revolution starting with the remote causes and followed by the immediate ones.
ADVANTAGES OF ESSAY EXAMINATION
1. The essay examination measures the higher levels of knowledge. It measures the ability to interpret, evaluate, apply principles, create, organize thoughts and ideas, contrast, etc.
2. The essay test helps students organize their thoughts and ideas logically.
3. The essay test is easy to prepare. It can be prepared in a short time.
4. It is harder to cheat in an essay test.
DISADVANTAGES OF ESSAY EXAMINATION
1. Essay tests are usually not well prepared. Some questions are vague. 2. There is difficulty of giving the right weight to each question.
3. Its usability is low. It takes a long time scoring it because no one except the teacher handling the subject on which the test is based can check the test papers. It cannot be mechanically scored.
4. Sampling is limited. Few questions can be included and hence, the coverage is limited.
5. Scoring is subjective. The causes of subjectivity are: Different standards of excellence of different teachers scoring the papers or different standards of the same teacher checking the papers at different times. The physical and mental conditions of the person checking a paper may also affect the scoring.
THE CLASSROOM TESTING PROCESS

1. Specifying the objectives 2. Preparing the Table of Specification 3. Determining the item format, number, of the test item, and difficulty level of the test 4. Writing the test items that match the objectives 5. Editing, revising, and finalizing test items 6. Administering the test 7. Scoring 8. Tabulating and analyzing the results 9. Assigning grades
GENERAL SUGGESTIONS IN CONSTUCTING WRITTEN EXAMINATION
1. Prepare a table of specifications or a test blue print and use it as a guide for writing test items.
2. Match the test item with the instructional objectives.
3. Keep the vocabulary level of the test items as simple as possible. Ensure that the test directions are clear and direct.
4. State each test item so that only one answer is correct. 5. See to it that one test item does not provide help or give clues in answering other test items
GUIDELINES IN WRITING SPECIFIC TEST ITEMS
Completion Test (Fill-in the Blanks)
1. Omit only words that are essential to the meaning of the statement or sentence. Example: The founder of Katipunan was ____________________.
2. Do not omit so many words in a statement. The statement may lose its meaning. Example: Wrong : _______ was _____________ the first ________ of the __________.
3. Make the blanks equal in length to avoid clues. Long blanks suggest long answer, short ones suggest short answer. Ex: 1. The brain of Katipunan was __________. 2. The brain of revolution was __________. 3. The founder of La Liga Filipina was __________.
4. Answer should be written before the number for easy checking.
5. Avoid equivocal questions. Equivocal questions admit two or more interpretations. Example: Wrong: Rizal was born in ___________. The answer to this may refer to place or time. Better: Rizal was born in the year _____________.
Identification: 1. The definition, description, or explanation of the term may be given by means of a phrase or incomplete statement if it is not indicated by a picture, diagram, or complete object. Example: Identify the following: ______ 1. The hero of the Battle of Mactan ______ 2. The longest Filipino revolt ______ 3. A triangle with a right angle
2. The statement should be so phrased that there is only one response. Example: Identify the following:
Wrong: ____________ (1.) Manuel L. Quezon Better: ____________ ( 2.) The first President of the Commonwealth.
Labeling Test:
1. Make a diagram map, drawing or picture to be labeled very clear and recognizable especially the parts to be labeled.
2. The parts to be labeled should be indicated by arrows so that the labels can be written in a vertical column in a definite place and not on the face of the diagram, map, drawing or picture.
Alternative Response Test (True or False) 1. Avoid the use of absolute modifiers such as all, none, no, always, never, nothing, only, alone, and the like unless they are a part of a fact or truth. These terms tend to make the statement false.
Example: (1) All Filipinos are hardworking. (This is of course false) (2) All players in athletics are strong. (Again, this is false) (3) All first place winners in the Olympics received gold medals. (This is true because this is a fact)
2. Vague qualifiers such as usually, seldom, much, little, few, small, large, and the like should be used only when they are a part of a fact or truth.
Example: (1) Some Filipinos are thrifty. (Of course this is true) (2) Many stars are already very old. (True. This is a fact)
3. The number of true statements and false statements should be approximately equal.
4. The correct responses should not follow a pattern; otherwise the students may be able to give the right symbols although they do not know the real answers.
5. Start with a false statement since it is common observation that the first statement in this type of test is always positive.
Multiple Choice Test
CONSTRUCTING /IMPROVING THE MAIN STEM
1. The main stem of the test item may be constructed in question form, completion form, or direction form. Example: Question Form: Which is the same as four hundred seventy? a. b. c. d. Completion Form: Four hundred seventy is the same as ___________. a. b. c. d. Direction Form: Add: 22 + 43 a. b. c. d.
* Three alternatives (for grades I-III) or four alternatives (for grades IV-VI) should be provided in each case.
2. The questions should not be trivial. There should be a consensus on its answer. Example: Trivial Question: What time does the sun rise in the morning? a. 4 oclock c. 6 oclock b. 5 oclock d. 7 oclock
3. Each question should have only one answer, not several possible answers.
4. Highlight negative words in the stem for emphasis. Example: One of the strengths of the Filipino character is pakikipagkapwa-tao. This is manifested in all of the following EXCEPTa. Malasakit c. Lakas ng Loob b. Pakikiramay d. Pakikiramdam
CONSTRUCTING /IMPROVING ALTERNATIVES
1. Alternatives should be as closely related to each other as possible. Example:
Poor alternatives: 74 + 23 = ______. a. 87 b. 97 c. 100
-Pupils mistakes should be anticipated. Such possible mistakes should be given among the alternatives.
2. Alternatives should be arranged in natural order. Example: Poor: Pedro is ten years old. How many trips has the earth made around the sun since he was born? a. 365 b. 12 c. 10 d. 30
Improved Alternatives: a. 10 b. 12 c. 30 d. 365
a. 365 b. 30 c. 12 d. 10
3. Alternatives should have grammatical parallelism. Example: Poor: Clay can be used for: a. making hollow blocks b. making pots c. garden soil
Improved: Clay can be used for: a. making hollow blocks b. growing vegetables c. making pots
4. Arrangement of correct answers should not follow any pattern. Example: Poor: 1. b 1. a 1. a 2. c 2. a 2. b 3. b 3. b 3. a 4. c 4. b 4. b 5. b 5. c 5. a 6. c 6. c 6. b
Matching Type:
1.
Use only homogenous material in a single matching exercise.

Example: Poor: Column A ___ 1. Quezon ___2. Pampanga ___3. Camarines Norte __4. Region IV-A __5. Region IV-B
Column B a. MIMAROPA b. Cagayan c. Daet d. CALABARZON e. Lucena f. San Fernando
2. There should be two columns, written side by side, the stimulus column or question column should be written to the left side and the response column to the right. There should be a short blank before each stimulus question where to write the symbol of the response.
Example:
Column A (Stimulus) Column B (Response)
___1. Proponent of a. Jean Piaget Psychosocial Theory ___2. Major contributor b. Erik Ericson on the Theory of Cognitive Development
3. Directions should be clear in stating what items in the response column should be matched within the stimulus column and vice versa. In addition, there should be an unequal number of responses and premises (stimuli).
Direction: Match the capital towns in column B with the provinces in column A and write the letter of symbol of each town in the spaces provided before each number in column A. Column A Column B _____1. Cagayan a. Malolos _____2. Pampanga b. Pasig _____3. Rizal c. Vigan _____4. Bulacan d. Lucena _____5. Ilocos Sur e. Tuguegarao _____6. Isabela f. Lingayen _____7. Zambales g. San Fernado _____8. Pangasinan h. Daet _____9. Batangas i. Laog _____10. Ilocos Norte j. Bangued k. Ilagan l. Iba m. Batangas
Rearrangement of Elements When this type of test is used the basis of arrangement should be tested clearly. The bases are:
Chronological Order: This is arranging items in the order in which they occur. Example: Arrange the following Presidents in their chronological order. Write your answer at the right. Garcia 1. ___________ Marcos 2. ___________ Magsaysay 3. ___________ Macapagal 4. ___________ Osmea 5. ___________ Quezon 6. ___________ Quirino 7. ___________ Roxas 8. ___________
Geographical Order: This is arranging things according to their geographical location. Example: Arrange the following provinces from north to south. Write your answer at the right.
Bulacan Cagayan Nueva Ecija Sorsogon Nueva Viscaya Batangas Isabela Albay Camarines Sur Romblon
1. ___________ 2. ___________ 3. ___________ 4. ___________ 5. ___________ 6. ___________ 7. ___________ 8. ___________ 9. ___________ 10. __________
Arrangement According to Magnitude: The basis of this arrangement is size which may be height, width, and distance.
Example: List the following biological classifications from the most general to the most specific.
Family Genus Phylum Order Class Specie Subphylum
1. __________ 2. __________ 3. __________ 4. __________ 5. __________ 6. __________ 7. __________
Alphabetical Order: This is arranging words according to the alphabet or according to their appearance in the dictionary.
Example: Arrange the following words alphabetically. loud 1. __________ tone 2. __________ music 3. __________ song 4. __________ duet 5. __________ alto 6. __________ chorus 7. __________ melody 8. __________ opera 9. __________
Arrangement According to Importance, Quality, etc. : Example: Arrange the following cities according to their contribution to the countrys foreign trade. Davao 1. __________ Zamboanga 2. __________ Manila 3. __________ Cebu 4. __________ Iloilo 5. __________
Essay Type Tests:
1. State questions that elicit the
desired cognitive skills specified in the learning outcomes. 2. Write the questions in such a way that the specific task is clearly understood by the examinee. 3. Ask all students to answer the same questions. Avoid using optional questions.
4. Indicate the number of points or the amount of time to be spent on each question. 5. Ask a colleague to critique the questions. 6. Prepare a model answer to each question.
Non-test Measures
PerformanceBased Evaluation Measures
Restricted-type Tasks Measures a narrowly defined skill Require relatively brief response Task is structured and specific
Examples: -Constructing a histogram from data provided -Writing a term paper about the significance of Edsa Revolution.
Extended-type Tasks More complex elaborate and time consuming Involves collaborative work with small groups of learners Example: -Composing a poem -Making commercial Advertisement
Affective Evaluation Measures
Teacher Observation Unstructured -Open-ended -Does not require checklist or rating scale for recording purposes. Structured -Uses checklist or rating scale for recording purposes.
Learner Self-report Autobiography- The learner describes his/her own life as he/she has experienced and viewed it. Self-expression- The learner responds to a particular question, issue or concern in an essay form. Self-description- The learner paints a picture of himself/herself in his own words.
Peer Ratings Sociometric Technique- shows the interpersonal relationships among the members of a group. Sociodistance scale- Measures the degree of acceptance or rejection of a learner in relation to the other group members.
RUBRICS
A rubric is a scoring tool for subjective assessments. It is a set of criteria and standards linked to learning objectives that is used to assess a student's performance on papers, projects, essays, and other assignments. Rubrics allow for standardized evaluation according to specified criteria, making grading simpler and more transparent.
Rubrics for Class Debate
CATEGORY
The team The team clearly clearly Understanding understood the understood of topic in-depth the topic inTopic and presented depth and their presented information their forcefully and information convincingly. with ease.
The team seemed to understand the main points of the topic and presented those with ease.
The team did not show an adequate understanding of the topic.
O R G A N I Z A T I O N
All arguments were clearly tied to an idea (premise) and organized in a tight, logical fashion.
Most All arguments arguments were were clearly clearly tied tied to an to an idea idea (premise) (premise) and but the organized organization in a tight, was logical sometimes fashion. not clear or logical.
Arguments were not clearly tied to an idea (premise).
P R E S E N T A T I O N S T Y L E
Team Team usually Team One or more consistently used sometimes members of the used gestures, eye used team had a gestures, eye contact, tone gestures, eye presentation contact, tone of voice and contact, tone style that did of voice and a level of of voice and not keep the a level of enthusiasm a level of attention of the enthusiasm in a way that enthusiasm audience. in a way that kept the in a way that kept the attention of kept the attention of the audience. attention of the audience. the audience.
Every major Every major Every major Every point point was point was point was was not well adequately supported supported. Use of Facts / supported supported with facts, Statistics with several with statistics relevant facts, relevant and/or statistics facts, examples, but and/or statistics the relevance examples. and/or of some was examples. questionable.
I N F O R M A T I O N
All information presented in the debate was clear, accurate and thorough.
Most Most informati information on presented in presented the debate in the was clear debate and was clear, accurate, accurate but was not and usually thorough. thorough.
Information had several inaccuracie s OR was usually not clear.
All statements, body language, Respect and for Other responses Team were respectful and were in appropriate language.
Statements and responses were respectful and used appropriate language, but once or twice body language was not.
Most statements and responses were respectful and in appropriate language, but there was one sarcastic remark.
Statements, responses and/or body language were consistently not respectful.
R E B U T T A L
All Most Most Countercountercounter- counter- arguments arguments arguments arguments were not were were were accurate accurate, accurate, accurate and/or relevant relevant, and relevant and strong. and relevant, strong. but several were weak.
GUIDELINES IN DEVELOPING RUBRICS
1. Identify the important and observable features or criteria of an excellent performance or quality product.
2. Clarify the meaning of each trait or criterion and the performance levels.
3. Describe the gradations of quality product or excellent performance.
4. Keep the number of criteria reasonable enough to be observed or judged.
5. Arrange the criteria in order in which they will likely to be observed.
6. Determine the weight of each criterion and the whole work or performance in the final grade.
ESTABLISHING CONTENT VALIDITY - The degree of validity is the single most important aspect of a test.
Validity: can be defined as the degree to which a test is capable of achieving certain aims. Validity is sometimes defined as truthfulness: Does the test measure what it intends to measure?
KINDS OF VALIDITY
1. Face Validity: is done by examining the physical appearance of the test.
2. Content Validity: is related to how adequate the content of the test samples the domain about which inferences are to be made. It has to do with the appropriateness of the test to the curricular objectives.
3. Criterion-related Validity: This kind of validity pertains to the empirical technique of studying the relationship between predictor, or test scores and some independent external measure, or criterion.
Kinds of Criterionrelated Validity
3.1 Concurrent Validity: describes the present status of the individual by correlating the sets of scores obtained from two measures given concurrently.
3.2 Predictive Validity: describes the future performance of an individual by correlating the sets of scores obtained from two measures given at a longer time interval.
4. Construct Validity: is established statistically by comparing psychological traits or factors that theoretically influence scores in a test.
Kinds of Construct Validity
4.1 Convergent Validity: is established if the instrument defines another similar trait other than what it is intended to measure. e.g. Critical Thinking Test may be correlated with Creative Thinking Test.
4.2 Divergent Validity: is established if an instrument can describe only the intended trait and not the other traits. e.g. Critical Thinking Test may not be correlated with Reading Comprehension Test.
FACTORS INFLUENCING VALIDITY
1. Appropriateness of Test:
it should measure the abilities, skills and information it is supposed to measure.
2. Directions: it should indicate how the learners should answer and record their answers.
3. Reading Vocabulary and Sentence Structures: it should be based on the intellectual level of maturity and background experience of the learners.
4. Difficulty of Items: it should have items that are not too difficult and not too easy to be able to discriminate the bright from slow pupils.
5. Construction of Test Items: it should not provide clues so it will not be a test on clues nor ambiguous so it will not be a test on interpretation.
6. Length of the Test: it should just be of sufficient length so it can measure what it is supposed to measure and not that it is too short that it cannot adequately measure the performance we want to measure.
7. Arrangement of items: it should have items that are arranged in ascending level of difficulty such that it starts with the easy so that the pupils will pursue on taking the test.
8. Patterns of Answers: it should not allow the creation of patterns in answering the test.
ESTABLISHING TEST RELIABILITY
Reliability: refers to consistency; it is the degree to which measurements of the content knowledge or of cognitive development ability are consistent each time the test is given.
CONSIDERATIONS IN ESTABLISHING TEST RELIABILITY
1.Length of the Test: Generally speaking,

the longer the test, the more reliable it will be. 2. The quality of the individual questions or test items: Test maker must see to it that each item is as precise and as understandable as possible. 3. Interpretability (Scorability): The interpretability of an evaluation devise refers to how readily scores may be derived and understood.
4. Usability (Practicability, Economy): Usability is the degree to which the evaluation instrument can be successfully employed by classroom teachers and school administrators without an undue expenditure of time and energy. Five factors upon which usability depends: a. Ease of administration; b. Ease of scoring; c. Ease of interpretation and application; d. Low cost
5. Objectivity: can be obtained by eliminating the bias, opinions or judgments of the person who checks the test.
6. Authenticity: the test should simulate real-life situations.
DESCRIBING EDUCATIONAL DATA
ITEM ANALYSIS The process of testing the effectiveness of the items in an examination.
Item analysis gives information concerning each of the following points: 1. The difficulty of the item 2. The discriminating power of the item 3. The effectiveness of each item
SEVERAL BENEFITS OF ITEM ANALYSIS 1. It gives useful information for class discussion of the test. 2. It gives data for helping the students to improve their learning method. 3. It gives insights and skills which lead to the construction of better test items for future use.
SIMPLIFIED ITEM-ANALYSIS PROCEDURE (UL Method) Only the upper group (U) and lower group (L) scores are considered. The middle or average group is held in abeyance.
Result of item # 5 taken by 30 students in Mathematics test which is subject for Item Analysis.
Student No. 1 2 3 4 Score 86 81 73 82 Answer D A E E Student No. 16 17 18 19 Score 60 80 50 80 Answer D A
5
6 7 8 9 10 11 12
85
74 94 74 75 76 75 79
D
C A B C D E A
20
21 22 23 24 25 26 27
89
90 77 63 57 70 95 72
C
E E A B E A E
13
14 15
65
87 98
D
E E
28
29 30
79
83 97
E
B E
E correct answer
STEPS:
1. Arrange the test scores from the highest to the lowest.
98-E
82-E
74-C
97-E 95-A 94-A 90-E 89-C 87-E 86-D 85-D 83-B
81-A 80-A 80-B 79-E 79-A 77-E 76-D 75-E 75-C
74-B 73-E 72-E 70-E 65-D 63-A 60-D 57-B 50-C
2. Separate the top 27% and the bottom 27% of the papers. The former is called the upper group (U) and the latter, lower group (L). Set aside the middle group.
30 students x 27% = 8
8 papers from the upper group and 8 papers from the lower group should be analyzed.
Upper 27% 98-E

97-E 95-A
Lower 27% 73-E

72-E 70-E
94-A
90-E 89-C
65-D
63-A 60-D
87-E
86-D
57-B
50-C
3. Record the frequency from step 2

Options Upper (27%) Lower (27%) A 2 B 0 C 1 D 1 E* 4
4. Compute the percentage of the upper group that got the item right and call it U.
U= 4/8 (100) U= 50%

Where: 4= Number of right items 8= Number of cases of upper 27%
5. Compute the percentage of the lower group that got the item right and call it L.
L=3/8 (100) L= 37.5%

Where: 4=Number of right items 8=Number of cases of upper 27%
6. Average U and L percentage and the result is the difficulty index of the item. U+L/2=difficulty index
50%+37.5% / 2 = difficulty
index
87.5 / 2 = 43.75 %
7. Use the table of equivalents below in interpreting the difficulty index:

.00-.20 .21-.80 .81-1.00 Very difficult Moderately difficult Very Easy
Retained
8. Estimate the discrimination index. In the foregoing sample, four students in the upper group and three students in the lower group chose the correct answer. This shows positive discrimination, since the upper group got the item right more frequently than the students of the lower group.
Negative discriminating power is obtained when more students in the lower group got the right answers than the upper group.
Index of discrimination = RU-RL/NG
Where: RU= right responses of the upper group RL= right responses of the lower group NG= number of students in each group
To illustrate: Index of discrimination= 4-3/8 = 1/8 =.125
9. Refer to the table of equivalents below in interpreting the discrimination index.

.00-.19 .20-.29 Poor item Moderate (reasonably good item) Very good item Rejected Rejected or revised
.30-up
Retained
10. Determine the effectiveness of the distracters. A good distracter attracts students in the lower group more than in the upper group.
Hence, for our illustrative item analysis data in step 3:

Options
Upper (27%) Lower (27%)
A
2 1
B
0 1
C
1 1
D
1 2
E*
4 3
Options:
B D A C Good: because more students from the lower groups are attracted Poor: since it attracted more students in the upper group Fair: because both the upper and the lower groups have the same frequency E (the correct answer) Good: because more students from the upper group choose the correct answer.
SKEWNESS AND KURTOSIS
Skewness: is the degree of asymmetry, or departure from symmetry of a distribution.
Skewed to the Right (positive skewness): If the frequency curve of a distribution has a longer tail to the right of the central maximum than to the left. Most scores are below the mean.
Illustration:
Mode
Median
Mean
Positive Skewness: Low Performance: Mean is greater than the mode
Skewed to the Left (negative skewness): if the frequency curve of a distribution has a longer tail to the left of the central maximum than to the right. Most scores are above the mean and there are extremely low scores.
Illustration:
Mean Median Mode

Negative Skewness: High Performance: Mean is lower than the mode
Kurtosis: is the degree of peakedness of a distribution, usually taken relative to a normal distribution.
Leptokurtic: A distribution having a relatively high peak
Platykurtic: a distribution having flat-topped.
Mesokurtic: a distribution which is moderately peaked.
MEASURES OF CENTRAL TENDENCY and VARIABILITY
Assumptions used
STATISTICAL TOOL Measures of Measures of Central Variability Tendency Describes the describes the degree of representative spread or value of a set dispersion of a of data set of data
MeanWhen the frequency distribution is regular/ symmetrical/ normal

Computational average -Affected by extreme scores -Most reliable among the measures of central tendency
Standard Deviation- the root

mean square of the deviations from the mean -the most reliable measure of variability
Median- Positional
average When the -middle score -measure of location frequency -50th percentile is -the most stable irregular / measure of central skewed tendency because it is not affected by the magnitude of the scores
Quartile Deviationthe average deviation of Q1 and Q3 -the most stable measure of variability -commonly used as measure of dispersion or variability
When the distribution of scores is normal and quick answer is needed
ModeNominal average -the score with highest frequency
Range- the
difference between the highest and lowest values in a set of observation
Computation of the Measures of Central Tendency
Mean
Ungrouped Data; used for few cases (N<30)
Formula: x = x N Where: x = Mean x = Summation of score N= Number of scores
Examples: Score are 6,8,3,9, and 12
Solution: x= xN = 6+8+3+9+12 5
x = 7.6
Grouped Data : used for large cases (N30) Formula:
x=
Where:
xfN
f= class frequency x= class midpoint N= sum of frequencies
Example: Data given: The following are scores of 50 students in College Algebra. Unarranged data
41 23 37 27 36 18 43 40 32 11 47 26 46 20 29 27 29 25 29 41 29 46 28 44 26 29 34 28 40 31 28 38 23 26 43 34 14 32 13 24 25 28 28 37 21 42 27 14 24 27
Procedure: 1. Arrange the given data into an array. Array: This is the arrangement of data from highest to lowest or from lowest to highest.
Arranged Data
47 46 46 44 43 43 42 41 41 40 40 38 37 37 36 34 34 32 32 31 29 29 29 29 29 28 28 28 28 28 27 27 27 26 26 26 26 25 25 24 24 23 23 21 20 19 18 14 13 11
2. Construct a frequency distribution: a. find the range of the score in the above data. R = H-L Where: R= range; H= highest score; L= lowest score The range is 36. (47-11) =36
b. Find the number of classes:

Formula: Number of classes = Range class width desired+ 1
The number of classes is 13. (36 3 + 1= 13)
Class interval
45-47 42- 44 39-41 36-38 33-35 30-32 27-29 24-26 21-23 18-20 15-17 12-14 9-11
Number of classes
3. Get the frequency and midpoint of the entire class interval. Frequency Distribution of College Algebra Scores for 5o Students
Class Interval 45-47 42- 44 39-41 36-38 f (frequency) 3 4 4 4 mp (midpoint) 46 43 40 37
33-35
30-32 27-29 24-26 21-23 18-20 15-17
2
3 13 7 3 3 0
34
31 28 25 22 19 16
12-14
9-11
2
1
13
10
4. Multiply the midpoint by their corresponding frequencies.

Class Interval 45-47 42- 44 39-41 36-38 33-35 30-32 27-29 24-26 21-23 18-20 15-17 12-14 9-11 f (frequency) 3 4 4 4 2 3 13 7 3 3 0 2 1 x (midpoint) 46 43 40 37 34 31 28 25 22 19 16 13 10 fx 138 172 160 148 68 93 364 175 66 57 0 26 10
N= 50
xf = 1,477
5. Solve using the formula: x= xf N = 1,477 50 = 29. 54
MEDIAN
Ungrouped Data Case 1. The total number of cases is an odd number Procedure: 1. Arrange the scores from highest to lowest or vice versa. Example: (N=11) 99 2. Get the middle score. That is the median. 98
96 95 93 92 91 85 84 83 80
Median
Case 2: The total number of cases is an even number. (N=10) Procedure: 1. Arrange the score from highest to lowest or vice versa. 2. Get the middlemost scores. 3. Compute the average of the two middlemost scores. The average is the median score.
98 96 95 91 90 85 84 83 82 80
Middlemost scores
Median= 90+85 / 2 = 87.5
Case 3: When the middlemost score occurs two or more times. Procedure: 1. Get the middlemost score/s, its/their identical score/s and its/their counterparts either above or below the middlemost score/s. 2. Compute their average and the average score is the median.
Example: a. N is odd (N=7)
86 84 75 75 73 69 67
Median = 75+75+73/ 3 = 223+3 Middle = 74.33 most scores
b. N is Even (N=8)
84 81 75 75 73 71 67 60
Middlemost scores
Median= 75+75+73+71 / 4 =73.5
Grouped Data
1. Add up or accumulate the frequencies starting from the lowest to the highest class limit. Call this cumulative frequency. (CF) 2. Find one half of the number of cases in the distribution. (N/2) 3. Find the cumulative frequency which is equal or closest but higher than the half of the number of cases. The class containing this frequency is the median class. 4. Find the lowest limit (LL) of the class below the median class. 5. Get the cumulative frequency of the class below the median class. 6. Subtract this from the half of the number of cases in the distribution. (N/2-CFb) 7. Get the frequency of the median class. (FMdn) 8. Find the class interval (i) then follow the given formula below.
Formula :
Median = LL + i
N/2
CFb FMdn
Where: LL= lowest limit of the median class i= class interval N/2= half of the number of cases CFb= cumulative frequency below the median class FMdn= frequency of the median class
Example:
Class Limits 45-49 40-44 35-39 30-34 24-29 20-24 15-19 10-14 f 2 0 12 13 10 5 4 4 CF 50 48 48 36 23 13 8 4
LL
29.5
CFb
i=5
N=50
FMdn
Solution: Median= LL +
N/2
CFb FMdn
= 29.5 +
25 23 13
= 29.5 + (2/13) 5 = 29.5 + 0.77 = 30.27
Mode
Ungrouped Data Get the most frequent score. Example 1: one mode or unimodal: 28, 21, 25, 25, 22 and 20 Mode is 25
Example 2: two modes or bimodal: 23, 23, 21, 20, 19, and 19 Modes are 23 and 19
Example 3: three modes or trimodal: 15, 16, 14, 12, 11, 11, 12, and 16 Modes are 11, 12, and 16
When there are more than three modes, they are called polymodal or multimodal.
Grouped Data Crude Mode- refers to the midpoint of the class limit with the highest frequency. Procedure: 1. Find the class limit with the highest frequency. 2. Get the midpoint of that class limit. 3. The midpoint of the class limit with the highest frequency is the crude mode.
Example:
Class limits 45-47 42- 44 39-41 36-38 33-35 30-32 27-29 24-26 21-23 18-20 15-17 f (frequency) 3 4 4 4 2 3 13 7 3 3 0
The highest frequency
Mode class 27 -29= 28 (mode)
Refined Mode- refers to the mode obtained from an ordered arrangements or a class frequency distribution. Procedure: 1. Get the mean and the median of the grouped data. 2. Multiply the median by three. (3Mdn) 3. Multiply the median by two. (2Mn) 4. Subtract 2Mn from 3Mdn to get the Mode. (Md)
Formula: Md= 3Mdn 2Mn Example: Md = 3 (30.27) 2 (29) = 90.81 58 = 32.8
GRADING AND REPORTING
Grading: is the process of assigning value to a performance. Purposes of Grading 1. Certify learners mastery of a specific content or level of achievement 2. Identify, selecting and grouping learners for a particular academic programs 3. Providing information in diagnosis and planning 4. Helping learners improve their school performance
Approaches to Grading Letter Grades: ( A, B, C, D) Generally used in marking learners performance on products other than objective tests. Correspond to verbal descriptions such as excellent or outstanding, good, average, or acceptable. Provide an overall indication of performance.
Percentage: (70%, 75%, 80%) Indicate the percentage of items answered correctly. Gives a finer discrimination in learners performance than letter grades. Communicates only a general indication of learners performances.
Pass/Fail Shows mastery or no mastery of learning objectives. Does not clearly reflect the learners actual level of performance.
Two Methods of Interpreting Scores
Absolute or Criterion-Referenced Grading Grading is based on fixed or absolute standards where grade is assigned based on how a student has met the criteria or the welldefined objectives of a course that were spelled out in advance.
Advantages: Matches learner performance with clearly defined objectives Discourages competition
Disadvantages: Difficulty in establishing clearly defined learning outcomes and setting standards that indicate mastery Subject to error leniency Scores depend on the difficulty of the test
Relative or Norm-Referenced Grading Also known as grading on the curve Based on comparing learners performance to each other
Advantages: May result in higher level or more complicated assessments that can be challenging to learners. May ensure the distribution of grades on the basis of scores in relation to one another, regardless of the difficulty of the test.
Disadvantages: Encourages competition among learners May affect learners social relations
GUIDELINES IN GRADING STUDENTS 1. Explain your grading system to the students early in the course and remind them of the grading policies regularly. 2. Base grades on a predetermined and reasonable set of standards.
3. Base your grades on as much objective evidence as possible. 4. Base grades on the students attitude as well as achievement, especially at the elementary and high school level.
5. Base grades on the students relative standing compared to classmates. 6. Base grades on a variety of sources.
9. When failing a student, closely follow school procedures.
7. As a rule, do not change grades. 8. Become familiar with the grading policy of your school and with your colleagues standards.
The end
65. Study this group of test which was administered with the following results, and then answer the question that follows. Subject Mean SD Rommy z score s Score -1.3 Math 56 10 43 -1.1 Physics 41 9 31 English 80 16 109 1.8
Compute the Z score formula: z=score mean/SD
100. Study this group of tests which was administered with the following results, and then answer the questions. Compute the z score: z = score mean / SD
Subject
Math Physics English
Mean
56 41 80
SD
10 9 16
Johns Score 43 31 109
Z score - 1.3 -1.1 1.8
115. Study this group of tests which was administered with the following results, and then answer the question.
Subject
Math Physics English
Mean
40 38 75
SD
3 4 5
Jamils Score 58 45 90
In which subject(s) were the sores most homogenous?

The LOWER the STANDARD DEVIATION (SD), the more HOMOGENOUS the SCORES will be.

Measurement and Evaluation PPP

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Measurement and Evaluation PPP

Hochgeladen von

Copyright:

Verfügbare Formate

MEASUREMENT AND EVALUATION

Ex: 5 to 10 minute test or a 10-item test.

1. Significance: Evaluation is an essential component of the teachinglearning process.

Placement, formative, diagnostic and summative evaluation should be conducted.

3. Scope: Evaluation should be comprehensive and as varied as the scope of objectives.

4. Congruency: Evaluation must be compatible with the stated objectives.

7. Reliability: Evaluation instruments should be consistent in measuring what it does measure.

Pre-test Aptitude Test Readiness Test

During Instruction Diagnostic Test (usually not graded)

Teachermade tests Homework Classroom performance Observation

-Preparation of the instrument is time consuming -Prone to cheating

-Scoring tends to be subjective without rubrics Administration is time consuming

Working portfolios Show portfolios Documentary portfolios

-Measures students growth and development -Intelligence fair

-Development is time consuming - Rating tends to be subjective without rubrics

Grading: identifying learners who pass or fail in a given subject.

Curriculum: assessing the strengths and weaknesses of a curriculum program.

Main points of Comparison

- Administered after the instructional process.

Survey - Covers a broad range of objectives

-Results are interpreted by comparing a student based on predefined standard.

-Constructed by a classroom teacher -Covers a narrow range of content

Speed -Consists of items approximately equal in difficulty

-Measures students speed or rate and accuracy in responding

Classification of Teacher-made Tests

Recall Types (Supply Test)

Recognition Types (Selective Test) -Alternative response

Objective Tests are classified as follows:

Give the name of the island indicated.

1. _______ 2. _______ 3. _______ 4. _______ 5. _______

ADVANTAGES OF OBJECTIVE EXAMINATIONS

DISADVANTAGES OF OBJECTIVE EXAMINATIONS

Classification of Essay Examinations or Questions

ADVANTAGES OF ESSAY EXAMINATION

3. The essay test is easy to prepare. It can be prepared in a short time.

4. It is harder to cheat in an essay test.

DISADVANTAGES OF ESSAY EXAMINATION

THE CLASSROOM TESTING PROCESS

GENERAL SUGGESTIONS IN CONSTUCTING WRITTEN EXAMINATION

2. Match the test item with the instructional objectives.

GUIDELINES IN WRITING SPECIFIC TEST ITEMS

Completion Test (Fill-in the Blanks)

4. Answer should be written before the number for easy checking.

Multiple Choice Test

CONSTRUCTING /IMPROVING THE MAIN STEM

CONSTRUCTING /IMPROVING ALTERNATIVES

1. Alternatives should be as closely related to each other as possible. Example:

Poor alternatives: 74 + 23 = ______. a. 87 b. 97 c. 100

Improved Alternatives: a. 10 b. 12 c. 30 d. 365

Use only homogenous material in a single matching exercise.

Family Genus Phylum Order Class Specie Subphylum

1. __________ 2. __________ 3. __________ 4. __________ 5. __________ 6. __________ 7. __________

Essay Type Tests:

1. State questions that elicit the

PerformanceBased Evaluation Measures

Affective Evaluation Measures

Rubrics for Class Debate

The team did not show an adequate understanding of the topic.

Arguments were not clearly tied to an idea (premise).

Information had several inaccuracie s OR was usually not clear.

Statements, responses and/or body language were consistently not respectful.

GUIDELINES IN DEVELOPING RUBRICS

3. Describe the gradations of quality product or excellent performance.

1. ___ 2. _ 3. _ 4. _ 5. _____

1. 2. 3. 4. 5. 6. 7. __