Sie sind auf Seite 1von 23

CHARACTERISTICS OF A GOOD TEST

a. b.

Valid -----refers to the extent to which measure what is purpose to measure. State that if the test item is congruent to the behavior to be tested it is then valid.

Types of Evidence
CONTENT VALIDITYCRITERION-RELATED VALIDITYCONSTRUCT-RELATED VALIDITY -- refers to how well a performance on a particular set of task can be explained by some PSYCHOLOGICAL CONSTRUCT or TRAITS.

refers to the ADEQUACY and REPRESENTATIVENESS of learning outcomes to be measured

1. PREDICTIVE VALIDITY --- involves the use of criterion and a predictor. Example correlating the results of college entrance test and student GWA at some future time (predictor= CET; criterion= GWA)

THEORETICAL CONSTRUCT -- describe by determining the components of such psychological task

can be assure with the use of T.O.S

2. CONCURRENT VALIDITY -- criterion are already available in which CET is correlated with some available criterion (predictor= GWA; criterion= 4th year high school grade)

CRITICAL CONSTRUCT - predictors, conclusions, assumptions, inference, interpretations and relevance of evidence

CHARACTERISTICS OF A GOOD TEST


RELIABILITY --refers to the CONSISTENCY of the test score. --ERRORS of measurements are factors or conditions that can contribute to the lowering of the test reliability. If the test has low reliability we can be assure that errors of measurement have affected the test scores to the point that the test is UNRELIABLE
SOME ERRORS OF MEASUREMENTS
What happening within the individual?(fatigue, hunger, headache, motional upset, anxiety, growth and learning acquired before the test) tends to reduce the consistency of the SCORE OVER TIME Test contain (poorly constructed items, items with clues, very easy, very difficult, very high vocabulary reading level )tends to guessing particularly when it is long Lightning of room, room temp (too hot or too cold) noise, seating arrangement, instruction, time allotment, attitude of test examinee(MAKE THE TEST UNRELIABLE LOWER THE TEST SCORE) MISKEY/ providing wrong answer, mistake in correcting a wrong answer, mistake in the use of required pencil and subjective scoring

Test Takers

Test Itself (INTRA TEST ERROR)

Test Administrations

Test Scoring

TO DETERMINE THE CONSTRUCT VALIDITY OF CRITICAL THINKING


Each subtest is correlated with the whole test. 2. The correlation of each subtest which measures a particular components contribute to the measurement of a psychological trait which is critical thinking. Define by: X Y (proportion (subtest) (correlation with the total score) of common variance)
1.

DEGREES OF RLATIONSHIP BETWEEN TWO SETS OF SCORE +1.00----PERFECT POSITVE RELATIONSHIP (the better)more from the upper group got the test correctly. 0.00---- NO RELATIONSHIP -1.00----PERFECR NEGATIVE RELATIONSHIP more from the lower group got the test correctly.

DISCRIMINANT VALIDITY---DIFFERENT TRAITS CONSTRUCT --- SCORE OF CRITICAL THINKING TEST ARE CORRELATED WITH THOSE OF ATTITUDES TOWARDS MOVIES

METHODS OF ESTIMATING TEST RELIABILITY


TEST-RETEST METHOD
--determines how scores are consistent over a given period of time. The same test is administered twice to the same group with an interval between 2 to 15 days(sufficient time interval)(2-3 days student can recall answer)(longer time interval lower the reliability)/true score= true score+error of measures/

PARALLEL/ALTERNATE FORMS METHOD --used two different versions of the same test, administered to the same group close together in time. It used form A or B and can be given on the same day or the next day. The difference of the two is how they worded or written, it should measure the same skills and errors are significantly controlled TEST-RETEST WITH ALTERNATE FORMS METHOD --administering the two version of the same test on two different occasions. Time interval may be short(2 weeks)(longer for 6 months). Takes into account all possible sources of errors. It is the most useful indicates variation of a test score over a period of time. INTERNAL CONSISTENCY METHOD -- employ only one test administration of the same test given to the same group on individual. DIFERENT METHODS 1. SPILT-HALF /ODD-EVEN METHODscoring odd items, scoring even items 2. KUDER RICHARDSON FORMULA 20two sets of score (odd and even) are correlated using PRODUCT MOMENT CORRELATION COEFFICIENT FORMULA 3. TO TEST THE RELIABILITY OF THE WHOLE TEST (USE SPEARMAN-BROWN PROPHECY FORMULA ) 4. PEARSON r USED TO COMPUTE INTERMNAL CONSISTENCY OF A CERTAIN TEST USED IN SPLIT-HALF METHOD

Reliability coefficient is high then it is said to be homogenous. Consistency of the test scores determined over different parts of the entire test.. RELIABILITY ESTIMATE MEASURE TEST-RETEST ALTERNATE FORMS
ITSELF,

WHAT TO

: ;

TEST ADMIN, TEST TAKERS

TEST ADMIN, TEST

TEST-RETEST WITH ALTERNATE FORMS


ITSELF, TEST TAKERS

: :

TEST ADMIN, TEST

INTERNAL CONSISITENCY TEST ITSELF

TEST ADMIN,

NOTE: a reliability coefficient of +.86 of a test measure that 86/100 of the obtained score of an individual is true score and 14/100 can be attributed to errors of measurements.

IMPROVING THE TEST ITEMS


Item Analysis

Who answer the item correctly

Is the extent to which a test item differentiate good performer to poor performer
Index of discrimination

Index of difficulty

METHOD TO EMPLOY IN ITEM ANALYSIS -USING THE UPPER AND LOWER INDEX METHOD 27/100
1. 2. 3.

4. 5.

6.

7.
8.

After scoring the test, arrange from lowest to highest. Segregate the top and bottom 27/100 of the paper. Tally the correct answers to each item by each student in the upper 27/100 group. Repeat step three, considering the lower 27/100. Get the percentage of the upper group that obtained the correct answer use U. repeat step 5, considering lower group. Used L. Get the average percent of U and L. Get the difference between U and L.

L/U = NO. OF PUPILS GOT ITEM CORRECT NL/NU = NO. OF PUPIL IN THE LOWER GROUP OVER UPPER GROUP

TABLE INTERPRETING DIFFICULTY INDEX


Range 0.00 0.20 0.21 0.40 0.41 0.60 Description Very difficult Difficult Moderate difficult

0.61 0.80 0.81 1.00

Easy Very easy

The higher the difficulty index the easiest the item is.

TABLE INTERPRETING INDEX OF DISCRIMINATION


A good test item separate the bright performer from the poor RANGE -1.00 - -0.61 DESCRIPTION Questionable item

-0.59 - -0.20
-0.19 0.20

Not discriminating
Moderate discriminating

0.21 0.60
0.61 1.00

Discriminating
Very discriminating

The higher the index of discrimination the higher the discrimination

Formula: Ds = {((U/NU)-(L/NL)}

WHEN WOULD YOU SAY GOOD OR RETAINED YOUR ITEM


-must have ACCEPTABLE INDEX OF DIFFICULTY AND DISCRIMINATION ACCEPTABLE INDEX OF DIFFICULTY RANGES FROM 0.41 - 0.60 -ACCETABLE INDEX OF DISCRIMINATION RANGES FROM +0.20 - +1.00

FAIR OR REVISED
-UNACCEPTED DIFFICULTY OR DISCRIMINATION INDEX

POOR OR DISCARDED
-BOTH DIFFICULTY AND DISCRIMINATION INDEX ARE UNACCEPTABLE. THEN THE ITEM NEED TO BE DISCARDED RIGHT AWAY

TABLE OF ACTION TO BE TAKEN


DIFFICULTY LEVEL DISCRIMINATING LEVEL
QUESTIONABLE ITEM VERY DISCRIMINATING NOT DISCRIMINATING

ACTION

VERY DIFFICULT

DISCARD DISCARD REVISE RETAIN REVISE MAY NEED REVISION ACCEPT DISCARD

DIFFICULT

MODERATELY DISCRIMINATING DISCRIMINATING NOT DISCRIMINATING

MODERATE DIFFICULT

MODERATELY DISCRIMINATING DISCRIMINATING

NOT DISCRIMINATING

EASY

MODERATELY DISCRIMINATING
DISCRIMINATING QUESTIONABLE

N.R.
N.R SEE EXAMPLE DISCARD

VERY EASY

TRADITIONAL ASSESSMENT
Discrete Point(Single Attribute Assessment) -- example Language assessment in the form of Multiple choices, matching type, true or false, or short answer Charles Spearman(1904)-Two Factor Theory --general Factor Or G-factor and postulates specific or S-factor. Example of tests with g-factor are Raven Progressive Matrices and Catres Culture Fair Intelligence test Integrative or Global Assessment(Multiple Trait Assessment) --measure more than one point or objective at a time, and often pragmatic. Example is writing composition Cloze Test --innovative method for testing wherein words are deleted from a passage. The most common practice is to delete every 5th word. The acceptable range for readability of certain reading materials is between 30-50 percent. C-Test -- second half of every word is deleted., leaving the first and last word intact, and commonly contains 100 words Dictation Test -- primarily a test for listening, and spelling. It is a test use to measure the ability to use capital letters, punctuation marks, spell words correctly and write legibly and neatly. ADMINISTERING DICTATION TEST Read each word once or twice as student listen, ask student to write the word. Read the word again for confirmation. Read each sentence slowly once or write then at normal speed once before students are asked to write. And do not read the word while students are writing Oral Interview--kind of integrative assessment. It is a collecting information through face-to-face between the interviewee and interviewer. The interviewee is not at liberty to modify or make a follow up question. The question should be prepared before hand and objective should be taking in consideration

MEASURE OF CENTRAL TENDENCY


Raw scores- scores obtained Tabulating raw scores steps in constructing a grouped frequency distribution are as follows 1. Determine the range of scores, ranges is equal to the highest score minus the lowest score. 2. Determined the appropriate number of class interval ideal 10-15. be sure that the lowest limit is divisible interval . Class interval is defined by k= 1+3.3logn, where n is the number of sample and n = (N/(1+Ne^2)) 3. Or i=range over k, the number of class size. 4. Determine the lowest limit (LL) of the interval, LS/I width = Q*I = LL. 5. Construct the frequency column (f) by tallying the no. of scores opposite each interval. Raking -Another way to organize test scores. It is the process of arranging a group of scores from highest to lowest. The highest scores is designated as first ranked, and so on. -Steps in ranking the scores
-

1. arrange the scores from highest to lowest, particular scores may be written as many times as it may occurs. 2. put a serial number opposite to each. 1,2,3,4,,.. 3. average the rank of each scores appearing more than one. Example 45,45,45appear three times and rank as 7, 8, and 9, then add = 24/3 = then they will be rank 8.

GRAPHING OF DATA
6 4 2 0 Series 1 Series 2 Series 3 6 4
1. 2. 3.

Histogram Polygon bar

Series 1
Series 2

2 0

MEASURES OF CENTRAL TENDENCY MEAN, MEDIAN, MODE


The MEAN denoted by -Simply the average of the group and most widely accepted measures of central tendency
For Grouped data

For ungrouped data


--

Where -- mean -- summation of x N total number of scores in distribution

- using deviation am- assume mean d deviation --- summation of frequency times devation.

The MEDIAN is defined by -- the middle most score in the distribution. It divides the distribution in half or 50 % of the scores is found above the median, and the other 50 % lies below the median . For ungrouped data 1. Arrange the scores from highest to lowest or vise versa. 2. If odd numbers, median is the middle most number in the distribution.

For grouped data

3. If even average the middle.

ll- lowest limit of N/2 N- no. of cses Cf- cummulative frequency f- frequency where the measure lies i- nterval

The MODE is defined by -- The most frequent, extremes, and repeated numbers. It is not affected if one number is changed less then or greater than

For ungrouped data 1. The mode for ungrouped data is the number that occur most.

For grouped data

Mode = 3median (2mean)

The measures of central tendency in different distribution

1. NORMAL DISTRIBUTION 2. POSITIVELY SKEWED DISTRIBUTION 3. NEGATIVELY SKEWED DISTRIBUTION

. Normal distribution

Positively skewed distribution


1. THERE ARE MORE LOW SCORES THAN HIGHER SCORE. 2. IT SHOWS THAT TEST IS SO DIFFICULT FORMED AN ASYMMETRICAL DISTRIBUTION MEAN>MEDIAN>MODE

>

>

The graph shows that the number of student who got good grades are relatively lower than those who got lower grades..

Negatively

skewed distribution
1. THERE ARE MORE HIGH SCORES THAN LOWER SCORE. 2. IT SHOWS THAT TEST IS VERY EASY, THUS EVEN THE LOW PERFORMER STUDENT S GOT GOOD GRADE FORMED AN ASYMMETRICAL DISTRIBUTION MODE>MEDIAN>MEAN 2. INVERSE OF POSITIVELY DISTRIBUTION

>

>

The graph shows that the number of student who got high grades are relatively more than those who got lower grades..

Forms of Assessment
1. TRADITIONAL ASSESSMENT
- EXAMPLE MULTIPLE CHOICE, MATCHING TYPE, TRUE OR FALSE COMPLETION TEST 2. PERFORMANCE ASSESSMENT -ENGAGE IN COMPLEX TASK, CREATION OF PRODUCT EX. DANCE STEP, DEMONSTRATION 3. PORTFOLIO ASSESSMENT -ON GOING EVALUATION, INVOLVES GATHERING OR COLLECTING MANY DIFFERENT STUDENTS PROGRESS INDICATORS 4. AUTHENTIC ASSESSMENT -REAL LIFE CRITERIA USE OF JUDGMENTS

THANK YOU!

Das könnte Ihnen auch gefallen