Sie sind auf Seite 1von 7

1

Reliability and Validity (DR SEE KIN HAI)


Data gathered must be checked for and answer the questions below: 1. Was the instrument/technique used reliable and valid? 2. Were the conditions under which the data were collected affect or reflect the samples ability? Reliability 1. Are your results/measurements stable, accurate, dependable and relatively predictable? 2. What are the errors in your measurements? (a). random error: daily fluctuation in individual performance (b). systematic or constant error: variables (e.g. practice effect, fatigue, mood, motivation, fluctuations of memory) that push scores up/down. Observed score = true score error score Total variance = true variance + error variance 2 2 2 ( obs = true + error ) Reliability of a measurement: How to determine reliability To compute Reliability coefficient i.e. how sets of two scores are correlated and how much of the variance in the scores contributed to the true differences among individuals. 4 methods: 1. test-retest method: coefficient of stability (duration: min 1 day and max I year; 2-3 month period recommended) - to test same level of cognitive, intellectual, motivational variables. Error: changes due to the person. > 0.5 prefer 2. alternate (parallel) forms method : same length, content, difficulty level and variance. Advantages: avoid memory and practice effect, more accurate with larger number of items. Disadvantages: time and effort in construction of test items, more difficult to construct 2 tests, equivalent of 2 tests, fatigue, boredom of subject taking the tests. 3. split-half method : splitting a test into halves and correlate their scores. Problems: different items with different difficulty levels, some pupils cannot complete the two halves, boredom on the second half. Divide into odd and even items with equivalent nrtt variance, use Spearman-Brown formula for reliability. rtt = for a test n times 1 + (n 1)rtt as long as the original test. 4. internal-consistency method: Kuder Richardson and Cronbach alpha reliability = rtt = 1
2 error 2 obs

2 Factors affecting reliability 1. test instruments: length of test reduces error, objective test higher reliability lower for subjective tests, methods of getting reliability 2. students being tested: spread of scores smaller (smaller SD) 3. administration of test

Split-half Reliability Correlate the scores of the first half to the scores of the second half items. Cronbachs alpha the average of all possible split-haalf reliaabilities for the Questionnaire .

3 Inter-Rater or Inter-Observer Reliability 1. To determine whether 2 observers are consistent in their observations. We do this in the pilot study because if the reliability is done in the main study and is low we will be in trouble. 2. Two ways of doing as it depends on (i) categorical measurement e.g. H, M, L or (ii) continuous measurement e.g. marks. (Best when your measure is an observation. Or you use 1 observer repeated on 2 different occasions. You can video taped and ask 2 raters to code them independently to score) Test-Retest Reliability The same test is administered on two different occasions. The time allowed between the 2 measures is important. The shorter the time gap the higher the correlation. Parallel- Forms Reliability Two tests (e.g. Form A and Form B) are constructed quite similar in all aspects then we correlate the 2 tests. (For control-experimental groups using pre- and posttest) Internal Consistency Reliability 1 test is given to a group of students on 1 occasion to estimate the reliability of the test. We look at how consistent the results are for different items that reflect the same construct. Example of these measures are: Average Inter-Item Correlation We compute the inter-item correlation as in the figure. We get the average inter-item correlation = 0.90 from the range of 0.84 to 0.95.

4 Average Item-total Correlation Here we get the average of thee total for each item ranging from 0.82 to 0.88 = 0.85

Split-Half Reliaability The test is given to a sample of students with total scores for each student, we divide the scores randomly into 2 halves and get the splithalf reliability.

Cronbachs Alpha ( ) We divide items 1 and 2 and get the split-half reliability, then items 1 and 3, items 1,4 then 1,5 then 1,6 then123 and 456, 12-34, 1256 and all possible split-half reliabilities and take the average.

5 Reliability and Validity Example 1 Fig 2 Fig 1

Fig 3

Fig 4

Fig 1- Your test instrument is consistent in measuring e.g. geometry but measuring the wrong thing as you want to estimate students knowledge in algebra not geometry. Fig 2- You are testing students knowledge in algebra but you do not get consistent scores from this instrument on another occasion e.g. test and retest reliability. Fig 3- You are testing geometry and are not consistent for different occasion. Fig 4- You are testing algebra and you get consistent results on different occasion. Example 2

Comparing same e.g. Algebra test scores as test-retest correlation to measure the reliability

Comparing Algebra test score with Geometry test score of 2 different concepts using same method of paper-pencil test we discriminate between 2 concepts called Discriminant Validity

Comparing a test score with teacher rating on the same concept e.g Algebra is called Convergent Validity

Comparing Algebra test score with teacher rating on Geometry (2 different concepts) with 2 different methods of measurement i.e. test and observation rating. Called Very Discriminant (Lower than the above score)

Construct Validity Construct validity defines how well a test or experiment measures up to its claims. It refers to whether the operational definition of a variable actually reflects the true theoretical meaning of a concept. What is the best way of measuring construct validity? We could measure by using content analysis, correlation coefficients, factor analysis; ANOVA studies demonstrating differences between differential groups or pretest-posttest intervention studies, multi-trait/multi-method studies, etc. What are the most common threats to construct validity? Environment of the test administration, administration procedures, examinees, scoring procedures, and test construction (or quality of test items).

Das könnte Ihnen auch gefallen