Beruflich Dokumente
Kultur Dokumente
2 Factors affecting reliability 1. test instruments: length of test reduces error, objective test higher reliability lower for subjective tests, methods of getting reliability 2. students being tested: spread of scores smaller (smaller SD) 3. administration of test
Split-half Reliability Correlate the scores of the first half to the scores of the second half items. Cronbachs alpha the average of all possible split-haalf reliaabilities for the Questionnaire .
3 Inter-Rater or Inter-Observer Reliability 1. To determine whether 2 observers are consistent in their observations. We do this in the pilot study because if the reliability is done in the main study and is low we will be in trouble. 2. Two ways of doing as it depends on (i) categorical measurement e.g. H, M, L or (ii) continuous measurement e.g. marks. (Best when your measure is an observation. Or you use 1 observer repeated on 2 different occasions. You can video taped and ask 2 raters to code them independently to score) Test-Retest Reliability The same test is administered on two different occasions. The time allowed between the 2 measures is important. The shorter the time gap the higher the correlation. Parallel- Forms Reliability Two tests (e.g. Form A and Form B) are constructed quite similar in all aspects then we correlate the 2 tests. (For control-experimental groups using pre- and posttest) Internal Consistency Reliability 1 test is given to a group of students on 1 occasion to estimate the reliability of the test. We look at how consistent the results are for different items that reflect the same construct. Example of these measures are: Average Inter-Item Correlation We compute the inter-item correlation as in the figure. We get the average inter-item correlation = 0.90 from the range of 0.84 to 0.95.
4 Average Item-total Correlation Here we get the average of thee total for each item ranging from 0.82 to 0.88 = 0.85
Split-Half Reliaability The test is given to a sample of students with total scores for each student, we divide the scores randomly into 2 halves and get the splithalf reliability.
Cronbachs Alpha ( ) We divide items 1 and 2 and get the split-half reliability, then items 1 and 3, items 1,4 then 1,5 then 1,6 then123 and 456, 12-34, 1256 and all possible split-half reliabilities and take the average.
Fig 3
Fig 4
Fig 1- Your test instrument is consistent in measuring e.g. geometry but measuring the wrong thing as you want to estimate students knowledge in algebra not geometry. Fig 2- You are testing students knowledge in algebra but you do not get consistent scores from this instrument on another occasion e.g. test and retest reliability. Fig 3- You are testing geometry and are not consistent for different occasion. Fig 4- You are testing algebra and you get consistent results on different occasion. Example 2
Comparing same e.g. Algebra test scores as test-retest correlation to measure the reliability
Comparing Algebra test score with Geometry test score of 2 different concepts using same method of paper-pencil test we discriminate between 2 concepts called Discriminant Validity
Comparing a test score with teacher rating on the same concept e.g Algebra is called Convergent Validity
Comparing Algebra test score with teacher rating on Geometry (2 different concepts) with 2 different methods of measurement i.e. test and observation rating. Called Very Discriminant (Lower than the above score)
Construct Validity Construct validity defines how well a test or experiment measures up to its claims. It refers to whether the operational definition of a variable actually reflects the true theoretical meaning of a concept. What is the best way of measuring construct validity? We could measure by using content analysis, correlation coefficients, factor analysis; ANOVA studies demonstrating differences between differential groups or pretest-posttest intervention studies, multi-trait/multi-method studies, etc. What are the most common threats to construct validity? Environment of the test administration, administration procedures, examinees, scoring procedures, and test construction (or quality of test items).