Beruflich Dokumente
Kultur Dokumente
Performance tests are assessments of what learning has already occurred in a particular subject area.
Aptitude tests are assessments of abilities or skills considered important to future success in school.
Application in classrooms and similar settings
In theory, test scores looked at over time will reveal how much progress schools have made in their efforts to maintain
or raise academic standards. They used to asses program success or failure in connections to the student’s learning.
Establishing test validity
According to Calmorin and Calmorin the degree of validity is most important attribute of a test. Validity refers to the
degree to which a test is capable of achieving certain aims. Item analysis is done after the first try out of a test.
After the item analysis, the tester uses the following table of equivalents in interpreting the difficulty index:
.00-20 - very difficult
.21-.80 - moderately difficult
.81-1.00 - very easy
Item Revision. On the basis of item analysis data, test items are revised for improvement. After revising the
test items that need revision, the tester needs another try out. The revised test must be administered to the same set
of samples.
How to establish reliability?
Test reliability is an element in test instruction and test standardization and is the degree to which a measure
consistently returns the same result when repeated under similar conditions. Reliability does not imply validity. That is,
a reliable measure is measuring.
Reliability may be estimated through a variety of methods that fall into two types: single-administration and
multiple-administration. Multiple-administration methods require that two assessments are administered.
1. Test-retest reliability is estimated as the person product-moment correlation coefficient
between two administrations of the same measure. This is sometimes known as the
coefficient of stability.
2. Alternative forms reliability is estimated by the person product- moment correlation
coefficient of two different forms of measure, usually administered together. This is
sometimes known as the coefficient of equivalence.
Single-administration methods include split-half and internal consistency.
1. Split-half reliability treats the two halves of a measure as alternate forms. This “halves reliability” estimate is then
stepped up to the full test length using the Spearman Brown Prediction Formula. This is sometimes referred to as the
coefficient of internal consistency.
2. The most common internal consistency measure is Cronbach’s alpha, which is usually interpreted as the mean of all
possible split-half coefficients.
These measures of reliability differ in their sensitivity to different sources of error and so need not be equal. Also,
reliability is a property of the scores of a measure rather than the measure itself and are thus said to be sample
dependent.
Reliability Estimation Using a Split-half Methodology
The split-half design in effect creates two comparable test administrations. The items in a test are split into
two tests that are equivalent in content and difficulty. Often this is done by splitting among odd and even numbered
items. This assumes that the assessment is homogenous in content.
Once the test is split, reliability is estimated as the correlation of two separate tests with an adjustment for
the test length. Other things being equal, the longer the test, the more reliable it will be when reliability concerns
internal consistency. This is because the sample of behavior is larger.