- students are able to perform task

2. Representativity
- represents a real situation

3. Authenticity
- the situation and the interaction are meaningful and
representative in the world of the individual user
4. Balance
- each relevant topic/ ability receives an equal amount of

5. Validity
- the test effectively measures what is intended to measure
Sub Classifications of validity:
A.Concurrent validity
- If the scores it gives correlate highly with a recognized external
criterion which measures the same area of knowledge or ability

B. Construct validity
- if scores can be shown to reflect a theory about the nature of a
construct or its relation to other constructs

C. Content Validity
- if the items or tasks of which it is made up constitute a
representative sample of items or tasks for the area of knowledge
or ability to be tested. (often related to syllabus or course).
D. Convergent Validity
- there is a high correlation between scores achieved in it and
those achieved in a different test measuring the same
construct E. Criterion- related

E. Validity
- if a relationship can be demonstrated between test scores
and some external criterion which is believed to be a
measure of the same ability
- Information on it is also used in determining how well a test
predicts future behavior
F. Discriminant validity
- if the correlation it has with tests of a different trait is lower
than correlation with tests of the same trait, irrespective of
testing method

G. Face Validity
- the extent to which a test appears to candidates, to be an
acceptable measure of the ability they wish to measure

H. Predictive Validity
- indication of how well a test predicts future performance in a
relevant skill
1. Appropriateness of test items
2. Directions
3. Reading vocabulary and sentence structure
4. Difficulty of items
5. Construction of test items
6. Length of the test
7. Arrangement of items
8. Patterns of answers
6. Relialibilty
- consistency and stability with which a test measures

1. Specificity
- questions should not be open to different information

2. Differentiation
- the test discriminates between good and poor students

3. Difficulty
- the test has an adequate level of difficulty
4. Length
- the test contains enough items. In multiple choice at least
40 test items are required

5. Time
- students should have sufficient time to perform a test/ task

6. Item construction
- a well- constructed questions is better than a poor one
Possible reasons for the inconsistency
of an individual’s score in a test

- scorer’s inconsistency
- Limited sampling of behaviour
- Changes in the individual himself
Factors affecting reliability/ factors which
influence the reliability of a test

- Objectivity
- Difficulty of the test
- Length of the test
- Adequacy
- Testing condition
- Test administration procedures
- Q. A is meant to give some idea about the reliability of the test
- statistical data can make problematic items more visible

2. Getting started
- during the development phase of the test development
process, a sample group of at least 20 representative end- users
is gathered to whom the test is administered and solved using a
statistical program
- Usually the help of statistician is necessary
- When this has been done, the descriptive statistics, the
correlation, and the item reliability analyses can be checked
1.Descriptive statistics
- intended to offer a general idea about the test scores


A. N indicates the number of tests reviewed

Minimum singles or the lowest score from the population
Maximum singles or the highest score from the population

b. Mean refers to the average score

c. Std. Deviation (SD) is the mean deviation of the values from

their arithmetic mean. A small SD implies that in general the
scores do not deviate much from the mean
2. Correlations
- the relationship between two variables

- Illustrated by scatter plots which are similar to line graphs in that

they use horizontal and vertical axes to plot data points; show how
much one variable is affected by another

- Indicates the strength and direction of a linear relationship

between two random variables

- Always situated on the -1 to 1 spectrum

- The closer a correlation is to either end of the spectrum, the

stronger the relationship

- A relationship is statically significant, if Sig. ≤0.5

The closer the data points come when plotted to making
a straight line, the higher the correlation between the two
variables, or the stronger the relationship

A perfect positive correlation is given the value of 1. A

perfect negative correlation is given the value of -1. if
there is absolutely no correlation present the value given
is 0. the closer the number is to 1 or -1, the stronger the
correlation, or the stronger the relationship between the
variables. The closer the number is to 0, the weaker the
positive correlations- If the data points make a straight
line going out to high x- and y- values

Negative correlation- if the line goes from a high- value

on the y- axis down to a high value on the x- axis

In language tests, correlations can merely serve as an

indicator of reliability, but very low correlations mostly
mean that something is wrong
3. Item reliability

- indicates the discriminatory potential of a test item

- As in standard correlations, a very reliable item(with a highly

discriminatory capacity) would score close to -1 or 1. Items are
considered unreliable if they score in between .3 and -.3.