Beruflich Dokumente
Kultur Dokumente
Psychological tests are formalized measures of mental functioning. Most are objective
and quantifiable; however, certain projective tests may involve some level of subjective
interpretation. Also known as inventories, measurements, questionnaires, and scales,
psychological tests are administered in a variety of settings, including preschools,
primary and secondary schools, colleges and universities, hospitals, outpatient healthcare
settings, social agencies, prisons, and employment or human resource offices. They come
in a variety of formats, including written, verbal, and computer administered.
A primitive form of proficiency testing existed in China as early as 2200 B.C., where
some form of examination of public officials by the Chinese emperor was conducted
every third year. Civil service examinations began in China during the Chan dynasty in
1115 B.C. and ended in 1905 when a reform measure abolished the system. For 3000
years, the open and competitive exam in China provided for evaluation of proficiency in
areas such as archery, music, writing etc.
In 1859, Charles Darwin published a book on evolution of species. In this book he argued
that chance variation in species would be selected or rejected by nature according to
adaptivity and survival value. His writing, on individual differences kindled interest in
research on heredity in his half cousin, Francis Galton, who went on to become an
extremely influential contributor to the field of measurement. Galton aspired to classify
people according to their ‘natural gifts’ and to ascertain their deviation from the average.
He is also credited with the development of many contemporary tools of psychological
assessment including questionnaires, rating scales and self- report inventories. Although
Karl Pearson developed the product moment correlation technique, the roots can be
traced directly to the work of Galton.
Assessment was also an important activity at the first experimental psychology laboratory
in Leipzig, Germany, the founder of which was Wilhelm Wundt, a medical doctor. In
contrast to Galton, Wundt focused on questions relating to how people were similar, not
different. Individual differences were viewed as a frustrating source of error in
experimentation by Wundt.
The early 1900s witnessed the birth of the first formal tests of intelligence. Much of the
nineteenth century testing that could be described as psychological in nature involved the
measurement of sensory abilities, reaction time etc. The primary impetus for the
development of the major tests used today was the need for practical guidelines for
solving social problems. The person who had the vision of broadening testing to include
the measurement of cognitive abilities was Alfred Binet.
The first useful intelligence test was prepared in 1905 by the French psychologists Alfred
Binet and Théodore Simon. The two developed a 30-item scale to ensure that no child
could be denied instruction in the Paris school system without formal examination. In
1916 the American psychologist Lewis Terman produced the first Stanford Revision of
the Binet-Simon scale to provide comparison standards for Americans from age three to
adulthood. The test was further revised in 1937 and 1960, and today the Stanford-Binet
remains one of the most widely used intelligence tests. The Wechsler tests now extend
from the preschool through the adult age range and are at least as prominent as the
Stanford-Binet.
The need to classify soldiers during World War I resulted in the development of two
group intelligence tests—Army Alpha and Army Beta. To help detect soldiers who might
break down in combat, the American psychologist Robert Woodworth designed the
Personal Data Sheet, a forerunner of the modern personality inventory. The test was
introduced to develop a measure of adjustment and emotional stability that could be
administered quickly and efficiently to groups of recruits.
During World War II the need for improved methods of personnel selection led to the
expansion of large-scale programs involving multiple methods of personality assessment.
Following the war, training programs in clinical psychology were systematically
supported by U.S. government funding, to ensure availability of mental-health services to
returning war veterans. As part of these services, psychological testing flourished,
reaching an estimated several million Americans each year. Since the late 1960s
increased awareness has led to excessive use of the psychological tests. But there has also
been some amount of misuse of these tests. Hence there have been greater efforts in the
recent years to establish legal controls and more explicit safeguards against misuse of
testing materials.
Tests are also used in industrial and organizational settings, primarily for selection and
classification. Selection procedures provide guidelines for accepting or rejecting
candidates for jobs. Classification procedures, which are more complex, aim to specify
the types of positions for which an individual seems best suited. Intelligence testing is
usually supplemented by methods devised expressly to meet the needs of the
organization.
Tests are also used in a wide variety of settings. For example, the courts rely on
psychological test data and related expert testimony as one source of information to help
answer important questions. Tests can also be used in the health psychology stream and
also in program evaluations- large scale or small scale.
Currently, a wide range of testing procedures is used. Each type of procedure is designed
to carry out specific functions. In general, psychological tests fall into two broad
categories. There are those designed to assess personal qualities, such as personality,
beliefs, values, and interests, and to measure motivation or ‘drive’. These are known as
measures of typical performance. These are usually administered without a time limit and
the questions have no ‘right’ and ‘wrong’ answers. The answers reflect how the person
taking the test would usually or typically feel, what they believe, or what they think about
things.
Second, there are those designed to measure performance. These are called tests of
ability, aptitude or attainment and are known as measures of maximum performance.
These tests are usually administered with a fixed time limit, and the questions in them do
have right and wrong answers. Some of these tests have very strict time limits, to ensure
that people cannot complete all the questions in the test in the time available. These tests
are designed to see how fast you can work. Usually their questions are not very difficult,
but you have to work fast to do well. Other types of maximum performance test have
more relaxed time limits, or may have no time limit at all. For these the questions may be
quite difficult, or sometimes start off easy and get progressively more difficult as you go
through the test. In these tests the emphasis is on how many questions you can get right,
rather than on how quickly you work. In most cases, tests fall somewhere in between
these two extremes. There will be a time limit, but this will be set to allow most people
sufficient time to get to the end of the test.
The important thing to remember is that when there is a time limit, it will be the same for
everybody. It is also important to remember that psychological tests of ability often seem
to be a lot ‘harder’ than the tests of knowledge people are used to from school. Typically,
if you had done your homework, you would expect to get 80 or 90 per cent of the
questions right in a school knowledge test. Psychological tests are designed so that on
average, people in the group they are intended for would get about 50 per cent right.
Test scores are generally known as intelligence quotients, or IQs, although the various
tests are constructed quite differently. The Stanford-Binet is heavily weighted with items
involving verbal abilities; the Wechsler scales consist of two separate verbal and
performance subscales, each with its own IQ etc. There are also specialized infant
intelligence tests, tests that do not require the use of language, and tests that are designed
for group administration.
Aptitude Tests
These tests predict future performance in an area in which the individual is not currently
trained. Schools, businesses, and government agencies often use aptitude tests when
assigning individuals to specific positions. Vocational guidance counseling may involve
aptitude testing to help clarify individual career goals. If a person's score is similar to
scores of others already working in a given occupation, likelihood of success in that field
is predicted. Some aptitude tests cover a broad range of skills pertinent to many different
occupations. The General Aptitude Test Battery, for example, not only measures general
reasoning ability but also includes form perception, clerical perception, motor
coordination, and finger and manual dexterity. Other tests may focus on a single area,
such as art, engineering, or modern languages. The DAT or Differential Aptitude test is
another example of aptitude tests.
Interest Inventories
For example, one frequently used interest inventory, the Kudor Preference Record,
includes ten clusters of occupational interests: outdoors, mechanical, computational,
scientific, persuasive, artistic, literary, musical, social service, and clerical. For each item,
the subject indicates which of three activities is best or least liked. The total score
indicates the occupational clusters that include preferred activities.
Personality Tests
Projective tests allow for a freer type of response. An example of this would be
the Rorschach test, in which a person states what each of ten ink blots might be. The
terms "objective test" and "projective test" have recently come under criticism in
the Journal of Personality Assessment. The more descriptive "rating scale or self-report
measures" and "free response measures" are suggested, rather than the terms "objective
tests" and "projective tests," respectively.
What is a ‘good’ test? Logically, it would include clear instructions for administration,
scoring and interpretation. It would also be important if a test offered economy in the
time it takes to administer, score and interpret it. Most of all, a good test would be one
that measures what it purports to measure. Along with all these, test users also mention a
few more characteristics important to have a ‘good’ test. They are:
1. Reliability
In everyday conversation, reliability is a synonym for dependability, trustworthiness and
consistency. In testing, it means consistency with reference to measurement. Reliability is
concerned with how accurate or precise a test score is. When a test is administered, the
outcome is an observed score on the quality measured by the test. However, all
measurement procedures physical as well as psychological are subject to some degree of
error. In order to know how much weight to place on the observed score, you need to
know how accurate the test is as a measuring device. Measures of test reliability allow us
to estimate that accuracy. This is a key characteristic of psychometric testing and what
makes it so much more valuable than other forms of measurement. For a psychometric
test, we can quantify the degree of accuracy of the scores we obtain. Being able to
quantify measurement error has important consequences for how we use tests. For
example, if you are carrying out an in-depth individual assessment of a person, on the
basis of which you will be making some important decision, then you need a high degree
of accuracy in your measurement. On the other hand if you are using a test to sort people
into one of two groups, and you are not concerned too much about making a few errors in
this process, then the reliability of the test can be less. In general, reliability can be
increased by making tests longer, and is decreased by shortening them. However, for a
given test length, reliability will depend a lot on how well the test has been designed and
developed. Reliability is one of the most important topics in training in test use.
Reliability is the extent to which a test is repeatable and yields consistent scores. A test
maybe reliable in one context and unreliable in another.
Types of reliability
There are a number of ways to ensure that a test is reliable. Some of them are:
• Test-retest reliability
The test-retest method of estimating a test's reliability involves administering the test to
the same group of people at least twice. Then the first set of scores is correlated with the
second set of scores. Correlations range between 0 (low reliability) and 1 (high
reliability). Test- retest reliability is an estimate of reliability obtained by correlating
pairs of scores from the same person on two different administrations of the same test. It
is appropriate to use when evaluating the reliability of a test that purports to measure
something that is relatively stable over time, such as a personality trait. If the
characteristic to be measured is likely to fluctuate over time, there would be little sense in
assessing the reliability of the test using the test retest method. The change might be due
to measurement error e.g. if you use a tape measure to measure a room on two different
days, any differences in the result is likely due to measurement error rather than a change
in the room size. However, if you measure children’s reading ability in February and the
again in June the change is likely due to changes in children’s reading ability.
• Alternate Forms
Alternate forms are simply different versions of a test that have been constructed so as to
be parallel. Although they don’t meet the requirements for the legitimate designation
‘parallel’, alternate forms of a test are typically designed to be equivalent with respect to
variables such as content and level of difficulty. It is basically to administer Test A to a
group and then administer Test B to same group. Correlation between the two scores is
the estimate of the test reliability. Developing alternate forms can be time consuming and
expensive but at the same time, it is advantageous. Once the tests are designed, it can be
used in several ways. This method also avoids the disadvantages of the test- retest
method to quite an extent.
2. Validity
Like reliability, the understanding the concept of validity is critical to competent test use.
A test is not simply either valid or not. Test manuals will contain reports of research
relating to various aspects of what the test is designed to measure. These studies will
never prove the tests validity, once and for all, because validity is contextual. A test can
be valid for one application but completely irrelevant for another. The studies reported in
the test manual should support the claims that are made about the tests and its use, and
provide the basis on which the test user can make inferences about people’s behaviour
and predictions about the future performance.
In order to be valid, a test must be reliable; but reliability does not guarantee validity, i.e.
it is possible to have a highly reliable test which is meaningless (invalid).
Types of Validity
• Face validity
Face validity relates more to what a test appears to measure to the person being tested
than to what the test actually measures. It has something to do with the mere appearance.
It is a judgment concerning how relevant the test items appear to be. A test on
Mathematics should have numerical questions, a test on history should have questions on
wars, kings etc. If a test definitely appears to measure what it claims to measure, then the
test is high on ‘face validity’. Does the measure, on the face it, seem to measure what is
intended? Ultimately, face validity may be more a matter of public relations than
psychometric soundness.
• Content validity
• Construct validity
Construct Validity is the most important kind of validity. Construct validity is generally
determined by investigating what psychological traits or qualities a test measures; that is,
by demonstrating that certain patterns of human behavior account to some degree for
performance on the test. A test measuring the trait “need for achievement,” for instance,
might be shown to predict that high scorers work more independently, persist longer on
problem-solving tasks, and do better in competitive situations than low scorers. If a
measure has construct validity it measures what it purports to measure.
• Criterion validity
Criterion related validity is a judgment of how adequately a test score can be used to infer
an individual’s most probable standing on some measure of interest- the measure of
interest being the criterion. There are no rules for what can constitute a criterion.
Whatever the criterion, ideally it is relevant, valid and uncontaminated. The criterion can
be a test scorer, behaviour, time, rating etc. Criterion validity consists of concurrent and
predictive validity. Concurrent validity is an index of the degree to which a test score is
related to some criterion measure obtained at the same time i.e. concurrently. Predictive
validity is an index of the degree to which a test score predicts some criterion measure.
For people, questions concerning the validity of a test are intimately tied to questions
concerning the fair use of tests and the issues of bias and fairness.
Test bias
In testing, bias is a factor inherent in a test that systematically prevents accurate, impartial
measurement. Systematic is a key word in the definition. Bias implies systematic
variation. Some tests have been found to be biased because of the design of the research
study rather than the design of the test. Prevention during test development is the best
cure for test bias.
Test fairness
3. Norms
Norm referenced testing and assessment is a method of evaluation and a way of deriving
meaning from test scores by evaluating an individual test taker’s score and comparing it
to the scores of a group of test takers. In this approach, the meaning of an individual test
score is understood relative to other test scores on the same test. Norms for many tests are
expressed as percentile norms. Percentile norms are the raw data from a test’s
standardization sample converted to a percentile form i.e. it is an expression of the
percentage of people whose score on a test or measure falls below a particular raw score.
• Age norms
They indicate the average performance of different samples of test takers who were at
various ages at the time the test was administered.
• Grade norms
Designed to indicate the average test performance of test takers in a given test grade,
grade norms are developed by administering the test to representative samples of children
over a range of consecutive grade levels. Then the mean or median scores for children at
each grade level is computed. Grade norms don’t provide information as to the content or
type of items that a student could or could not answer correctly. It is used basically to
understand and gauge how one student’s performance compares to that of fellow student
in the same grade.
Ethics are an essential part of the administration of psychological tests and it is necessary
that all test users follow the ethical guidelines when using any type of psychological test.
Psychological tests are an important tool in terms of many professions in an array of
settings such as in clinical psychology, education, and even business. However, misuse of
psychological test by the administrators is a constant and troubling issue that has the
potential to harm the individuals involved and even society as a whole.
For test takers, the misuse of a psychological test could result in improper diagnoses or
inappropriate decision making. The misuse of tests reflects very poorly on the
professional organizations along with highly trained test users and overall will result in
poor decisions that may harm society in both an economic and mental manner.
Usually test administrators do not intentionally misuse tests, but rather are not properly
informed as to the technical knowledge and overall testing procedure involved. In an
effort to prevent the misuse of psychological tests, psychologists developed a set of
professional and technical standards for the creation, evaluation, administration, scoring,
and interpretation of all psychological tests. Professionals can overcome the misuse of
tests simply by understanding the professional and technical standards that are involved
in using psychological tests.
There are many issues of concern when it comes to ethics, one such issue being the right
to privacy and confidentiality. The concepts of individual rights and privacy are a
fundamental part of our society. The Ethical Principles affirm individual rights to privacy
and confidentiality as well as self-determination. The term confidentiality indicates that
individuals are guaranteed privacy in terms of all personal information that is disclosed
and that no information will then be disclosed without the individual's direct permission
which is usually required in writing. The results should be given directly to respondents
and are strictly confidential, including maintaining the privacy from employers.
Psychologists should protect data kept on file so that only those who have a right of
access can obtain them.
There are times however, that confidentiality is breached because managers, for example,
will seek out psychological information about their employees. Another example of
confidentiality being breached in a professional setting is when teachers may seek prior
test scores for students, however, with the good intention of understanding issues of
performance.
Psychologists have a responsibility to ensure that the examinee as well as their parent or
guardian understand all implications and requirements that will be involved in a
psychological test before it is even administered. In addition to the issue of informed
consent, participants are also entitled to be prompted with an explanation, as non-
technical as it may be, of the test results. However, due to the fact that some test results
may influence the participant's self esteem as well as behavior, it is crucial that a trained
professional explain the results to the participant in a sensitive and understanding
manner.
The right to be informed of test findings is another one of the important ethical issues.
The Respondents should also be allowed to clarify their results. They should then be
provided a written description of their preferences. With sensitivity to the situation, the
test user will inform the test taker of the purpose of the test and will be available top
answer further questions test takers or their guardians have about the test scores. Ideally,
counseling resources will be available for those who react adversely to information that
has been presented.
Respondents should be informed of the nature of the test before taking it, and must
choose to take it voluntarily and should not be compelled to take it. Another issue that
involves ethics in terms of psychological tests is the right to protection from stigma or
the right to the least stigmatizing label. In conjunction with the participant's right to
know and understand their results, researchers need to be careful not to use any
stigmatizing labels when describing the results in terms of the participant. Researchers
need to refrain from using terms such as "feebleminded" and "addictive personality".
Therefore, the results that the test taker receives along with their parent or guardian in
cases involving minors should bring upon positive growth and development for the test
taker.
It is the duty of the psychologist to protect the integrity of the test by not coaching
individuals on actual test materials or other practice materials that might unfairly
influence their test performance. Ensuring that test techniques are not described publicly
in such a way that their usefulness is impaired is also important.