Examples of Intelligence Tests

1. Stanford-Binet IQ test

The modern field of intelligence testing began with the Stanford-

Binet IQ test. The Stanford-Binet itself started with the French
psychologist Alfred Binet as a standard way for psychologists to quickly and
easily compare the psychological functioning of different people. As Binet
indicated, case studies may be more detailed and at times more helpful, but
the time required to test large numbers of people would be huge.
Unfortunately, the tests he and his assistant Victor Henri developed in 1896
were largely disappointing (Fancher, 1985).

Later on, Binet worked with physician Theodore Simon on the

problem of retardation in French school children. Between 1905 and 1908,
their research at a school for boys in Grange-aux-Belles, France led to the
development of the Binet-Simon tests. Employing questions of increasing
difficulty, this test measured such things as attention, memory, and verbal
skills. Binet cautioned people that these scores should not be taken too
literally because of the plasticity of intelligence and the inherent margin of
error in the test (Fancher, 1985).

In 1916, Stanford psychologist Lewis Terman released the "Stanford

Revision of the Binet-Simon Scale" or the "Stanford-Binet" for short. With
the help of several graduate students and validation experiments, he
removed several of the Binet-Simon test items and added completely new
ones. The test soon became so popular that Robert Yerkes, the president of
the American Psychological Association, decided to use the test to develop
the Army Alpha and Army Beta tests, which helped classify recruits. Thus, a
high-scoring individual would get a grade of A (high officer material),
whereas a low-scoring individual would get a grade of E and be rejected
(Fancher, 1985).

Since the Stanford-Binet got its name, it has been revised several
times to give us the current Stanford-Binet 5. According to the publisher's
website, "The SB5 was normed on a stratified random sample of 4,800
individuals that matches the 2000 U.S. Census. Bias reviews were conducted
on all items for gender, ethnic, cultural/religious, regional, and
socioeconomic status issues.

Validity data was obtained using such instruments as the Stanford-

Binet Intelligence Scale, Fourth Edition, the Stanford-Binet Form L-M, the
Woodcock-Johnson® III, the Universal Nonverbal Intelligence TestTM, the
Bender®-Gestalt, the WAIS®-III, the WIAT®-II, the WISC-III®, and the
WPPSI-R®." Low variation on individuals tested multiple times indicates the
test has high reliability. It features Fluid Reasoning, Knowledge, Quantitative
Reasoning, Visual-Spatial Processing, and Working Memory as the 5 factors
tested. Each of these factors is tested in two separate domains, verbal and
nonverbal, in order to accurately assess individuals with deafness, limited
English, or communication disorders. Examples of test items include verbal
analogies to test Verbal Fluid Reasoning and picture absurdities to test
Nonverbal Knowledge. In conclusion, the test makers assure people the
Stanford-Binet 5 will accurately assess low-end functioning, normal
intelligence, and the highest levels of giftedness (Riverside Publishing, 2004).
Despite this recent revision, some controversy remains as to the accuracy
and bias of this test; however, many psychologists believe the evidence
available shows that the Stanford-Binet test is valid, and it remains a popular
assessment of intelligence.

Students with exceptional scores on this test may be deemed bright, moderately
gifted, highly gifted, extremely gifted, or profoundly gifted.

The Stanford Binet IQ Test is designed to test intelligence in four areas including
verbal reasoning, quantitative reasoning, abstract and visual reasoning, and
short-term memory skills. The Stanford Binet also scores 15 subtests including:

• vocabulary
• comprehension
• verbal absurdities
• pattern analysis
• matrices
• paper folding and cutting
• copying
• quantitative
• number series
• equation building
• memory for sentences
• memory for digits
• memory for objects
• bead memory

Those planning on taking The Stanford Binet IQ Test will take an

additional vocabulary test, which along with the subject's age, determines
the number and level of subtests to be administered. Total testing time is
45-90 minutes, depending on the subject's age and the number of
subtests given. Raw scores are based on the number of items answered,
and are converted into a standard age score corresponding to age group,
similar to an IQ Score.

The Stanford Binet IQ Test combines features of earlier editions of the

Stanford-Binet Intelligence Scale with recent improvements in
psychometric design. Point-scale format subtests, designed to measure
behavior at every age, and used in the 1986 edition are combined with
the age-scale or functional-level design of the earlier editions. Two routing
subtests identify the developmental starting points of the examinee, and
the items can be tailored to cognitive level, resulting in greater precision
in measurement. The Stanford-Binet IQ Test now has five factors,
(Fluid Reasoning, Knowledge, Quantitative Reasoning, Visual-Spatial
Processing, and Working Memory) as opposed to the four of the previous
edition of the 1st Stanford Binet IQ Test.

This edition of the Stanford Binet IQ Test allows for evaluation of the
abilities of elderly examinees. The test is for children ages 2 through

The Wechsler Intelligence Scales are a series of standardized tests

used to evaluate cognitive abilities and intellectual abilities in children and


The Wechsler Intelligence Scales for Children (regular, revised,

and third edition) and Wechsler Preschool and Primary Scale of
Intelligence are used as tools in school placement, in determining the
presence of a learning disability or a developmental delay, in identifying
giftedness, and in tracking intellectual development.

The Wechsler Adult Intelligence Scales (regular and revised) are used to
determine vocational ability, to assess adult intellectual ability in the
classroom, and to determine organic deficits. Both adult and children's
Wechsler scales are often included in neuropsychological testing to assess
the brain function of individuals with neurological impairments.


Intelligence testing requires a clinically trained examiner. The Wechsler

scales should be administered, scored, and interpreted by a trained
professional, preferably a psychologist or psychiatrist.


All of the Wechsler scales are divided into six verbal and five performance
subtests. The complete test takes 60-90 minutes to administer. Verbal and
Performance IQs are scored based on the results of the testing, and then a
composite Full Scale IQ score is computed. Although earlier editions of some
of the Wechsler Scales are still available, the latest revisions are described
Wechsler Adult Intelligence Scale-Revised (WAIS-R)

The WAIS-R, the 1981 revision of the original Wechsler Adult Intelligence
Scale, is designed for adults, age 16-74. The 11 subtests of the WAIS-R
include information, digit span, vocabulary, arithmetic, comprehension,
similarities, picture completion, picture arrangement, block design, object
assembly, and digit symbol. An example of questions on the subtest of
similarities might be: "Describe how the following pair of words are alike or the
same-hamburger and pizza." A correct response would be "Both are things to eat."

Wechsler Intelligence Scale for Children, Third Edition (WISC-III)

The WISC-III subtests includes many of the same categories of subtests as

the WAIS-R. In addition, there are two optional performance subtests:
symbol search and mazes.

Wechsler Preschool and Primary Scale of Intelligence (WPPSI)

The WPPSI is designed for children age 4-6½ years. The test is divided
into six verbal and five performance subtests. The eleven subtests are
presented in the following order: information, animal house and animal
house retest, vocabulary, picture completion, arithmetic, mazes, geometric
design, similarities, block design, comprehension, and sentences.

The 1997 Medicare reimbursement rate for psychological and

neuropsychological testing, including intelligence testing, is $58.35 an hour.
Billing time typically includes test administration, scoring and interpretation,
and reporting. Many insurance plans cover all or a portion of diagnostic
psychological testing.

Normal results

The Wechsler Intelligence Scales are standardized tests, meaning that as

part of the test design, they were administered to a large representative
sample of the target population, and norms were determined from the
results. The scales have a mean, or average, standard score of 100 and a
standard deviation of 15. The standard deviation indicates how far above or
below the norm the subject's score is. For example, a ten-year-old is
assessed with the WISC-III scale and achieves a full-scale IQ score of 85.
The mean score of 100 is the average level at which all 10-year-olds in the
representative sample performed. This child's score would be one standard
deviation below that norm.

While the full-scale IQ scores provide a reference point for evaluation, they
are only an average of a variety of skill areas. A trained psychologist will
evaluate and interpret an individual's performance on the scale's subtests to
discover their strengths and weaknesses and offer recommendations based
upon these findings.

Key Terms

Normative or mean score for a particular age group.
Representative sample
A random sample of people that adequately represents the test-
taking population in age, gender, race, and socioeconomic
Standard deviation
A measure of the distribution of scores around the average
(mean). In a normal distribution, two standard deviations above
and below the mean includes about 95% of all samples.
The process of determining established norms and procedures
for a test to act as a standard reference point for future test

2. Non verbal test - Raven's Progressive Matrices

The Raven Progressive Matrices test is a widely used intelligence test in

many research and applied settings. In each test item, one is asked to find
the missing pattern in a series. Each set of items gets progressively harder,
requiring greater cognitive capacity to encode and analyze.

Sample item from the Raven Progressive Matrices tests

Raven's Progressive Matrices was designed primarily as a measure of

Spearman's g. There are no time limits and simple oral instructions. There
are 3 different tests for different abilities:

• Coloured Progressed Matrices (younger children and special groups)

• Stanford Progressive Matrices (average 6 to 80 year olds)
• Advanced Progressive Matrices (above average adolescents & adults)

In terms of its psychometrics, Raven's Progressive Matrices:

has good test-retest reliability between .70 and .90 (however, for low score
ranges, the test-retest reliability is lower)

• has good internal consistency coefficients - mostly in the .80s and .90s
• has correlations with verbal and performance tests range which
between .40 and .75
• fair concurrent validity in studies with mentally retarded groups
• lower predictive validity than verbal intelligence tests for academic

3. Non-verbal test - Gesell Developmental Schedules for very young


Prior to the preschool years, the assessment tools for infants measure
somewhat different components of intellectual ability. An example of an oft-
used test is the Gesell Developmental Schedules. This test was first
introduced in 1925 and has been revised periodically. The schedules are
designed to measure developmental progress of babies and children from 4
weeks to 5 years. These schedules provide a standardized procedure for
observing and evaluating the developmental attainment of children in five

• Gross motor skills: cruises a rail using 2 hands

• Fine motor skills: uses “scissors” grasp on string
• Language development: uses “da-da” with meaning
• Adaptive behaviour: pulls a string to obtain a ring
• Personal-social behaviours: pushes arm through dress if started.

Gesell identified naturally occurring situations in the home or clinic and uses
objects or tasks with high appeal for infants and preschoolers. Well-trained
observers can attain interrater reliabilities in the mid .90s (Knobloch &
Pasamanick, 1974).

Gessell didn’t intend his schedules to be intelligence tests, rather they are
used to identify neurological impairment and mental retardation.

Gesell determined that normal development is a time-bracketed

phenomenon: that is the age variability for attaining developmental
milestones in infancy is very small; on the order of a few weeks for many
tasks. Many studies indicate that the Gesell Schedules function well in the
screening of intellectually at-risk infants. Virtually all infant tests have
borrowed from or adapted the original schedules devised by Arnold Gesell

Tests for special populations

• Tests may be individual or group

• Typically designated as performance, non-language or nonverbal tests
• Tests designed for groups such as infants, preschoolers, mentally
retarded people, physically disabled (hearing, visual, motor), and
multicultural populations (language & cultural issues)

The major non-verbal test in use is Raven's progressive matrices