Reasoning Tests Technical Manual

INTERNATIONAL
general,
critical &
graduate
TEST BATTERY
technical manual
c
ONTENTS
1
6
THEORETICAL OVERVIEW
THE GRADUATE & GENERAL REASONING TESTS (GRT1 & GRT2)
THE CRITICAL REASONING TEST BATTERY (CRTB)
THE PSYCHOMETRIC PROPERTIES OF THE REASONING TESTS
ADMINISTRATION INSTRUCTIONS
REFERENCES
2
LIST OF TABLES 3
1 GRT2 STANDARDISATION SAMPLES
2 GENDER DIFFERENCES ON GRT2
3 CORRELATIONS BETWEEN GRT2 AND AGE
4 GENDER DIFFERENCES ON GRT1
5 RELATIONSHIP BETWEEN AGE AND GRT1
6 GENDER DIFFERENCES ON CRTB
7 RELATIONSHIP BETWEEN AGE AND CRTB
8 COEFFICIENT ALPHA FOR GRT2 SUB-SCALES
9 COEFFICIENT ALPHA FOR GRT1 SUB-SCALES
10 TEST-RETEST RELIABILITY ESTIMATES FOR GRT1
11 COEFFICIENT ALPHA FOR CRTB SUB-SCALES
12 DIFFERENCES BETWEEN COMPUTER VS CONVENTIONAL METHODS OF TEST
ADMINISTRATION
13 CORRELATIONS BETWEEN THE GRT1 SUB-SCALES
14 CORRELATIONS BETWEEN THE GRT2 SUB-SCALES
15 CORRELATIONS BETWEEN CRTB VERBAL & NUMERICAL
16 CORRELATIONS BETWEEN THE GRT1 AND AH5 SUB-SCALES
17 CORRELATIONS BETWEEN THE GRT2 AND AH3 SUB-SCALES
18 CORRELATIONS OF GRT2 WITH TECHNICAL TEST BATTERY
19 CORRELATIONS OF GRT2 WITH CLERICAL TEST BATTERY
20 CORRELATIONS BETWEEN THE EXPERIMENTAL VERSIONS OF THE VCR1 & NCR1
AND THE VERBAL NUMERICAL SUB-SCALE OF THE AH5
21 CORRELATIONS BETWEEN VCR1 & NCR1 AND AH5 VERBAL/NUMERICAL
22 CORRELATIONS BETWEEN VCR1 & NCR1 AND WGCTA
23 CORRELATIONS BETWEEN VCR1 & NCR1 AND MAB (N=154)
24 CORRELATIONS WITH GRT2 AND JOB PERFORMANCE IN AN INSURANCE SETTING
25 CORRELATIONS WITH GRT2 AND PERFORMANCE CRITERIA IN BANKING
26 CORRELATIONS BETWEEN GRT2 & SERVICE ENGINEER PERFORMANCE
27 CORRELATIONS BETWEEN GRT2 & PRINTER PERFORMANCE CRITERIA (N=70)
28 CORRELATIONS BETWEEN GRT2 & PROFICIENCY CRITERIA
29 CORRELATIONS BETWEEN GRT2 & SUCCESSFUL APPLICANT FOR COMPONENT
COURSE
4
1
THEORETICAL
OVERVIEW
A major reason for using psychomet-
ric tests to aid selection decisions is
that they provide information that
cannot be obtained easily in other
ways. If such tests are not used then
what we know about the applicant is
limited to the information that can
be gleaned from an application form
or CV, an interview and references. If
1
2
THE ROLE OF PSYCHOMETRIC TESTS
IN PERSONNEL SELECTION AND
ASSESSMENT
THE ORIGINS OF REASONING TESTS
we wish to gain information about a

person’s specific aptitudes & abilities
and about their personality, atti-
tudes and values then we have little
option but to use psychometric tests.
In fact, psychometric tests can do
more than simply provide additional
information about the applicant.
They can add a degree of reliability
and validity to the selection proce-
dure that it is impossible to achieve
in any other way. How they do this
is best addressed by examining the
limitations of the information
obtained through interviews, appli-
cation forms and references and
exploring how some of these limita-
tions can be overcome by using psy-
chometric tests.
THE ROLE OF
PSYCHOMETRIC TESTS IN
PERSONNEL SELECTION
6 AND ASSESSMENT
While much useful information can A further limitation of the inter-
be gained from the interview, which view is that it only assesses the
clearly has an important role in any candidate’s behaviour in one setting,
selection procedure, it does nonethe- and with regard to a small number
less suffer from a variety of weak- of people. How the candidate might
nesses. Perhaps the most important act in different situations and with
of these is that the interview as been different people (e.g. when dealing
shown to be a very unreliable way to with people on the shop floor) is not
judge a person’s character. This is assessed, and cannot be predicted
because it is an unstandardised from an applicant’s interview perfor-
assessment procedure. That is to say, mance. Moreover, the interview
each interview will be different from provides no reliable information
the last. This is true even if the inter- about the candidate’s aptitudes and
viewer is attempting to ask the same abilities. The most we can do is ask
questions and act in the same way the candidate about his strengths
with each applicant. It is precisely and weaknesses, a procedure that
this aspect of the interview that is has obvious limitations. Thus the
both its main strength and its main range, and reliability of the informa-
weakness. The interview enables us tion that can be gained through an
to probe each applicant in depth and interview are limited.
discover individual strengths and There are similar limitations on
weaknesses. Unfortunately, the inter- the range and usefulness of the infor-
views unstandardised, idiosyncratic mation that can be gained from
nature makes it difficult to compare application forms or CV’s. While
applicants, as it provides no base line work experience and qualifications
against which to contrast intervie- may be prerequisites for certain
wees’ differing performances. In occupations, in and of themselves
addition, it is likely that different they do not determine whether a
interviewers may come to radically person is likely to perform well or
different conclusions about the same badly. Experience and academic
applicant. Applicants will respond achievement is not always a good
differently to different interviewers, predictor of ability or future success.
quite often saying very different While such information is important
things to them. In addition, what any it may not be sufficient on its own to
one applicant might say will be enable us to confidently choose
interpreted quite differently by each between applicants. Thus aptitude
interviewer. In such cases we have to and ability tests are likely to play a
ask which interviewer has formed significant role in the selection
the correct impression of the candi- process as they provide information
date? This is a question to which on a person’s potential and not just
there is no simple answer. their achievements to date.
7
Moreover, application forms tell us incumbents). In the case of personal-
little about a person’s character. It is ity tests the test addresses the issue
often a candidate’s personality that of how the person characteristically
will make the difference between an behaves in a wide range of different
average and an outstanding perfor- situations and with different people.
mance. This is particularly true Thus psychometric tests of personal-
when candidates have relatively ity, aptitude and ability provide a
similar records of achievement and range of information that are not
past performance. Therefore, person- easily and reliably assessed in other
ality tests can play a major role in ways. Such information can fill
assisting selection decisions. important gaps which have not been
References do provide some useful assessed by application forms, inter-
information but mainly for verifica- views and references. It can also raise
tion purposes. While past perfor- questions that can later be directly
mance is undoubtedly a good predic- addressed in the interview. It is for
tor of future performance references this reason that psychometric tests
are often not good predictors of past are being used increasingly in
performance. If the name of the personnel selection. Their use adds a
referee is supplied by the applicant, degree of breadth to assessment deci-
then it is likely that they have chosen sions which cannot be achieved in
someone they expect to speak highly any other way.
of them. They will probably have
avoided supplying the names of
those who may have a less positive
view of their abilities. Aptitude and
ability tests, on the other hand, give
us an indication of the applicant’s
probable performance under exam
conditions. This is likely to be a true
reflection of the person’s ability.
What advantages do psychometric
tests have over other forms of assess-
ment? The first advantage they have
is that they add a degree of reliability
to the selection procedure that
cannot be achieved without their use.
Test results can be represented
numerically making it easy both to
compare applicants with each other,
and with pre-defined groups (e.g.
successful vs. unsuccessful job
THE ORIGINS OF
8 REASONING TESTS
The assessment of intelligence or age of 10, regardless of its chrono-
reasoning ability is perhaps one of logical age. From this idea the
the oldest areas of research interest concept of the Intelligence Quotient
in psychology. Gould (1981) has (IQ) was developed by William Stern
traced attempts to scientifically (1912) who defined it as mental age
measure psychological aptitudes and divided by chronological age multi-
abilities to the work of Galton at the plied by 100. Previous to Stern’s
end of the last century. Prior to paper chronological age had been
Galton’s pioneering work, however, subtracted from mental age to
interest in this area was aroused by provide a measure of mental alert-
phrenologists’ attempts to assess ness. Stern showed that it was more
mental ability by measuring the size appropriate to take the ratio of these
of people’s heads. Reasoning tests, in two constructs, which would provide
their present form, were first devel- a measure of the child’s intellectual
oped by Binet, a French educational- development relative to other chil-
ist who published the first test of dren. He further proposed that this
mental ability in 1905. ratio should be multiplied by 100 for
Binet was concerned with assess- ease of interpretation; thus avoiding
ing the intellectual development of cumbersome decimals.
children and to this end invented the Binet’s early tests were subse-
concept of mental age. Questions, quently revised by Terman et al.
assessing academic ability, were (1917) to produce the now famous
graded in order of difficulty accord- Stanford-Binet IQ test. These early
ing to the average age at which chil- IQ tests were first used for selection
dren could successfully complete by the American’s during the first
each item. From the child’s perfor- world war, when Yerkes (1921)
mance on such a test it was possible tested 1.75 million soldiers with the
to derive its mental age. This army alpha and beta tests. Thus by
involved comparing the performance the end of the war, the assessment of
of the child with the performance of reasoning ability had firmly estab-
the ‘average child’ from different age lished its place within psychology.
groups. If the child performed at the
level of the average 10 year old, then
the child was said to have a mental
2
THE GRADUATE
& GENERAL
REASONING TESTS
(GRT1 & GRT2)
bk
Research in the area of intelligence Research has clearly demonstrated
testing has consistently demon- that in order to accurately assess
strated that three aptitude domains: reasoning ability it is necessary to
Verbal, Numerical and Abstract develop tests which have been
Reasoning Ability (Heim, 1970). specifically designed to measure that
Consequently, the GRT1 and GRT2 ability in the population under
have been designed to measure just consideration. That is to say, we
these three areas of ability. Verbal need to be sure that the test has been
and Numerical ability assess, as their developed for use on the particular
respective names would suggest, the group being tested, and thus that it
ability to use words and numbers in is appropriate for that particular
a rational way, correctly identifying group. There are two ways in which
logical relationships between these this is important. Firstly, it is impor-
entities and drawing conclusions and tant that the test has been developed
inferences from them. Abstract, in the country in which it is intended
reasoning assesses the ability to iden- to be used. This ensures that the
tify logical relationships between items in the test are drawn from a
abstract spatial relationships and common, shared cultural experience,
geometric patterns. Many psycholo- giving each candidate an equal
gists argue that Abstract Reasoning opportunity to understand the logic
tests assess the purest form of ‘intel- which underlies each item. Secondly,
ligence’. That is to say, these tests are it is important that the test is
the least affected by educational designed for the particular ability
experience and assess what some range on which it is to be used. A
might term ‘innate’ reasoning ability. test designed for those of average
Namely, the ability to solve abstract, ability will not accurately distinguish
logical problems which require no between people of high ability as all
prior knowledge or educational their scores will cluster towards the
experience. top end of the scale. Similarly, a test
designed for people of high ability
will be of little practical use if given
to people of average ability. Not only
will the test not discriminate between
applicants, as all the scores will
cluster towards the bottom of the
scale, but also as the questions will
be too difficult for most of the appli-
cants they are likely to lose motiva-
tion, producing artificially low
scores. For this reason two versions
of this reasoning test were developed.
One developed for the general popu-
lation (the average ability range) and
one for the graduate population.
bl
In constructing the items in the
GRT1 and GRT2 which measure
these reasoning abilities a number of
guide lines were borne in mind.
Firstly, and perhaps most impor-
tantly, each item was constructed so
that only a minimal educational level
was needed in order to be able to
correctly solve each item. Thus we
tried to ensure that each item was a
measure of ‘reasoning ability’, rather
than being a measure of specific
knowledge or experience. For
example, in the case of the numerical
items the calculations involved in
solving each item are relatively
simple, with the difficulty of the item
being due to the logic which under-
lies that question rather than being
due to the need to use complex
mathematical operations to solve
that item. (It should be noted
however that in the case of the GRT1
a higher level of education was
assumed, as the test was designed for
a graduate population). Secondly, a
number of different item types (e.g.
odd one out, word meanings etc.)
were used to measure each aspect of
reasoning ability. This was done in
order to ensure that each sub-scale
measures a broad aspect of reasoning
ability (e.g. Verbal Reasoning
Ability), rather than measuring a
very specific aptitude (e.g. vocabu-
lary). In addition, the use of different
item types ensures that the test is
measuring different components of
reasoning ability. For example the
ability to understand analogies,
inclusion/exclusion criteria for class
membership etc.
bm
3
THE CRITICAL
REASONING TEST
BATTERY (CRTB)
bo
Research has clearly demonstrated VCR1 and NCR1 have been devel-
that in order to accurately assess oped on data from undergraduates.
reasoning ability it is necessary to That is to say, people of above
develop tests which have been average intelligence, who are likely
specifically designed to measure that to find themselves in senior manage-
ability in the population under ment positions as their career devel-
consideration. That is to say, we ops.
need to be sure that the test has been In constructing the items in the
developed for use on the particular VCR1 and NCR1 a number of guide
group being tested, and thus is lines were borne in mind. Firstly,
appropriate for that particular and perhaps most importantly,
group. There are two ways in which special care was taken when writing
this is important. Firstly, it is impor- the items to ensure that in order to
tant that the test has been developed correctly solve each item it was
in the country in which it is intended necessary to draw logical conclusions
to be used. This ensures that the and inferences from the stem
items in the test are drawn from a passage/table. This was done to
common, shared cultural experience, ensure that the test was assessing
giving each candidate an equal critical (logical/deductive) reasoning
opportunity to understand the logic rather than simple verbal/numerical
which underlies each item. Secondly, checking ability. That is to say, the
it is important that the test is items assess a person’s ability to
designed for the particular ability think in a rational, critical way and
range on which it is to be used. A make logical inferences from verbal
test designed for those of average and numerical information, rather
ability will not accurately distinguish than simply check for factual errors
between people of high ability as all and inconsistencies.
the scores will cluster towards the In order to achieve this goal for
top end of the scale. Similarly, a test the Verbal Critical Reasoning
designed for people of high ability (VCR1) test two further points were
will be of little use if given to people born in mind when constructing the
of average ability. Not only will it not stem passages for the VCR1. Firstly,
discriminate between applicants, as the passages were kept fairly short
all the scores will cluster towards the and cumbersome grammatical
bottom of the scale, but also as the constructions were avoided, so that a
questions will be too difficult for person’s scores on the test would not
most of the applicants they are likely be too affected by reading speed;
to be de-motivated, producing artifi- thus providing a purer measure of
cially low scores. Consequently, the critical reasoning ability. Secondly,
bp
care was taken to make sure that the
passages did not contain any infor-
mation which was counter-intuitive,
and was thus likely to create confu-
sion. To increase the acceptability of
the test to applicants the themes of
the stem passages were chosen to be
relevant to a wide range of business
situations. As a consequence of these
constraints the final stem passages
were similar in many ways to the
short articles found in the financial
pages of a daily newspaper.
Finally an extended response
format was chosen for the VCR1.
While many critical reasoning tests
only ask the applicant to make the
distinction between whether the
target statement is true, false or
cannot be inferred from the stem
passage the response format for the
VCR1 was extended to include the
judgement of whether the target
statement was probably or definitely
true/false given the information in
the passage. This was done in order
to decrease the chance of guessing a
correct answer from 33% to 25%.
With guessing having substantially
less impact on a candidate’s final
score, it was thus possible to decrease
the number of items in the test that
were needed for it to be reliable.
bq
4 THE PSYCHOMETRIC
PROPERTIES OF THE
REASONING TESTS
This chapter will present details con-
cerning the psychometric properties
of the reasoning Tests. The aim will
be to show that these measures fulfil
various technical requirements, in
the areas of standardisation, reliabil-
ity and validity, which ensure the
psychometric soundness of the test.
1
4
INTRODUCTION
STANDARDISATION
RELIABILITY OF THE REASONING

TESTS
METHOD EFFECTS
5 VALIDITY
6 THE STRUCTURE OF THE REASONING

TESTS
7 THE CONSTRUCT VALIDITY OF THE

REASONING TEST
8 CRITERION-RELATED VALIDITY
bs INTRODUCTION
Standardisation : Normative Normally Pearson correlation coeffi-

Normative data allows us to compare cients are used to quantify the simi-
an individuals score on a standard- larity between the scale scores over
ised scale against the typical score the two or more occasions.
obtained from a clearly identifiable, Stability coefficients provide an
homogeneous group of people. important indicator of a test’s likely
usefulness of measurement. If these
RELIABILITY coefficients are low (< approx. 0.6)
The property of a measurement then it is suggestive of either that the
which assesses the extent to which abilities/behaviours/attitudes being
variation in measurement is due to measured are volatile or situationally
true differences between people on specific, or that over the duration of
the trait being measure or to the retest interval, situational events
measurement error. have made the content of the scale
In order to provide meaningful irrelevant or obsolete. Of course, the
interpretations, the reasoning tests duration of the retest interval
were standardised against a number provides some clue as to which effect
of relevant groups. The constituent may be causing the unreliability of
samples are fully described in the measurement. However, the second
next section. Standardisation ensures measure of a scales reliability also
that the measurements obtained provides valuable information as to
from a test can be meaningfully why a scale may have a low stability
interpreted in the context of a rele- coefficient. The most common
vant distribution of scores. Another measure of internal consistency is
important technical requirement for Cronbach’s Alpha. If the items on a
a psychometrically sound test is that scale have high inter-correlations
the measurements obtained from with each other, and with the total
that test should be reliable. scale score, then coefficient alpha
Reliability is generally assessed will be high. Thus a high coefficient
using two specific measures, one alpha indicates that the items on the
related to the stability of scale scores scale are measuring very much the
over time, the other concerned with same thing, while a low alpha would
the internal consistency, or homo- be suggestive of either scale items
geneity of the constituent items that measuring different attributes or the
form a scale score. presence of error.
Reliability : Stability Reliability : Internal

Also known as test-retest reliability, Consistency
an assessment is made of the similar- Also known as scale homogeneity, an
ity of scores on a particular scale assessment is made of the ability of
over two or more test occasions. The the items in a scale to measure the
occasions may be from a few hours, same construct or trait. That is a
days, months or years apart. parameter can be computed that
bt
indexes how well the items in a scale Validity : Criterion Validity
contribute to the overall measure- Criterion validity involves translat-
ment denoted by the scale score. A ing a score on a particular test into a
scale is said to be internally consis- prediction concerning what could be
tent if all the constituent item expected if another variable was
responses are shown to be positively observed.
associated with their scale score. The criterion validity of a test is
The fact that a test has high inter- provided by demonstrating that
nal consistency & stability coeffi- scores on the test relate in some
cients only guarantees that it is meaningful way with an external
measuring something consistently. It criterion. Criterion validity comes in
provides no guarantee that the test is two forms – predictive & concurrent.
actually measuring what it purports Predictive validity assesses whether a
to measure, nor that the test will test is capable of predicting an
prove useful in a particular situation. agreed criterion which will be avail-
Questions concerning what a test able at some future time – e.g. can a
actually measures and its relevance test predict the likelihood of someone
in a particular situation are dealt successfully completing a training
with by looking at the tests validity. course. Concurrent validity assesses
Reliability is generally investigated whether the scores on a test can be
before validity as the reliability of used to predict a criterion measure
test places an upper limit on tests which is available at the time of the
validity. It can be mathematically test – e.g. can a test predict current
demonstrated that a validity coeffi- job performance.
cient for a particular test can not
exceed that tests reliability coeffi- Validity : Construct Validity
cient. Construct validity assesses whether
the characteristic which a test is
VALIDITY actually measuring is psychologically
The ability of a scale score to reflect meaningful and consistent with the
what that scale is intended to tests definition.
measure. Kline’s (1993) definition is The construct validity of a test is
“A test is said to be valid if it assessed by demonstrating that the
measures what it claims to measure”. scores from the test are consistent
Validation studies of a test investi- with those from other major tests
gate the soundness and relevance of which measure similar constructs
a proposed interpretation of that and are dissimilar to scores on tests
test. Two key areas of validation are which measure different constructs.
known as criterion validity and
construct validity.
ck STANDARDISATION
For each of the three reasoning The results below demonstrate
batteries, information is provided on gender differences on NR2 with the
the constituent norm samples, age male mean score just over two raw
and gender differences where they score points higher than that of the
apply. All normative data is available females. This is in line with other
from within the GeneSys system numerical measures. More surprising
which computes for any given raw perhaps, is that no differences were
score, the appropriate standardised observed on either the Verbal and
scores for the selected reference Abstract measures.
group. In addition the GeneSys™ The effect of age on GRT2 scores
software allows users to establish was examined using a sample of
their own in-house norms to allow 1441 respondents on whom age data
more focused comparison with was available (see Table 3). Whereas
profiles of specific groups. there is a negative relationship with
GRT2 scores and age for all the three
GRT2 NORMATIVE DATA measures, this tendency is highly
The total norm base of the GRT2 is significant on the Abstract (AR2),
based on a general population norm suggesting, in line with expectations
as well as a number of more that fluid ability may be more likely
specialised norm groups. These to decline with age than crystallised
include undergraduates, technical ability.
staff, personnel managers, customer
service staff, management applicants
etc. detailed in Table 1.
GRT2 GENDER AND AGE

DIFFERENCES
Gender differences on GRT2 were
examined by comparing results of
almost equal numbers of males and
female respondents matched as far
as possible for educational and socio-
economic status. Table 2 provides
mean scores for males and females
on each of the GRT2 scales as well as
the t-value for mean score differ-
ences.
cl
Males N Females N Mean Age Range SD Age
General Population 3177 1236 35.10 16-63 8.87
Telesales Applicants 175 391 27.16 16-55 8.29
HE College Students 5 158 16.25 15-41 3.14
Customer Service Clerks 28 87 26.46 19-50 7.19
Technical Staff 122 24 29.38 16-58 9.92
Financial Consultants 58 20 32.94 20-53 9.20
HR Professionals 30 38 36.31 22-55 7.94
Service Engineers 86 8 26.56 18-58 6.87
Table 1: GRT2 Standardisation Samples
Mean Mean Females Males

GRT2 Females Males t-value df p N N
VR2 23.36 23.22 .3404 1438 .734 852 588
NR2 15.77 17.91 -5.424 1438 .000 852 588
AR2 16.95 17.40 -1.259 1438 .208 852 588
Table 2: Gender differences on GRT2
GRT2 AGE
VR2 .11
NR2 .10
AR2 -.37
Table 3: Pearson correlations between GRT2 and age

cm
GRT1 GENDER AND AGE CRTB GENDER AND AGE
DIFFERENCES DIFFERENCES
Gender differences on GRT1 were Gender differences on CRTB were
examined by comparing results of examined by comparing results of
almost equal numbers of males and males and female respondents
female respondents matched as far matched as far as possible for educa-
as possible for educational and socio- tional and socio-economic status.
economic status. Table 4 provides Table 6 opposite provides mean
mean scores for males and females scores for males and females on both
on each of the sub-scales of the the verbal and numerical sub-scale
GRT1 as well as the t-value for mean of the CRTB scales as well as the t-
score differences. value for mean score differences.
Consistent with findings on the While female respondents register
GRT2, the only significant score marginally higher verbal reasoning
difference is registered on the scores (VCR1) than males, this is not
Numerical (NR1) with males regis- statistically significant. A significant
tering a higher mean score. No gender difference was observed on
significant differences were observed the Numerical (NCR1), with males
for either the Verbal or the Abstract registering over four raw score points
components of the GRT1. higher mean score. This is consistent
The effect of age on GRT1 scores with other measures of numerical
was examined using a sample of 499 ability.
respondents on whom age data was The effect of age on CRTB scores
available (see Table 5). Whereas the was examined using a sample of 359
Verbal and Numerical were found to respondents on whom age data was
be unrelated to age, the Abstract available (see Tabel 7). Unlike the
showed a significant negative rela- General and Graduate Reasoning
tionship, consistent with expecta- tests, the correlations with age are
tions. The correlations are lower positive for the Critical Reasoning,
than those obtained with the GRT2, although only marginal and non-
although this may be explained by significant for the Verbal. This
the sample which in this case is more suggests that the Numerical at least,
restricted in the range of observed may be measuring more of an
scores. acquired ability, which is more posi-
tively influenced by experience than
other more classic measures of
numerical ability.
cn
Mean Mean Females Males
GRT1 Females Males t-value df p N N
VR1 16.73 16.61 .260 497 .795 251 248
NR1 12.98 14.91 -4.255 497 .000 251 248
AR1 14.11 14.27 -.383 497 .702 251 248
Table 4: Gender Differences on GRT1
GRT1 AGE
VR1 -.00
NR1 -.05
AR1 -.27
Table 5: Relationship between Age and GRT1
CRTB F M t-value df p F M
VCR1 18.34 17.78 .501 134 .6178 47 89
NCR1 10.19 14.40 -3.578 134 .0004 47 89
Table 6: Gender differences on CRTB
CRTB AGE
VRC1 .03
NRC1 .18
Table 7: Relationship between Age and CRTB

RELIABILITY OF THE
co REASONING TESTS
If a reasoning test is to be used for GRT1 INTERNAL CONSISTENCY
selection and assessment purposes Table 9 presents alpha coefficients
the test needs to measure each of the for the three sub-scales of the GRT1
aptitude or ability dimensions it is (n=109). Each of these reliability
attempting to measure reliably, for coefficients is greater than .8, clearly
the given population (e.g. graduate demonstrating that the graduate
entrants, senior managers etc.). That version of the GRT meets acceptable
is to say, the test needs to be consis- levels of internal consistency.
tently measuring each ability so that
if the test were to be used repeatedly GRT1 TEST-RETEST
on the same candidate it would A sample of 70 undergraduate
produce similar results. students completed the GRT1 on two
It is generally recognised that separate occasions with a four week
reasoning tests are more reliable interval. Table 10 provides uncor-
than personality tests and for this rected correlations for each measure.
reason high standards of reliability Although the Abstract falls some-
are usually expected from such tests. what below the ideal, the test-retest
While many personality tests are correlations are generally of a high
considered to have acceptable levels and acceptable level which, in
of reliability if they have reliability conjunction with the internal consis-
coefficients in excess of .7, reasoning tency data would demonstrate that
tests should have reliability coeffi- the GRT1 is a reliable measure of
cients in excess of .8. general reasoning ability.
Table 11 presents alpha coeffi-
GRT2 INTERNAL CONSISTENCY cients for the two sub-scales of the
Table 8 presents alpha coefficients Critical Reasoning Test. Each of
for the three sub-scales of the GRT2 these reliability coefficients is .8 or
(n=135). Each of these reliability greater, clearly demonstrating that
coefficients is substantially greater Critical Reasoning Tests reach
than .8, clearly demonstrating that acceptable levels of reliability.
general population version of the
GRT is highly reliable.
cp
GRT2 r
Verbal (VR2) .83
Numerical (NR2) .84
Abstract (AR2) .83
Table 8: Coefficient Alpha for GRT2

Sub-scales (n = 135)
GRT1 r
Verbal (VR1) .82
Numerical (NR1) .85
Abstract (AR1) .84
Table 9: Coefficient Alpha for GRT1

Sub-scales (n = 109)
GRT1 VR1 NR1 AR1
VR1 .79 .31 .09
NR1 .35 .78 .39
AR1 .22 .38 .74
Table 10: Test-retest reliability esti-

mates for GRT1 (N=70)
CRTB r
Verbal Critical Reasoning (VCR1) .80
Numerical Critical Reasoning (NCR1) .82
Table 11: Coefficient Alpha for CRTB Sub-scales (N=134)

cq METHOD EFFECTS
One important aspect of the change Roid was small and consisted
from paper and pencil to computer- entirely of American investigations.
based administration, concerns the As publishers of a wide range of
effects the change in test format tests which can be administered both
might have on the available norma- by paper and pencil and computer,
tive data for a test. In other words Psytech International recently
does the translation of a test to conducted a study looking at the
computer format alter the nature of effect of administration format on
the test itself? It is possible that the reasoning test performance. The Test
score range of a test administered in chosen was the Graduate Reasoning
question booklet form may be differ- Test, a graduate level test comprising
ent from the range when the test is three sub-tests – verbal, numerical
administered via a computer. Given and abstract. This test was chosen as
that most of the normative data being representative of many main-
available for mainstream psychomet- stream reasoning tests. The Graduate
ric tests was collected from paper Reasoning Test also gives an oppor-
and pencil test administration the tunity to test whether the representa-
question is far from academic. For tional format of a test is important.
instance, many organisations will It could be the case that computer
administer computer-based tests at presentation of graphical material,
their head office location but will use such as is found in mechanical and
paper and pencil format when their spatial reasoning tests, might lead to
testers visit remote locations. If performance differences while alpha-
comparison is to be made across a numeric material does not. With the
group in which some individuals GRT1 the abstract subtest uses a
received paper and pencil testing and graphical format while the verbal
some computer testing then the and numerical sub-scales use alpha-
importance of the question of numeric formats.
normative comparability can not be
overstated.
Very little research attention has
been paid to this topic, which is
surprising given the possible impact
that format differences would have.
Roid (1986) has indicated that what
little evidence is available concerning
paper and pencil v. computer
formats suggests that computer
administration of most tests does not
change the score range enough to
affect the normative basis of the test.
While this is somewhat reassuring
the number of studies looked at by
cr
THE STUDY RESULTS CONCLUSION
A group of 80 university undergrad- An independent t-test was used to The results of this study show that as
uates took part in this investigation. test whether any differences existed far as the Graduate Reasoning Test,
The students were divided into four between the mean scores on first at least, is concerned the format in
groups. Each group completed the administration for those students which the test is administered does
Graduate Reasoning Test twice with who completed the paper version of affect, to some extent, the scores
an interval of two weeks between the GRT1 and those receiving the obtained. No differences between the
successive administrations. Two of GeneSys Administered version. As group means were detected for either
the groups completed the same can be seen from Table 12 a signifi- of the three sub-tests of the Graduate
format of the test on each occasion – cant difference was found for both Reasoning Test. It was also the case
i.e. paper/paper or the Numerical and Abstract sub- that no difference was found
computer/computer. The other two scales in that students who had between the graphical representation
groups experienced both formats of received the computer administered of the abstract sub-test and the
the test, each group in a different version of the test tended to score alpha-numerical representation of
order – i.e. paper/computer or significantly higher than those who the verbal and numerical sub-tests.
computer/paper. had received the paper version first. Furthermore no interaction effects
Table 12 provides similar data for were observed which indicates that
the two test formats on second the well established phenomenon of
administration. It can be seen from ‘practice effects’ does not differ with
this table that no significant differ- the nature of the test medium.
ences existed between the different These results provide some confi-
formats for second administration. dence that the changeover from
paper and pencil to computer-based
testing will not require the re-stan-
dardisation of tests. It would seem
from this study that administration
of ability tests by either computer or
paper and pencil will produce similar
performance levels. Thus it is
perfectly acceptable to compare indi-
viduals who were tested using differ-
ent formats.
GRT1 Mean Mean SD- F-ratio P

PC Paper t-value df p N-PC N-Paper SD-PC Paper variance variance
VR1
19.09 18.48 .876 158 .382 80 80 4.26 4.58 1.15 .532
NR1
14.53 13.86 .927 158 .355 80 80 3.97 5.01 1.59 .0419
AR1
15.58 14.79 1.117 158 .266 80 80 3.81 5.03 1.74 .0146
Table 12: Differences between Computer vs. Conventional Methods of Test Administration
cs VALIDITY
Whereas reliability assess the degree know that the results are reliable and
of measurement error of a reasoning that the test is measuring the apti-
test, that is to say the extent to which tude it is supposed to be measuring.
the test is consistently measuring one Thus after we have examined a test’s
underling ability or aptitude, validity reliability we need to address the
addresses the question of whether or issue of validity. We traditionally
not the scale is measuring the char- examine the reliability of a test
acteristic it was developed to before we explore its validity as reli-
measure. This is clearly of key ability sets the lower bound of a
importance when using a reasoning scale’s validity. That is to say a test
test for assessment and selection cannot be more valid than it is reli-
purposes. In order for the test to be a able.
useful aid to selection we need to
GRT1 sub-scale VR1 NR1 AR1
Verbal (VR1) _ .42 .30
Numerical (NR1) _ .55
Abstract (AR1) _
Table 13: Product-moment Correlations

between the GRT1 Sub-scales (n=499)
GRT2 sub-scale VR2 NR2 AR2
Verbal (VR2) _ .60 .56
Numerical (NR2) _ .65
Abstract (AR2) _

between the GRT2 Sub-scales (n = 1441)
THE STRUCTURE OF THE
REASONING TESTS ct
Specifically we are concerned that THE GRADUATE REASONING THE GENERAL REASONING
the test’s sub-scales are correlated TESTS (GRT1) TESTS (GRT2)
with each other in a meaningful way. Table 13, which presents Pearson Table 14, which presents Pearson
For example, we would expect the Product-moment correlations Product-moment correlations
different sub-scales of a reasoning between the three sub-scales of the between the three sub-scales of the
test to be moderately correlated as GRT1 demonstrates two things. GRT2 demonstrates two things.
each will be measuring a different Firstly, the relatively strong correla- Firstly, the relatively strong correla-
facet of general reasoning ability. tions between each of the sub-scales tions between each of the sub-scales
Thus if such sub-scales are not corre- indicate that each is measuring one indicate that each is measuring one
lated with each other we might facet of an underlying ability. This is underlying characteristic, which in
wonder whether each is a good clearly consistent with the design of this case we might assume to be
measure of reasoning ability. this test, where each sub-scale was reasoning ability or mental alertness.
Moreover, we would expect different intended to assess a different facet of Thus these relatively strong correla-
facets of verbal reasoning ability reasoning ability or mental alertness. tions between the sub-scales are
(e.g. vocabulary, similarities etc.) to Secondly, the fact that each sub-scale consistent with our intention to
be more highly correlated with each accounts for less than 30% (r < .55) construct a test which measures
other than they are with a measure of the variance in the other sub- general reasoning ability. Secondly,
of numerical reasoning ability. scales indicates that the Verbal, the fact that each sub-scale accounts
Consequently, the first way in which Numerical and Abstract Reasoning for less than 45% (r < .65) of the
we might assess the validity of a sub-scales of the GRT1 are measur- variance in the other sub-scales indi-
reasoning test is by exploring the ing different facets of reasoning cates that the Verbal, Numerical and
relationship between the test’s sub- ability, as they were designed to. Abstract Reasoning sub-scales of the
scales. Moreover, this is what we would in GRT2 are still measuring distinct
fact predict from research in the area aspects of reasoning ability.
intelligence testing (Heim, 1970).
THE CRITICAL REASONING
TESTS (CRTB)
Table 15, which presents the Pearson
Product moment correlation between
the two sub-tests of the CRTB,
demonstrates that while the Verbal
and Numerical sub-tests are margin-
ally correlated, they nevertheless
measuring quite distinct abilities,
CRTB Verbal Numerical sharing only 10% of common vari-
ance.
Verbal 1 .352
Numerical .352 1

between CRTB Verbal & Numerical (n=352)
THE CONSTRUCT VALIDITY
dk OF THE REASONING TESTS
As an evaluation of construct valid- THE RELATIONSHIP BETWEEN
ity, the Psytech Reasoning Tests were THE GRT1 AND AH5
administered with other widely used Table 16 presents product-moment
measures of similar constructs. correlations between the sub-scales
In the case of the General and and the total scale scores of the AH5
Graduate reasoning tests, the AH and the GRT1. The AH5, which has
series was considered to be a suitable been developed for use on a graduate
external measure. The AH Series of population, has a Verbal/Numerical
tests is one of the most widely sub-scale which combines verbal and
respected range of reasoning tests numerical items and an Abstract, or
which have been developed on a Diagrammatic, reasoning sub-scale.
U.K. population. Within this series These along with the total scale score
there are tests which have been on the AH5 (the sum of the sub-
specifically designed for use on both scales) were correlated with the
the general population (AH2/ AH3/ GRT1 sub-scale scores and the total
AH4) and the graduate population scale score. The correlations with the
(AH5/AH6). Developed by Alice total scale scores were included
Heim (1968, 1974) of Cambridge within this table, even though the
University the AH series has become total scale scores are simple compos-
something of a benchmark against ites of the sub-scale scores, as they
which to compare the performance provide a measure of general (g),
of other reasoning tests. rather than specific (e.g. verbal,
As the original Critical Thinking numerical etc.) mental aptitudes.
Appraisal, The Watson-Glaser (W- Table 16 provides clear support
GCTA) (Watson & Glaser (1991) has for the concurrent validity of the
set the standard in the assessment of GRT1 against the AH5. The correla-
abilities that are of relevance in tions between each of the GRT1 sub-
management decision-making. scales and their comparable AH5
Thus we chose to explore the sub-scales are high, indicating that
construct validity of the GRT1 and they are measuring similar
the GRT2 by comparing their perfor- constructs. In addition, this table
mance against that of the AH3 and provides some evidence in support of
AH5 respectively, while the construct the discriminant validity of these
validity of the CRTB is examined by sub-scales. That is to say, each of the
comparing its performance against GRT1 sub-scales is more highly
both the AH5 and the Watson-Glaser correlated with its comparable sub-
Critical Thinking Appraisal. scale on the AH5 than it is with the
AH5 sub-scale measuring a different
specific mental ability. For example,
while the VR1 has a correlation of
.69 with the Verbal/Numerical sub-
dl
scale of the AH5, its correlation with SUB-SCALE VR1 NR1 AR1 TOTAL
the Abstract sub-scale is only .35. In
addition, the extremely high correla- AH5 Verbal/Numerical .69 .70 .35 .70
tion (r=.84) between the total scale
scores of the AH5 and the GRT1 indi- AH5 Abstract .51 .67 .72 .74
cates that, as a whole, this battery of
tests is a good measure of general AH5 Total .69 .79 .65 .84
reasoning ability (g).
Table 16: Product-moment correlations between the GRT1 and
THE RELATIONSHIP BETWEEN AH5 Sub-scales
THE GRT2 AND AH3
Table 17 presents Pearson Product-
moment correlations (n=81) between GRT
each of the GRT2 sub-scales with each SUB-SCALE VR2 NR2 AR2 Total
of the sub-scale scores of the AH3 and
the total scale score. In addition to the AH5 Verbal .63 .63 .61 .73
high correlations between these sub-
scales it is also worth noting the AH3 Numerical .58 .76 .76 .70
extremely high correlation (r=.82)
between the total scale scores on these AH5 Perceptual .54 .55 .56 .76
two tests. These results clearly demon-
strate that the GRT2 is measuring the AH5 Total .70 .78 .78 .82
trait of general reasoning ability which
is assessed by the AH3. We should Table 17: Product-moment Correlations between the
however note that the correlations GRT2 and AH3 Sub-scales
between the sub-scales provide no
clear support for the discriminant
validity of the GRT2. That is to say,
the correlations between each of the
GRT2 sub-scales and their respective
AH3 sub-scales (e.g. the VR2 with the
AH3 Verbal) are not significantly
higher than are the correlations across
sub-scales (e.g. the VR2 with the AH3
Numerical and Verbal).
dm
RELATIONSHIP WITH GRT2 THE RELATIONSHIP BETWEEN
AND OTHER MEASURES THE CRTB AND AH5
As a part of a number of data-collec- In Tables 20 and 21 we present two
tion exercises, the GRT2 was applied sets of data supporting the concur-
with a number of alternative rent validity of the VCR1 and NCR1.
measures, namely the technical Test The first data set was collected trial
Battery (TTB2) and Clerical Test versions of these two tests which
Battery (CTB2). contained each of the stem
passages/tables which appear in the
Technical Test Battery (TTB2) final tests along with approximately
A sample of 94 trainee Mechanical 80% of the final items. This data
apprentices completed both the was originally collected as part of the
GRT2 and the Technical Test Battery test construction process in order to
as part of a validation exercise. The check that the trial items we had
GRT2 sub-scales register modest constructed were measuring reason-
correlations with the components ing ability, and not some other
measures of the Technical Test construct (e.g. reading ability,
Battery, although this is no more numerical checking etc.). This is
than would be expected from differ- particularly important when
ent aptitude measures with none constructing critical reasoning tests
exceeding .50 (see Table 18). as it is easy to construct items which
are better measures of checking
Clerical Test Battery (CTB2) ability than they are of reasoning
A sample of 54 clerical staff working ability. That is to say, items which
for a major bank completed the simply require carefully scanning
Verbal reasoning Test (VR2) as part and memorising the text in order to
of an assessment of Clerical apti- successfully complete them, rather
tudes which included components of than having to correctly draws
the Clerical Test Battery (CTB2). logical inferences from the text. As
The strongest observed correlation can be seen from the table, our trial
was not with the Spelling measure items clearly appear to be measuring
(SP2) as expected but with Office ability, as scores on both the VCR1
Arithmetic (NA2). Examination of and NCR1 are strongly correlated
NA2 does reveal a fairly high verbal with the AH5.
problem-solving element, which may Table 21 presents the correlations
explain this. The Clerical Checking between the Verbal/Numerical sub-
Test (CC2) only registered a modest scale of the AH5 and the final
correlation which is as expected of a version of the two critical reasoning
measure which relies only to a tests. This data therefore provides
limited extent on general ability (see evidence in support of the fact that
Table 19). the final versions of these two tests
dn
measure reasoning ability rather than Technical Test Battery VR2 NR2 AR2
some other construct (i.e. verbal or
numerical checking ability). As was MRT2 .45 .45 .38
noted above, when developing critical
reasoning tests it is particularly impor- SRT2 .35 .47 .46
tant to demonstrate that the tests are
measuring reasoning ability, and not VAC2 .34 .40 .40
checking ability.
The above data clearly demon- Table 18: Pearson Correlations of GRT2 with
strates, as does the previous data set, Technical test Battery
that both the VCR1 & NCR1 are
measuring reasoning ability. The size
of these correlations indicate that the CTB2 Sub-scales VR2
Verbal/Numerical sub-scale of the
AH5 and the two critical reasoning Office Arithmetic (NA2) .51
tests share no more than 35% of
common variance, clearly demonstrat- Clerical Checking (CC2) .37
ing that these tests are measuring
different, but related, constructs. This Spelling (SP2) .34
is what we would predict, given that
the VCR1 & NCR1 were developed to Table 19: Pearson Correlations of
measure critical reasoning, rather than GRT2 with Clerical Test Battery
be ‘pure’ measures of mental ability, or
intelligence. Given the nature of criti-
cal reasoning, we would expect a CRTB AH5
candidate’s scores on these tests not
only to reflect general reasoning ability VRC1 .51
or intelligence (g), but also to reflect
verbal and numerical comprehension, NRC1 .52
reading ability, reading speed and
numerical ability and precision. Table 20: Product-moment Correlations
between the Experimental Versions of the
VCR1 & NCR1 and the Verbal Numerical
Sub-scale of the AH5
CRTB AH5
VRC1 .60
NRC1 .51

between VCR1 & NCR1 and AH5
Verbal/Numerical
do
RELATIONSHIP BETWEEN THE THE RELATIONSHIP BETWEEN
CRTB AND WATSON-GLASER THE CRTB AND THE MULTI-
CRITICAL THINKING DIMENSIONAL APTITUDE
APPRAISAL BATTERY
Table 22 details the relationship The CRTB was correlated with
between the CRTB and the Watson- Jackson’s MAB to assess the relative
Glaser Critical Thinking Appraisal. positioning of the CRTB within a
The relationship between the total broad range of abilities. Table 23
score on CRTB and W-GCTA is details the relationship between each
moderate, although this may be due MAB sub-scale and the Verbal and
to its absence of numerical content. Numerical components of the CRTB
However, unexpectedly, the CRTB as well as the CRTB total score. The
Verbal sub-scale does not appear to CRTB total correlates .60 with the
have a higher correlation with the W- MAB total and is equally related to
GCTA than the numerical. In both the MAB Verbal and
summary, while CRTB does not Performance scales (.57 and .52).
appear to be measuring exactly the More specifically at the sub-scale
same construct as the W-GCTA, the level, the CRTB total, relates signifi-
domains do overlap to such an cantly to three of each of the Verbal
extent that this provides some and Performance sub-scales. It is
evidence that the CRTB is a measure more strongly related to the
of critical thinking. Information, Arithmetic and Object
Perception sub-scales (.60, .58, .58).
This it would appear that the CRTB
total measures a composite of both
Verbal and Performance abilities.
When the CRTB is divided into its
sub-scales, the Verbal does not
appear to be strongly related to the
MAB Total but is related to the MAB
verbal scales in particular,
Information and Vocabulary.
As expected, the Numerical
subscale relates more closely with the
MAB Performance scale (.48) and
also correlates significantly with
MAB Arithmetic and Spatial as it
does with the Verbal which can be
expected by the higher verbal
content in VCR1 than more classic
measures of verbal ability.
dp
WGCTA
Verbal .38
Numerical .38
CRTB Total .57

between VCR1 & NCR1 and WGCTA
Verbal Numerical Total
MAB Total .24 .48 .60
MAB Performance .04 .48 .52
MAB Verbal .43 .44 .57
Information .29 .32 .60
Comprehension .25 .44 .52
Arithmetic .24 .45 .58
Similarities .22 .33 .39
Vocabulary .32 .27 .40
Digit Symbol .09 .37 .39
Picture Completion .14 .38 .44
Spatial .19 .50 .58
Picture Arrangement .15 .34 .42
Object Assembly .09 .23 .44
Table 23: Product-moment Correlations between VCR1 &

NCR1 and MAB (n=154)
CRITERION-RELATED
dq VALIDITY
In this section, we provide details of GRT2 did correlate with those
number of studies in which the ratings which were associated with
reasoning tests have been used as skill areas namely, numerical and
part of a pilot study on a sample of software related work (see Table 25).
job incumbents on whom perfor-
mance data was available. SERVICE ENGINEERS
A leading International Crane &
INSURANCE SALES heavy lifting equipment servicing
A sample of 86 Telephone Sales company tested a sample of 46
Personnel with a leading Motor service engineers on the GRT2
Insurance group completed the battery and OPP. Their overall
GRT2 as part of a validation study. performance was rated by supervi-
The results of the GRT2 were corre- sors on a behaviourally anchored five
lated with a number of performance point scale. From Table 26, the
criteria as detailed in Table 24. Verbal (VR2) is found to be strongly
The pattern of correlations related to rated performance (r=.46)
suggests that while there is a rela- and the Abstract (AR2) relating
tionship between GRT2 and perfor- moderately with the same.
mance measures, this is not always
in the anticipated direction. In fact PRINTERS
some notable negative correlations A major local newspaper group with
were observed, indicating that for the largest number of local titles in
some performance criteria, higher the United Kingdom sought to
reasoning ability may be less desir- examine whether tests could predict
able. However, the strongest correla- the job performance of experienced
tions were found between GRT2 sub- printers. A sample of 70 completed
scales and ‘Ins_Nce’ and the VR2 the GRT2 battery as well as a
with overall sales. The consistent number of other measures including
negative correlations with ‘Sales PH’ the OPP (Occupational Personality
would require further examination. Profile) and TTB (Technical Test
Battery). Each of the group were
BANKING assessed on a number of perfor-
A sample of 118 retail bankers mance criteria by supervisors. In
completed the GRT2 and a personal- addition, test data were correlated
ity measure OPP as part of a concur- with the results of a job sample print
rent validation exercise. Participants test which was administered at selec-
were rated on a range of competen- tion stage. Table 27 details the
cies which focused primarily on results of this study.
personal qualities as opposed to abil- Some noteworthy correlations
ities. As expected, GRT2 failed to were registered with GRT2 and
relate strongly to the overall compe- performance measures. Firstly,
tency rating which based on a overall performance is highly corre-
composite of ratings covering such lated with the Abstract (AR2) but
diverse areas as Orderliness, also moderately with the Verbal and
Planning, Organising, Teamwork etc. Numerical sub-scales. The job
dr
sample criterion measure generally Verbal Numerical Abstract
registers higher correlations with each
of the GRT2 sub-scales, reaching .41 Sales PH -.26 -.27 -.27
with the Abstract (AR2). Perhaps more
surprising, the Abstract correlates .56 Conversions% -.20 -.21 -.29
with a supervisor’s rating of Initiative,
consistent with the behavioural Ins_Nce .32 .25 .33
descriptions used for this performance
rating, which point to not simply Sales .27 .14 .13
taking initiative but being able to
successfully resolve problem situations. Table 24: Pearson Product Moment Correlations with
The single totally objectively GRT2 and job performance in an Insurance Setting
derived performance measure was
time-keeping, which was based on an Sales PH Sales per hour
electronic time and attendance record- Conversions % Percent of Conversions from other policies
ing system. Only one significant corre- Ins_NCE Composite Training Outcome Measure
lation was observed with Abstract Sales Total Value of Policies sold
(AR2) although this was negative,
suggesting that those with higher
ability tended to have a poorer time GRT2 PERF_ARI PERF_SOF Competency
and attendance record.
VR2 .14 .03 -.02
NR2 .29 .32 .12
AR2 .31 .28 .01
Table 25: Pearson Product Moment Correlations with

GRT2 and Performance criteria in Banking (N=118)
GRT2 Overall Criterion Verbal Numerical Abstract
Measure Performance Overall Performance .26 .28 .36
Verbal .46 Performance Job Sample .33 .30 .41
Numerical -.24 Initiative .40 .44 .56
Abstract .28 Time Keeping _ _ .32
Table 26: Pearson Product Table 27: Correlations Between GRT2 & Printer Performance
Moment correlations between Criteria (N=70)
GRT2 & Service Engineer
Performance
ds
FINANCIAL SALES TRAINING APPLICANTS FOR
CONSULTANTS CAR COMPONENT TRAINING
A sample of 100 trainee Financial COURSE
Consultants from a major financial A large training company used the
services group completed the GRT2 General Reasoning Test to investi-
as part of a validation study. Table gate the ability profiles of success-
28 details the correlations between ful/non-successful applicants for
GRT2 and end of year examinations. training on a car components assem-
The results indicate that the bly task. The criterion of success
Numerical (NR2) appears to be the used was only in part determined by
best predictor of examination perfor- training outcomes, with other factors
mance with correlations (up to .46), also played a part, such as perceived
although the Abstract (AR2) also attitude, temperament etc. A sample
registers some highly notable corre- of 150 applicants was used for the
lations (up to .44). Only the Verbal study.
(VR2) fails to relate to any of the Both the Verbal (VR2) and
examination results which is perhaps Abstract (AR2) register moderate
somewhat unexpected. correlations with success on the
programme. The numerical fails to
relate with this criterion (see Table
29).
Criterion VR2 NR2 AR2
Protection Clusters Result .10 .31 .35 GRT2 Sub-scale Success
Pension exam Results .04. .40 .32 Verbal .27
Seller Induction Exam Results .18 .26 .32 Numerical .16
Aggregate Result .11 .46 .44 Abstract .30
Financial Planning Certificate Result .13 .44 .42 Table 29: Correlations between
OPP & Successful Applicant for
Table 28: Correlations between GRT2 & Proficiency Criteria Component Course
5
ADMINISTRATION
INSTRUCTIONS
● CRTB Administration Instructions
● GRT1 Administration Instructions
● GRT2 Administration Instructions

CRTB ADMINISTRATION
ek INSTRUCTIONS
BEFORE STARTING THE QUESTIONNAIRE
Put candidates at their ease by giving information about yourself, the

purpose of the questionnaire, the timetable for the day, if this is part of a
wider assessment programme, and how the results will be used and who will
have access to them. Ensure that you and other administrators have switched
off mobile phones etc.
The instructions below should be read out verbatim and the same script
should be followed each time the CRTB is administered to one or more
candidates. Instructions for the administrator are printed in ordinary type.
Instructions designed to be read aloud to candidates are a larger typeface
with speech marks.
If this is the first or only questionnaire being administered give an introduc-

tion as per or similar to the following example (prepare an amendment if not
administering all tests in this battery):
‘From now on, please do not talk among yourselves, but

ask me if anything is not clear. Please ensure that any
mobile telephones, pagers or other potential distractions are
switched off completely. We shall be doing two tests: the
Verbal Critical Reasoning Test, which takes 20 minutes and
the Numerical Critical Reasoning Test, which takes 30
minutes. During the test I shall be checking to make sure
that you are not making any accidental mistakes when
filling in the answer sheet. I will not be checking your
responses to see if you are answering correctly or not.’
WARNING: It is most important that answer sheets do not go astray. They

should be counted out at the beginning of the test and counted in again at
the end.
Continue by using the instructions EXACTLY as given. Say:
DISTRIBUTE THE ANSWER SHEETS
Then ask:
‘Has everyone got two sharp pencils, an eraser, some rough

paper and an answer sheet. Please note that the answer
boxes are in columns (indicate) and remember, do not write
on the booklets.’
Rectify any omissions, then say:

el
‘Print your surname, first name and title clearly on the line
provided, followed by your age and sex. Please insert
today’s date which is [ ] on the “Comments” line’
Walk around the room to check that the instructions are being followed.
WARNING: It is vitally important that test booklets do not go astray. They

should be counted out at the beginning of the session and counted in again at
the end.
DISTRIBUTE THE BOOKLETS WITH THE INSTRUCTION:
‘Please do not open the booklet until instructed.’
Remembering to read slowly and clearly, go to the front of the group and say:
‘Please open the booklet at Page 2 and follow the instruc-

tions for this test as I read them aloud.’
Pause to allow booklets to be opened.
‘In this test you have to draw inferences from short

passages of text. You will be presented with a passage
(there are 6 pieces of text in all) followed by a number of
statements. Your task is to decide whether or not each
statement can be inferred from the passage.
For each statement you will be asked to decide which of the

five response categories on Page 3 correctly describes
whether the statement can be inferred from the passage.
Mark your answer by filling in the appropriate box on your

answer sheet that corresponds to your choice. On Page 3 of
this question booklet there are some example questions.
Once you have fully read the instructions you will have a
chance to complete the example questions in order to make
sure that you understand the test.
Please read the instructions and categories.’
Pause while the candidates read the instructions, then say:
‘Please attempt the example questions now.’
While the candidates are doing the examples, walk around the room to check
that everyone is clear about how to fill in the answer sheet. Make sure that
no-one is looking at the actual test items during the example session. When
all have finished (allow a maximum of two and a half minutes) give the
answers as follows:
em
‘The correct response to Example 1 is Probably False.
It is not explicitly stated within the text that voters are
concerned about the economy.
The correct response to Example 2 is Definitely True

It is explicitly stated that economic policies are often deter-
mined by short term electoral demands.
The correct response to Example 3 is Definitely False

It explicitly states within the text that election results can
be predicted from economic indicators.’
Check for understanding, then say:
‘Time is short so when you begin the timed test work as

quickly and as accurately as you can.
If you want to change an answer, fully erase your first

choice, and fill in your new choice of answer.
There are 6 passages of text and a total of 36 questions.

You have 20 minutes in which to answer the questions.
If you reach the “End of Test” before time is called you

may review your answers if you wish.
If you have any questions please ask now, as I will not be

able to answer questions once the test has started.’
Then say very clearly:
‘Is everyone clear about how to do this test?’
Deal with any questions appropriately, then, starting stop-watch or setting a

count-down timer on the word BEGIN say:
‘Please turn over the page and begin.’
Answer only questions relating to procedure at this stage, but enter in the
Administrator’s Test Record any other problems which occur. Walk around
the room at appropriate intervals to check for potential problems.
At the end of the 20 minutes, say:
‘Please now turn to Page 10 which is a blank page.’
NB: If this is the final test to be used in this battery, instead of the above
line, please turn to the instructions in the Ending Test Session box on page
45 of this document
en
You should intervene if candidates continue after this point.
Then say:
‘We are now ready to start the next test. Has everybody still
got two sharpened pencils, an eraser and some unused
rough paper?’
If not, rectify, then say:
‘The next test follows on the same answer sheet, please

locate the section now.’
Check for understanding, then, remembering to read slowly and clearly, go to

the front of the group and say:
‘Please turn to Page 12 of the booklet and follow the

instructions for this test as I read them aloud.’ (Pause to
allow page to be found).
In this test you have to draw inferences from numerical

information which is presented in tabular form.
You will be presented with a numerical table and asked a

number of questions about this information. You will then
have to select the correct answer to each question from one
of six possible choices. One and only one answer is correct
in each case.
Mark your answer by filling in the appropriate box on your

answer sheet that corresponds to your choice.
You now have a chance to complete the example questions

on Pages 13 in order to make sure that you understand the
test.
Please attempt the example questions now, using the rough

paper if necessary.’
eo
no-one is looking at the actual test items during the example session. When
all have finished (allow a maximum of three minutes) give the answers as
follows:
‘Please follow the answers and explanations carefully.
The correct answer to Example 1 is Design (answer no. 5).

Amongst women, design was consistently chosen by the
lowest percentage as the most important feature of a car.
The correct answer to Example 2 is Economy (answer no.

2).
Of all the features of a car, economy is the only feature for
which the ratings increase according to age category.
The correct answer to Example 3 is 10.4 (answer no.5).

Of males below 30, 5% identified safety and 52% identified
performance as the most important feature of a car. 5 over
52 is 10.4.
Please do not turn over the page yet’
Then say:
‘Time is short so when you begin the timed test work as


There are 5 tables of information and a total of 25 ques-

tions. You have 30 minutes in which to answer the ques-
tions.
If you reach the “End of Test” before time is called you

may review your answers if you wish, but do not go back to
the Verbal Reasoning Test.
If you have any questions please ask now, as you will not be
able to ask questions once the test has started.’

ep
At the end of the 30 minutes:
ENDING THE TEST SESSION
Say:
‘Stop now please and close your booklet’
COLLECT THE ANSWER SHEETS AND THE TEST BOOKLETS,

ENSURING THAT ALL MATERIALS ARE RETURNED (COUNT
BOOKLETS AND ANSWER SHEETS)
Then say:
‘Thank you for completing the Critical Reasoning Test

Battery.’
GRT1 ADMINISTRATION
eq INSTRUCTIONS

should be followed each time the GRT1 is administered to one or more
with speech marks.
If this is the first or only questionnaire being administered, give an introduc-


switched off completely. We shall be doing three tests:
verbal, numerical and abstract reasoning. The tests take 8,
10 and 10 minutes respectively to complete. During the test
I shall be checking to make sure that you are not making
any accidental mistakes when filling in the answer sheet. I
will not be checking your responses to see if you are
answering correctly or not.’

the end.
Then ask:

paper and an answer sheet.’
provided and indicate your title, sex and age by ticking the
appropriate boxes. Please insert today’s date which is [ ]’
er

the end.

‘This test is designed to assess your understanding of words

and relationships between words. Each question has six
possible answers. One and only one is correct in each case.
Mark your answer by filling in the number box on your
answer sheet that corresponds to your choice. You now
have a chance to complete the four example questions on
Page 3 in order to make sure that you understand the test.
Please attempt the example questions now, marking your

answers in boxes E1 to E4.’
Indicate section
nobody is looking at the actual test items during the example session. When
all have finished (allow a maximum of two minutes) give the answers as
follows:
‘The answer to Example 1 is number 2, sick means the

same as ill.
The answer to Example 2 is number 3, you drive a car and

fly an aeroplane.
The answer to Example 3 is number 5, wood is the odd one

out.
The answer to Example 4 is number 4, as both heavy and

light have a relationship to weight.
Is everyone clear about the examples?’

es Then say:
‘REMEMBER:
Time is short, so when you begin the timed test, work as

If you want to change an answer, simply erase your first

choice and fill in your new answer.
There are a total of 30 questions and you have 8 minutes in

which to answer them.
If you reach the end before time is called you may review
your answers if you wish.

‘Stop now please and turn to Page 12.’
NB: If this is the final test to be used in this battery, instead of the above line,
please turn to the instructions in the Ending Test Session box on page 52 of
this document. If you are skipping a test, please find the appropriate test
bookmark for your next test in the margin of the page and replace the above
line as necessary.
Then say:
‘We are now ready to start the next test. Has everyone still
got two sharpened pencils, an eraser, some unused rough
paper?’
et
Indicate section
Check for understanding, then remembering to read slowly and clearly, go to

‘Please ensure that you are on Page 12 of the booklet and

follow the instructions for this test as I read them aloud.’
Pause to allow page to be found.
‘This test is designed to assess your ability to work with

numbers. Each question has six possible answers. One and
only one is correct in each case. Mark your answer by
filling in the appropriate box that corresponds to your
choice on the answer sheet.’
You now have a chance to complete the four example questions on Page 13
in order to make sure that you understand the test. Please attempt the
example questions now, marking your answers in the example boxes.
Indicate section
follows:
‘The answer to Example 1 is number 5, the sequence goes

up in twos.
The answer to Example 2 is number 4, as all other frac-

tions can be reduced further.
The answer to Example 3 is number 2, 100 is 10 times 10.
The answer to Example 4 is number 5, the journey will

take 1 hour and 30 minutes.

fk Then say:
‘Time is short, so when you begin the timed test work as


choice and fill in your new choice of answer.
There are a total of 25 questions and you have 10 minutes

in which to attempt them.
your answers to the numerical test if you wish, but do not
go back to the verbal test.
Deal with any questions, appropriately, then, starting stop-watch or setting a

‘Please turn over the page and begin’
‘Stop now please and turn to Page 20’
line as necessary.

Then say:
paper?’
fl
Indicate section


‘In this test you will have to work out the relationship
between abstract shapes and patterns.
Each question has six possible answers. One and only one is
correct in each case. Mark your answer by filling in the
appropriate box that corresponds to your chosen answer on
your answer sheet. You now have a chance to complete the
three example questions on Page 21 in order to make sure
that you understand the test.

answers in the example boxes.’
Indicate section
all have finished, (allow a maximum of two minutes) give the answers as
follows:
‘The answer to Example 1 is number 5, as the series alter-

nates between 2 and 4 squares as does the direction of the
two squares which return to their original position.
The answer to Example 2 is number 4, as all of the other

options have an open side to one of the boxes.
The answer to Example 3 is number 6, as this is a mirror

image of the pattern.

fm Then say:
‘Time is short, so when you begin the timed test, work as



If you reach the end before time is called, you may review
your answers to the abstract test, but do not go back to the
previous tests.

Say:

Then say:
‘Thank you for completing the Graduate Reasoning Test.’

GRT2 ADMINISTRATION
INSTRUCTIONS fn

with speech marks.



the end.
Then ask:

appropriate boxes. Please insert today’s date which is [ ].’
fo

the end.



Indicate section
follows:

same as ill.

fly an aeroplane.

out.
The answer to Example 4 is number 5, as dark means the

opposite of light.

Then say:
fp
‘REMEMBER:




line as necessary.
Then say:
paper?’
fq If not, rectify, then say:

Indicate section


‘This test is designed to assess your ability to understand

numbers and the relationship between numbers. Each
question has six possible answers. One and only one is
the answer sheet.
Indicate section
follows:

up in twos.

The answer to Example 4 is 5, the journey will take 1 hour

30 minutes.

Then say:
fr



line as necessary.
Then say:
paper?’
fs If not, rectify, then say:

Indicate section


between shapes and figures.

Indicate section
follows:




Then say:
ft


previous tests.

Say:

Then say:
‘Thank you for completing the General Reasoning Test.’

gk
6 REFERENCES
Binet. A (1910) Les idées modernes
sur les enfants Paris: E. Flammarion.
Cronbach L.J. (1960) Essentials of

Psychological Testing (2nd Edition)
New York: Harper.
Galton F. (1869) Heriditary Genuis

London: MacMillan.
Budd R.J. (1991) Manual for the
Clerical Test Battery: Letchworth,
Herts UK: Psytech International
Limited
Budd R.J. (1993) Manual for the

Technical Test Battery: Letchworth,
Herts UK: Psytech International
Limited
Gould, S.J. (1981). The Mismeasure Stern W (1912) Psychologische

of Man. Harmondsworth, Middlesex: Methoden der Intelligenz-Prüfung.
Pelican. Leipzig, Germany: Barth
Heim, A.H. (1970). Intelligence and Terman, L.M. et. al., (1917). The
Personality. Harmondsworth, Stanford Revision of the Binet-Simon
Middlesex: Penguin. scale for measuring intelligence.
Baltimore: Warwick and York.
Heim, A.H., Watt, K.P. and
Simmonds, V. (1974). AH2/AH3 Watson & Glaser (1980) Manual for
Group Tests of General Reasoning; the Watson-Glaser Critical Thinking
Manual. Windsor: NFER Nelson. Appraisal Harcourt Brace
Jovanovich: New York
Jackson D.N. (1987) User’s Manual
for the Multidimensional Aptitude Yerkes, R.M. (1921). Psychological
Battery London, Ontario: Research examining in the United States army.
Psychologists Press. Memoirs of the National Academy of
Sciences, 15.
Johnson, C., Blinkhorn, S., Wood, R.
and Hall, J. (1989). Modern
Occupational Skills Tests: User’s
Guide. Windsor: NFER-Nelson.
INTERNATIONAL
general,
critical &
graduate
TEST BATTERY
administration
instructions
2 GRT1 ADMINISTRATION
instructions INSTRUCTIONS

with speech marks.



the end.
Then ask:

appropriate boxes. Please insert today’s date which is [ ]’
3
instructions

the end.



Indicate section
follows:

same as ill.

fly an aeroplane.

out.
The answer to Example 4 is number 4, as both heavy and

light have a relationship to weight.

4
instructions
Then say:
‘REMEMBER:




line as necessary.

Then say:
paper?’
5
instructions

Indicate section


‘This test is designed to assess your ability to work with

numbers. Each question has six possible answers. One and
only one is correct in each case. Mark your answer by
filling in the appropriate box that corresponds to your
choice on the answer sheet.’
Indicate section
follows:

up in twos.

The answer to Example 4 is number 5, the journey will

take 1 hour and 30 minutes.

6
instructions
Then say:




please turn to the instructions in the Ending Test Session box on the final
page of this document. If you are skipping a test, please find the appropriate
test bookmark for your next test in the margin of the page and replace the
above line as necessary.

Then say:
paper?’
7
instructions

Indicate section


between abstract shapes and patterns.

Indicate section
follows:




8
instructions
Then say:



previous tests.

Say:

Then say:
‘Thank you for completing the Graduate Reasoning Test.’

GRT2 ADMINISTRATION 15
INSTRUCTIONS instructions

with speech marks.



the end.
Then ask:

appropriate boxes. Please insert today’s date which is [ ].’
16
instructions

the end.



Indicate section
follows:

same as ill.

fly an aeroplane.

out.
The answer to Example 4 is number 5, as dark means the

opposite of light.

17
instructions
Then say:
‘REMEMBER:




line as necessary.
Then say:
paper?’
18
instructions

Indicate section


‘This test is designed to assess your ability to understand

numbers and the relationship between numbers. Each
question has six possible answers. One and only one is
the answer sheet.
Indicate section
follows:

up in twos.

The answer to Example 4 is 5, the journey will take 1 hour

30 minutes.

19
instructions
Then say:




line as necessary.
Then say:
paper?’
20
instructions

Indicate section


between shapes and figures.

Indicate section
follows:




21
instructions
Then say:



previous tests.

Say:

Then say:
‘Thank you for completing the General Reasoning Test.’

Reasoning Tests Technical Manual

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Reasoning Tests Technical Manual

Hochgeladen von

Copyright:

Verfügbare Formate

INTERNATIONAL

THE GRADUATE & GENERAL REASONING TESTS (GRT1 & GRT2)

THE CRITICAL REASONING TEST BATTERY (CRTB)

THE PSYCHOMETRIC PROPERTIES OF THE REASONING TESTS

THE ORIGINS OF REASONING TESTS

we wish to gain information about a

RELIABILITY OF THE REASONING

6 THE STRUCTURE OF THE REASONING

7 THE CONSTRUCT VALIDITY OF THE

Standardisation : Normative Normally Pearson correlation coeffi-

Reliability : Stability Reliability : Internal

GRT2 GENDER AND AGE

General Population 3177 1236 35.10 16-63 8.87

Telesales Applicants 175 391 27.16 16-55 8.29

HE College Students 5 158 16.25 15-41 3.14

Customer Service Clerks 28 87 26.46 19-50 7.19

Technical Staff 122 24 29.38 16-58 9.92

Financial Consultants 58 20 32.94 20-53 9.20

HR Professionals 30 38 36.31 22-55 7.94

Service Engineers 86 8 26.56 18-58 6.87

Table 1: GRT2 Standardisation Samples

Mean Mean Females Males

VR2 23.36 23.22 .3404 1438 .734 852 588

NR2 15.77 17.91 -5.424 1438 .000 852 588

AR2 16.95 17.40 -1.259 1438 .208 852 588

Table 2: Gender differences on GRT2

Table 3: Pearson correlations between GRT2 and age

VR1 16.73 16.61 .260 497 .795 251 248

NR1 12.98 14.91 -4.255 497 .000 251 248

AR1 14.11 14.27 -.383 497 .702 251 248

Table 4: Gender Differences on GRT1

Table 5: Relationship between Age and GRT1

VCR1 18.34 17.78 .501 134 .6178 47 89

NCR1 10.19 14.40 -3.578 134 .0004 47 89

Table 6: Gender differences on CRTB

Table 7: Relationship between Age and CRTB

Verbal (VR2) .83

Numerical (NR2) .84

Abstract (AR2) .83

Table 8: Coefficient Alpha for GRT2

Verbal (VR1) .82

Numerical (NR1) .85

Abstract (AR1) .84

Table 9: Coefficient Alpha for GRT1

GRT1 VR1 NR1 AR1

VR1 .79 .31 .09

NR1 .35 .78 .39

AR1 .22 .38 .74

Table 10: Test-retest reliability esti-

Verbal Critical Reasoning (VCR1) .80

Numerical Critical Reasoning (NCR1) .82

Table 11: Coefficient Alpha for CRTB Sub-scales (N=134)

GRT1 Mean Mean SD- F-ratio P

GRT1 sub-scale VR1 NR1 AR1

Verbal (VR1) _ .42 .30

Numerical (NR1) _ .55

Table 13: Product-moment Correlations

GRT2 sub-scale VR2 NR2 AR2

Verbal (VR2) _ .60 .56

Numerical (NR2) _ .65

Table 14: Product-moment Correlations

Table 15: Product-moment Correlations

Table 21: Product-moment Correlations