You are on page 1of 19


Philosophy and Language Testing

Glenn Fulcher
University of Leicester, England


Philosophy is concerned with rational thinking about ... the general nature of
the world (metaphysics or theory of existence), the justification of belief (episte-
mology or theory of knowledge) and the conduct of life (ethics or theory of value)
(Honderich, 1995, p. 666). In education and language testing we are concerned
with questions of ontology (what we believe to be true), epistemology (how we
discover what is true), and the consequences of testing (the nature of ethical prac-
tice). This chapter will focus primarily on questions of ontology and epistemology,
as ethics is dealt with separately in Chapter 95. Furthermore, while general agree-
ment among language testers exists on key ethical principles to guide our practice,
there are radical differences of views regarding ontological and epistemological
As far as epistemology is concerned, the question usually boils down to: Should
the human sciences emulate the methods of the natural sciences or should they
develop their own? (Polkinghorne, 1983, p. 15). Realistsheirs to Hobbes, Mill,
and Comte, who believe in the existence of what we observe and test independ-
ently of the observer or testergive special place to the scientific method. Antireal-
ists, on the other hand, usually hold that the constructs we claim to test are not
independent of the language tester or the act of testing. The so-called objects of
our observation exist only in relation to our interpretations of them as they are
locally constructed. They would argue with Dilthey (1883/2008) that the richness
of human experience and culture cannot be captured by methods developed for
the natural sciences. Of particular importance in language testing is the social
turn, which brings critical analysis to test use and impact. There is much room
for disagreement here. Paradigm clashes are not unusual in the social sciences, but
in language testing the fault lines are more pronounced because, for most of its

The Companion to Language Assessment, First Edition. Edited by Antony John Kunnan.
2014 John Wiley & Sons, Inc. Published 2014 by John Wiley & Sons, Inc.
DOI: 10.1002/9781118411360.wbcla032
2 Interdisciplinary Themes

history, it has been firmly grounded in the scientific realism of early quantitative
approaches: One of the most important objects of measurement ... is to obtain a
general knowledge of the capacities of man by sinking shafts, as it were, at a few
critical points (Cattell & Galton, 1890, p. 380). In this chapter I set out the realist
and antirealist positions, realizing that there are many gradations between the two.
I argue that extreme positions on the cline are untenable. I make a case for realism
in the pragmatist tradition, which is not to be associated with the naive realism
that is the target of constructivism. I also recognize the role for critical research,
especially where language testing is misused or abused. I conclude by proposing
an optimistic view of the future within an Enlightenment-inspired framework.
I begin by describing the realist position, and then move on to antirealist
stances. With Bachman (2006, pp. 1967), I distinguish two kinds of antirea
list stance, the constructivist and the operationalist, although I prefer to call the
latter instrumentalist for reasons that will become clear, and because Kane (2006b,
p. 442) explicitly distances his approach to validation from the operationalist posi-
tion. I then discuss two key issues upon which language testers are in fundamental
disagreement because of their philosophical positions. I then briefly indicate the
research each position generates, and outline the challenges they face. Finally, I
suggest a way forward based on classical pragmatism.


Realists hold to the Enlightenment view that the scientific method is the most
productive in empirical research (whether quantitative or qualitative), as expressed
by Popper (1959, p. 3):

A scientist, whether theorist or experimenter, puts forward statements, or systems of

statements, and tests them step by step. In the field of the empirical sciences, more
particularly, he constructs hypotheses, or systems of theories, and tests them against
experience by observation and experiment.

The applicability of realism to social sciences has also been championed by edu-
cationalists such as Dewey, for whom

the scientific method is simply the method of experimental enquiry combined with
free and full discussionwhich means, in the case of social problems, the maximum
use of the capacities of citizens for proposing courses of action, for testing them, and
for evaluating the results. (Putnam, 1990, p. 190)

Theories and evidence that provide the basis for decision making need to be
assessed using generally accepted criteria. In language testing, four have been
suggested (Fulcher & Davidson, 2007, p. 20):

1. Testability: Theory generates predictions that can be tested, specifically to see

whether scores support inferences from test taker responses to skills, abilities,
Philosophy and Language Testing 3

or knowledge, and to investigate if inferences are generalizable, and capable

of extrapolation to the real world.
2. Simplicity (Ockhams Razor): The requirement that the theory does not use
more abstract terms or constructs than are necessary to explain the evidence
3. Coherence: The need to construct theories that are in keeping with what is
already known, as well as for the theory itself to be internally coherent.
4. Comprehensiveness: The requirement that our theories account for as much of
the available data and facts as possible.

It is argued that these criteria are paradigm free and can be used in theory
and model evaluation of any kind. However, the logic of the key criterion of
testability assumes an evidential approach to validation, which in turn presup-
poses that the evidence exists. It seems reasonable that a researcher in any evi-
denced-based discipline must subscribe to this notion, encapsulated in this
summary of Humes position: He holds that objects that have real existence
must have duration and must be independent of what we individually think
about them (Meyers, 2006, p. 63). In order to test theories we must have experi-
ences of enduring objects, events, or states that co-occur to a degree that would
minimally allow us to make statements about the likelihood of, and possible
reasons for, co-occurrence.
In language testing this leads to two claims. First, that individuals have a
stable language competence and capacity for use that endures for some time
even though it is subject to change (through learning or attrition), and that
responses to test items or tasks can be translated into numbers that are indexical
of that competence. This is not to deny that communication is a social act, but
recognizes that, unless an individual has an enduring performable competence,
they cannot engage in anything like the co-construction of discourse (Fulcher,
2003, pp. 1920). Second, that score meaning can be generalized and extrapo-
lated to relevant domains for a reasonable period of time, and with a known
degree of probability: our theory makes predictions about the likelihood of
future events.
Language testing has, for the most part, relied on realist assumptions
throughout its history, partly because it has been largely dependent upon the
normative practices in measurement that Quetelet imported into social science
research from astronomy in the creation of his social physics (1842/1962, p.
9); and, as Hamp-Lyons (2000, p. 582) has argued, The early history of lan-
guage testing on the American side of the Atlantic is part of the larger story of
intelligence testing, which was firmly grounded in positivism. This observa-
tion is largely correct, even if the geographical claim and the reference to posi-
tivism are not. First, there had always been an interest in measurement in the
United Kingdom (Edgeworth, 1888, 1890), and in 1923 Ballard (1923, p. 29)
could write

The British Press refers to mental tests as though they were new things invented by
Americans. In point of fact they are neither new nor American. They have been the
common property of the race since the dawn of history.
4 Interdisciplinary Themes

Ballard cites research by Cyril Burt, as well as the adaptation of the Binet tests.
Second, the label positivism is now typically used pejoratively, and with less
specificity than it deserves. Most researchers who hold a realist position do not
hold positivist views or espouse the verifiability principle (Jordan, 2004, p. 32).
Such a position is nominalist, and therefore profoundly antirealist. In arguing that
only verifiable statements are meaningful, and that only words which refer to
observables are capable of verification, all theoretical words are rendered unin-
telligible (Devitt & Sterelny, 1987, pp. 18990). Without theoretical language, sci-
entific research programs are unattainable; this is why positivism is referred to as
the linguistic turn in philosophy.

Constructionism (or social constructionism) is a postmodern approach that does
not ask about truth, but wishes to uncover the historical and cultural reasons that
led to the currently dominant version of truth. This may take the form of decon-
structing text where no form (particularly scientific) has any special status
(Derrida), or uncovering the power structures that are claimed to marginalize
people while legitimizing the power of the elite (Foucault). Constructivists hold
that our tests and what they measure are contingent upon the social context in
which they are designed and used.
All shades of constructionism are therefore critical, and the basic assumptions
are laid out by Hacking (1999):

0. X is taken for granted. X appears to be inevitable.

1. X need not have existed. X could have been different.
2. X is bad.
3. We would be better off if X were changed, or if X did not exist.

To be a constructivist, it is necessary to subscribe to at least (0) and (1), and it is

(1) that gives constructionism its edge: Our current beliefs and practices, includ-
ing our theories and constructs, are contingent. If a constructivist also holds (2),
she is usually committed to unmasking the evils of X in order to undermine the
power or authority that is associated with it, or wishes to reform aspects of X.
When a constructivist also holds (3) the attacks on X are usually strident, fore-
grounding injustices, marginalization, or subjugation of peoples. In applied lin-
guistics the language becomes one of struggle and conflict, with charges of
cultural imperialism and a determination by the powerful centre (Western
cultures and Anglo-American norms) to keep the periphery in a state of depend-
ence (Phillipson, 1988, p. 348). All groups who can be cast as minority or
downtrodden are drawn into the argument, and labels such as patriarchal,
oppressive, and positivist are attached to alternative views (Pennycook,
Social constructivist schools of thought bring the same critical approach to
knowledge, which for them is also contingent. The concept becomes a battle-
ground in education because constructivists claim that it is the powerful who
decide what knowledge counts and is therefore learned and tested. Testing is
Philosophy and Language Testing 5

seen as the mechanism through which the elite exercise power and maintain their
position (Foucault, 1975, pp. 18494). Questions of inductive inference are irrele-
vant, because all knowledges are equal in value; facts do not help to build,
support, or undermine theories, for the facts emerge only in the context of some
point of view (Fish, 1995, p. 253). The ultimate statement of this extreme position
was provided by Nietzsche (1888, 604):

Interpretation, the introduction of meaning not explanation (in most cases a new
interpretation over an old interpretation that has become incomprehensible, that is
now itself only a sign). There are no facts, everything is in flux, incomprehensible,
elusive; what is relatively most enduring isour opinions.

This carries a number of implications. First, no utterance (consisting of conven-

tional signsor words) can be evaluated in terms of whether it succeeds or fails
to correspond to some external reality. Rather, use of language is a moment-by-
moment attempt to deal with experience, whether of other people or of our envi-
ronment. Attempts to decide if conventional signs fit the facts or describe the
way the world is are futile (Rorty, 1989, p. 121); we are simply negotiating our
way through existence. Reference from conventional signs to the real world as
described by Frege (1892) is no longer of concern. Second, dualism is abolished.
What is language? Nothing but new forms of life constantly killing off old
formsnot to accomplish a higher purpose, but blindly (Rorty, 1989, p. 120). This
nominalism (which constructivism shares with positivism!) makes it equally
meaningless to ask questions about psychological states, as they are transitory and
ephemeral. They simply cannot be known, explained, or predicted. What we are
left with is the transient social construction of meaning on an interaction-by-
interaction basis.

Although I have classed instrumentalism as antirealist, it may be more appropri-
ate to call it nonrealist, because instrumentalists hold that, if a test assists in useful
decision making, that is really all that matters. For instrumentalists the issue of
whether the terms of theories refer to any real entity is simply irrelevant. They
accept Humes fork, and hold that nondeductive (subjective) inference is always
subject to question and error. One argument for instrumentalism is provided by
Laudan (1981a) in his critique of realism, in which he uses historical evidence to
undermine the premise that successful theories have terms that refer. For example,
atomic theory failed to be empirically successful for hundreds of years, while the
miasmatic theory of disease transmission was: it led to policies of moving people
away from ports and introducing quarantine. Thus, theories are evaluated prima-
rily on the grounds of the degree to which they enable us to predict phenomena
and manipulate our environment in useful ways, as we can never be certain that
our terms refer.
Each of the three positions described in the introduction have impacted upon
language testing, leading to incommensurable stances that are explored in the next
6 Interdisciplinary Themes

Current Positions on Key Issues

I have selected two themes for discussion. My rationale is that these best illustrate
fault lines that are directly related to philosophical beliefs.

Constructs/Theoretical Terms
Bachman (2006, pp. 1823) writes: When a researcher observes some phenome-
non in the real world, he generally does this because he wants to describe, induce
or explain something on the basis of this observation. That something is what can
be called a construct. These are nonobservable abstract nouns that are opera-
tionalized in such a way that we may make inferences about them from our
observations (Fulcher & Davidson, 2007, p. 7). Realists minimally subscribe to the
reality of these nonobservables.
This is very close to a correspondence theory of truththe natural home of the
realist. Models of communicative competence/language ability, from Ollers use
of Spearmans g to modern componential approaches, rest on an assumption
that the terms of the theory refer to real competences that are not merely useful
Some researchers explicitly work within this paradigm rather than just assume
it to be the case:

We argue that the validity of any given teaching, learning, and assessment task
whether it is representative, authentic, and generalizableis just a more complex
version of the problem of determining whether a representation of a given state of
affairs is true or not. We provide two logical arguments. Both of them show the
construal (production and interpretation) of surface forms of discourse in order to
represent faithfully (and truthfully) certain changing states of affairs in the real world
is the necessary and sufficient basis for any validity to be found in any teaching,
learning, and assessment tasks whatever. (Badon, Oller, Yan, & Oller, 2005, p. 2)

Badon et al. argue that the validity of a test of aviation English can be evaluated
on the grounds of whether or not language used by pilots, air traffic controllers,
and test takers represents a true state of affairs in the real world. The facts of real
world events must be encoded into recognized conventional signs (linguistic
realizations). Based on Ollers theory of pragmatic mapping, the validity question
becomes whether the construct to be measured exists, and whether variation in
scores is causally linked to variations in the construct. It is therefore necessary to
develop tasks which require test takers to refer to objects and events in the real
world, and use language to control and change events.
The data-based approach to scale development, with its careful analysis of
language use in context, but relating observable variables to constructs such as
discourse management and pragmatics, would sit comfortably within this
kind of interpretation (Fulcher, Davidson, & Kemp, 2011). For this reason we add
the further observation that realist approaches do not abandon context. Rather,

The authenticity, representativeness, and consequent generalizability of teaching,

learning, and assessment tasks depends on their incorporation of the sign systems,
Philosophy and Language Testing 7

social actions, and realia found in actual contexts of discourse. While codes, contexts,
and interactions must be distinguished in theory, in practice they interact holistically.
(Badon et al., 2005, p. 1)

For realists, context is real, not constructed, and so, while it is important to maintain
a connection between the world and conventional signs, realists must also take
seriously implicature and illocutionary intent.
Some would go further and argue that the term construct needs to be distin-
guished from trait, as the former implies that the theoretical term is a construc-
tion of the researcher: It may be part of a nomological net, but does not refer. That
is, construct theorists are said to really be constructivists with a scientific air about
them. For example, they may admit that a number of models could fit their data,
and the theoretical terms could vary by model. In contrast, Blackburn (2005, p.
118) describes a real realist, an industrial strength, meat-eating realist as someone
who holds that (a) there are no such things as constructs, only traits, which refer
to properties that exist in the real world, are discovered not created, and exist
independently of the researcher or theories, and (b) the terms define the properties
in ways that are not contingent. This position is best represented by Borsboom
and colleagues, who argue:

Realism, in the context of measurement, simply says that a measurement instrument

for an attribute has the property that it is sensitive to differences in the attribute; that
is, when the attribute differs over objects then the measurement procedure gives a
different outcome. (Borsboom, Cramer, Kievit, Scholten, & Franic, 2009, p. 148)

Validity in this formulation is equivalent to the existence of what the test meas-
ures, and goes back to the strongest scientific claims for testing made in the 19th
and early 20th centuries. The argument is that only if this ontological claim holds,
then the measurement procedure can be used to find out about the attributes to
which it refers (Borsboom, 2005, p. 152).
Constructivism is incommensurable with all shades of realism. Constructivists
challenge the primary claim that there are facts or traits in the real world that exist
independently of the mind of the researcher or test taker. The world itself is con-
structed. The trail of the human serpent is everywhere.
Do language testers deal with facts or things that exist? McNamara argues
that they do not. He represents a trend in language-testing research that focuses
upon the social nature of language testing, and the dependency of all concepts
and communication on locally situated interaction:

Recent work has drawn attention to the potential of poststructuralist thought in

understanding how apparently neutral language proficiency constructs are inevita-
bly socially constructed and thus embody values and ideologies (McNamara, 2001,
2006). It is worth noting here that the deconstruction of such test constructs applies
no less to constructs in other fields of applied linguistics, notably second language
There is also a growing realization that many language test constructs are explic-
itly political in character and hence not amenable to influences which are not political.
(McNamara, 2006, pp. 378)
8 Interdisciplinary Themes

The constructs have no existence in the external world, and their conventional
names are signs constructed for socialprimarily politicalpurposes. More spe-
cifically, tests play a critical role in the power struggles that constitute identity-
forming social life, and may be deconstructed using Foucaultian insights (Shohamy,
2001, pp. 204, 548). The proper focus of attention is the social construction of
tests, their social impact, and role in policy. Construct labels no longer refer, reduc-
ing them to the embodiment of the values and ideologies at play in the power
struggles of the day.
As a direct consequence, the role of cognition is downplayed in critiques of
validity theories, and the link between performance (observation) and compe-
tence (construct) abolished. Using the notion of performativity from feminist
poststructuralism, McNamara also suggests:

We assume in language testing the existence of prior constructs such as language

proficiency or language ability. It is the task of the language tester to allow them to
be expressed, to be displayed, in the test performance. But what if the direction of
the action is the reverse, so that the act of testing itself constructs the notion of lan-
guage proficiency? (McNamara, 2001, p. 339)

Presumably, in the process of testing, we see just another transitory interaction,

or what Davidson (1980) refers to as a passing theory, in which identity and
meaning are temporarily constructed and deconstructed:

In linguistic communication nothing corresponds to a linguistic competence as often

described ... I conclude that there is no such thing as a language, not if a language
is anything like what many philosophers and linguists have supposed. There is
therefore no such thing to be learned, mastered, or born with. We must give up the
idea of a clearly defined shared structure which language-users acquire and then
apply to cases. (Davidson, 1980, p. 265)

The instrumentalist position makes no assumption about construct reality. Nor

does it admit the necessity of constructs for language testing to be a successful
enterprise. Validity is an issue of whether the testing processes lead to useful
outcomes. This is the primary reason for the move from talk of validity (Messick,
1989) to talk of validation (Kane, 2006a). Although Kane uses the language of
constructs and traits, he argues that The use of trait language does not necessarily
buy us much, and it can be misleading. It can suggest that we have found an
explanation for an observed regularity, when we have merely labelled it (Kane,
2006a, p. 30). Such an error is defined as reification (Kane, 2006a, p. 59). Kane
(2009, pp. 547) has also argued that it is possible to avoid construct language
completely, scoring only relevant observable variables displayed in tasks sampled
from the universe of generalization. Chapelle, Enright, and Jamieson (2010)
embrace this position, arguing that the construct of academic language proficiency
has proved too difficult to define and articulate as a basis for test development
and validation: Kanes organizing concept of an interpretive argument, which
does not rely on a construct, proved to be useful (Chapelle et al., 2010, pp. 34).
Bypassing construct labels and definitions, they move straight from observables
Philosophy and Language Testing 9

CLAIM: The students English speaking

abilities are inadequate for study in
an English-medium university.


GROUNDS: A students presentation

to the class on an assigned topic was
charaterized by hesitations and

Figure 85.1 Interpretive argument. Adapted from Chapelle et al. (2010, p. 5)

to claims using the Toulmin model as the basis for an interpretive argument (see
Figure 85.1).
The evidence leads to a score generated by scoring rules (the application of a
scoring rubric), and an inference is made from the score to the claim. It is impor-
tant to note that this is done without the need for a construct inference such as
the students fluency.
The procedures for constructing and evaluating interpretative arguments are
generic, but adapted to the specific claims of each assessment context (Kane, 2010,
p. 79). Constructing and challenging arguments has an analogy in the courtroom
where, If the procedures have not been followed correctly or if the procedures
themselves are clearly inadequate, the interpretive argument would be effectively
overturned (Kane, 2006a, p. 29). The role of the prosecution is to undermine the
defences argument with alternative explanations of the data. The argument of
utility for an intended purpose is all that we are able to evaluate.
Neither the real realists nor the constructivists are keen on instrumentalism.
For the former it does away with the all-important traits (Borsboom, 2006a,
p. 431). For the latter it is too concerned with individual cognition (McNamara
& Roever, 2006). But this does not matter to instrumentalists, because they
accept both critiques: we need pluralism so that we have a range of approaches
to solve different problems (Kane, 2006b). If it seems useful, instrumentalists go
with it.

Society, Impact, and Consequences

It would appear that the realists have a problem with the impact of tests on
society and individuals. Although consequences have been the focus of legal
disputes for a long time (Fulcher & Bamford, 1996), the traditional position has
been that there is a cause for concern only if the adverse social consequences
are empirically traceable to sources of test invalidity (Messick, 1989, p. 88). The
only exception was Cronbach (1988), who argued that any socially negative effect
should be a concern for the test developer. On the other hand, the most strident
10 Interdisciplinary Themes

realists wish to abolish social impact and consequences from validity discussions

Validity is not complex, faceted, or dependent on nomological networks and social

consequences of testing. It is a very basic concept and was correctly formulated, for
instance, by Kelley (1927, p. 14) when he stated that a test is valid if it measures what
it purports to measure. (Borsboom, Mellenbergh, & van Heerden, 2004, p. 1061)

However, other realists do not agree. Badon et al. (2005, pp. 910) argue that, if a
test can be shown to measure a trait that is critical to aviation communication,
and if teaching this trait reduces miscommunication and hence aviation accidents,
this would (a) constitute evidence of validity, and (b) have a positive social
Clearly, this is not likely to be enough for constructivists. McNamara and
Roever (2006, pp. 2050251), for example, describe Borsbooms version of realism
as an attempt to strip validity theory of its concern for values and consequences
and to take the field back 80 years to the view that a test is valid if it measures
what it purports to measure. They quote Shohamy with approval:

The ease with which tests have become so accepted and admired by all those who
are affected by them is remarkable. How can tests persist in being so powerful, so
influential, so domineering and play such enormous roles in our society? One answer
to this question is that tests have become symbols of power for both individuals and
society. Based on Bourdieus . . . notion of symbolic power, [we] will examine the
symbolic power and ideology of tests and the specific mechanisms that society
invited to enhance such symbolic power. (Shohamy, 2001, p. 117)

When constructivists turn to instrumentalism, they find that there is nothing in

Kanes model of an interpretative argument, or in its adoption within language
testing, even when it focuses on test use, that would invite such reflection (McNa-
mara & Roever, 2006, p. 39). For constructivists the focus is the test taker as a
political subject in a political context, and so research that ignores the social and
ideological is suspect. Of particular concern is the topic of identity. This comes in
two forms. The first is the use of tests for purposes of identifying/classifying, in
contexts such as war, immigration, asylum, or citizenship, where there are pos-
sibilities of oppression or mistreatment. The second is related to the kind of iden-
tity the test taker must assume in order to pass this test, which includes using
discourse that reflects the power relations of dominant institutions. In this sense
all tests are claimed to be tests of identity (McNamara & Roever, 2006, pp. 1969)
and thus an exercise of power in their own right.
The instrumentalists take a middle position on social impact and consequences.
They acknowledge that there are real policy and political issues, and questions of
fairness for the individual. They are also happy to embed these within validity
theory where Messick placed them. However, dealing with consequences is very
much a technical matter: evaluating consequences that stakeholders feel are
important using program evaluation as a model (Kane, 2006a, p. 56), rather than
adopting a critical stance.
Philosophy and Language Testing 11

Current Research

Much of the research in designing assessments for specific purposes is generally
realist. We have seen that this is the case with aviation English, arguably one of
the highest stakes uses of tests. It seems unlikely that stakeholders would wish to
use a test that the designers claimed did not measure constructs/traits of interest
because they did not exist. Similarly, the growth of interest in diagnostic testing
(Jang, 2009) and the assessment of language disorders (Oller, 2012) has a strongly
realist flavor. Approaches that employ factor-analytic techniques, particularly
structural equation modeling, make strong realist assumptions about traits (e.g.,
Song, 2008). Work into the design of scoring models also assumes that perform-
ance in domains of interest can be described in terms of relevant generalizable
traits. For example, Fulcher et al. (2011) arrange observable variables from the
analysis of service encounters into clusters under the trait headings of discourse
competence and pragmatic competence. It is assumed that these competen-
cies exist, and that they are manifested through their associated observable vari-
ables. Most current test development activity also takes place within a realist
framework (Mislevy & Yin, 2012).

Constructivist research takes a number of forms. One trend is the description
of language use, particularly investigating locally co-constructed interaction
between participants in speaking tests (e.g., Brooks, 2009). Another area of interest
is the description and assessment of second language pragmatics (Roever, 2011).
There is always a strong fairness agenda in constructivist writing, with advocacy
for those who are marginalized. This can be combined with test analysis tech-
niques such as differential item functioning to discover if tests discriminate against
subgroups (McNamara & Roever, 2006). Where constructivists excel is in carrying
out case studies of the social use of tests, unmasking policy agendas behind test
use, and investigating the construction of identities through competing discourses
(Shohamy, 2001). Constructivist research in this vein helps maintain the conscience
of the field by asking difficult questions about contingent constructed ideas.
As constructivists are inherently distrustful of tests and the motivations of their
developers, there is little research into constructivist test development. The one
exception is dynamic assessment (DA). Set within a sociocultural theoretical
framework, DA uses assessment to scaffold language acquisition, and so is con-
cerned with change (Fulcher, 2010, pp. 727). As each use of DA is considered a
unique encounter, the preferred method of research is the individual case study,
which cannot be generalized to any other case (Lantolf & Poehner, 2011).

Research within this tradition is concerned with establishing and following appro-
priate procedures, because reports of what was done count as validity evidence
12 Interdisciplinary Themes

(Chapelle, 2008, p. 320). While there will be variation of content according to

purpose, procedures are generic. These are a useful addition to our validation
tools. The second area of expansion is in the development and application of argu-
ment models to language-testing projects (Chapelle, Enright, & Jamieson, 2008;
Bachman & Palmer, 2010) that expand and put into practice the work of Kane
(2006a), which in turn depends upon Toulmin. The quality of argument is critical
because claims are evaluated in terms of the warrants and backing brought to bear
(Toulmin, 2003, pp. 1516). Proper procedure and good argumentation are central
to validation in the absence of ontological claims.


Realism needs strong testable theories, which it is generally acknowledged do not
exist in psychology or language testing even by real realists (Borsboom, 2006b,
pp. 4645). Closely related to this problem is the fact that traits in language
testing are not separate from the individuals in whom we posit their existence;
even if we can claim that traits like discourse competence or fluency really
exist, separating out their effect on measures is simply not as easy as in the natural
sciences. Perhaps the most intransigent problem in all social science research is
that the researcher interacts with and changes the subjects of the research, both as
a result of the research methods, and by naming traits (value labels in Messicks
terms). In short, there is a genuine problem not only with reference but also with
defining and operationalizing traits (Fulcher, 2010, pp. 324), and this may be the
most significant reason why social science theories have not lead to research pro-
grams that are as successful as those in the natural sciences.

The first problem is that constructivist research is ideologically driven. Those
committed to a Foucaultian reading of the use of tests will see evidence of struggle
and marginalization in any data they collect. In principle, there is no data that
could falsify a priori beliefs. The second problem is concerned with what is con-
structed. Hacking (1999) argues that constructivism is useful as a tool to investi-
gate ideas that are abstractions of observables and reified within a matrix of
facts and relations. In language testing, such an idea would be the native
speaker (Davies, 2003). Individual native speakers exist, and are not problematic.
We manage to classify them accurately despite dialects and idiolects. But once we
extract the idea of the native speaker it becomes a political, social, and prob-
lematic thing; and we know that it is used for political purposes, including in
some cases weaving it into a matrix that relates it to territory and citizenship.
However, critical social tools are not appropriate for the analysis of objects in the
real world, theoretical terms, or elevator words like knowledge or reality.
We do not construct people, trees, quarks, or (in the case of elevator words) eve-
rything. That would be to reduce the world to mere mental states (without indi-
viduals in which to reside).
Philosophy and Language Testing 13

Perhaps the most disturbing aspect of the strain of constructivism that

has most influenced language testing is the deep pessimism about the world
and its institutions. Everything is seen as evidence of conflict and there is
no way out. Fulcher and Davidson (2008) constructed an imaginary dialogue
between Mill and Foucault to tease out these problems. Mill was an optimist,
so when he wrote about testing he saw it as helping to create personal
development which would support the introduction of universal suffrage.
For Mill we make progress through personal and social development. For
Foucault there is no escape from despair, and tests will forever be instru-
ments of oppression.
Despite the problems associated with constructivism it has served a useful
purpose in drawing our attention to the very real misuse of tests. It is a legitimate
enterprise to describe and critique the political contexts of test use (Fulcher, 2009),
and to build explicit intended effects of tests into test development. However, the
overarching ambitions of constructionism have also had a negative impact that
needs to be critiquedpreferably before constructionism itself is taken for

The only test of success in instrumentalism is the utility of a belief, practice, or
test to improving life and furthering our projects. While engagement with data
is important, it is accepted that all our theories are underdetermined, and hence
no single explanation is true. This does not matter, however, as long as we
have an assessment process that proves to be useful for making decisions with
reasonable accuracy. Perhaps the major criticism to be directed at instrumental-
ism is its lack of ambition. It has given up on the larger questions of truth (just
what is the nature and structure of language knowledge and ability for use in a
specified domain?) in return for a purely epistemological solution to a practical
This is not a new problem for instrumentalism, and neither is the standard
response. Dewey (1912) argues that truth is wrapped up with the notion of social
credit, or what works to improve the human condition:

I should say that as method for philosophy it indicated a more severe intellectual
conscience; less free and easy use of the concept of Truth in general and more careful
use of truths in particular to designate such conceptions and propositions as have
emerged successfully from the test conditions that are practically appropriate.
(Dewey, 1912, p. 80)

If this is accepted as a defence, then consequences become paramount. They are

not optional to the development of the technical processes and argumentation,
and cannot be relegated to an afterthought. However, recollecting Laudens argu-
ment for instrumentalism over realism, we must remember that, despite the prac-
tical success of miasmatic theories of disease, they were wrong. Without the
noncontingent (true) explanation, we would not have been able to develop modern
14 Interdisciplinary Themes

Future Directions

Bachman (2006, p. 200) correctly suggests that many studies do not succeed in
clearly combining philosophical approaches. We should add that frequently they
do not articulate their own philosophical assumptions, and some are internally
incoherent. Even when they do articulate assumptions there can be less clarity
than is sometimes required. This is the case, for example, in Fulcher and Davidson
(2007), where there is some sliding between classical and modern pragmatism,
which has led some readers to (mistakenly) assume that the text has a postmodern
agenda. Researchers also need to be aware that while some combining is possible
there are areas where assumptions are incommensurable. It is a disservice to the
field to paper over the fault lines, for it is only in disagreement and healthy debate
that progress is made (Mill, 1859/1998, p. 25).
The first important question for the future relates to the nature of our con-
structs/traits. Unless there is some general consensus, it appears that the field will
follow three separate agendas. I will start by making explicit what is implicit in
the preceding discussionthat the constructivist position is both confused and
untenable in this respect. If everything is constructed and contingent, from proc-
esses to traits, our project is lost from the start.
The rest of the problem may be tackled by recourse to classical pragmatism.
Pragmatism was defined by Peirce in Baldwins dictionary (1902/1998, p. 300) as:

The opinion that metaphysics is to be largely cleared up by the application of the

following maxim for attaining clearness of apprehension: Consider what effects, that
might conceivably have practical bearings, we conceive the object of our conception
to have. Then, our conception of these effects is the whole of our conception of the

This could easily be misinterpreted as an instrumentalist position, and was con-

strued as such by later pragmatists such as William James. However, Peirce
applied the maxim primarily to the notion of objects and constructs. The example
he provided in the original 1878 formulation of the pragmatic maxim was the
construct of hardness, which manifested itself in the effect of the application of
the construct, such as observing (and predicting) that a diamond will cut other
materials, but not vice versa. This, he said, was to insist upon the reality of the
objects of general ideas in their generality (1902/1998, p. 302). The construct of
hardness is therefore real because of the practical consequences that flow from
its definition and meaning.
In classical pragmatism, therefore, an abstraction is defined as a generalization
of experience, labeled with an abstract noun. An example from the language-
testing literature might be fluency, a term given to a range of linguistic and
processing features that we may experience and describe (Fulcher, 1996). Peirce
(1903, p. 134) would ask under what circumstances such an abstraction can be
real, and answers: according to the pragmatic maxim this must depend on
whether all the practical consequences of it are true. Next, he asks what kind of
thing such an abstraction is:
Philosophy and Language Testing 15

What kind of being has it? What does its reality consist in? Why it consists in some-
thing being true of something else that has a more primary mode of substantiality.
Here we have, I believe, the materials for a good definition of abstraction. (1903,
p. 134)

In the case of fluency, the abstraction consists of a set of primary substances (in
Peirces terms), which may include features such as speed of delivery, pausing
(for content planning at syntactically appropriate slots), hesitating (causing syn-
tactic disjunct), and so on. Peirce continues to a definition: An abstraction is a
substance whose being consists in the truth of some proposition concerning
a more primary substance (1903, p. 135). If the categories of fluency described
in Fulcher (1996) can be observed, and if they vary in ways predicted (North, 2007,
p. 657, found independently that the fluency descriptors were the only consistent
set capable of acting as anchors in the construction of the CEFR), the abstraction
is true, even though its name is conventional. Finally, Peirce (1903, p. 134) insists
reality can mean nothing except the truth of statements in which the real thing is
asserted. According to this treatment it is arguably the case that fluency is a
trait that has the property of being real (although it is questionable how real it
remains if reductionist strategies are employed for the sake of automated scoring
or research, as in the case of Bernstein, Van Moere, & Cheng, 2010, p. 362), just as
hardness and weight are real because of their practical consequences.
The pragmatist strategy therefore avoids the need for a strong correspondence
theory of truth that is required by the real realists on the one hand, while incor-
porating the instrumentalist arguments supported by relevant empirical data on
the other. It steers a course between extremes, incorporating the advantages of
each, while mitigating the challenges.
Research agendas within such a framework could lead to substantive validation
programs. This would have practical consequences; as Laudan (1981b, p. 145)
says: the aim of science is to secure theories with a high problem-solving effec-
tiveness and language testing is a problem-solving activity.
The second way forward is to re-engage with a progressive Enlightenment
agenda that incorporates consideration of consequences, but without ideological
baggage. All fields evolve, and for the most part advances are made through incre-
mental theory building, empirical research, and conceptual development. Theory
in natural sciences evolves as well, and each stage has allowed humans to manipu-
late their environment in predictable and successful ways in order to achieve more
than had previously been possible. This is also true of language testing and the
validation process. Karl Popper referred to this as verisimilitude, or the approxima-
tion of a theory to truth. Peirce (1877/1998, p. 155) held a similar view:

This great law is embodied in the conception of truth and reality. The opinion that
is fated to be ultimately agreed to by all who investigate, is what we mean by the
truth, and the object represented in this opinion is the real. That is the way I would
explain reality.

Advancement requires a critical, collaborative profession, prepared to argue cases

and abandon them when necessary. Peirce and Mill both knew that the cycle of
16 Interdisciplinary Themes

progress would be endless. Scientific inquiry does not lead to the discovery of
Truth with a capital T, but makes genuine progress by not being wrong. A better
language-testing future cannot be built on a static or ideological view of society,
individuals, or trait definitions. It needs an optimistic agenda of expanding our
knowledge, and learning how to build better tests in the service of meritocratic
and just decision making.

SEE ALSO: Chapter 31, Assessing Test Takers with Communication Disorders;
Chapter 46, Defining Constructs and Assessment Design; Chapter 86, Cognition
and Language Assessment; Chapter 93, The Influence of Ethics in Language


Bachman, L. F. (2006). Generalizability: A journal into the nature of empirical research

in applied linguistics. In M. Chalhoub-Deville (Ed.), Inference and generalizability in
applied linguistics: Multiple perspectives (pp. 165207). Amsterdam, Netherlands: John
Bachman, L. F., & Palmer, A. (2010). Language assessment in practice. Oxford, England: Oxford
University Press.
Badon, L. C., Oller, S. D., Yan, R., & Oller, J. W. (2005). Gating walls and bridging gaps: Validity
in language teaching, learning, and assessment. Retrieved October 25, 2012 from http://
Ballard, P. B. (1923). Mental tests. London, England: Hodder & Stoughton.
Bernstein, J., Van Moere, A., & Cheng, J. (2010). Validating automated speaking tests. Lan-
guage Testing, 27(3), 35577.
Blackburn, S. (2005). Truth. London, England: Penguin Books.
Borsboom. D. (2005). Measuring the mind. Cambridge, England: Cambridge University Press.
Borsboom, D. (2006a). The attack of the psychometricians. Psychometrika, 71(3), 42540.
Borsboom, D. (2006b). Can we bring about a Velvet Revolution in psychological measure-
ment? A rejoinder to commentaries. Psychometrika, 71(3), 4637.
Borsboom, D., Cramer, A. O. J., Kievit, R. A., Scholten, A. Z., & Franic, S. (2009). The end
of construct validity. In R. W. Lissitz (Ed.), The concept of validity (pp. 13570). Charlotte,
NC: Information Age Publishing.
Borsboom, D., Mellenbergh, G. J., & van Heerden, J. (2004). The concept of validity. Psycho-
logical Review, 111(4), 106171.
Brooks, L. (2009). Interacting in pairs in a test of oral proficiency: Co-constructing a better
performance. Language Testing, 26(3), 34166.
Cattell, J. M., & Galton, F. (1890). Mental tests and measurements. Mind, 15, 37381.
Chapelle, C. A. (2008). The TOEFL validity argument. In C. A. Chapelle, M. K. Enright, &
J. Jamieson (Eds.), Building a validity argument for the Test of English as a Foreign Language
(pp. 31952). New York, NY: Routledge.
Chapelle, C. A., Enright, M. K., & Jamieson, J. (Eds.). (2008). Building a validity argument for
the Test of English as a Foreign Language. New York, NY: Routledge.
Chapelle, C. A., Enright, M. K., & Jamieson, J. (2010). Does an argument-based approach to
validity make a difference? Educational Measurement: Issues and Practice, 29(1), 313.
Cronbach, L. (1988). Five perspectives on validity argument. In H. Wainer & H. Braun
(Eds.), Test validity (pp. 317). Hillsdale, NJ: Erlbaum.
Philosophy and Language Testing 17

Davidson, D. (1980). Essays on actions and events. Oxford, England: Clarendon.

Davies, A. (2003). The native speaker: Myth and reality. Clevedon, England: Multilingual
Devitt, M., & Sterelny, K. (1987). Language and reality: An introduction to the philosophy of
language. Oxford, England: Blackwell.
Dewey, J. (1912). A reply to Professor Royces critique of instrumentalism. The Philosophical
Review, 21(1), 6981.
Dilthey, W. (1883/2008). An introduction to the human sciences: An attempt to lay a foundation
for the study of science and history (R. J. Betanzos, Trans.). Detroit, MI: Wayne State Uni-
versity Press.
Edgeworth, F. Y. (1888). The statistics of examinations. Journal of the Royal Statistical Society,
51, 599635.
Edgeworth, F. Y. (1890). The element of chance in competitive examinations. Journal of the
Royal Statistical Society, 53, 64463.
Fish, S. (1995). What makes an interpretation acceptable? In R. B. Goodman (Ed.), Pragma-
tism (pp. 25365). New York, NY: Routledge.
Foucault, M. (1975). Discipline and punish: The birth of the prison. New York, NY: Vintage.
Frege, G. (1892). On sense and reference. Retrieved October 25, 2012 from http://en.wikisource.
Fulcher, G. (1996). Does thick description lead to smart tests? A data-based approach to
rating scale construction. Language Testing, 13(2), 20838.
Fulcher, G. (2003). Testing second language speaking. Harlow, England: Longman.
Fulcher, G. (2009). Test use and political philosophy. Annual Review of Applied Linguistics,
29, 320.
Fulcher, G. (2010). Practical language testing. London, England: Hodder.
Fulcher, G., & Bamford, R. (1996). I didnt get the grade I need: Wheres my solicitor?
System, 24(4), 43748.
Fulcher, G., & Davidson, F. (2007). Language testing and assessment: An advanced resource book.
London, NY: Routledge.
Fulcher, G., & Davidson, F. (2008). Tests in life and learning: A deathly dialogue. Educational
Philosophy and Theory, 40(3), 40717.
Fulcher, G., Davidson, F., & Kemp, J. (2011). Effective rating scale development for speaking
tests: Performance decision trees. Language Testing, 28(1), 529.
Hacking, I. (1999). The social construction of what? Cambridge, MA: Harvard University Press.
Hamp-Lyons, L. (2000). Social, professional and individual responsibility in language
testing. System, 28(4), 57991.
Honderich, T. (Ed.). (1995). The Oxford companion to philosophy. Oxford, England: Oxford
University Press.
Jang, E. E. (2009). Cognitive diagnostic assessment of L2 reading comprehension ability:
Validity arguments for Fusion Model application to LanguEdge assessment. Language
Testing, 26(1), 3174.
Jordan, G. (2004). Theory construction in second language acquisition. Amsterdam, Nether-
lands: John Benjamins.
Kane, M. (2006a). Validation. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp.
1764). Westport, CT: Praeger.
Kane, M. (2006b). In praise of pluralism: A comment on Borsboom. Psychometrika, 71(3),
Kane, M. (2009). Validating the interpretations and uses of test scores. In R. W. Lissitz (Ed.),
The concept of validity (pp. 3964). Charlotte, NC: Information Age Publishing.
Kane, M. (2010). Terminology, emphasis, and utility in validation. Educational Researcher,
37(2), 7682.
18 Interdisciplinary Themes

Lantolf, J. P., & Poehner, M. E. (2011). Dynamic assessment in the classroom: Vygotskian
praxis for second language development. Language Teaching Research, 15(1), 1133.
Laudan, L. (1981a). A confutation of convergent realism. Philosophy of Science, 48(1),
Laudan, L. (1981b). A problem-solving approach to scientific progress. In I. Hacking (Ed.),
Scientific revolutions (pp. 14455). Oxford, England: Oxford University Press.
McNamara, T. (2001). Language assessment as social practice: Challenges for research.
Language Testing, 18(4), 33349.
McNamara, T. (2006). Validity and values: Inferences and generalizability in language
testing. In M. Chalhoub-Deville (Ed.), Inference and generalizability in applied linguistics:
Multiple perspectives (pp. 2745). Amsterdam, Netherlands: John Benjamins.
McNamara, T., & Roever, C. (2006). Language testing: The social dimension. London:
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (pp. 13103). New
York, NY: Macmillan/American Council on Education.
Meyers, R. G. (2006). Understanding empiricism. Chesham, England: Acumen.
Mill, J. S. (1859/1998). On liberty. In J. Gray (Ed.), John Stuart Mills On liberty and other
essays (pp. 5128). Oxford, England: Oxford University Press.
Mislevy, R., & Yin, C. (2012). Evidence-centered design in language testing. In G. Fulcher
& F. Davidson (Eds.), The Routledge handbook of language testing (pp. 20822). London,
England: Routledge.
Nietzsche, F. (1888). The will to power. Book 3: Principles of a new evaluation. Retrieved October
25, 2012 from
North, B. (2007). The CEFR illustrative descriptive scales. Modern Language Journal, 91,
Oller, J. W. (2012). Language assessment for communication disorders. In G. Fulcher & F.
Davidson (Eds.), The Routledge handbook of language testing (pp. 15061). London,
England: Routledge.
Peirce, C. S. (1877/1998). The fixation of belief. In E. C. Moore (Ed.), The essential writings
of Charles S. Peirce (pp. 12036). New York, NY: Prometheus Books.
Peirce, C. S. (1902/1998). Some contributions to Baldwins dictionary. In E. C. Moore (Ed.),
The essential writings of Charles S. Peirce (pp. 30013). New York, NY: Prometheus Books.
Peirce, C. S. (1903). Pragmatism as a principle and method of right thinking: The 1903 Harvard
Lectures on Pragmatism (P. A. Turrisi, Ed.). New York, NY: State University of New York
Pennycook, A. (2001). Critical applied linguistics: An introduction. Mahwah, NJ: Erlbaum.
Phillipson, R. (1988). Linguicism: Structures and ideologies in linguistic imperialism. In
J. Cummins & T. Skuttnab-Kangas (Eds.), Minority education: From shame to struggle (pp.
33958). Clevedon, England: Multilingual Matters.
Polkinghorne, D. (1983). Methodology for the human sciences: Systems of inquiry. Albany, NY:
State University of New York Press.
Popper, K. (1959). The logic of scientific discovery. London, England: Routledge.
Putnam, H. (1990). A reconsideration of Deweyan democracy. Southern Californian Law
Review, 63, 167197. (Reprinted in Goodman, R. B. (Ed.). (1995). Pragmatism: A contem-
porary reader [pp. 183204]. London, England: Routledge).
Quetelet, A. (1842/1962). A treatise on man and the development of his faculties. New York, NY:
Burt Franklin.
Roever, C. (2011). Testing of second language pragmatics: Past and future. Language Testing,
28(4), 46381.
Rorty, R. (1989). The contingency of language. In R. B. Goodman (Ed.), Pragmatism (pp.
10723). New York, NY: Routledge.
Philosophy and Language Testing 19

Shohamy, E. (2001). The power of tests: A critical perspective on the uses of language tests.
London, England: Longman.
Song, M.-Y. (2008). Do divisible subskills exist in second language (L2) comprehension?
A structural equation modeling approach. Language Testing, 25(3), 43564.
Toulmin, S. E. (2003). The uses of argument (2nd ed.). Cambridge, England: Cambridge Uni-
versity Press.

Suggested Readings

Baggini, J., & Fosl, P. S. (2003). The philosophers toolkit: A compendium of philosophical concepts
and methods. Malden, MA: Blackwell.
Blackburn, S., & Simmons, K. (Eds.). (1999). Truth. Oxford Readings in Philosophy. Oxford,
England: Oxford University Press.
Kenny, A. (2006). An illustrated history of Western philosophy (2nd ed.). London, England:
Philosophy Bites (n.d.). Home page. Retrieved October 25, 2012 from http://www.philoso