How Chess Players Think-Patrick Turner

How chess players think:
evidence for the role of search

at Expert level and below
Patrick Turner
First degree: BSc. (Hons) Mathematics
Open University personal identifier: U6094525
Dissertation submitted for:

MSc. in Psychological Research Methods
March 2005
Abstract
There are two competing views of the dominant mechanism underpinning chess
thinking pattern recognition or search-and-evaluation? Whilst the recent
development of template theory has gone some way to unifying the two existing
theories, there still remain a great deal of unanswered questions concerning the nature
of the chess thinking process in particular the relative contribution of recognition
and search-and-evaluation to chess skill. Although recognition-based theories of
chess thinking do not deny that search is part of the thought process, they emphasise
that recognition of the position provides for highly selective search. Thus an Expert
need not search any faster, or deeper, to arrive at a good move he narrows down his
search by pattern recognition to focus his analysis on the good moves. Conversely,
search-and-evaluation theories emphasise the ability to search deeper, wider, faster
and more thoroughly, coupled with the ability to evaluate leaf nodes more accurately,
as the basis for the selection of good moves. They do not claim that recognition is not
involved in directing search merely that it is not the dominant mechanism.
The aim of the research discussed here was to investigate support for both recognition
and search theories of chess skill through experimentation involving chess players at
two levels (Expert and Class A/B) completing a choice of next move task for three
chess positions. Two major conclusions are drawn from the results. Firstly, there is
strong evidence for differences in search capabilities across skill levels in chess
players, supporting the results of Gobet (1998a) and others. Such evidence argues
against the basis of de Groots main conclusion (1965) that recognition is the
dominant mechanism underpinning chess skill. Proponents of template theory (e.g.
Gobet & Simon, 1998a) argue that such continued results for search differences
across skill levels do not undermine the recognition-based theory of chess skill itself.
The second major conclusion to be drawn, however, suggests that there is less support
for the role of recognition than in previous studies, such as Gobets (1998a). It may
be that the results hold only between Class A/B players and Experts. This would
provide evidence to the fact that the better players at club level are superior primarily
because of their search capabilities and not recognition. A different model of chess
skill may be required for players below the level of Master.
Table of contents
Introduction
Literature review
10
Methodology
31
Analysis
37
Project Review
55
Conclusions
63
Appendix I: de Groot positions
66
Appendix II: Protocol analysis
71
Bibliography
77
Introduction
The game of chess provides an ideal environment for the study of human
decision-making in complex domains. As such, it has provided the basis for a
number of studies into human cognition, including perception, memory and
decision-making. Over the decades following the publication, in 1965, of
Adriaan de Groots original research into chess thinking, there have emerged two
schools of thought concerning how chess players think the family of
recognition-based theories typified by chunking theory, due to de Groot (1965),
Chase & Simon (Gobet and Simon, 1998a, 1998b; Gobet, 2004) among others;
and the search-and-evaluation theory of Holding (Holding, 1985; Gobet 2004).
Whilst the recent development of template theory has gone some way to unifying
the two theories, there still remain a great deal of unanswered questions
concerning the nature of the chess thinking process in particular the relative
contribution of recognition and search-and-evaluation (often simply referred to
as search) to chess skill.
The structure of chess thinking

The two theories agree on the basic structure of the chess thought process. De
Groot (1965) showed that this process can be represented as a sequence of
mental operations on not only the perceived position that the player is confronted
with but also imagined positions as might occur if certain sequences of moves
are played a development of Selzs Framework of Productive Thinking (de
Groot, 1965). Briefly, the chess thinking process comprises three main phases
a phase of orientation, noting possible threats, plans and candidate moves; a
phase of elaboration, within which specific sequences of moves are considered

(I move here, then he moves here etc.), each of which terminates in an
evaluation of the desirability of an imagined position (a leaf node); and a final
phase within which the best move so far considered may be checked before the
player commits to it (de Groot 1965, pp100-116). It is within the middle phase
that search activity is carried out. Although recognition-based theories of chess
thinking do not deny that search is part of the thought process, they emphasise
that recognition of the position (and good moves or general plans to undertake in
such a position) serves to make search activity highly selective. Thus an expert
player need not search any faster, or deeper, to arrive at a good move he
narrows down his search by recognition to focus his analysis on the good moves.
Conversely, search-and-evaluation theories emphasise the ability to search
deeper, wider, faster and more thoroughly, coupled with the ability to evaluate
leaf nodes more accurately, as the basis for the selection of good moves. They
do not claim that recognition is not involved in directing search merely that it is
not the dominant mechanism.
Newell & Simon (1972) formalised de Groots framework in the Problem
Behaviour Graph (PBG) model. A PBG characterises the phase of elaboration in
chess thinking, where search is undertaken. They are characterised by sequences
of moves, beginning with a candidate move (or base move) and alternating for
moves from each side, with possible branching in each sequence. Each branch
ends in a leaf node and each leaf node is evaluated, usually only as good or
bad for the player on move. As such, PBGs allow for the extraction of search
variables such as number of nodes searched, and maximum depth of search.
It is more difficult to extract variables characterising recognition although
number of base moves considered serves to characterise option generation

before any search is conducted.
Aims
The aim of the research discussed here was to investigate support for both
recognition and search theories of chess skill through experimentation involving
chess players of different calibres completing a choice of next move task for a
small number of chess positions with varying character.
The experimental aims were to establish significant differences in choice of next
move and search behaviour across two groups of chess players of differing
calibres, for three different chess positions. This was to be achieved through the
application of de Groots experimental procedure and using the analysis methods
of de Groot (1965) and Newell & Simon (1972). Data from the most recent
study of this kind, that of Gobet (1998a), was also to be used for comparisons of
results.
The specific research questions included:
Do club-level chess players of differing calibres differ in terms of quality

of move selection?
Do club-level chess players of differing calibres differ in terms of

capacity of search, mean and maximal search depth, and thoroughness of
search?
To what degree do the levels of search activity in club-level players fit

with existing models of chess thinking?
Novelty
The experimentation and analysis outlined above is not completely novel. It
draws much of the experimental procedure, analysis methods and study variables
from existing research in the field, such as de Groot (1965) and subsequent
replications of that original set of experiments (Newell & Simon, 1972; Gobet,
1998a). It is novel in two respects, however:
It comprises a repeated measures choice of next move task across three

positions; each of the three studies named above focused only on one
position;
It samples from club-level players only (Experts down to Class B) and

therefore serves to test some of de Groots original conclusions, which
were based on an extremely high calibre sample including Grandmasters.
Motivation for this dissertation

The choice of subject matter for this dissertation is motivated by twin interests in
human decision-making in naturalistic settings and empirical research into
human decision-making. An enduring methodological problem that human
decision-making research faces is the design of experiments that both preserve
ecological validity (i.e. a naturalistic decision-making setting and task) and
enable the valid measurement of important variables. Chess is a rare case of a
structured and bounded decision-making environment that still affords
ecologically valid, yet well-defined, experimentation.
Structure of this dissertation

The remainder of this dissertation is structured as follows:
The Literature Review introduces the main arguments for both

recognition- and search-based theories of chess skill;
The Methodology chapter outlines the experimental design, experimental

procedure and analysis techniques undertaken.
The Analysis chapter sets out the results and analysis from the
experiment.
The Project Review reflects upon the changes in focus for the research
throughout its course, including modifications to the design, the success
of the experiment, the focusing of the analysis and the validity of the
methods.
The Conclusions chapter revisits the main findings of the analysis in the
context of the original research questions and the wider debate
concerning the nature of chess skill.
Literature review
The game of chess is ideally suited to a range of studies in cognitive psychology,
particularly memory, expertise and decision-making. Success at chess is
completely dependent upon skill and, whilst the configuration of the board and
pieces, and the rules of the game can be understood relatively quickly, a typical
chess position offers a non-trivial decision-making task, even for highly skilled
players. This is because of the inherent complexity that the game offers and,
although information about each position is known perfectly and the ultimate
goal of the game is certain, this complexity renders chess a credible domain of
interest for the study of human decision-making. There is also a substantial
amount of psychological literature on chess, perhaps because of the relatively
simple manner in which experiments can be conducted.
Cognitive psychology and chess enjoy a history of over a century of research; the
key question that has engaged psychologists throughout has been, What
constitutes skill at chess? Although it is generally agreed that chess skill is
based upon both recognition (the ability to match patterns based on the
possession of good patterns) and look-ahead search (essentially the ability to
compute sequences of moves), opinions are polarised and there are distinct
camps that espouse the dominance of one mechanism over the other.
Most modern research on chess skill has its foundations in the studies of the
Dutch international chess master Adriaan de Groot, whose original experiments,
conducted between 1938 and 1944, served to develop both theories of expertise
and decision-making, and corresponding experimental methods. The remainder
of this chapter is divided into sections, each of which discusses a key
10
development in one or both of the competing recognition-based and search-based

theories of chess skill.
The role of recognition: de Groot

De Groot (1965) was concerned with the thought processes underlying expert
chess players choice of next move decisions. His main experiment was a
choice of next move task, conducted with a relatively small sample of good
chess players, ranging from grandmasters (including Alexander Alekhine and
Max Euwe) to Class C players (approximately average club level). De Groot
used a set of chess positions, typically middlegame positions taken from games
which he had played. De Groot set these positions up on a chessboard and asked
his subjects, assuming the role of the player on move, to think of a move and
play it on the board as if they were involved in an actual tournament game. The
only extra stipulation was that the subject thought aloud as he or she did so that
de Groot could record the way in which the subject arrived at his or her next
move. (This method is discussed in further detail in the next section.)
De Groot recorded each subjects thought as a verbal protocol which he then
coded, using Otto Selzs framework of productive thinking (de Groot, 1965). De
Groot was motivated by Selzs framework, which described thinking as a
hierarchically organised linear series of operations (de Groot, 1965, p vi) and,
in fact, sought to test it through the coding of the protocols. De Groot
demonstrated that he could successfully represent the protocols within this
framework, which, at the macro-structural level, comprises three phases: a first
phase of orientation that may include a listing of candidate moves for
consideration; a phase of elaboration whereby candidate moves are examined in
11
detail through the consideration of possible sequences of moves that they

precipitate; and a final phase in which a move is selected, possibly following
some form of summarisation. De Groots coding, which was later formalised by
Newell and Simon (1972) as a Problem Behaviour Graph (PBG), captured the
history of all sequences of moves, each beginning with a base move (candidate
next move) considered by the subject. Such sequences included branching,
whereby the subject considered two or more possible sequences from some
branching move coming after the base move. Each sequence terminated in an
evaluation (positive, negative or unexpressed). Since this coding captured all the
moves considered it allowed for the reinvestigation of base moves.
De Groot did not expose every player to every position; positions A, B and C
were most commonly used and de Groot chose only to extract quantitative
variables from the encoded protocols for these positions (seen by 19, 6 and 6
players, respectively). These variables included the chosen move, the time taken
for each phase, the ordered sequence of base moves considered (candidate next
moves), the total number of moves, and variables concerning the frequency of
both immediate and non-immediate reinvestigations. De Groot had also analysed
positions A to C extensively to generate an order of move quality for each of
the legal moves in each position.
De Groots first results were that stronger players chose better quality moves
than weaker players. Secondly, there was little difference between masters and
Experts1 on the various search variables, including the total number of moves
considered (typically less than 100), depth of search or rate of search (number of
moves per minute). De Groot then asserted the master does not necessarily
1
Experts is capitalised when referring to the class of players directly below masters and not
capitalised when referring, in general, to people possessing expertise.
12
calculate deeper, but the variations that he does calculate are much more to the
point; he sizes up positions more easily and, especially, more accurately (1965,
p320). Although de Groot stated that he still expected greater search abilities in
high calibre players, he conceded that such differences did not explain the
observed performance differences. Having failed to establish skill differences on
these search variables, de Groot therefore conducted a second experiment based
on a recall task, originally conducted in flawed form by Djakow, Rudik and
Petrowski in 1927 (Gobet 2004). Players were exposed to 16 positions, taken
from relatively obscure master games, each for a short length of time (between 2
and 15 seconds). After each presentation the player was requested to reproduce
the position verbally and de Groot developed a scoring scheme for assessing the
corresponding verbal protocols. The results showed, significantly, that
grandmasters outperformed weaker players. De Groot inferred that experience
(in its effect upon perceptual processes) was the contributory factor, asserting
that the position is perceived in large complexes, each of which hangs together as
a genetic, functional and/ or dynamic unit. For the master such complexes are of
a typical nature. (1965, p329, italics from original text). De Groot also
suggested that eye movements undoubtedly come into play a hypothesis
proved, in 1996, by de Groot and Gobet (Gobet, 2004). De Groot conducted a
detailed analysis of the verbal protocols for the recall task and identified contentspecific themes that demanded differing degrees of attention. It is interesting to
compare this approach with the quantitative (information-theoretic) approach of
Chase and Simon in the development of chunking theory (see below).
Returning to the results of the choice of next move experiment, one of de
Groots innovations was an extension of the Selzian framework of productive
13
thinking. De Groot noticed that players employed a method that he denoted

progressive deepening the reinvestigation of sequences emanating from the
same base move several times, either immediately or non-immediately, with the
tendency to search both progressively wider (examining more branches) and
deeper each time before evaluating at leaf nodes. This is referred to as rough
cut, fine cut by Newell and Simon (1972, p752). Selzs concept of subsidiary
methods stated that human problem solving is based on, essentially, exhaustive
depth-first search in support of one plan followed by depth-first search for a
second plan if the first fails etc. (where plan defines the context of evaluation
of leaf nodes). De Groot effectively redefined exhaustiveness in relative terms,
(1965, p270). This allowed for the reinvestigation of any base move, with the
examination of ever deeper and wider extensions to the search tree emanating
from each move. De Groot proposed that the varying criteria by which a
sequence is considered to be exhausted upon investigation/ reinvestigation
and thus the criteria by which the corresponding base move is evaluated as good
or not are based on recognition.
De Groots main conclusions, across both of this experiments, was that
recognition (based on the possession of perceptual chess-specific knowledge),
together with the application of effective set of heuristic goal-driven rules, were
the major components of chess skill. The identification of recognition, in
particular, as a key mechanism refuted the then commonly held view that chess
skill was innate and had a large impact on theories of expertise that still persists.
14
Information processing and Problem Behaviour Graphs

The representation of human problem solving in the Selzian framework was
attractive to Herbert Simon, who viewed such an activity as, essentially, as
information processing. Simon was also the originator of the concepts of
bounded rationality, which states that there are limits on human information
processing that, in turn, impose limits on human rationality, and satisficing,
which describes the sufficient, yet sub-optimal, human approach to decisionmaking where bounded rationality is enforced, e.g. due to the complexity of the
decision-making environment. Chess is certainly one such environment and
there are clear parallels between satisficing and de Groots progressive
deepening, the latter of which seeks a positive evaluation of a move even though
a thorough analysis may be lacking.
In 1965, Newell and Simon (1972) reinvestigated and replicated de Groots
choice of next move experiment with the aim of investigating whether the
human decision-maker, in selecting his next move in chess, could be considered
an Information Processing System (IPS) and whether a thorough task analysis
would enable them to enrich their IPS model. Newell and Simon advocated the
elicitation of verbal protocols but emphasised their quantitative analysis rather
than de Groots extensive qualitative analysis. As such they built on de Groots
enhanced Selzian framework and formalised the coding of the verbal protocol as
a Problem Behaviour Graph (PBG).
A PBG is a descriptive chronological model of an individuals thinking
throughout the course of a problem-solving task. It concerns the navigation of a
human decision maker along sequences of linked nodes, each representing some
projected state of the environment with links representing the application of an
15
operator to a previous node. This forms a chronologically order set of sequences

of linked nodes, possibly with branching (representing the conception of two
different operators on a given node), ending at given leaf nodes. A PBG for
choosing the next move in a chess position represents, as nodes, future chess
positions that may be arrived at through the application of a sequence of moves
for white and black. Each initial move, or base move, represents the candidate
moves that a player conceives, and chooses from, in completing the task. Each
leaf node terminates in an evaluation (including a non-evaluation) of the
position at that point. Note that a PBG is not equivalent to a search tree because
the latter models all sequences of moves considered by the chess player in
selecting his next move once only whereas a PBG provides a chronological view
on that players considerations. As such, PBGs therefore may contain a number
of sequences beginning with the same base move, which may or may not be
different (indeed, identical sequences may or may not include different
evaluations). Whilst most of the work underpinning PBGs is due to Selz and de
Groot, Newell and Simon added the graphical formalism. To differentiate
between difference sequences, they redefined de Groots sub-phases as
episodes distinct chains of reasoning beginning with a base move, whether it be
different or the same as that considered beforehand.
The advantage of the PBG formulation is that it provides for the quantitative
analysis of the search-and-evaluation process. Newell and Simon (1972)
examined the quantitative variables derived from the protocol of a single subject
(S2) and compared them with those of de Groots sample, noting the consensus
in results in terms of both quality of move and decision-making method. In
particular, S2 exhibited progressive deepening.
16
Perhaps the most important contribution of Newell and Simons 1965 research
was their detailed analysis of the search strategies of S2 and de Groots subjects.
They proposed a small number of principles for the generation of moves and
episodes essentially an attempt at naming the heuristic rules that de Groot had
suggested contributed to chess skill. Newell and Simon did not find much
evidence, in the protocols, of means-ends analysis (goals-setting and the
identification and analysis of means i.e. moves to achieve those goals)
although they noted both that all protocols studied concerned position A a
highly tactical position in which strategic plans are of less consequence and
that de Groot had observed numerous examples of goal-setting in more strategic
positions (1965, pp157-9). Despite their characterisation of search strategies,
Newell and Simon share de Groots view on the importance of recognition in
chess skill, particularly upon immediate consideration of a position and prior to
any search: players notice a small number of considerable moves, and do not
notice (or at least do not mention noticing) the large number of remaining legal
moves (Newell & Simon, p775), that is, there is a perceptual process guiding
search from the outset. This embodies the first phase in de Groots macrostructural model of next-move selection.
Chase and Simons Chunking theory

Chunking theory emerged from the 1973 experiments of Chase and Simon
(Gobet, 2004) as a general theory of expertise, originally applied to chess. In
line with de Groots conclusions, it asserts that recognition is the key mechanism
underpinning expertise. In the experiment, three classes of player (Masters,
Experts and novices) were exposed to middle and end-game positions of two
17
types: positions from actual games and random positions matched for the number
of pieces present. There were two tasks: the recall task was essentially a
modification of de Groots procedure although all positions were shown for 5
seconds and the players were subsequently asked to reconstruct them on a chess
board; the copy task differed in that the positions were not hidden from the
experiments during the reconstruction phase. For the positions drawn from
actual games, success at reconstruction (according to the number of pieces
correctly placed) was found to be proportional to skill level. For the random
positions, however, there were no significant differences across the three groups
of players. Chase and Simon concluded that the improved performance for more
skilled players was not due to any superiority in short-term memory, but to the
recognition of familiar patterns.
Chase and Simon (Gobet 2004) noted that, in both tasks, subjects reconstructed
pieces in groups, as defined by the intervals between piece placement in the
recall task, and by glances at the stimulus position in the copy task; further,
pieces in the same group tended to share more meaningful relations (e.g.
attacking, defending, same colour, same type etc. judged by skilled players)
than those in different groups. Chase and Simon denoted these patterns of pieces
chunks. The experiment also provided evidence that better players possess
bigger chunks (in terms of number of component pieces) and more chunks.
Chase and Simon (Gobet 2004) asserted that chunks are stored in short-term
memory (STM) as pointers to patterns encoded in semantic long-term memory
(LTM). Essentially, chunks are akin to the conditions of productions in LTM
that associate patterns with moves. Chase and Simon also expressed time
18
parameters for the rate of learning (approximately 8 seconds per chunk) and
STM limits (7 chunks, in line with Millers predictions).
In a second 1973 paper, Chase and Simon also proposed that a secondary
transient memory store, a visuo-spatial store known as the minds eye, provides
an internal representation of the position upon which mental operations may be
carried out (e.g. the moves suggested by LTM). The position in the minds eye is
also available to perceptual processes and thus chunks in a projected position
following a potential move may also be perceived and matched against patterns
in LTM. Thus chunking theory offers an explanation of how recognition may be
combined with mental simulation to arrive at good moves. It should be noted,
however, that the minds eye extension to the theory is not supported by
empirical evidence since the experiment did not include a decision-making task.
Chase and Simon conducted a second experiment to demonstrate the stability of
chunks. The criterion for stability was: a chunk is considered to be repeated if at
least two thirds of its component pieces are recalled together. Stability of chunks
for class A players was 96%, versus 65% for the master player in the sample.
Support for chunking theory comes from Charness (Gobet, 2004) who, in 1974,
conducted a recall experiment with positions presented verbally, at a rapid rate
(average latency 2.3 seconds per piece) in three ways: by Chase and Simons
relations; by columns (on the board) or randomly. The best recall was found for
Chase and Simons relations and the worst for the random condition.
Criticisms of chunking theory

Chunking theory was not without its critics, however. These criticisms are on a
number of bases and include both methodological criticisms and theoretical
19
criticisms. Gobet and Simon (1998b) summarised the methodological criticisms

raised by many authors, including Holding (1985) and highlighted some
methodological concerns of their own, including the small sample size in the
1973 experiments and the one-to-one mapping of pieces placed a single bursts
of activity onto chunks. A single burst of activity was defined, in the 1973
recall experiments, as a sequence of piece placement with latencies less than 2
seconds between pieces. Gobet and Simon (1998a) argued that this latency may
actually increase over the recall period. Further, a burst of activity is also
dependent upon the physical limitations of picking up all component pieces of a
chunk in one hand. The most outspoken critic of the theory was, perhaps,
Holding (1985), who advocated the roles of both search and conceptual
knowledge (rather than perceptual chunks) in chess skill. Holdings specific
arguments included the following:
Chunks may be encoded into LTM in less than 8 seconds;
The size of chunks is too small to reflect conceptual knowledge;
Although chess skill can explain memory performance, there is no

evidence for a causal relationship in the opposite direction, that is that
memory (and recognition) explains chess skill.
The first criticism was based on recall experiments with interpolated tasks
designed to cause STM interference (e.g. Charnesss experiment of 1976,
reported in Holding, 1985) had shown no effect on memory performance,
suggesting that LTM encoding for chunks was rapid. The second criticism is
based on Holdings assertion that chunking theory does not provide a sufficient
basis for maintaining that chess memory is organised in small chunks whose
labels are held in STM. Instead it appears that chess players who actively
20
process the given positions are able to integrate the general characteristics of
these positions in a hierarchical, prototypical or schematic format, not necessarily
based on pairs of pieces, that constitutes an understanding of the positions
(Holding, 1985 p130). Key to this argument is Holdings inspection of both
positions and corresponding chunks from Chase and Simons experiments. He
claimed that the actual chunks identified bear little relation to the important
playing themes in that same position and concluded if we assume that all the
chunks for memorising purposes are to be identified on one basis and the patterns
for move selection on another, the theory loses a good deal of its economy
[Holding, p103]. Indeed, if we accept the criterion for the stability of chunks
across experiments, it appears that better players perceive positions in a number
of ways (65% stability is a fairly low figure). The final criticism is backed up
with evidence from Holding and Reynolds (1982) experiment with random
positions. Players of different skill levels from novice to Expert completed two
tasks: the first was a recall task and the second was a choice of next move task on
the corrected positions. As expected, there was no effect of skill on memory, but
there was a significant effect of skill on (assessed) quality of next move.
Holding and Reynolds concluded that the evidence shows that skill differences
continue to appear in situations where recognition by chunking is impossible
(Holding, 1985 p133). In light of such criticisms, Gobet and Simons replicated
the 1973 experiments and made corresponding modifications to the theory
(discussed in Gobet and Simons template theory, below).
21
SEEK Theory: the contribution of Holding

Above all of Holdings specific criticisms of Chunking Theory, his central belief
was that it was basically flawed although he accepted the result that skill has an
effect on memory for meaningful chess positions, he believed that the role of
recognition (based on memory) was insufficient in explaining chess skill.
Holding promoted the importance of search, evaluation and knowledge to chess
skill and expressed this idea in his SEEK theory. It is important to understand
Holdings distinction between the mechanisms of recognition and search
since his use of terminology differs slightly from that of other researchers. To
Holding recognition defines the key mechanism of Chunking Theory as the
association between perceived patterns (chunks) and good moves without
search. Search involves a combination of planning a selective search through
candidate moves and sequences, and evaluating the utility of these moves to
support next move selection. Perhaps the most confusing aspect of Holdings
definitions is that he asserts that pattern recognition from semantic knowledge
also plays a key role in directing search by suggesting good moves. To Holding,
patterns may be general rather than specific chunks (1985, p174) and the
corresponding recognition mechanism is almost certainly less automatic in its
cueing of moves than that of Chunking Theory. In fact, it appears that search,
in itself, is an extremely low-level skill, involving only focusing ones evaluative
skills on different moves. It should be noted that Holding (and others) refers to
search when he really means the wider set of skills described above, i.e. search,
evaluation and knowledge all three of which are embodied in SEEK theory.
Holding claimed that, within de Groots verbal protocols, there was, in fact, a
relationship between skill level and both number of moves considered and speed
22
of search (number of moves considered per minute), although this was not
statistically significant. He argued that the real effect was obscured by the highly
tactical nature of the only position for which a meaningful number of protocols
were published, i.e. position A. Other studies have supported this claim, in
particular Charnesss 1981 experiment (Holding, 1985; Gobet, 2004), conducted
with 34 skilled players and a balance of tactical and strategic positions, different
to those used by de Groot, suggests a linear relationship between skill level (in
terms of Elo points) and depth of search (in terms of number of moves). Holding
reports that average maximal depth of search increases by 1.4 plies per standard
deviation of skill (200 points) and Gobet reports that the average depth of search
increases by 0.5 plies for the same interval.
In 1979, Holding (1985) developed a single scale to evaluate positions on the
basis of advantage to one side over the other using the expert judgement of
skilled players. He then asked 50 Class A-E players to evaluate a set of
quiescent positions, with level material, from actual grandmaster games on this
scale. The players were also asked to select a next move. Evaluations were
scored in comparison with the actual outcomes of the games. The results showed
that there is an effect of skill on evaluations. In Holding and Reynolds 1982
experiment (Holding, 1985) for recall on random positions players were also
asked to evaluate the position immediately (after it had been corrected following
the recall task) and after 5 minutes of consideration. There were no skill
differences for correctness of evaluation at either measurement point. Holding
concluded that evaluative skill is influenced by memory, including generic
[semantic] memory for the type of specific formations that are known to give
rise to advantages and disadvantages (Holding, 1985 p208).
23
Holdings main conclusion is that differences in chess skill are due to search,
evaluation and knowledge: the better players show greater competence in every
phase of the SEEK processes, conducting more knowledgeable evaluations, in
order to anticipate events on the chessboard (1985, pp255-256).
Gobet and Simons template theory

Gobet and Simon (1996) set out to test Holdings conclusion by means of a
natural experiment, observing the performance of the then-world champion
Grand Master Gary Kasparov, in both a series of matches of simultaneous games
and tournaments against expert opponents (predominately Masters and Grand
Masters). The average time afforded to Kasparov for each move was 3 minutes
in tournament play and 3 minutes per round (all matches of simultaneous games,
played against between four and eight opponents). Gobet and Simon reasoned
that the increased time-pressure in the simultaneous games would provide
Kasparov with less time to evaluate moves and, therefore, if Holdings
conclusion were true, he should perform less well in the simultaneous games
than in the tournament. The results showed that Kasparovs performance did not
greatly differ across the two conditions. Indeed, in the simultaneous matches,
Kasparov played at the level of a very strong Grand Master. Gobet and Simon
concluded that it was Kasparovs pattern-matching that accounted for his similar
performances in both simultaneous matches and normal tournament play, and
that this result could be generalised to all expert chess players. This is supprted
by a similar result from Calderwood, Klein and Randall (1988).
Gobet and Simon (1998b) asserted that some of Holdings criticisms were valid
(e.g. those concerning LTM encoding and chunk size) whilst others were
24
incorrect (or had been shown to be incorrect). For example, Holdings result for
skill differences for choice of next move decisions in random positions was
countered by Gobet and Simons experimental results (1998b) that indicated that
chunking theory does predict a small skill difference in the recall of such
positions contrary to de Groots and Chase and Simons earlier results and
preserving the possibility of a relationship between memory and skill. Gobet and
Simon state that Holdings main issue with chunking theory that it consists of
pattern recognition without search is a misunderstanding, since the minds
eye extension to the theory clearly describes the use of pattern recognition to
support a think-head process, thus generating subsequent moves for
consideration (this account also largely equates pattern recognition of non-base
moves with Holdings evaluation mechanism).
In 1996, Gobet and Simon (1998a) replicated Chase and Simons original
experiments, with some key modifications, including an increased sample size of
26 (ranging from Masters to Class A players) and computer-aiding for the
reconstruction of positions, to eliminate the physical limitations on piece
replacement in the original experiment that may have confounded results on
chunk size. The main results concurred with Chase and Simons original study
that is, skill effects on recall in both tasks disappeared for random positions. The
most startling difference in results, however, related to the size of chunks.
Whilst the effect of skill level on chunk size was again present, mean largest
chunk size at all skill levels was greater. In particular, for Masters this figure
was 16.8 in the recall task (compared with 7 in the original experiment), and 14
in the copying task. Moreover, some positions were reconstructed by Masters
using only one chunk.
25
This new data confirmed Gobet and Simons development of chunking theory,
namely template theory (1998a, 1998b). Template theory uses the same basic
mechanism as chunking theory, so that chunks are stored in STM as pointers to
patterns in LTM; they are also used to reconstruct visuo-spatial images in the
minds eye (the secondary transient memory store). Gobet and Simon stated that
the more typical the position, the stronger the associations that chunk will have
with semantic memory, including moves, plans and other patterns. Further, they
proposed that such positions are actually represented by templates, which are
essentially chunks with slots for variables. They therefore comprise a core
chunk and their parameters allow them to describe a range of chunks within a
class defined by the range of variable values. Templates can provide for large
constellations of pieces to be considered together where large chunks alone
cannot, since the number of chunks with, e.g. more than 10 pieces, required to
hold all meaningful patterns on those pieces would be unmanageably large.
Templates, instead, provide for the redundancy that occurs because classes of
chunks tend to share good moves, plans, tactical and strategic features etc. Gobet
and Simon emphasise, within template theory, the associations between chunks
and templates with semantic knowledge. As with chunking theory, the authors
suggested a leaning time for 8 seconds for chunks and templates. Two learning
parameters are proposed: Gobet and Simon also assert that like the chunking
theory, template theory is not limited to chess (Gobet 1998b p.127)
Template theory served to address the outstanding criticisms of chunking theory
in the following ways. The null effect of interference for recall of chess
positions could be accounted for by chunk size, since if less STM pointers are
required to encode a single position (possibly only one for Masters) then noise
26
will not necessarily eradicate that memory. Likewise, Holdings criticisms on

chunk size and conceptual knowledge were countered by direct modifications to
the theory, which were supported by empirical evidence. Finally, Gobet (1998a)
has used template theory to explain skill differences for search variables; this is
discussed in the next section.
The integration of pattern recognition and search

Gobet (1998a) conducted a replication of de Groots choice of next move
experiment with 48 Swiss players (ranging from Master to Class B) using de
Groots position A, and conducted an extensive analysis of the resultant verbal
protocols, including the generation of problem behaviour graphs (Newell &
Simon, 1972) and the extraction of the same quantitative variables as de Groot,
with the aim of comparing results and reinvestigating the effects of search
variables on quality of next move. Gobet was motivated both by empirical
evidence that opposed de Groots result that search variables did not differ across
skill levels, e.g. due to Charness (Gobet 2004) and by the lack of replication of
de Groots original experiment; he was undoubtedly also motivated in seeking
empirical evidence to support his own work at that time with Herbert Simon in
developing template theory, since although the research was published in 1998,
the original data was collected as part of a different study in 1986. As well as a
small skill difference for the mean depth of search, Gobet discovered a skill
effect for the way in which progressive deepening was conducted. The variables
in the study characterising progressing deepening behaviour related to the
number of reinvestigations of sequences starting with the same base move; these
were sub-divided into immediate reinvestigations (same base move considered
27
twice in succession) and non-immediate reinvestigations (same base move

considered twice with at least one different base move considered in between),
and also maximal and total values, with the former providing the largest number
of reinvestigations (immediate or non-immediate) among all base moves
considered. The maximal number of immediate reinvestigations had a positive
association with skill level and the maximal number of non-immediate
reinvestigations had a negative association with skill level.
Gobets main conclusions were that players in his sample differed along more
dimensions that those in de Groots sample, and that the average values on all
variables (pooled across skill levels) did not differ significantly between studies.
Gobet notes that the differences he found within his sample were mainly between
Masters and Class players. Since de Groots sample only included 2 players at
Class level, it is perhaps not surprising that such differences did not show up in
the original experiments.
Importantly, Gobet claims that his skill effects for search can still be accounted
for by pattern recognition models of chess thinking because sequences of moves
are likely to be associated with patterns: pattern recognition should facilitate the
generation of moves in the minds eye, permitting a smooth search (1998a p24).
Saariluoma presented further evidence of the pattern-recognition-based search
hypothesis (Gobet 1998a, 2004) with his smothered mate experiment, in which
high calibre players were asked to choose a move that would lead to mate in a
specially devised endgame. The position was one that had an efficient, yet
unusual sequence of moves that led to mate as well as a longer, more familiar
sequence. Players tended to choose the move at the beginning of the stereotyped
position.
28
Summary
In summary, the relative influences of recognition and search-and-evaluation on
chess skill are not fully understood. Further, the degree to which these are, in
fact, separate processes rather than alternative descriptions of the same process,
is unclear. Certainly most advocates of either theory believes that both
recognition and search mechanisms are fundamental to chess skill. For example,
de Groots (Gobet, 2004, p120) assertion that recognition serves to direct the
look-ahead search-and-evaluation suggests that these processes are, in some
sense, interdependent. Further, Holdings (1985) conclusion that search-andevaluation is the dominant process is based on the assertion that better players
plan these evaluations in a more effective way. Yet Holdings knowledgeable
evaluations (1985, p256) might well be directed by effective pattern-matching,
which is essentially De Groots conclusion. Gobet and Simons template theory,
developed in part due to criticisms of chunking theory from advocates of searchand-evaluation, provides for a credible explanation of skill differences for search
(if it is accepted that templates can store sequences of moves). This extended
theory apparently leaves no room for alternative explanations of chess skill
wherever it could be argued that patterns exist (e.g. any experimentation
involving real chess positions). It therefore offers the possibility of unifying both
recognition-based and search-based theories. To refine the template theory
explanation of skill differences on search variables, further data concerning such
differences would be of great benefit.
Further, the balance of chess research has been in favour of recall tasks, rather
than choice of next move tasks. The attractions of recall tasks (over choice of
next move tasks) in explaining chess skill are the objectivity of the measures and
29
the ease with which data can be analysed. Since chess skill is primarily
concerned with decision-making, however, it seems strange that there are not
more studies based on the choice of next move task. Finally, research based on
the choice of next move task, perhaps because of the analytical overheads the
task usually imposes, tends to focus on a small number of positions, often only
one notably Gobet (1998a). An obvious danger in generalising results from a
single position is that any position effects are discounted.
30
Methodology
This chapter outlines the experimental design, procedure and analytical methods
employed in the research. It also includes an ethical section. The ecological
nature of the experimentation in this study meant that a great deal of relatively
unstructured data (verbal protocols) were generated through the experimental
procedure. These data were subjected to a detailed and structured (qualitative)
protocol analysis that provided a set of quantitative variables to be entered into
statistical analyses. The intermediate results of the protocol analysis offer the
best means of conveying this part of the methodology and serve to precipitate the
relevant section of the Project Review. Appendix II therefore contains details of
the protocol analysis, including an example verbal protocol and Problem
Behaviour Graph (PBG).
Participants
Eight male chess players from four different clubs in Worcestershire and the
West Midlands took part in the experiment. Although their ages were not
recorded, all had been playing chess as graded players for between 30 and 45
years (mean 34.75 years, standard deviation 5.39 years). Their British Chess
Federation (BCF) grades were converted into the Fdration Internationale Des
checs (FIDE) standard Elo ratings using the BCF conversion formula (BCF,
2003) and subsequently mapped onto United States Chess Federation (USCF)
class divisions to facilitate comparisons between the results of this experiment
and those of existing studies (e.g. Gobet, 1998a). The players were assigned to
31
two skill levels according to their equivalent USCF class as described in Table 1,
below.
Level 1 (Expert; n=4) Level 2 (Class; n=4)

Sample mean (BCF grading)
168
120
Sample mean (FIDE Elo rating) 2087
1849
Equivalent USCF class
Expert
Class A/ Class B
Equivalent Elo rating band
2000 2200
1600 2000
Table 1; Description of Skill levels of experiment players
Materials
Three chess positions were used in the experiment. They were positions A, B1
and C of de Groots original choice of next move experiments (de Groot, 1965
pp88-93) and were labelled A, B and C, respectively. They were depicted as
standard chess position images on A4 card, complete with full move histories for
the games from which they were taken. The positions themselves can be found
in Appendix I
Portable digital recording equipment, and pen and paper, were also used in the
experiment. The recording time display on the equipment was made available to
the players in place of a chess clock.
Experimental Design and procedure

A 2 x 3 repeated measures experiment was conducted using the following
independent variables: Skill (Expert; Class) and Position (A; B; C).
The experiment, which was conducted with each participant individually and in a
quiet and undisturbed environment, consisted of a single choice of next move
task repeated across three conditions, defined by the three positions described
above (A, B and C). The procedure was essentially the same as in the original de
Groot experiments of 1938-43 (de Groot, 1965). Before the first task began the
32
experimenter instructed the player that he would be presented with the positions
one by one and, for each, would be required to choose his next move, as if he
were playing over the board in normal tournament play; the only difference being
that he was requested to think aloud as he did so. The experimenter clarified that
thinking aloud was not the same as providing a commentary on ones thought
process, i.e. it was simply a natural verbal expression of thought. Further, the
player was informed that the positions were from real games and were not chess
problems (typified by a single provable winning move); and that there were no
time limits imposed, although a guideline was provided: that the player should
aim to spend as much time on the task as they might reasonably expect to in a
tournament game. Once the experimenter had checked that instructions had been
understood and had gained the players informed consent for their participation,
the task began.
The conditions were conducted sequentially with the offer of a short break
between each if required. The position was presented to the player at the same
time the recording began. Thereafter the experimenter only intervened if asked a
direct question concerning procedure or if the participant had remained quiet for
a period of approximately 30 seconds; in the latter case the experimenter
prompted the player by asking, What are you thinking now? Throughout the
recording and wherever necessary, the experimenter noted questions for
clarification. At the end of each condition the recording was stopped and the
experimenter requested clarification accordingly. Most such instances concerned
a misreported or unspecified move, piece or square.
Upon completion of the three iterations of the choice of next move task the
experiment concluded.
33
Protocol Analysis
The data collected from the experiment consisted of a single verbal protocol for
each player at each level of the 2 x 3 design, giving 24 protocols in total. Each
protocol was transcribed into tabular format and used to generate a Problem
Behaviour Graph (PBG) according to the coding scheme set out in de Groot
(1965), Newell & Simon (1972) and Gobet (1998a). Appendix II describes the
coding scheme in greater detail and includes an example verbal protocol and the
PBG that was generated from it. It also provides definitions of the important
elements of PBGs from which the quantitative variables may be extracted.
Derivation of quantitative variables

Table 2 describes the set of quantitative variables derived from each graph, and
its means of derivation. Although most of these variables were originally
devised by de Groot (1965) and also used by Gobet (1998a), two were novel and
are indicated in the table.
34
Quality of Move
Total Time
Time of First Phase

Number of Base Moves
Number of Episodes
Number of Nodes
Total Depth
Maximal Depth of Search

Mean Depth of Search
Standard Deviation of Depth of
Search
Rate of Base Moves
Rate of Nodes
Total IR
Total NIR
Maximal IR
Maximal NIR
Number of Null Moves
Proportion of Null Moves
Subjective assessment of the quality of the chosen move

(see Appendix A for the derivation of scores)
Total time taken for choice of next move: time elapsed from
initial presentation of position to confirmation of next move
selection
Total time elapsed before first Episode begins
Number of distinct base moves (null moves permitted)
Number of distinct Episodes of problem-solving behaviour
Number of nodes (moves) considered, including repeated
and null moves.
Aggregate of search depths for each episode, with null
moves included in the totals. Episodic depth is defined by
the longest sequences of moves, beginning with the base
move, among all branches. This variable is only measured
to enable the calculation of Mean and Maximal Search
Depths.
The maximal number among all episodic depths, with null
moves omitted from the totals.
Mean episodic depth with null moves included; Total Depth
divided by Episodes.
Standard deviation of episodic depth with null moves
included. This is a new variable.
Rate of generation of distinct base moves; Total Time
divided Base Moves
Rate at which nodes are considered; Total Time divided by
Nodes
Total number of immediate reinvestigations of all base
moves
Total number of non-immediate reinvestigations of all base
moves
The maximal IR amongst all base moves
The maximal NIR across all base moves
Total number of null moves among all nodes. This is a new
variable and is only measured to enable the calculation of
Proportion of Null Moves.
Proportion of total number of nodes that are null moves;
Nodes divided by Null Moves. This is a new variable.
Table 2; quantitative variables derived from Problem Behaviour Graphs
Ethics
The only serious ethical consideration for this research is the non-disclosure of
any personally identifiable data both during and after the life of the study.
Although all data has been rendered anonymous before reporting, players
choices of next move have being assessed and thus they may have reason to feel
that their individual performance is under scrutiny. To mitigate against any such
misconceptions, the experimenter explained that each players data was to
35
remain anonymous and protected from unauthorised use under the Data
Protection Act 1998. The experimenter also explained that the anonymous
results would be published as part of the MSc. dissertation. The players were
also advised of their right to withdraw from the study, even retrospectively, and
the experimenter provided contact details to each player if they wished to
exercise this right.
The experimental procedure itself was totally innocuous there were no risks to
the players physical or mental well-being as a result of taking part.
36
Analysis
Each dependent variable in Table 2 except Total Depth of Search and Number of
Null Moves was subjected to a repeated measures factorial analysis of variance
(ANOVA) with the between-subjects variable Skill and the within-subjects
variable Position. The criterion of sphericity was satisfied for all variables
entering each analysis except for Number of Non-immediate Reinvestigations,
which was subsequently excluded from the analysis. These results for each
variable are provided in the next section in meaningful groups; details of other
tests are provided under the appropriate headings. The second section compares
the results with those of similar studies, notably Gobet (2004) and the final
section provides a higher level discussion of all findings.
Results from this study

Quality of Move
The main effect of Skill on Quality of Move is significant (F(1,6)=9.757,
MSE=15.042, p<0.05) whilst the main effect of Position on Quality of Move
(F(2,12)=3.683, MSE=6.292, p<0.6) is weakly significant; there is no interaction
effect.
Table 3 shows the actual moves selected by each player across the three
positions, together with the Quality of Move scores assigned to each of those
moves and Figure 1, below, provides a plot of the marginal means of Quality of
Move for each Skill level across the three positions.
37
Skill
level
Expert
Class
Position A
Move
Quality
Rc2
1
Position B
Move
Quality
Rb8
5
Position C
Move
Quality
Ne4
3
Bxd5
Rb8
Kh8
Bxd5
Rb8
Bd7
Bxd5
Rb8
e5
Rc2
Kf8
d5
b4
Rb8
Bd7
b4
Kg7
e5
Kh1
1
h5
2
Ne4
3
Table 3; Moves chosen and Quality of Move for all players across all
positions
Quality of Move
6
Estimated Marginal Means
Position
A
1
0
Class
C
Expert
Skill level
Figure 1; estimated marginal means for Quality of Move

The most interesting features of the data illustrated above are that although
Position A appears to split Experts from Class players in terms of Quality of
Move, Move Quality in the other two Positions is better balanced across Skill
levels. In particular, the marginal means for Quality of Move across Skill levels
in position C are almost identical (Class = 3; Expert = 3.25). Further, no player
selected a bad move in Position B, with no Quality of Move score below 2.
38
Time variables
There is no main effect of Skill on Total Time (F(1,6)=0.605, MSE=29.592, ns)
and, in fact, Experts apparently taken longer than Class players in choosing their
next move in all three positions, the biggest of which was observed for Position
A (a mean total time of 14.5 minutes for Experts versus 9.2 minutes for Class
players). The same pattern is observed for the Time the First Phase; the main
effect of Skill is non-significant here also (F(1,6)=3.604, MSE=3.604, ns).
There is, however a significant main effect of Position on Total Time
(F(2,12)=8.117, MSE=64.528, p<0.01) (although not Time of First Phase) and there
are no interaction effects on either time variable. The most noticeable
differences were between positions B, which was considered, on average for 8.6
minutes and C, which taxed the players for a mean time of 14.3 minutes.
Base Moves and Episodes

As with Total Time, there are also main effects of Position upon the Number of
Base Moves (F(2,12)=7.104, MSE=22.792, p<0.01) and Number of Episodes
(F(2,12)=3.285, MSE=69.542, p<0.08), although the effect is weak in the latter
case. There are no main effects of Skill, nor any interaction effects, upon either
of the two variables, whose marginal means are summarised below with the
corresponding number of legal moves, for each position.
39
Position
Marginal Means
Number of Base
Moves
4.625
Number of
Episodes
10.25
7.75
Number of
Legal Moves
56
35
C
6.375
13.625
37
Table 4; Marginal Means for Base Moves/ Episodes and Number of Legal
Moves
As can be seen in Table 4, the relationship between Position and Number of Base
Moves does not apparently stem from the number of legal moves available in
each position: an average of 4.625 base moves are generated for position A (56
legal moves) and 3 for position B (35 legal moves), yet 6.375 of the possible 37
legal moves are generated for position C. Further, it can be seen that there
appears to be a linear relationship between the mean Number of Base Moves and
the mean Number of Episodes.
Number of Nodes
The main effect of Skill upon Number of Nodes is significant (F(1,6)=6.593,
MSE=4056, p<0.05), as is the main effect of Position (F(2,12)=4.618,
MSE=1439.292, p<0.05) although there is no interaction effect. Inspection of the
marginal means, as illustrated in Figure 2, indicates that Experts consider more
nodes than Class players in all positions. Position A shows the greatest skill
differences (66.5 nodes for Experts versus only 17.5 for Class players) whereas
Position B and C show less differences on skill levels. These latter two positions
differ greatly from each other, however, on mean Number of Nodes across Skill
levels (26.9 in Position B versus 53.63 in Position C). Further, Experts search
approximately the same number of nodes in Positions A as C, whereas Class
players search approximately the same number of nodes in Positions A and B.
40
Number of Nodes
70
60
50
40
Position
30
A
20
10
Class
C
Expert
Skill
Figure 2; Marginal Means for Number of Nodes

Finally, the distribution of Number of Nodes is shown in Figure 3. Apart from
the outlier (117 nodes searched by one of the Expert players in Position A),
Number of Nodes is fairly normally distributed with all values < 100.
Frequency Distribution of Number of Nodes
6
Std. Dev = 26.51
Mean = 41
N = 24.00
0
0 - 10
40 - 50
20 - 30
80 - 90
60 - 70
100 - 110
Number of Nodes
Figure 3; Frequency distribution of Number of Nodes
Rate of generation
There are no effects (main or interaction) of Skill or Position on Rate of Base
Moves. The main effect of Skill level on Rate of Nodes is weakly significant
(F(1,6)=5.646, MSE=13.777, p<0.6) whilst there is no effect for Position
41
(F(2,12)=0.590, MSE=0.001978, ns) and no interaction effect. Better players

generate nodes more rapidly (Expert: mean 4.09 , s.d. 1.03; Class: mean 2.58,
s.d. 1.48), as illustrated in Figure 4.
Number of Nodes per minute
5.0
4.5
4.0
3.5
3.0
Position
2.5
2.0
1.5
Class
C
Expert
Skill level
Figure 4; Estimate marginal means of Number of Nodes per minute

Depth of Search
The main effect of Skill for Mean Depth of Search is significant (F(1,6)=3.977,
MSE=3.899, p<0.1), although this significance is weak, and there exists a main
effect of Skill on Maximal Depth of Search (F(1,6)=18.609, MSE=70.042, p<0.01).
There are no other main effects on this group of either Skill or Position, or any
interaction effects. Note that this group includes the new variable Standard
Deviation of Depth, whose inclusion is discussed later. The relationship between
Skill level and some of the search variables is investigated below.
Predicting search variables

To investigate the predictive power of skill (taken here as the continuous variable
Elo Rating) on the two search variables Mean Depth of Search and Maximal
Depth of Search, these latter variables were first pooled across positions A, B
and C for each player by:
42
a. selecting the maximal search depth of all episodes undertaken to derive

Maximal Depth of Search (pooled);
b. Pooling both Total Depth of Search and Number of Episodes to derive the
new quotient Mean Depth of Search (pooled).
Table 5 summarises the corresponding search data entering the analysis.
Elo
rating
1720
Maximal Depth
of Search
(pooled)
4
Total Depth
of Search
(pooled)
8
Number of
episodes
(pooled)
7
Mean Depth
of Search
(pooled)
1.14
1780
79
25
3.16
1925
111
36
3.08
1970
128
29
4.41
2010
14
170
43
3.95
2045
11
100
26
3.85
2105
170
44
3.86
2190
9
144
43
3.35
Table 5; Pooled Mean and Maximal Depth of Search by Player
The regression of Maximal Depth of Search on Elo Rating is significant
(F(1,22)=10.597, MSE=59.802, p<0.05). The regression line is given by:
Maximal Depth of Search = -14.830 + 0.011 x Elo Rating
This predicts that Maximal Depth of Search increases by approximately 2.1 ply
per 200 Elo points (a single standard deviation in the Elo scale and the width of
most USCF classes).
The regression of Mean Depth of Search on Elo Rating is weakly significant
(F(1,6)=4.672, MSE=3.058, p<0.08). The regression line in this case is given by:
Mean Depth of Search = -4.888 + 0.004 x Elo Rating
This predicts an increase of 0.8 in Mean Depth of Search for every 200 Elo
points.
43
Reinvestigations
There are no main effects of Skill or Position on any of the reinvestigation
variables although the interaction effect upon Maximal Number of IR is
significant (F(1,6)=7.895, MSE=6.25, p<0.05). This is illustrated in Figure 5 The
interaction is actually disordinal: Class players generated less immediate
reinvestigations in Position A than Positions B and C, whereas the opposite is
true for Experts.
Maximal Number of IR
5
Position
A
1
B
0
Class
C
Expert
Skill level
Figure 5; Estimated Marginal Means for Maximal Number of IR
Null moves
There is a main effect of Skill upon the Proportion of Null Moves (F(1,6)=7.414,
MSE=0.04596, p<0.05) (F(1,6)=6.005, MSE=3037.5, p<0.05), and no main effect
for Position, nor any interaction effect. This effect is illustrated in Figure 6,
below: Proportion of Null Moves is inversely proportional to Skill, suggesting
that better players are more likely to think in terms of complete sequences of
moves; it is noted from the verbal protocols that some players in the Class group
tended to consider sequences of own moves with null moves in place of some
opponent moves.
44

.18
.16
.14
.12
.10
Position
.08
.06
.04
Class
C
Expert
Skill
Figure 6; Estimated Marginal Means of Proportion of Null Moves
45
Summary
The following table summarises the main effects of Skill level and Position on
each of the dependent variables entered into the analysis.
Dependent variable
Quality of Move
Main effect of Main effect of Interaction

Skill2
Position
effect
p<0.06
ns
p<0.05
Total Time
ns
p<0.01
ns
Time of First Phase
ns
ns
ns
Number of Base Moves
ns
p<0.01
ns
Rate of Base Moves
ns
ns
ns
Number of Episodes
ns
p<0.08
ns
Number of Nodes
p<0.05
p<0.05
ns
Rate of Nodes
p<0.06
ns
ns
Mean Depth of Search
p<0.1
ns
ns
Maximal Depth of Search
p<0.01
ns
ns
Number of IR
ns
ns
ns
Number of NIR
ns
ns
ns
Maximal IR
ns
ns
p<0.05
Maximal NIR
ns
ns
ns
Number of Reinvestigations
ns
ns
ns
ns
ns
p<0.05
Table 6; Summary of main effects of Skill and Position on dependent
variables
Hence the data presented suggests that Skill has a main effect upon Quality of
Move, Number of Nodes, Rate of Nodes, Mean Depth of Search, Maximal Depth
of Search and Proportion of Null Moves; and that Position has a main effect upon
Quality of Move, Total Time, Number of Base Moves, Number of Episodes and
Number of Nodes. Note that there is only one interaction effect (upon Maximal
IR) yet there are no main effects for this variable.
p-values quoted at standard levels (0.01, 0.05) except where p>0.05. In this case the actual p-value is
quoted, rounded to 1dp.
46
Comparison with other studies

The results reported above are interpreted in the context of the design and sample
size. This is particularly important for comparisons with results from other
related studies, i.e. Gobet (1998a) and de Groot (1965). The sample was fairly
small sample with a relatively narrow range of skill levels; in particular there
were no Masters among the sample. De Groots sample3 included players of all
skill levels down to Class (n=14; Grandmasters=5, Masters = 2; Experts = 5;
Class = 2). Gobets sample was larger (n=48) with average skill level
somewhere in between de Groots and the sample used in this study
(Masters=12; Experts=12; Class A=12; Class B=12). Conversely, the data in
both of the other studies is based on Position A only, whereas this study
employed three very different types of position (see Appendix I).
Quality of Move
The results of both this study and Gobets confirm de Groots assertion that
better players choose stronger moves. The significance of the effect of Position
on Quality of Move in this study, however, suggests that some positions are more
difficult to select a good move for than others in particular, Position A.
Interestingly, the position that the players were least comfortable with (Position
C) generated the best quality moves on average. Figure 1 suggests an interaction
effect, with the tactical and complex Position A splitting the two groups
effectively and the strategic and quieter Position B showing little difference, but
the corresponding F ratio is non-significant.
For the purposes of comparison, this sample includes only the players for whom detailed
statistics have been extracted from their Position A protocols, courtesy of Gobet (1998a)
47
Time variables
Gobet (1998a) found a weakly significant result for Total Time, suggesting that
Masters choose their next move more rapidly than lower calibre players. The
results above show no differences between Experts and Class players, although
the marginal means indicate that Experts are slower than Class players (12.68
minutes versus 10.46 minutes). The implication is that there are, in fact, no
differences between players of different levels in the time taken to choose their
next move. An observation from the experiment is that some players consciously
truncated their thought processes on the basis that, in a tournament game, too
much time spent on the single choice would lead them into time trouble. Gobet
found a significant reduction in the Time of First Phase for higher calibre players
whereas the results here are also non-significant. Time of First Phase was
perhaps one of the more difficult variables to extract from the protocols due to
the poorly defined boundary it shares with the Phase of Elaboration (de Groot,
1965). Although certain players deliberately sized up the situation and discussed
general plans before entering a longer phase of search and evaluation, others
apparently focused immediately on base moves and corresponding sequences,
whilst one player spent the majority of his time apparently in the First Phase
before committing to a move. This issue is revisited in the Methodological
Discussion.
Base Moves and Episodes

Gobets results suggest a curvilinear relationship for both variables with Skill,
since Class A players generate more base moves and episodes than either Experts
or Class B players, although only the effect on Number of Base Moves is
48
significant (Gobet 1998a). Perhaps unsurprisingly, with Class A and B players

pooled in this experiment, there are no significant effects of Skill. The
significant effects of Position on both Number of Base Moves and Number of
Episodes, however, again suggests that different types of position give rise to
different search and evaluation strategies irrespective of skill level, but that this
relationship is not explained by the complexity of the position (as measured by
number of legal moves). Position C demanded the widest search for base moves
and generated the most episodes; it may be argued that the character of this
position is perhaps more ambiguous that the other two, containing strategic and
tactical themes. It is possible that this required players to pursue potential
tactical lines as well as more strategic moves.
Search variables4
De Groot (1965) based his main conclusion, that recognition is the dominant
mechanism in chess thinking, on two results suggesting that search behaviour
does not differ across skill levels (at least at the higher levels of chess skill):
1. Chess players rarely search more than 100 nodes in any position;
2. There are no significant effects of skill on any search variable (e.g. Number
of Nodes, Mean Depth of Search, Maximal Depth of Search).
Whilst both this study and Gobets (1998a) provide evidence in support of the
first result, this study shows that Experts do search more nodes than Class
players. This is partially backed up by Gobet (1998a): although he did not find a
skill effect for Number of Nodes in position A, the average number of Nodes was
considerably lower for the Class B group (33.9) than for the other groups (58 for
4
The variables in the previous groups Number of Nodes, Rate of generation and Depth of Search
are considered here together.
49
Masters, 58.3 for Experts and 56.8 for Class A players; Gobet 1998a p13). The
significant difference found here, therefore, might be due, in part, to the reduced
skill range among the players in the experiment; it could be that the biggest skill
differences for this search variable are actually to be found between Experts and
Class players. This suggests that there is a improvement in search capacity up to
Expert level, beyond which this measure remains fairly constant and that de
Groots second result, above, does not hold below the level of Expert.
This study also confirms the significant result from Gobet (1998a) concerning
the effect of Skill on Mean Depth of Search, and adds evidence to the argument
(counter to that of de Groot) that higher calibre players employ greater search
than lower calibre player due to the significant result on Maximal Depth of
Search.
To investigate such effects in more detail, Charness (Holding 1985; Gobet, 2004)
and Gobet (1998) made predictions of search capabilities for different skill levels
by analysing the relationship between Elo rating and selected depth of search
variables (Maximal Depth of Search and Mean Depth of Search). Charness, in
his 1981 experiment investigating the effects of age and skill on search
capabilities, used four positions, two of which were strategic whilst the other two
were tactical in nature. Gobet used only one position, de Groots position A,
which is highly tactical in nature. The regression equations calculated from the
pooled data in this study suggest slightly larger increases in Maximal Depth of
Search and Mean Depth of Search per 200 Elo points than evidenced by the
previous studies (see Table 7).
50
Prediction
This study
Charness
Gobet
Increase in Maximal Depth of
2.1
1.4
N/A
Search per 200 Elo points
Increase in Mean Depth of Search
0.8
0.5
0.6
per 200 Elo points
Table 7; predicted gain in search capabilities as a function of Elo rating
In interpreting this result it is noted that:
1. de Groots results are based on a sample dominated by Grandmasters,
Masters and Experts;
2. Charness and Gobet found skill differences for search capabilities when
lower calibre players were more prevalent in the sample;
3. Both Charness and Gobet have suggested that the relationship between skill
level and search capabilities across all playing levels is not linear. Whilst
Charness proposes a plateau effect for high calibre players, Gobet suggests a
curvilinear relationship, whereby high calibre players actually search less due
to better recognition-led evaluation capabilities.
Given the relatively low calibre of the players in this sample, the data presented
here therefore extends the model of Gobet in suggesting that rate of change of
search capability (as measured by Mean and Maximal Depth of Search) is
greater at lower skill levels (e.g. between Class A/B and Expert). Note that the
predictions for Mean Depth of Search are similar across three studies that used
different combinations of types of position. This backs up the result of the
previous section that states that there is no significant effect of Position on either
Mean Depth of Search or Maximal Depth of Search.
Rate of generation
The weakly significant effect of Skill on Rate of Nodes is divergent with Gobets
(1998a) result. Although neither study provides evidence for an effect of Skill on
51
Rate of Base Moves, Charnesss 1981 result (Gobet, 1998a) suggests that
Grandmasters generate more base moves per minute than Experts. The reduced
sample size in this study might explain why such a result was not identified here.
Reinvestigations
There was a degree of convergence with Gobet (1998a) concerning
reinvestigation variables. Gobets only significant results in this area were for
the main effects of Skill on Maximal Number of IR (p<0.005) and Maximal
Number of NIR (p<0.02) (Gobet 1998a p16). The results presented in the
previous section indicate that there are no main effects of Skill on these
variables, although there is an interaction effect.
Gobet suggested that that Maximal Number of IR is proportional to Skill level,
which is backed up by the plot of marginal means of Maximal Number of IR in
this study (Figure [max ir]). It is interesting to note that if only the data for
Position A are entered into an ANOVA the effect of Skill is actually significant
(F(1,6)=10.714, MSE=28.125, p<0.05).
A high Maximal Number of IR represents a situation where a player becomes
deeply involved in the analysis of a particular sequence (or branch) of his or her
search tree, returning to the same base move a number of times in succession. It
is also seen as evidence for progressive deepening (Gobet 2004, p110). Position
A is the most tactical of the three at it appears that Expert players become deeply
involved in the tactical analysis required to select a good move. The disordinal
nature of the interaction identified in this study also suggests that Class players
are less equipped to do the same and even tend to have longer sequences of
reinvestigations for quieter, less tactical positions.
52
Gobet (1998a) also asserted that Maximal Number of NIR is inversely

proportional to Skill, yet an ANOVA with the current data (Position A only)
generates a non-significant result, as Figure 7 indicates.
Maximal Number of NIR
2.0
1.8
1.6
1.4
1.2
Position
1.0
.8
.6
Class
C
Expert
Skill
Figure 7; Estimated Marginal Means for Maximal Number of NIR
Null Moves
The significant skill effect for Proportion of Null Moves suggests that better
players think in terms of completely specified sequences of moves more often
than lesser players. By means of a comparison, Saariluoma and Hohlfeld (Gobet
2004)5 examined the proportion of null moves as a function of position type
(strategic or tactical) and found that it is greater, at approximately 12%, in
strategic positions; Charness (Gobet 2004) previously found this percentage to be
approximately 10%. Interestingly, although the result in the current study holds
for Expert players (Position B = 11%; Position A = 5.5%; Position C = 5%),
Class players search approximately 15-16% null moves irrespective of position
type. (See also Figure 6.)
Calibre of players involved in the study unspecified.
53
The differences in proportions across the 3 positions as each skill level lead to
two alternative interpretations:
1. Strategic positions (Position B) demand more generalised plan
formulation than tactical positions (Position A and, to a certain extent,
Position C). result is an increased proportion of templates of move
sequences;
2. Better players are simply more thorough in their analysis of tactical
sequences.
Summary
The results generated by this study broadly agree with those of Gobet (1998a),
Charness (Holding, 1985; Gobet, 2004) and Saariluoma and Hohlfeld (Gobet
2004) and argue against some of de Groots earlier conclusions. Better players
make better choices of move, as shown by de Groot (1965) and Gobet (1998a),
but they also search more, to a greater depth and more thoroughly than lesser
players. The exact relationship between skill and both capacity and depth of
search is probably not linear. It appears that the rate of increase in search
capacity plateaus at the level of Master and above; and that depth of search may
actually vary in a curvilinear fashion with skill level, with a rate of increase that
itself decreases, and actually changes sign, as skill level increases from Class B
to Grandmaster. Given the difference in calibre of players in the samples
considered across the various studies, it is entirely possible that de Groots
results on search variables were actually correct it is merely the applicability of
the conclusions to lower skill levels that is in question.
54
Project Review
This chapter reflects upon a two key issues: the necessary refocusing of the
research throughout its course (including modifications both to the design and
the analysis) and the validity of the data collection and analysis methods used in
support of the choice of next move task.
Focus of research
The final dissertation is far more focused than the original research proposal
suggested in might be. The main reason for this is that one half of the study was
suspended to keep the study to a manageable size, both in a positive sense (due to
the healthy amount of material available from the choice of next move task) and
a negative sense (due to both access difficulties and increased overheads of
qualitative analysis). The original experimental design included a choice of next
move task and a personal construct elicitation task, the latter conceived with the
aim of investigating the nature of conceptual knowledge that chess players
possess. Holding (1985) postulated that conceptual knowledge, along with
search and evaluation, explain skill in chess and one of his main criticisms of
chunking theory was that chunks were too small in size to reflect conceptual
knowledge (Gobet & Simon, 1998b). Template theory (Gobet & Simon, 1998a)
addresses this criticism by introducing larger perceptual structures known as
templates, which are large enough, in theory, to encode entire positions.
Personal Construct Psychology (PCP) is concerned with how individuals
construe the world, based on the assertion that each man possesses an ever
changing set of hypotheses about the world that are represented on personal
55
constructs essentially axes of reference characterised by contrasting poles (e.g.

we may hypothesise about people on the construct good-bad or we may
hypothesise about chess positions on the construct, tactical-strategic). Must of
PCP is due to George Kelly, who also devised the Repertory Grid technique,
which includes methods for the elicitation of personal constructs (Fransella, Bell
& Bannister, 2004).
Under the assumption that personal constructs, which may exist at any level of
abstraction, are equivalent ways of classifying/ describing both templates and the
higher level schemata that they relate to, the research questions that the second
half of the study concerned, therefore, were:
How many constructs do chess players of a given skill level possess?
How are the construct systems of chess players organised?
What degree of overlap is there between different chess players construct

systems, particularly those players with similar skill levels?
What are the most concrete constructs and do they correspond to Chase &
Simons piece relations in chunking theory?
Thus the questions for this part of the study were fairly open-ended and the
analysis was intended to be investigative. The basic procedure chosen was the
method of triads, whereby thee elements (in this case, chess positions) are
presented to the participant, who is asked a question of the form, How are two
of these elements similar and thereby different from the third? The context for
answering the question is defined by the research hence here it was a choice of
next move task on each of the three positions. The similarity-difference pair
provided by the participant would form the poles of a new construct, which the
experimenter would help the participant into something meaningful to him or
56
her. This would define a single episode of elicitation the presentation of a new
triad would mark the next. Typically only one (or possibly two) constructs is
elicited from each triad before the next is presented. This implies that a fairly
large number of elements is employed (typically more than the number of
constructs expected).
Personal construct elicitation is most commonly used in clinical psychology as a
means of establishing a patients views on self and others with a view to guiding
choice of therapy. As such, elements are provided simply as names or roles of
individuals in the patients life a triad of role names activates similarities and
differences almost immediately. In the more general case of knowledge
elicitation, elements take the form of exemplars from the participants domain of
expertise. Unfortunately triads of such exemplars do not always instantly
activate similarities and differences since, by their very nature, the exemplar
elements require some consideration. Chess positions are typical of this type of
exemplar.
It was decided to attempt to elicit 10-15 personal constructs of chess knowledge
using 18 separate positions, presented as 18 different triads (i.e. positions are
sampled with replacement with the condition that each appears in exactly three
triads and never with the same position twice).
Upon the first run of the experiment, it became clear that 18 triads was far too
ambitious a target for the allotted 2 hour elicitation session because the players
required time to orient themselves to each of the three positions in the each triad
before they could provide any similarity-difference pairs. The experimenter
attempted to counter this by imposing a limit of 5 minutes consideration time, but
the effect was that the actual elicitation procedure was used by the participants to
57
think aloud in analysing each of the three positions to their satisfaction, and
time quickly ran out.
No player completed more than six triads in their 2 hour session and there was no
time for construct refinement (whereby construct definitions are challenged,
developed, discarded etc. and construct hierarchies are developed by adding
new, related constructs at higher and lower levels of abstraction and possibly
linking constructs already elicited) Since the choice of next move task had
already been completed at the start of the session, this meant that the participant
has been engaged in experimentation for three hours. This is close to the limit of
concentration for a single session and, since the players were not being paid for
their involvement, it was unreasonable to expect them to continue. The
experimenter then fell back on a contingency plan, whereby construct refinement
was completed by each participant via e-mail, the experimenter having analysed
that individuals embryonic construct set and posed specific questions. Although
this was actually been completed with six of the eight participants, the data were
not fully analysed due to time constraints shared understanding of an
individuals personal constructs is severely hindered if dialogue concerning those
constructs is limited to e-mail, causing an unmanageable increase in workload.
In summary, therefore, the main reasons why personal construct elicitation failed
as part of this research study were:
1. Relatively slow time period for each triadic elicitation episode due to
complexity of elements (chess positions);
2. Lack of continued access to participants (3 hour session limit);
3. Overheads on analysis imposed by e-mail completion of task.
58
The experimenter has retained the data and recommends that, if the study were to
be extended, this data is analysed thoroughly to establish answers to the research
questions set out above.
Choice of next move task
Thinking aloud
As de Groot himself noted (1965, p80), the validity of thinking aloud as a
means of expressing ones thought process may be called into question. de Groot
pioneered the use of the technique and others, in particular Herbert Simon, have
advocated its continued use for gaining insight into human problem solving
techniques. Chess is particularly well-suited to verbal protocols since it includes
a great deal of common and well-defined terminology to describe moves, tactics,
plans and positional features. There are obvious dangers in interpreting verbal
protocols as perfect records of thought, however; de Groot mentions a few of
these, e.g. incompleteness due to unconscious and rapid thought, the disruptive
influence of slowing ones thinking down to verbalise thought etc. (1965, pp8084). Further, individual differences in style of verbalisation cannot be ruled out
however, as a confounding factor in (ultimately) deriving protocol statistics. Of
the sample of eight in this experiment, some players were certainly more at ease
with thinking aloud than others. The following behaviours were observed from
different individuals:
Long periods of silence where, it is almost certain, complete sequences

were being calculated that were never expressed;
59
Long periods spent in the First Phase or Transitional Phases, followed

fairly rapidly by sequence assessments or even next-move selection,
suggesting that the First/ Transitional Phase verbalisations were in fact
masking deeper calculations;
Fairly rapid repetition of the opening moves from a sequence to

precipitate the investigation of a new branch. In these cases it was
difficult to tell whether such repetition constituted fresh consideration of
the base move (indicating a new episode) or reorientation of ones place
in a search tree (indicating a new branch in the same episode). To avoid
the introduction of experimenter bias, the former was assumed, according
to the coding scheme as described by de Groot (1965).
The provision of a running commentary on ones thought process rather

than the direct verbalisation of thoughts. This occurred due both to
unease with thinking aloud and over-helpfulness!
Fortunately, none of these behaviours were permanent features of any

individuals thinking aloud. It is hard to believe, however, that such behaviours
were limited to the sample involved in this experiment. Mitigation for these
behaviours could involve a practice run followed by experimenter feedback,
although it could be argued that thinking aloud is a skill that can only be learnt
effectively over longer periods (particularly to combat the first behaviour).
Coding of verbal protocols and problem behaviour graphs

The coding scheme for verbal protocols and PBGs in explained in greater detail
in Appendix II The experimenter had very few issues in coding the verbal
protocols due to the apparent universality of de Groots coding scheme (based on
60
Selzs framework). The only issues arose in the identification of boundaries

between the First Phase and First Episode, particularly for players exhibiting the
second behaviour described above.
Having established the macro-structure of the verbal protocol, the task of coding
each episode as a sequence of moves in the PBG remained. In terms of
identifying player errors (e.g. in naming moves, pieces or squares) the
experimenter, a non-chess player, had very few problems, since such errors stood
out as obvious anomalies in logical sequences of moves and could easily be
corrected. (A good analogy is that of an error-correcting code: the correction can
be done without an understanding of the content.) The clarity of the coding
scheme due to de Groot (1965), Newell & Simon (1972) and Gobet (1998a) also
greatly assisted in converting verbal protocols into PBGs. There were occasions,
however, on which the apparent structure of thinking exhibited by some of the
players did not fit into the PBG formulation. Examples include:
The expression of fragments of sequences (i.e. no base move and location

of first move in fragment unspecified or introduced with a remark such
as, so if at some point we could play. The PBG coding scheme
forces such fragments to be coded as transitions, even though calculations
are being carried out, because it requires moves to have clearly defined
positions;
The expression of what are, effectively, opponent base moves. Some

players apparently used a technique whereby they pretended that the
opponent was on move and started to calculate sequences from candidate
opponent moves. In the PBG formulation these must all be coded as
61
branches following a null base move. Whether these sorts of calculations

constitute different episodes of thought remains to be decided;
The representation of not moves in the PBG. These are not quite the
same as null moves, e.g. not Qe4.
It is recommended that, in an extended study, extensions to the PBG coding

scheme are trialled, with feedback sought on validity from chess players, and
implications for variability in the results of quantitative analysis investigated.
62
Conclusions
The specific research questions for this study were as follows:
Do club-level chess players of differing calibres differ in terms of quality

of move selection?
Do club-level chess players of differing calibres differ in terms of

capacity of search, mean and maximal search depth, and thoroughness of
search?
To what degree do the levels of search activity in club-level players fit

with existing models of chess thinking?
The first two of these questions have been answered directly by the analysis:
Experts choose better moves than Class A/B players across both tactical and
strategic positions; Experts also search more, to a greater depth and more
thoroughly than Class A/B players. To address the final question, it is useful to
turn again to Gobets (1998a) replication of de Groots experiments. Gobet
provides a useful summary of which effects would be expected under both
recognition-based and search-based models of chess skill: Both pattern
recognition and search models predict that stronger players choose better moves,
that they select moves faster, and they generate more nodes in one minute
Search models predict that stronger players search more nodes and search
deeper. Finally, pattern recognition models predict that strong players mention
fewer base moves, reinvestigate more often the same move, jump less often
between different moves and have a shorter first phase. (Gobet, 1998a, p23).
These postulated relationships for different models of chess skill are shown in
Figure 8 below; Proportion of Null Moves is assessed to vary inversely with skill
63
level under search-based models, since searches should be more completely

defined, and has been added to Figure 8, accordingly.
Figure 8; postulated differences in variables for increase in skill level under

different models of chess skill
Gobet suggested that he had identified all differences expected under a
recognition-based model but that some changes expected under a search-based
model of chess skill had not been found, since there were no skill differences on
Number of Nodes. The results from this study, however, suggest that all skill
differences expected under a search-based model had been identified, although
Rate of Nodes and Mean Depth of Search only weakly. They also suggest that
there are no skill differences on any of the variables only associated with the
recognition-based model.
Two major conclusions may be drawn from this set of results. Firstly there is
strong and continued evidence for differences in search capabilities across skill
levels in chess players, building on the results of Gobet (1998a), Charness
(Holding, 1985; Gobet, 2004) and Saariluoma and Hohlfeld (Gobet 2004). Such
evidence argues against the basis of de Groots main conclusion (1965) that
recognition is the dominant mechanism underpinning chess skill. Proponents of
64
template theory (e.g. Gobet), however, argue that such continued results for
search differences across skill levels do not undermine the recognition-based
theory of chess skill itself. In response to Holdings assertion that differences in
depth of search cannot be explained by recognition-based models, Gobet states,
this is obviously wrong, as pattern recognition should facilitate the generation of
moves in the minds eye, permitting a smooth search. (Gobet, 1998a, p24).
Thus skill differences on variables previously associated with search-based
models of chess skill can be explained by recognition-based models.
The second major conclusion to be drawn, however, suggests that there is less
support for recognition-based theory from the results presented above. None of
the variable differences that Gobet (1998a) asserts are predicted only by
recognition-based models (and not search-based models) are significant in this
study.
These two conclusions must be placed in the context of the calibre of players
involved in the study, however. It may be that the results hold only between
Class A/B players and Experts. This, however, would provide evidence to the
fact that the better players at club level are superior primarily because of their
search capabilities and not recognition. A different model of chess skill may be
required for players below the level of Master.
65
Appendix I: de Groot positions

POSITION A
1. d4 d5 8. Bd3 Nc6
2. c4 e6 9. O-O cxd4
3. Nc3 Nf6
4. Bg5 Be7
5. e3 O-O
6. Nf3 dxc4
7. Bxc4 c5
WHITE TO MOVE
15. Ba2 Bc6

16. Rac1 Qb6
10. exd4 Nb4
11. Bb1 Bd7
12. a3 Nbd5
13. Qd3 g6
14. Ne5 Rc8
66
POSITION B
1. e4 e5 11. Qf3 O-O

2. Nf3 d5
3. exd5 e4
4. Bb5+ c6
5. dxc6 bxc6
6. Ba4 exf3
7. Qxf3 Nf6
8. O-O Be7
9. Bxc6+ Nxc6
10. Qxc6+ Bd7
BLACK TO MOVE
21. Qg4 Qe5

12. d3 Qc7 22. Be3 Bf4
13. Nc3 Bd6
23. Bd4 Qxd4
14. h3 Bc6 24. Qxf5 g6
15. Qe2 Rfe8
25. Qc5 Qd7
16. Qd2 Nh5
26. Qxh5 Bxe4
17. Qg5 Bh2+
27. Qg4 Qxg4
18. Kh1 Re5
28. hxg4 Bc6
19. Qh4 Rf5
29. Rfe1
20. Ne4 Bg3
67
POSITION C
1. c4 e6
2. d4 Bb4+
3. Bd2 Bxd2+
4. Qxd2 f5
5. Nc3 Nf6
6. g3 d6
7. Bg2 Qe7
BLACK TO MOVE
8. O-O-O Nbd7
9. e4 fxe4
10. Nxe4 O-O
11. Nc3 Nb6
12. Qe2 Qd7
13. Nf3 Qc6
14. b3 a5
15. a4 Nbd5
16. Nb5 Nb4
17. Bh3
68
Assessment of quality of move

De Groot (1965, p128) conducted a thorough analysis of position A to provide an
assessment on the order of quality of the best 22 among all 56 legal moves. He used
this analysis to assess the performance of each of his skill levels in next move
selection. Gobet (1998a) reanalysed position A to assign a quantitative score to each
move, on a scale of 1 (weak move) to 5 (winning move) (Gobet 1998a; Gobet, 2005)6.
This enabled him to enter quality of move into a quantitative analysis. Although
Gobet (1998) does not report this scoring scheme for each of the 56 legal moves,
Gobet (2005) contains enough information to reconstruct this scheme.
There is less data available for positions B and C, however, the best source being de
Groots ordering of the best 10 moves for the former and 9 moves for the latter (1965,
p129). To generate move quality scores, an analysis of all positions (A, B and C) was
conducted with the computer chess engine Fritz 5 (Chessbase, 1997), truncated at 13
plies. The Fritz 5 user manual asserts that Fritz analyses positions more accurately at
odd-number ply-depths. 13 plies was the greatest odd-number depth to which Fritz
analysis could be reasonably conducted given the computing resources available. It
should be noted, however, that this represents an extremely strong analysis.
The Fritz analysis generated evaluations for each legal move for each position; Fritz
evaluations are based on a number of factors, of which material advantage is a key
determinant. Unfortunately, these evaluations could not be used directly because they
were not normalised (i.e. the evaluations for each move were dependent upon he static
evaluation of the starting positions, which were not matched for material advantage)
and could not easily be normalised (because the range of evaluations across all moves
6
There is a discrepancy between these source in terms of the lowest score awarded (1 or 0). This study
adopts the view of the most recent source. The lowest score does not, in fact, matter, either to Gobets
analysis or the analysis conducted in this study, since no player selected a move that has an ambiguous
score.
69
was partially determined by the potential for material loss, and this was not matched
across the three positions because position B, for example, contained no queens).
An comparison of the Fritz analysis for position A and Gobets subjective analysis for
the same position, however, revealed that there was a general mapping of the former
onto the latter. This mapping, outlined in Table 8, below, was applied to the moves
for positions B and C to obtain a full set of Gobet numbers for each position.
Moves according to Fritz evaluation

White to move
Black to move
Moves with maximal score(s) e
Moves with minimal score(s) e
amongst all legal moves
amongst all legal moves
Moves with scores e where
e-0.1 e < e
e e < e+0.1
e-0.3 e < e-0.1
e+0.1 e < e+0.3
e-1.0 e < e-0.1
e+0.3 e < e+1.0
e < e-1.0
e+1.0 < e
Table 8; Mapping of Fritz evaluations onto Gobet Numbers
Gobet
number
5
4
3
2
1
This mapping essentially partitions the legal move set into 5 groups, the first of which
contains only the move(s) with the best score remembering that white aims to
maximise scores and black aims to minimise them and the last of which contains
blunders (moves that result in the equivalent loss of at least a pawn in material,
worth 1.0).
70
Appendix II: Protocol Analysis

Protocol Analysis begins with the transcription of the verbal protocol from digital
audio media to text. The next stage is the identification of the protocol structure: First
Phase, Episodes, Transitional Phases and Final Phase (de Groot, 1965; Newell &
Simon, 1972)
First Phase: This is characterised by general orientation, the consideration of enemy

threats and own plans, and the generation of base moves without any search or
evaluation. There is a lack of thinking focused at the level of move sequences. The
boundary between the First Phase and the first Episode is not always clear since a
player may shift gradually from generating base moves to analysing them by search
and evaluation.
Episode: An Episode is a distinct sequence of move considerations, with branching
allowed, stemming from the single consideration of a base move. If a subsequent
sequence begins from a base move then it is a new episode, whether that base move is
different from the last or not. Sequences may be of length one and each leaf node
need not be evaluated explicitly. Thus every time a base move is mentioned at the
beginning of a sequence after the First Phase has completed then it signifies the
beginning of a new Episode although discretion allows for two exceptions: 1. it is
obvious that base moves are being listed; 2. it is obvious that it is a rapid repetition of
the same sequence rather than a new search-and-evaluation of that sequence.
Transitional Phase: this is identical in character to the First Phase except that it
occurs between Episodes. Transitions typically occur when a player stands back
from his thought process to look again at general plans.
71
Final Phase: this is typified by summary statements, comparisons of base moves with
apparently no further search-and-evaluation and move selection. It is also the last
phase.
Once each of the phases and episodes have been identified and checked, the textual
protocol is tabulated, as in Table 9, below. The time at completion of each phase/
episode is noted. The right-hand column is used to make notes on move selections
and errors on the part of the player. Thankfully it is usually relatively simple to
identify errors (e.g. in square identification) because they generate anomalies within
sequences.
Player 8, Position B
First Phase. Right. Theres 3 e4, e5,
knight f3, d5. Right, oh gosh, its black
to move. Initially, hes got, er, the
pawns: 7 plays 4. Hes got 2 bishops for
a rook. 3 pawns and its black to
move. Hmm... The key to this, I would
imagine, is to keep keep the threats up.
Fortunately hes got the 2 bishops so he
can place quite a few threats. Trouble is
if youve got 1 bishop you, um, he puts
them all on a square that you cant, er, the
diagonals you cant attack but he cant do
this because youve got two bishops.
Umm is there any initial threats? Lets
have a look. Er, I cant see I cant
see Theres no sillies that you can
just capture something. OK, so what is
white threatening to do? What dont we
want him to do? We dont really want
him to mobilise his rooks. I think weve
got to keep him weve got to keep him
pinned down here. Obviously if we win
any more pawns its going to be
advantageous for us so he cant let us win
any more pawns. Umm I think we
dont really want him to let him excuse
me we dont want to let him get his
rooks on the 7th or doubled up on the 7th,
72
which would be not very well for us.

Having said that hes got to be careful.
Um, right, so what would we do? What
would be a good plan? Er, right, weve
both bishops are pointing at his king.
Er theres no open files, we cant...
Hmm I suppose we could play Thing
is: whats he going to do if?
[2:46]
Episode 1. If we attack a pawn? The
obvious attack is his queens knight
pawn, here. Attack it with the rook
attack it with the rook by playing rook
rook there = Rb8
there: whats he going to do? If he
pushes the pawn on = b3
pushes the pawn on hes going to end up
with a position thats like Swiss cheese
and the black bishop would run rampant.
He must be careful not to stagger his
pawns, that it looks like colander. If he
does that and the bishops get in the
middle here hell have a hell of a problem
getting rid of them, plus the bishops will
stop him from doubling up and all the
other things.
[3:42]
Episode 2. Now the reason I say shall we
play a sort of a waiting move lets
threaten a pawn then find a better square
for this rook to go. As the kings
stranded over the on h1 or king rook
1 can we do anything about attacking it
by using that as a springboard attack a
pawn then possibly come up the board?
Come up the board to, perhaps, knight 4 come up the board = Rb5
or b5?
[4:25]
Transition. Are you recording in
algebraic or descriptive? Because I think
in descriptive. So, er, what else? I mean
there is the alternatives, er just a small
little waiting move, say. Lets just
importantly, what can he do? What can
he do? If we start attacking hes got to
defend it.
[5:01]
Episode 3. If I would, personally, if I
was black if I was white and black
played rook to queens knight 1, Id play
queens rook to knight 1 to defend it
because I dont like the look of playing
73
pawn to knight 3 its too messy black

can start to get his black squared bishop
in the holes. I personally wouldnt move
that. Er, I would defend, plus also that
would give you the option of playing
pawn to queen knight 4 at a later date if
you were white.
[5:46]
Episode 4. Er, so Right, black, rook to
knight 1, OK, rook to knight 1.
[5:58]
Episode 5. There is the alternative, of
course, of the immediate counter-attack.
Rook to queen knight 1, rook to king 7,
attacking the pawn so if black takes the
queen knight pawn, white would then
take the a7 pawn, which would then leave
him that queen rook pawn is then
passed, so the rook behind it. Having
said that, he...
[6:32]
Episode 6. So, alright, rook to queen
knight 1, rook to king 7, rook takes pawn,
rook takes pawn, rook takes knight pawn,
hes going to lose too many of these
pawns. I dont think its I dont think
its going to work out for him if he just
counter-attacks. I think he might be able
to do that a move or so later.
[6:55]
Episode 7. Rook to queen knight 1, rook
to queen knight 1 for white, um now I,
personally, would stop that rook from
getting to king 7. Play king to bishop 1
perhaps? King to bishop 1, which would
stop the rook in its tracks. Slower build
up. Mmm yes, plus you could play
black could play
[7:39]
Episode 8. Look, because, these sort of
positions, you need an overall strategy as
opposed to plying move to move, er, and
you think, right, my overall strategy
is Or would it be? I dont like
whites position insomuch as wherever he
advances a pawn, blacks going to get in.
If he tries to protect this with b3, black
can infiltrate er, f3 black can infiltrate
on the black squares.
[8:21]
rook takes knight pawn = Rxc2, I

believe, since black has already taken the
knight pawn in the third move in this
sequence.
protect this = the king
74
Episode 9. He cant the only square he

can come to is this king 7, this e7, so king
to there. Hmm.tactically would it be
better just to play a nice quiet move to
start with by playing king to bishop 1?
He cant advance anything. Pawn,
hmm hmm Right, I shall if we just
play it: king to bishop 1, its his move.
What could he do? I suppose he could
play rook to king 2 but yeah, so if you
play king to bishop 1, you play rook to
king 2, he then could have pawn to queen
bishop 3 which the rook would then
protect the pawn and he could then try
and advance down the centre.
[9:47]
Episode 10. Umm yes, so I think we;re
back to the original idea of attacking that
pawn immediately. Rook to queen knight
1, rook to queen knight 1 Rook to
queen knight 1, er Rook to queen
knight 1, rook to queen knight 1, bishop
bishop to there = Bd7
to there, to attack that pawn.
[10:30]
Episode 11. Hmm I suppose we could
attack that pawn immediately but
Attack, defend, so if we played it
bishop to queen 2 = Bd7
immediately: bishop to queen 2 to attack
the pawn at knight his knight pawn
er, his king knight pawn, he could defend
it with pawn to bishop 3. You could
then, whether you want to immediately,
or at a later date, you can play pawn to
king bishop 4. Hmm I dont like it
its getting a bit messy because he could
then play rook to the 7th to attack the
rook to the 7th = Re7
pawn.
[11:32]
Episode 12. No, I think rook to queen
knight 1 I would play rook to queen
knight 1. It doesnt give him many
options, whereas all the other things does
give him options. I prefer him not to
have many.
[11:54]
Table 9; Example verbal protocol (Player 8, Position B)
The corresponding Problem Behaviour Graph is constructed row-by-row from each
episode in the protocol. Although moves are operators and positions are states, it
75
makes more sense to label the moves and not the board positions, because (a) board
positions are uniquely defined by the starting position and the sequence of moves; and
(b) board positions are difficult to represent economically in a PBG.
Each column of the PBG represents one ply (half-move). Hence all odd-numbered
columns represents the players move and the even-numbered columns his opponent.
Figure 9, below, illustrates a full PBG, including the following common features:
Null moves (depicted by , representing an unspecified move). Note that base

moves can be null moves (e.g. Episode 8);
Evaluations (depicted by a combination of symbols at the end of each leaf

node: + = good for player on move, ? = unclear/ unspecified, = bad for
player on move);
Branching (in Episode 3);
Immediate reinvestigations of the same base move (e.g. Episodes 2-7);
Non-immediate reinvestigation of the same base move (e.g. Episode 10)
The selected move is shown in a separate grey box at the bottom of the PBG.
Episode
E1
E2
E3
1
Rb8
Rb8
Rb8
E4
E5
E6
E7
E8
E9
E10
E11
E12
Rb8
Rb8
Rb8
Rb8
Kf8
Rb8
Bd7
Rb8
2
b3
Rab1
b3
Rab1
Re7
Re7
Rab1
f3
Re2
Rab1
g3
+
3
+
Rb5
?
+
?
Rxb2
Rxb2
Kf8
+
Bd7
g5
Rxa7
Rxa7
?/+
?
Rxc2
c3
?
Re7
?/-
4
?/+
Rb8
Figure 9; Problem Behaviour Graph for Player 8, Position B
76
Bibliography
British Chess Federation. (2003) Conversion Between BCF Grade and FIDE
Rating http://www.bcf.org.uk/grading/how_it_works/conversion.htm (last
accessed 20th March 2005)
Chessbase GmbH (1997), Fritz5 users manual, Hamburg, Chessbase GmbH.
Calderwood, R., Klein, G. A. and Crandall, B. (1988), Time pressure, skill and
move quality in chess, American Journal of Psychology, Vol. 101, No. 4.
de Groot, A., Thought and Choice in Chess, USA, Basic Books Inc.
Fransella, F., Bell, R. and Bannister, D. (2004), A Manual for Repertory Grid
Technique, Chichester, John Wiley & Sons Ltd.
Gobet, F (1998a), chess players thinking revisited. Swiss Journal of Psychology,
57, pp18-32.
Gobet, F (1998b), Expert memory: a comparison of four theories, Cognition, Vol
66, pp115-52.
Gobet, F. (2005), personal e-mail communication.
Gobet, F., de Voogt, A. and Retschitzki, J. (2004), Moves in Mind: the
psychology of board games, Hove, Psychology Press
Gobet, F. and Simon, H. A. (1996), The Roles of Recognition Processes and
Look-Ahead Search in Time-Constrained Expert Problem Solving: Evidence
from Grand-Master-Level Chess. Psychological Science Vol. 7, No. 1.
Gobet, F and Simon, H. A. (1998a), Expert Chess Memory: Revisiting the
Chunking Hypothesis. Memory, Vol. 6, pp225-55
77
Gobet, F and Simon, H. A. (1998b), Pattern recognition makes search possible:

Comments on Holding (1992), Psychological Research, Vol. 61, pp204-8.
Holding, D. H. (1985), The Psychology of Chess Skill, USA, Lawrence Erlbaum
Associates.
Holding D. H. and Pfau, H. D., Thinking ahead in chess, American Journal of
Psychology, Vol. 98, No. 2.
Howell, D. C. (2002), Statistical Methods for Psychology, USA, Duxbury
Thomson Learning.
Newell, A and Simon, H. A. (1972), Human Problem Solving, USA, PrenticeHall.
78

How Chess Players Think-Patrick Turner

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

How Chess Players Think-Patrick Turner

Hochgeladen von

Copyright:

Verfügbare Formate

How chess players think:

evidence for the role of search

Dissertation submitted for:

Appendix I: de Groot positions

Appendix II: Protocol analysis

The structure of chess thinking

phase of elaboration, within which specific sequences of moves are considered

number of base moves considered serves to characterise option generation

Do club-level chess players of differing calibres differ in terms of quality

Do club-level chess players of differing calibres differ in terms of

To what degree do the levels of search activity in club-level players fit

It comprises a repeated measures choice of next move task across three

It samples from club-level players only (Experts down to Class B) and

Motivation for this dissertation

Structure of this dissertation

The Literature Review introduces the main arguments for both

The Methodology chapter outlines the experimental design, experimental

development in one or both of the competing recognition-based and search-based

The role of recognition: de Groot

detail through the consideration of possible sequences of moves that they

thinking. De Groot noticed that players employed a method that he denoted

Information processing and Problem Behaviour Graphs

operator to a previous node. This forms a chronologically order set of sequences

Chase and Simons Chunking theory

Criticisms of chunking theory

criticisms. Gobet and Simon (1998b) summarised the methodological criticisms

Chunks may be encoded into LTM in less than 8 seconds;

The size of chunks is too small to reflect conceptual knowledge;

Although chess skill can explain memory performance, there is no

SEEK Theory: the contribution of Holding

Gobet and Simons template theory

will not necessarily eradicate that memory. Likewise, Holdings criticisms on

The integration of pattern recognition and search

twice in succession) and non-immediate reinvestigations (same base move

Level 1 (Expert; n=4) Level 2 (Class; n=4)

Experimental Design and procedure

Derivation of quantitative variables

Time of First Phase

Maximal Depth of Search

Proportion of Null Moves

Subjective assessment of the quality of the chosen move

Table 2; quantitative variables derived from Problem Behaviour Graphs

Results from this study

Estimated Marginal Means

Figure 1; estimated marginal means for Quality of Move

Base Moves and Episodes

Estimated Marginal Means

Figure 2; Marginal Means for Number of Nodes

Std. Dev = 26.51

Figure 3; Frequency distribution of Number of Nodes

(F(2,12)=0.590, MSE=0.001978, ns) and no interaction effect. Better players

Estimated Marginal Means

Figure 4; Estimate marginal means of Number of Nodes per minute

Predicting search variables

a. selecting the maximal search depth of all episodes undertaken to derive

Estimated Marginal Means

Figure 5; Estimated Marginal Means for Maximal Number of IR

Proportion of Null Moves

Estimated Marginal Means

Figure 6; Estimated Marginal Means of Proportion of Null Moves

Main effect of Main effect of Interaction

Time of First Phase

Number of Base Moves

Rate of Base Moves