Sie sind auf Seite 1von 376

Cognition and Technology

Cognition and Technology


Co-existence, convergence and co-evolution

Edited by

Barbara Gorayska
University of Cambridge

Jacob L. Mey
University of Southern Denmark

John Benjamins Publishing Company


Amsterdam / Philadelphia

TM

The paper used in this publication meets the minimum requirements of American National Standard for Information Sciences Permanence of Paper for Printed Library Materials, ansi z39.48-1984.

Library of Congress Cataloging-in-Publication Data Cognition and Technology : Co-existence, convergence and co-evolution / edited by Barbara Gorayska and Jacob L. Mey. p. cm. Includes bibliographical references and indexes. 1. Human-machine systems. 2. Human-computer interaction. 3. Cognition. I. Gorayska, Barbara. II. Mey, Jacob. TA167.C62 2004 004.019--dc22 isbn 90 272 3224 5 (Eur.) / 1 58811 544 5 (US) (Hb; alk. paper)

2004050172

2004 John Benjamins B.V. No part of this book may be reproduced in any form, by print, photoprint, microlm, or any other means, without written permission from the publisher. John Benjamins Publishing Co. P.O. Box 36224 1020 me Amsterdam The Netherlands John Benjamins North America P.O. Box 27519 Philadelphia pa 19118-0519 usa

Table of contents

Introduction: Pragmatics of Technology Barbara Gorayska and Jacob L. Mey Part I Theoretical issues Towards a science of the bio-technological mind Andy Clark Language as a cognitive technology Marcelo Dascal Relevance, goal management and cognitive technology Roger Lindsay and Barbara Gorayska Robots as cognitive tools Rolf Pfeifer The origins of narrative: In search of the transactional format of narratives in humans and other social animals Kerstin Dautenhahn The semantic web: Knowledge representation and aordance Sanjay Chandrasekharan Part II Applications Cognition and body image Hanan Abdulwahab El Ashegh and Roger Lindsay Looking under the rug: Context and context-aware artifacts Christopher Lueg

23 25

37

63

109

127

153

173 175

225

vi

Table of contents

Body Moves and tacit knowing Satinder P. Gill Gaze aversion and the primacy of emotional dysfunction in autism Sarah Bowman, Lisa Hinkley, Jim Barnes and Roger Lindsay Communicating sequential activities: An investigation into the modelling of collaborative action for system design Marina Jirotka and Paul Lu Part III Coda The end of the Dreyfus aair: (Post)Heideggerian meditations on man, machine and meaning Syed Mustafa Ali Martin Luther King and the ghost in the machine Will Fitzgerald Name index Subject index

241

267

303

331

333

345

355 365

Introduction
Pragmatics of Technology
Barbara Gorayska and Jacob L. Mey
University of Cambridge / University of Southern Denmark

In presenting earlier Cognitive Technology (CT) volumes to the public (Gorayska and Mey, 1996; Marsh, Gorayska and Mey, 1999), we as the editors took up a mode of collaboration between ourselves and our contributors that we had found useful in collections of articles: papers are submitted, accepted, and revised but always with this one purpose in mind: viz., to bring about a unied perspective incorporating the intentions of the contributors and the editors, while respecting the individual contributions diversity and independence. In the present case, we venture to present the ideas of the contributors (and some of our own ideas) in a new garb, designed to bring out the convergence, co-existence and co-evolution that we think are characteristic for the eld of Cognitive Technology. Studies in Cognitive Technology examine human condition in the technological world. They are, as Clark (2002, p. 21 and this volume) puts it, in a very real sense, the stud[ies] of ourselves. Such studies are an absolute necessity for the world and its inhabitants, in order for us to survive the onslaught and temptations of revolutionary new technologies and their sometimes aggressive practitioners. To properly paint this picture of a threatening, but also promising future, where technology and the cognitive makeup of its users also converge, co-evolve and co-exist, we must trace our steps back into the murky past when the discipline was originally conceived. It is no secret that the eld of CT, despite all convergence, has evolved in ways that are not always easy to combine. In eect, even the term Cognitive Technology has been the subject of acrid disputes and ground-claiming in the past. But in this eld, as in all other areas of scientic research, the important thing to do is to let things develop, co-evolve, on their own, so as to obtain the maximum convergence in the midst of coexisting, sometimes clashing views.

Barbara Gorayska and Jacob L. Mey

The contributors to this volume come from widely diverging orientations, scientic as well as geographical. Despite this, the common concern of all is: How to make the most of technology without losing our soul, as one of the editors once put it (Gorayska 1994, p. 441). If this can be obtained by oering the present collection to the interested general and specialist public, we will have reached our aim, and provided a forum for even more co-evolution and convergence, while safeguarding the peaceful coexistence of the various strands making up the pattern of cognitive-technological studies in our days.

A ashback
It all started one sunny day in 1993, in old Hong Kong, where the two editors (BG and JM) nally met, after having corresponded about various common matters of interests for a number of years. Over lunch (in which Inger Mey, Jonathon Marsh, N. V. Balasubramanian, Ho Mun Chan, Orville Lee Clubb, Laurence Goldstein, Kevin Cox and Brian Anderson also participated), the idea of CT as a new discipline, combining ndings from computer science, philosophy, psychology and pragmatics was discussed. In Hong Kong, interest in CT as an emerging academic discipline arose when a small group of academics at the City University of Hong Kong began exploring the ways in which developments in information technology had implications for human cognition. They were originally inspired by two factors. The rst was the Fabricated World Hypothesis proposed by Gorayska and Lindsay (in support of their investigations of how lay people understand and cognitively utilize the concept of relevance (Gorayska and Lindsay, 1993, rst presented as Gorayska and Lindsay, 1989a)):
It is usual in psychology to treat the external world as a brute given, and to see perception, memory and action as processes driven by its immutable characteristics. In fact this is not so. In learning his route, the postman does not struggle to nd an algorithm to t an arbitrary collocation of house numbers, the number system is designed to make a convenient algorithm possible. We wish to propose the Fabricated World Hypothesis. There is a deliberate allusion here to the Carpentered World Hypothesis of Segall, Campbell and Herskovits (1966). Segall et al. argued that some aspects of our conscious experience, particularly the geometrical illusions of Muller-Lyer, are a consequence of expectations induced by the regularities present in human constructions. Windows are almost always rectangles, never trapezoidal and so on. We

Pragmatics and Technology

believe that this feedback eect of human artifacts upon the human mind is extremely widespread: and furthermore that it is the supremely important principle by which urban man controls his behaviour and releases processing capacity. It is almost banal to say that the human environment is organised around goal satisfaction. Areas of town may be most suitable for shopping, business or leisure, buildings are almost always organised around particular goals: bars, hairdressers, libraries, and so on. Within buildings, rooms are organised around goal-oriented activities: into kitchens, bathrooms, bedrooms, etc. Even within a room there may be a dining table, and similar goal-oriented functional specialization. What is not so banal is that the principles underlying the fabrication of the external environment, make an immense dierence to the processes involved in perception, memory and goal-management. To put this in a challenging way: much of human memory is in the environment. The fabricated environment is such that simple algorithms suce to generate eective behavior within it. In particular for present purposes, in the fabricated environment situations are such as to satisfy the execution conditions of very few plans at one time. (Gorayska and Lindsay, 1989b, pp. 1617)

Another source of inuence came from the work of Gorayska and Cox (1992) on user-friendly interfaces to expert systems. In the spirit of the Fabricated World Hypothesis, the authors saw expert systems as extensions of the human mind. Consequently, they argued that the locus of control with respect to the language of expression at the interface ought to be placed in the users mind and not (as it was the case then, and is still now, predominantly) in the machine. Those days also witnessed the advent of multi-media, anticipated to have a tremendous inuence on the delivery of education and training, mass communications and advertising, and manufacturing product design. Multi-media developments beneted from studies in the overlap area between cognitive processing and the organization and design of Information Technology (IT) equipment for the working environment. They aimed at improving overall system performance, by optimizing the eectiveness and eciency of the human agents. The view taken by the investigators (of what came to be known as human factors) was that this eciency was heightened by minimizing stress and fatigue and maximizing comfort, safety, and job satisfaction. Within that broad paradigm, several dierent areas of investigation emerged. Cognitive Ergonomics (Card, Moran, and Newell, 1983 and 1980), for example, focused mainly on things such as the direction of eye movements, spatial proximity of input sources, and degrees of complexity in information displays so as to maximize the eciency with which both the equipment and the information

Barbara Gorayska and Jacob L. Mey

provided were used. Cognitive Engineering (Norman, 1986; Norman and Draper, 1986, Rasmussen, 1988) dealt with the notion of an internal model (the user had of a system and the system had of the user) in order to achieve optimal user-control and system-exibility. Engineering Psychology (Wickens, 1992) provided technology designers with psychological proles of users that helped eliminate design aws and ensure that the capabilities and limitations of the human were accounted for in optimal design. None of these approaches, however, involved examination of the semantics, syntax, and pragmatics of information itself, and how its form of delivery might impact on the cognitive make-up of users. Such studies would have to involve factors inuencing the nature and design of the interface between human cognition and IT processes, as well as products externalizing human thought processes. The Hong Kong CT interest group felt that the study of the relationship between all such IT developments and the human mental processes of forming cognitive schemata was not only timely but necessary. The term Cognitive Technology was coined to express the necessity of exploring the developmental co-dependence between the human mind and tools it interacted with:
[CT] explores the nature of the information made available due to such technological advances, how as a result of this information the human/Information Technology interaction inuences cognitive developments in humans and how outcomes of such interactions and inuences provide feedback eects on future advances in IT. (Balasubramanian, Gorayska and Marsh, 1993, p. 4)

An elaborated version of this manifesto was rst presented at the conference on New Visions of the post-industrial society: the paradox of technological and human paradigms, in Brighton, UK (Gorayska and Mey, 1994) and appeared in print as (Gorayska and Mey, 1996c). Inger and Jacob Mey brought to the round table a complementary but distinct tradition. In Europe (especially Scandinavia), the United States, and Japan, the consideration of the processes at the interface of human cognition and technology focused on the ways that the toolness of computer artifacts interacts (or interferes) with the intentions and needs of the user, and how the user adapts to the toolness in various ways, while keeping his or her autonomy vis--vis the tool. Inspired and supported by the Scandinavian labor movement, Pelle Ehn and a group of workers at Aarhus University, Denmark, looked into the question how well the user interface relates to the practice and language the user is familiar with (Ehn 1988, p. 434). This perspective was not far from that

Pragmatics and Technology

advocated by the group working with Donald Norman in California, who as early as 1986 had published a collection of articles under the telling acronym of UCSD (to be read as either User-Centered System Design, or more whimsically, University of California at San Diego the place where their research was being conducted). Ehn acknowledges his indebtedness to Normans and his co-workers ideas, when he directly refers to Hutchins et al.s article in the 1986 Norman book on the matter of the level of tools in the user interface. While the primitive command of a Turing machine gives the user the tools to perform a task that can be done with a computer artifact, there are not many users that would be helped by this in their ordinary work practice. (1988, p. 434). Ehn then refers to Alan Perlis famous quip (also quoted by Hutchins et al., 1986, p. 101) about the Turing tar-pit in which everything is possible but nothing of interest is easy, and its converse, the over-specialized system where operations are easy, but little of interest is possible (Hutchins et al., 1986, p. 103). It is in the interface between the tools aordances and the humans intentions that the proper dialectic interaction can take place. Ideally, the toolness of the tool should gradually diminish, as the tool itself recedes into the background, as the useful factotum whose presence we rely on without actually ever seeing it. This is also the notion of the invisible tool, as propounded in an early article by Mey (1988) and also taken up by Norman in his 1999 book The Invisible Computer. In all these matters, what was really at stake was the question: given that adaptation is necessary to deal with in an increasingly complicated technological environment, in which direction should the adaptation take place? Should the humans adapt to the computer, or should the computer be adapted to human needs? On the face of it, an ecological, human-oriented solution would be to let the human be the adaptee, the one that is being adapted to, while the computer is the one that adapts. But the situation becomes more complex when we consider the fact that, in making the computer adapt to our needs, we have already taken a big bite out of the adaptive process: we are natural-born cyborgs (as Clark has expressed it; 2001a), for whom using, and identifying with, tools (including now the computer) is as natural as is breathing. In a way, we are already pre-approved for computer use: we are as much destined to be computer-consumers as we are potential acceptors and executors of credit card oers and practices. The real question was, therefore, not if we should adapt, but how we should do it, and on whose premises. This is where CT entered the picture. The adaptivity problem (Mey, 1998) that the Europeans, Japanese, and some

Barbara Gorayska and Jacob L. Mey

Americans had focused on, tied in with the developments at the Hong Kong end of the CT world. Hence it was not enough to re-invent and reverse the old slogan from the 1933 Chicago World Fair (quoted by Norman, 1996, p. 253): Science Finds, Industry Applies, Man Conforms, to make technology the one wh[ich] should do the conforming (ibid.); a balanced view of adaptation should take into account the fact that man is in principle a conformist, that is, a exible tool-maker and user; and that to get the most out of our technology, we have to include the cognitive human self. Which is precisely what CT came to be all about. As Andy Clark puts it, the tools (i.e. the technology] and culture [i.e., the cognitive environment] are indeed as much determiners of our nature as products of it (this volume, p.31). The essence of the human dealing with the computer technology is that we are able to bridge both gulfs (in Hutchins, Hollan and Normans terminology; 1986, p. 94): that of evaluation and that of execution, in order to be able to deal with both worlds: that of cognition and that of technology.

Establishing the ground


Following the fruitful discussions at the momentous round table lunch in 1993, the two worlds nally found a common expression in the collaborative ground work for the First International Cognitive Technology Conference, CT95, which took place in Hong Kong in 1995. The call for papers distributed in a variety of available media read:
Cognitive Technology (CT) is the study of the interaction between people and the objects they manipulate. It is concerned with how technologically constructed tools/aids (A) bear on dynamic changes in human perception, (B) aect natural human communication, and (C) act to control human adaptation. Cognitive systems must be understood not only in terms of their goals and computational constraints but also in terms of the external physical and social environments that shape cognition. This can yield (A) technological solutions to real world problems, and (B) tools designed to be sensitive to the cognitive capabilities and aective characteristics of their users. CT takes a broader view of human capability than the current Human Computer Interface research and talks of putting more of the human into the interface without attempting to simulate humanness on machines. It is primarily concerned with how cognitive awareness in people may be amplied by technological means and considers the implications of such amplication on what it means to be human. It should appeal to researchers across disciplines

Pragmatics and Technology

who are interested in the socio-cultural and individual implications of developments in the interface between technology and human cognition. Any technology which provides a tool has implications for CT; computer technology has special importance because of its particular capacity to provide multi-sensory stimuli and emulate human cognitive processes. (CT95 call for papers, rst published on the Internet in December 1993)

At the time of CT95, several new important developments happening at the interface of cognition and technology were in the ong. One was the trend to consider the process of externalizing-internalizing-externalizing of the mind as a constant loop:
When we externalize our minds, we create an object. This object, in its turn, is not just an object in space: it is something that we consider, relate to, love or hate, in short, work with in our minds, hence internalize. In very simple cases, the object is just an object for vision (as Descartes seemed to think); more sophisticated considerations include the mirroring that takes place when the child discovers its own image as separate from itself [], or when we evaluate a mind product as to its adequacy, and compare it to the original representation that we had in mind. Consequently, removing this check can have some strange and unexpected eects, as in the cases where an artist loses the use of one of his senses: the near-blind Monet, the deaf Beethoven, who continued to externalize their minds, but with unmistakably dierent (but not necessarily artistically inferior) outcomes. The re-internalized object is dierent from the one that started the externalizing process: it retains a tie to its origin, but has also become strangely independent. It now has a life of its own, and at a certain point in time, it is its turn to become externalized. This process continues until we think the result is adequate, and in the meantime, every new version interacts dialectically with the previous one. It supersedes it, but cannot replace it quite. (Gorayska and Mey, 1996a, p. 6)

The externalization-internalization-externalization loop gave rise to the second emergent factor the dichotomy between Cognitive Technology (CT) and Technological Cognition (TC):
The term Cognitive Technology may be too narrow for our intentions. It serves well to describe those issues which deal with determining approaches to tool design meant to ensure integration and harmony between machine functions and human cognitive processes. Unfortunately, it does not adequately describe a number of other issues, in particular, those concerns which relate to the identication and mapping of the relationship between technological products and the processes by which human cognitive structures adapt. We see these

Barbara Gorayska and Jacob L. Mey

two types of issues as constituting related but distinct areas of investigation which are best kept separate but must be given closely aligned treatment. We therefore reserve the term Cognitive Technology to refer to methodological matters of tool design, and propose the term Technological Cognition to refer to the theoretical explorations of the ways in which tool use aects the information and adaptation of the internal cognitive environments of humans. Human cognitive environments are constituted by the set of cognitive processes which are generated by the relationships between various mental characteristics. These environments serve to inform and constrain conscious thought. Under the new schema, theoretical developments in Technological Cognition would nd concrete expression in the constructed artifacts produced by Cognitive Technology. It is this dichotomy which forms the basis for our argument and the grounds from which to develop a framework for analysis. (Gorayska and Marsh, 1996, p. 28)

Third, and most importantly, the separation of methodologies for tool-design (externalization processes) and the processes by which existent tools in turn technologize the mind (internalization processes) allowed us to put the explorations of both on a rm pragmatic footing:
Cognitive technology thus turns necessarily (although not automatically) into technological cognition (Gorayska & Marsh, 1996). I said: not automatically, because the necessity is one that we need to realize and make our own. The conditions for using technology are not in the technology alone, but in the minds of the users. To vary Kants famous dictum, cognition without technology is empty, but technology without cognition is dangerous and blind. Our minds need the computer as a tool, but we need to consciously integrate the computer into our minds before we start using it on a grand scale. In this way (and in this way only), rather than being a mindless technological contraption, the computer may become a true tool of the mind. Technology creates gadgets that can be put to use without regard for their essential functions. What, in the end, such gadgets do to ourselves and to our environment, however, is not necessarily anybodys concern in their actual conditions of use, in which the mind-captivating (not to say mind-boggling) fascination of advanced technology allows us to focus on the intermediate stage between intent and eect, the purely technological one. To take a simple example, pressing a button is, in itself, a neutral activity; yet it can in the end cause a door bell to ring just as well as it may detonate a nuclear explosion. I just pressed a button, the pilot who launched Fat Boy on Hiroshima could have said. And if Timothy McVeigh (one of the persons indicted, and subsequently convicted, in the April 19, 1995 bombing of the Alfred P. Murrah Federal Building in Oklahoma City that took 165 lives)

Pragmatics and Technology

would have had to kill all those 165 people by hand, he never would have gotten beyond the rst two or three, especially if he had started out with the babies. Now he could just connect some wires, and leave it at that. Technology does the work; our minds are at rest. Garrison Keillor, in his News from Lake Wobegone, aired on radio station WBEZ, Chicago, on 12 August 1995, oered a philosophical reection on Halloween pranks. His point was that you are responsible even for the unknown, and unintended, eects of the fun you have (the example was of some boys at Halloween disconnecting a box car and sending it out on the tracks). What determines your responsibility is the outcome; he called this a strictly outcome-based morality. Applying this to our subject from a slightly dierent point of view, one could talk of a pragmatic view of technology and cognition. The pragmatics of cognitive technology (CT) deals with technologys eects on the users and their environment; a pragmatic view of technological cognition (TC) implies that we inquire into the conditions that determine that cognition, that is the conditions under which users can cognize their technology, in order to realize the eects of their technological eorts. We need pragmatics in our CT so that it can be environmentally correct; we need pragmatics for our TC to be morally sound. (Mey, 1995; italics in original)

Recent developments in CT and in the philosophy of mind echo this statement. To take the latter rst, Andy Clark in his recent book, Mindware (2001b), has proposed the concept of wideware to cover environmental extensions of the human mind. Under this concept, he subsumes both the Fabricated World Hypothesis (the principles underlying the fabrication of the external environment make an immense dierence to the processes involved in perception, memory and goal-management) and our earlier observation that technology externalizes processes (algorithms) which originate in the mind:
what is distinctive about human thought and reason may depend on a much broader focus than that to which cognitive science has become most accustomed, one that includes not just body, brain, and the natural world, but the props and aids (pens, papers, PCs, institutions) in which our biological brains learn, mature and operate. A short anecdote helps set the stage. Consider the expert bartender. Faced with multiple drink orders in a noisy and crowded environment, the expert mixes and dispenses drinks with an amazing skill and accuracy. But what is the basis of this expert performance? Does it all stem from nely tuned memory and motor skills? By no means. In controlled psychological experiments (Beach 1988, cited in Kirlik, 1988, p. 707), it becomes clear that expert skill involves a delicate interplay between internal and environmental factors. The experts

10

Barbara Gorayska and Jacob L. Mey

select and array distinctly shaped glasses at the time of ordering. They then use these persistent cues so as to help recall and sequence the specic orders. Expert performance thus plummets in tests involving uniform glassware, whereas novice performances are unaected by any such manipulations. The expert has learned to sculpt and exploit the working environment in ways that transform and simplify the task that confronts the biological brain. Portions of the external world thus often function as a kind of extraneural memory store. (Clark, 2001b, p. 141)

The need to consider moral issues in tool-design, accentuated by the pragmatic processes of tool use, has been reiterated by Chan and Gorayska (2001) in their Critique of Pure Technology. Echoing Wittgenstein (1953), they remind us that the meaning of a piece of technology is a consequence of how it is actually being used. Following Kant (1781), they point out that the way technologies will be used by people is limited by human inherent and enjoined perceptive capabilities. Therein lies a danger of tool misuse. What we need to work towards is an increased awareness 1) of the unthinking use of pure technology, i.e. technology decontextualized from the cognitive processes activated through its use, and 2) of the processes through which we become habituated to the cognitive scaold (Clark, 1997) that technology provides. How can this be achieved?
This can be achieved through an open dialogue an ongoing and collective deliberation of how our environment and our lives should be shaped by technical means and what kinds of risks we as the public can collectively bear. Such collective deliberations which ultimately lead to collective responsibility in the face of adversity require access of a wider public to details of the design and the already known evidence of potential short-term and long-term risks in adopting a given technology. [] The parallel [to Kant] in the case of pure technology is that we also need to acknowledge the limits of technology and admit the fact that human-related problems in a technology-oriented society have to be dealt with not only at the level of cognitive, rational, technology, but also at the cognitive, emotive, psychological and spiritual levels. Most importantly, they will have to be dealt with at the ethical, social and political levels as well. (Chan and Gorayska, 2001, p. 474)

By virtue of bringing the pertinent questions of CT to the attention of a wider audience, this book is an invitation to such public, collective deliberations.

Pragmatics and Technology

11

The CT agenda
It has been said that you dont really understand a scientic problem area until you have been able to formulate some pertinent questions belonging to its remit. In this perspective, it seems to us that the rst question above is subordinated to the second one, and that this latter question encapsulates the answers to the rst one. For, if we can argue successfully for the need for some urgent problems to be solved in the area of CT, then of course the need for a forum discussing those questions has been established, and the audience will manifest itself: If you build it, they will come. If we endorse CT as the study of the pragmatic cognitive processes ongoing at the dialectic boundary area between human tool users and tools used by humans, we will immediately be able to recognize a whole array of problems: What is the tool to a human, and what is the human to a tool? (see Gorayska, Marsh and Mey, 2001) What does it mean to say that the human comes rst in technology? Where are the limits for computerizing human activities if there are any? How does the tool, especially the computer tool, change the mind (as opposed to, or as a complement to, the mind changing the (computer) tool)? Can we have too much technology (the computer taking over)? What are the secondary (hidden) eects of this technology (cf. Salomon 1993; Gorayska and Mey, 1996a)?

More importantly, however, one issue that looms large in such considerations is the pertinent methodological question that has permeated CT (conceived of as a scientic discipline) to date, namely; What methods do we as investigators need to employ when we study technological cognition for the purpose of designing or evaluating cognitive tools? In other words, How do we study and apply the pragmatics of technology? And more concretely, What (kind of) question(s) do we want to address in CT, and how are we going to address them? If we want to nd out what we do when we do CT in design we must rst ask ourselves what it means to do CT in general. Since CT is essentially to do with the relationship between humans and their tools, the following general aspects of doing CT are worthy of attention: The question of what constitutes a cognitive tool. All artifacts are in some measure cognitive artifacts and can be situated on a continuum of purposeful use between the extremes of raw material and the mind (Gorayska, Mey and Marsh, 2001). Inasmuch as we employ them as mental prostheses, AI

12

Barbara Gorayska and Jacob L. Mey

robots (Pfeifer, 2002, reprinted in this volume), event modeling languages (Jirotka and Lu, this volume), or natural language (Dascal, 2002, reprinted in this volume) that help designers analyze and better understand phenomena in the real world, are in fact instances of cognitive tools. So are narratives (Dautenhahn, 2002, reprinted in this volume). Could the same be said about care-givers who help newborn babies acquire cognitive skills by simply interacting with them (Lindsay and Gorayska, 2002, reprinted in this volume)? Is it more fruitful, from the standpoint of CT, to dene cognitive tools in purely functional terms? Could there be natural cognitive technologies, i.e., spontaneously evolved, functionally determined, often modular (depending on relevance discontinuities; Lindsay and Gorayska, op. cit.), mental processes that constitute the techne of the mind, as rst proposed by Meenan and Lindsay (2002) and further evidenced by El Ashegh and Lindsay and by Bowman et al. (both this volume)? The question of co-evolution of the mind and tools it creates. Can minds and tools indeed co-evolve and, if so, how can this evolutionary co-dependence be shown? For interesting discussions relevant to this issue see, for example, Mithen (1996), Lindsay (1999), Clark (2002, reprinted here) and Dautenhahn (2002, reprinted here). The question of adaptivity vs. adaptability, as raised by Mey (1998), and discussed above: can we distinguish between the two in practice? Some examples of how technology can coerce behavioral change in its users can be found in Gills chapter (this volume). The question of the transparent tool, the tool that you use without noticing that you are using it (following Mey 1988, Norman 1999): what cognitive eects, if any, is it likely to have on their users? Dautenhahn (2002, reprinted in this volume) provides an excellent example. Arguments presented in Ali (2002) and Lueg (2002), both reprinted in this volume, are also relevant to this question. The dangers of the can-must-can spiral, or the dialectics of tool overbidding: once you have created a tool that can do a thing, you are obliged to use it, and create a tool that can do even more, which then puts you under the obligation to use that tool, and so on ad innitum. (We have seen sad examples of this spiral in the arms race, or the race to ll our lives with useless gadgetry that requires us to buy even more useless gadgetry, and so on.) The unthinking use of technology (discussed by Chan and Gorayska, 2001): can we predict it and how can we avoid it? The meaning of a tool lies in its use. If misapplied with little thought, the tool use can have disastrous

Pragmatics and Technology

13

consequences. The extreme case of such misapplication of use with respect to the computer that brackets emotion from cognition has come to be known in CT as the Schizophrenia Problem (Janney, 1997, further discussed in Ali, 2002, reprinted here). The related question of human wants and needs: is what we want, really what we need, or are our technological needs the result of someone elses wants? In other words, we have to realize that cognition is not neutral, but always embodied in some human or technological context (Clark, 2000 & 2002, reprinted in this volume). What are the social, psychological and cognitive mechanisms in enjoining need? For some answers to this question, see Lindsay and Gorayska (2002, reprinted in this volume) and Lindsay (1996). The dialectics between a routine operation facilitating our daily needs, and the mind-numbing sterilization of our daily activities through routine operation (possibly resulting in the computer users screen rage, in analogy to the frustrated drivers road rage): what technologies do we need to overcome that sterilization? In this connection, a provocative piece by Will Fitzgerald (this volume) may serve as a poignant reminder. In humanizing technology, there are two directions our work can take: one is from the inside out, the other from the outside in. By this we mean that we either can take our point of departure in the human (head and body) and try to get our technological environment to t the perceived dimensions that we have of it; or we can work from the environment as a given, and try to t the human (body and head) into it: which direction maximizes human benets? We will dwell some more on this distinction in the following. (See also the arguments in Jirotka and Lu, this volume.) The question of motives and benets: we need to carefully consider what motivates technology inventors and developers, what species the broader scientic frameworks within which they operate, and furthermore, what motivates design perspectives. Some such perspectives, proposed for CT, include Empirical Modeling (Beynon, 1995), Cognitive Dimensions (Blackwell et al., 2001; Kutar et al., 2001), the Reective Questioning Perspective (Gorayska and Marsh, 1996 & 1999), Goal-Based Relevance Analysis (Lindsay and Gorayska, 2002, reprinted in this volume, and Lindsay (1996)), and, with relevance to the latter, the Aordance View (Chandrasekharan, this volume). The paramount question is whether such considerations of these and other proposed perspectives on practice can put us in a better position to understand, at a deeper level, a design process? Can this benet the formulation of a broader, synthesizing CT methodology?

14

Barbara Gorayska and Jacob L. Mey

The question of contributions from the related disciplines: To what extent can empirical research in those disciplines further the design of humane technologies? Arguments, experiments and case studies relevant to this question are presented in this volume in the chapters by Roger Lindsay and Barbara Gorayska (2002, reprinted here), Marina Jirotka and Paul Lu, Hanan A. El Ashegh and Roger Lindsay, Sarah Bowman et al., Satinder Gill, and Kerstin Dautenhahn (2002, reprinted here).

Changing nature, changing us


To grasp the distinction alluded to above (inside out vs. outside in), lets walk through an imagined example of a prosthesis (cf. Mey, 2000): an articial limb, such as a hand. If we want to dene the articial hand from the inside out, we go about dening it by its specic functions, such as we see it from the point of view of our body and our understanding of its functions in relation to our needs. If we dene the hand from the outside in, the emphasis is on the existing conditions, and the question to ask is: What kind of hand (or prosthesis, in general) would t these conditions, and how are we going about technologizing them? Among the questions that may turn up here are: How much does visual and tactile resemblance play a role in dening the hand? Is a hook acceptable as a prosthesis (cf. the movie Edward Scissorhands)? Does the prosthesis have to be attached to a body, or can it operate independently, by remote control? Do we have to call the prosthesis a hand at all (given that we use the word hand metaphorically in a number of contexts where the resemblance is nil, e.g. when we talk about the hands on a dial or a clock, or a side or direction, as in the Japanese Yamanote the hilly section of Tokyo, lit. the mountains hand)?

Given our conception of CT, how would we deal with these problems? Here, typically, the question of design in tool manufacturing crops up a much maligned term, since it has been taken over by certain professional designers who operate more for their own good than for that of others. As to the conditions for designing a tool from the outside in, these are similarly well-known: one starts out with a specication of the problem, then goes on to gure out how the tool will t in. Consider the activity of fruit-or-

Pragmatics and Technology

15

berry-picking. We consider the human hand as the primary tool for this activity: from times immemorial, people have been picking berries with their ngers, apples and pears with their hands. But consider the boys who want to have some walnuts out of a tree. The nuts hang too high, and they cannot reach them by hand. What do they do? They throw branches and other objects into the tree, trying to make the nuts fall down. This may work for nuts, that dont get damaged by the fall. But if you were to do this to avocados, the result would be disastrous. Here, the environmental conditions dictate another solution, which one of us (JM) in fact saw practiced in Braslia (the capital of Brazil) one day, when walking to school: a man was maneuvering a tall pole with a cup-like attachment at the end. Placing the pole with the cup under the avocado, he then carefully manipulated a little knife that was attached to the end of the pole, with the help of strings running down the length of the pole. After several tries, the avocado landed successfully in the cup and was lowered safely down to the owner of the tree (or the user, whichever the case might have been we believe nobody in Braslia owns those trees outside the apartment blocks). In this case, the hand was completely externalized and detached from the body; it fullled its functions in part because of its non-likeness to a real hand (one wouldnt even call such a prosthesis a hand, we believe). From the primitive stick thrown into the tree to this sophisticated contraption, we notice an increasing instrumentalization (along the lines sketched in Gorayska et al., 2001): the primitive tool becomes a sophisticated instrument. But at the same time, its role as a prosthesis shifts character: from being in touch with the body, it becomes part of the environment rather than of the human. Externalizing our activities, thus, does not mean that we have to replicate them; they can be reborn in dierent shapes, sometimes even unrecognizably or unidentiably so. Figure for example, that some genetic engineering process would be able to produce an avocado with a hard bolster, a bit like a cocoa nut. Then the whole operation of avocado picking could be eected by a device like the one we use for cherry harvesting: a tractor with a big mechanical arm clamps the tree, spreads out a sheet underneath it, and proceeds to shake the cherries out of the tree and on to the sheet, which is then folded mechanically and emptied in the container attached to the tractor. The activity of fruit picking, which was originally thought of as specically hand and ngeroriented, has now been transformed into something which happens completely outside the body. Using these examples as our guideline, we now are able to formulate a few more pertinent questions to test our CT practice:

16

Barbara Gorayska and Jacob L. Mey

How do we have to think about the material conditions and parameters for using a prosthesis, or for that matter, any tool? Taking, again, the case of the hand, we would have to distinguish between US and European parameters for the opening of doors: the up-down movement that is typical for European latch-type door openers is strategically dierent from the rotating movement that is appropriate to the turning knobs that are standard in the US. We should be thinking also of other aordances for hand interaction: a technology that would be proper for berry picking would not necessarily be t for hand-shaking at conventions, for instance, or for the apostolatus manuum, the apostolate of the hands, as Catholic priests were wont to call their greeting obligations at parish and other religious events. To properly assign functions to the prosthesis (the hand), we need to consider the various functional movements that are involved in hand-toworld interaction, and dierentiate our prosthesis accordingly: grasping vs. picking in relation to the nature of the objects (solid vs. scattered, at vs. protruding, substantial vs. leafy, and so on); pushing vs. pulling (e.g., buttons vs. knobs); vertical vs. horizontal direction (levers vs. handles); lifting vs. stroking (e.g., a baby or a cat).

An interesting example of how the design of the prosthesis feeds back into the design of the natural hand and vice versa is furnished by the development and progress of the common computer instrument known as the mouse. In the beginning, the mouse was just a simple button on a wire, not much dierent from a lamp switch, that could be moved around. When the mouse became on line, that is, when its movements could be said to emulate and control the movements of the cursor on the screen, new possibilities turned up. Some researchers started to think about the fact that the human user not only had hands, but also feet; and that the mouse-like instrument, by having only one button, did not exploit the full capacity of the human hand steering it, which had ve ngers (plus the fact that there were two hands; but since people had to use at least one of them for typing, this was never a serious option). Hence, some of these people set out to design mice that could remedy this defect. The ve-nger mouse and the foot mouse were actually developed as early as the mid-eighties, in Japan, at the Department of Computer Sciences of Osaka University, where JM was witness to how the only person (a graduate student) who was able to manipulate both the foot- and the ve-ngered mouse (and both at the same time!) obtained quite striking achievements as a result of this

Pragmatics and Technology

17

successful coordination. However, as one of the researchers who had developed the ve-nger gadget (and who was not even able to operate it himself) remarked: Its a nice toy, but the only person who ever will ever be able to use it is Katoo-san, and it took him almost two months. The moral of the story is, perhaps, not that the invention of the vengered mouse (or the foot mouse) in itself was contrary to the ideas of CT, as we see them, but that this particular invention was developed out of a wrong idea about making prostheses, namely that they should be as close as possible to the human body part that they replaced. The relevance of using ngers was overlooked. In actual fact, there isnt much that we specically use all of our ve ngers for, singly and individually (except if we are prestidigitators or consummate piano players). Most of the time, we use the hand as such, or the st, e.g., for throwing a ball, or punching someone in on the nose. Here, the ve ngers cooperate in the movement and do not work on their own, as in the case of the ve-ngered mouse. In other words, the practice of CT that went into this invention was misguided in that it took the wrong point of departure; or better, the point of departure itself was wrong, given the circumstances. The world of the mouse and the keyboard do not require more than one button to regulate the movement of the cursor. Even if there are two or three buttons on a mouse, our hunch is that most users will be content to use just the one, and resort to the other, or other two, only in cases of necessity, such as when executing special functions (which, of course, could also be realized in other ways, too, just as it is the case with most of the mouses functions, which can be replaced by a control stroke plus a character, if we desire to avoid tendonitis and computercaused carpal tunnel syndrome and other side benets of mousing!)

Conclusion
As the above examples and discussions have shown, the need to do CT does not stop at the edge of the human-computer interface. CT penetrates far down into the reaches where the cognitive mind reigns supreme, and from where it prepares its forays into the world of technology. In the other direction, CT reaches out to the farthest shores of technological development, bringing back the lost tribes of tooling humans into the cognitive fold. In this perspective, we can think of prostheses and other (computerized) tools as instruments helping to eect this return to humanity, the cognitive tools being extensions of the

18

Barbara Gorayska and Jacob L. Mey

human body (such as in the case of the articial limb) and of the human mind (as in Gorayska and Lindsays fabricated world or Clarks wideware), but at the same time devices that change the body and inspire the mind. An analogy from music may help us to understand this double problematic. It has become common place these days among music bus to demand that baroque and other older music be played on original instruments. That usually means that we construct instruments that are as close as possible to their originals (which, in itself, is a dubious proposition: no modern Stradivarius will ever come close to the beauty of sound and shape that is inherent in the few remaining originals). But one could attack the problem of reproducing technology in a quite dierent way, viz., by asking: If Bach had lived today, would he have been happy to only play on his old foot-and-hand driven, bellow-powered monster of a church organ, if a modern, electrically powered and electronically steered instrument were available to him? Would Beethoven have preferred to hammer out his sonatas on the Hammerklavier rather than on a contemporary Yamaha or Steinway grand? (Of course, he was deaf anyway, so perhaps it wouldnt matter in his case) We think the answer is obvious. Being great minds, not merely great composers and/or performers, these cognitive geniuses would have recognized the tool for what it was: an extension of the mind, not an impediment to its development. They would have jumped at the innovative potential of the new tools, and have dispatched, perhaps with a sad smile, the old instruments to the junkyard of musical history, to be retrieved only for nostalgic purposes. Using the new tools, they would have created a totally new music, in dialectical interaction with their modern instruments. On the other hand, using the retro technology of our days, the most we can do in playing Bach on a replicated 17th century clavichord is to mechanically re-create the old work, but without the inspiration and interaction that went into the original performance, which itself was a unique product of the dialectics between then-cognition and then-technology. Similarly, when dealing with modern technological and cognitive artifacts, it behooves us to take a bold stance and demand of ourselves that we face the challenges hands-on and with open eyes. Technological cognition inspires cognitive technology; it should never clamp the brakes on our development, whether cognitive or technological. Cognitive Technology, as a scholarly discipline, sees it as its purpose in life to oer a modest contribution towards realizing that aim, a push in what we believe is the right direction.

Pragmatics and Technology

19

References
Ali, S. M.(2002). The end of the Dreyfus Aair: (Post)Heideggerian meditations on man, machine and meaning. International Journal of Cognition and Technology, 1(1), 8596. Reprinted in this volume. Balasubramanian, N. V., B. Gorayska & J. Marsh (1993). Establishment of a Cognitive Technology research group. Technical Report TR-93-04. Department of Computer Science. Hong Kong: City University of Hong Kong. Beach, K. D. (1988). The role of external mnemonic symbols in acquiring an occupation. In M. N. Gruneberg & R. N. Sykes (Eds.), Practical Aspects of Memory: Current Research and Issues 1, pp. 342346. New York: Wiley. Beynon, M. (1995). Empirical modelling for educational technology. In J. Marsh, C. L. Nehaniv & B. Gorayska (Eds.), Proceedings of the Second International Cognitive Technology Conference : 5468. August 2528, Aizu, Japan Blackwell, A. F., C. Britton, A. Cox, T. R. G. Green, C. Gurr, G. Kadodo, M. S. Kutar, M. Loomes, C. L. Nehaniv, M. Petre, C. Roast, C. Roe, A. Wong & R. M. Young (2001). Cognitive dimensions of notations: Design tools for Cognitive Technology. In M. Beynon, C. L. Nehaniv & K. Dautenhahn (Eds.), Cognitive Technology: Instruments of mind, CT01. Lecture Notes in AI 2117, pp. 325341. Berlin: Springer. Bowman, S., L. Hinkley, J. Barnes & R. Lindsay (this volume). Gaze aversion and the primacy of emotional dysfunction in autism. In B. Gorayska & J. L. Mey (Eds.), Cognition and Technology: Co-existence, convergence, co-evolution, pp. 267301. Amsterdam: John Benjamins. Card, S. K., T. Moran & A. Newell (1980). The keystroke-level model for user performance time with interactive systems. Communications of the ACM, 23: 396410. Card, S. K., T. Moran & A. Newell (1983). The psychology of human-computer interaction. Hillsdale, N. J.: Erlbaum. Chan, H-M. & B. Gorayska (2001). Critique of pure technology. In M. Beynon, C. L. Nehaniv & K. Dautenhahn (Eds.), Cognitive Technology: Instruments of mind, CT01. Lecture Notes in AI 2117, pp. 463476. Berlin: Springer. Chandrasekharan, S. (this volume). The Semantic Web: Knowledge representation and aordance. In B. Gorayska & J. L. Mey (Eds.), Cognition and Technology: Co-existence, convergence, co-evolution, pp. 153172. Amsterdam: John Benjamins. Clark, A. (1997). Being there: Putting the brain, body, and world together again. Cambridge, Mass.: MIT Press. Clark, A. (2001a). Natural-born cyborgs? In M. Beynon, C. L. Nehaniv & K. Dautenhahn (Eds.), Cognitive Technology: Instruments of mind, CT01. Lecture Notes in AI 2117, pp. 1724. Berlin: Springer. Clark, A. (2001b). Mindware. Oxford: Oxford University Press. Clark, A. (2002). Towards a science of the bio-technological mind. International Journal of Cognition and Technology 1(1), 2133. Reprinted in this volume. CT95 (1993). Call for Papers for the First International Cognitive Technology Conference. Internet publication. Hong Kong: City University of Hong Kong.

20

Barbara Gorayska and Jacob L. Mey

Dascal, M. (2002). Language as a Cognitive Technology. International Journal of Cognition and Technology 1(1), 3561. Reprinted in this volume. Dautenhahn, K. (2002). The origins of narrative: In search of the transactional format of narratives in humans and other social animals. International Journal of Cognition and Technology 1(1), 97123. Reprinted in this volume. Ehn, P. (1988). Work-oriented design of computer artifacts. Stockholm: Arbetslivscentrum (distributed by Almquist & Wiksell International). El Ashegh, H. A. & R. Lindsay (this volume). Cognition and body image. In B. Gorayska & J. L. Mey (Eds.), Cognition and Technology: Co-existence, convergence, co-evolution, pp. 175223. Amsterdam: John Benjamins. Fitzgerald, W. (this volume). Martin Luther King and the Ghost in the Machine. In B. Gorayska & J. L. Mey (Eds.), Cognition and Technology: Co-existence, convergence, coevolution, pp. 345353. Amsterdam: John Benjamins. Gill, K. S (1996). Information Society: New media, ethics and Postmodernism. Berlin: Springer. Gill, S. (this volume). Body moves and tacit knowing. In B. Gorayska & J. L. Mey (Eds.), Cognition and Technology: Co-existence, convergence, co-evolution, pp. 241265. Amsterdam: John Benjamins Publishing Company. Gorayska, B. (1994). How to lose the soul of language. Journal of Pragmatics 22: 536547. Gorayska, B. & K. Cox (1992). Expert systems as extensions of the human mind. AI & Society 6: 245262. Gorayska, B. & R. Lindsay (1989a). Metasemantics of relevance. The First International Congress on Cognitive Linguistics. Print A265. L. A. U. D. (Linguistic Agency at the University of Duisburg) Catalogue: Pragmatics, 1989. Available from http:// www.linse.uni-essen.de:16080/linse/laud/shop_laud. Gorayska, B. & R. Lindsay (1989b) On relevance: Goal dependent expressions and the control of planning processes. Technical Report 16. School of Computing and Mathematical Sciences. Oxford: Oxford-Brookes University. (First published as Gorayska and Lindsay 1989a). Gorayska, B. & R. Lindsay (1993). The roots of relevance. Journal of Pragmatics 19(4): 301323. Gorayska, B. & J. Marsh (1996). Epistemic technology and relevance analysis: Rethinking Cognitive Technology. In B. Gorayska & J. L. Mey (Eds.), Cognitive Technology: In search of a humane interface, pp. 2739. Amsterdam: North Holland. Gorayska, B. & J. Marsh (1999). Investigations in Cognitive Technology: Questioning perspective. In B. Gorayska, J. Marsh & J. L. Mey (Eds.) Humane interfaces: Questions of methods and practice in Cognitive Technology, pp. 1743. Amsterdam: North Holland. Gorayska, B., J. Marsh & J. L. Mey (2001). Cognitive Technology: Tool or instrument. In Meurig Beynon, Chrystopher L. Nehaniv & Kerstin Dautenhahn (Eds.), Cognitive Technology: Instruments of mind, CT01. Lecture Notes in AI 2117, pp. 116. Berlin: Springer. Gorayska, B. & J. L. Mey (1994). Cognitive Technology. In Karamjit S. Gill (Ed.) Proceedings of the conference on New Visions of the Post-industrial Society: The paradox of technological and human paradigms, SEAKE Centre, Brighton 1994. Reprinted in Karamjit Gill (Ed.) 1996, pp. 287294.

Pragmatics and Technology

21

Gorayska, B. & J. L. Mey (1996a). Of minds and men. In B. Gorayska & J. L. Mey (Eds.), Cognitive Technology: In search of a humane interface, pp. 124. Amsterdam: North Holland. Gorayska, B. & J. L. Mey (Eds.) (1996b). Cognitive Technology: In search of a humane interface. Amsterdam: North Holland. Gorayska, B. & J.L. Mey (Eds.) (1996c). AI & Society 10, Special Issue on Cognitive Technology. Hutchins, E. L., J. D. Hollan & D. A. Norman (1986). Direct manipulation interfaces. In D. A. Norman & S. W. Draper (Eds.) User-centered computer design, pp. 87124. Hillsdale, N. J.: Erlbaum. Janney, R. W. (1999). Computers and psychosis. In J. P. Marsh, B. Gorayska & J. L. Mey (Eds.), Humane Interfaces: Questions of methods and practice in Cognitive Technology, pp. 7179. Amsterdam: Elsevier Science. (An earlier version of this paper appeared as Janney, R. W. (1997). The prosthesis as partner: Pragmatics and the Human-Computer Interface. In J. P. Marsh, C. L. Nehaniv & B. Gorayska (Eds.), Proceedings of the Second International Cognitive Technology Conference CT97: Humanizing the Information Age, pp. 16. IEEE Computer Society Press.) Jirotka, M. & P. Lu (this volume). Communicating sequential activities: An investigation into the modelling of collaborative action for system design. In B. Gorayska & J. L. Mey (Eds.), Cognition and Technology: Co-existence, convergence, co-evolution, pp. 303330. Amsterdam: John Benjamins. Kant, I. (1781). Critique of pure reason. (Translated by Norman Kemp Smith.) London: MacMillan, 1933. Kirlik, A. (1988). Everyday life environments. In W. Bechtel and G. Graham (Eds.), A Companion to Cognitive Science, pp. 702712. Oxford: Blackwell. Kutar, M. S., C. L. Nehaniv, C. Britton & S. Jones (2001). The Cognitive dimensions of an artifact vis--vis individual human users: Studies with notations for the temporal specication of interactive systems. In M. Beynon, C. L. Nehaniv & K. Dautenhahn (Eds.), Cognitive Technology: Instruments of mind, CT01. Lecture Notes in AI 2117, pp. 344355. Berlin: Springer. Lindsay, R. (1996). Cognitive Technology and the pragmatics of impossible plans a study in Cognitive Prosthetics. AI & Society. 10, 273288. Special issue on Cognitive Technology. Lindsay, R. (1999). Can we change our minds? The impact of computer technology on human cognition. In B. Gorayska, J. Marsh & J. L. Mey (Eds.) Humane interfaces: Questions of methods and practice in Cognitive Technology, pp. 4569. Amsterdam: North Holland. Lindsay, R. & B. Gorayska (2002). Relevance, goal management and Cognitive Technology. International Journal of Cognition and Technology 1(2), 187232. Reprinted in this volume. Lueg, C. (2002). Looking under the rug: Context and context aware artifacts. International Journal of Cognition and Technology 1(2), 287302. Marsh, J., B. Gorayska, & J. L. Mey (Eds.) (1999). Humane interfaces: Questions of methods and practice in Cognitive Technology. Amsterdam: North Holland. Meenan, S. & R. Lindsay (2002). Planning and the neurotechnology of social behaviour. International Journal of Cognition and Technology 1(2), 233274.

22

Barbara Gorayska and Jacob L. Mey

Mey, J. L. (1988). CAIN and the invisible tool: Cognitive Science and the Human-Computer Interface. Journal of the Society of Instrument and Control Engineers (Tokyo) 27(1), 247252. Mey, J. L. (1995). Cognitive Technology Technological Cognition. In: Proceedings of the First International Cognitive Technology Conference, August 1995, Hong Kong. Reprinted in AI & Society (1996) 10, 226232. Mey, J. L. (1998). Adaptability. In: Concise encyclopedia of pragmatics, pp. 57. Oxford: Elsevier Science. Mey, J. L. (2000). The computer as prosthesis. Hermes, Journal of Linguistics 24, 1429. Mithen, S. J. (1996). The prehistory of the mind: A search for the origins of art, religion and science. London: Orion Books Ltd. Norman, D. A. (1986). Cognitive Engineering. In D. A. Norman & S. A. Draper (Eds.), User Centred Systems Design, pp. 3163. Hillsdale, N. J.: Erlbaum. Norman, D. A. (1993). Things that make us smart. Reading, Mass.: Addison-Wesley. Norman, D. A. (1999). The invisible computer. Cambridge, Mass.: MIT Press. Norman, D. A. & S. W. Draper (Eds.) (1986). User-centered computer design. Hillsdale, N. J.: Erlbaum. Pfeifer, R. (2002). Robots as cognitive tools. International Journal of Cognition and Technology 1(1), 125143. Reprinted in this volume. Rassmussen, J. (1988). Information Processing and Human-Machine Interaction: An approach to Cognitive Engineering. New York: North Holland. Salomon, G. (Ed.). (1993). Distributed cognition: Psychological and educational considerations. Cambridge: Cambridge University Press. Segall, M. H., D. T. Campbell & M. J. Herskovits (1966). The inuence of culture on visual perception. Indianapolis: Bobbs-Merrill. Wickens, C. (1992). Engineering Psychology and Human Performance. 2nd edition. New York: Harper Collins. Wittgenstein, L. (1995). Philosophical Investigations. Oxford: Blackwell.

Part I

Theoretical issues

Towards a science of the bio-technological mind*


Andy Clark
Indiana University

Soon, perhaps, it will be impossible to tell where human ends and machines begins. Maureen McHugh, China Mountain Zhang, p. 214.

1.

A sketch

The study of Cognitive Technology is, in a very real sense, the study of ourselves. Who we are, what we are, and even where we are, are all jointly determined by our biological natures and the web of supporting (and constraining) technologies in which we live, work and dream. We humans, I would argue, are naturally pre-disposed (in ways unique to our species) to create cascading torrents of non-biological structure within which to think and act. We do not need neural implants and prosthetic limbs to count as Natures very own Cyborgs. For we are, and long have been, bio-technological symbionts: reasoning and thinking systems spread across biological matter and the delicately codetermined gossamers of our socio-technological nest. This tendency towards bio-technological hybridisation is not an especially modern development. On the contrary, it is an aspect of our humanity which is as basic and ancient as the use of speech, and which has been extending its territory ever since. We see some of the cognitive fossil trail of the Cyborg trait in the historical procession of potent Cognitive Technologies that begins with speech and counting, morphs rst into written text and numerals, then into early printing (without moveable typefaces), on to the revolutions of moveable typefaces and the printing press, and most recently to the digital encodings that bring text, sound and image into a uniform and widely transmissible format. Such technologies, once up-and-running in the various appliances and institutions

26

Andy Clark

that surround us, do far more than merely allow for the external storage and transmission of ideas. They actively re-structure the forms and contents of human thought itself. And there is no turning back. Whats more, the use, reach and transformative powers of these technologies are escalating. New waves of user-sensitive technology will bring this ageold process to a climax, as our minds and identities become ever more deeply enmeshed in a non-biological matrix of machines and tools, including software agents, vast searchable databases, and daily objects with embedded intelligence of their own. We humans have always been adept at shaping our minds and skills to t our current tools and technologies. But when those tools and technologies start trying to t us, in turn when our technologies actively, automatically, and continually tailor themselves to us, just as we do to them then the line between tool and user becomes imsy indeed. Such technologies will be less like tools and more like part of the mental apparatus of the person. They will remain tools in only the thin and ultimately paradoxical sense in which my own unconsciously operating neural structures (my hippocampus, my posterior parietal cortex) are tools. I do not really use my brain. There is no user quite so ephemeral. Rather, the operation of the brain makes me who and what I am. So, too, with these new waves of sensitive, interactive technologies. As our worlds become smarter, and get to know us better and better, it becomes harder and harder to say where the world stops and the person begins. What are these technologies? They are many, and various. They include potent, portable machinery linking the user to an increasingly responsive worldwide-web. But they include also, and perhaps ultimately more importantly, the gradual smartening-up and interconnection of the many everyday objects which populate our homes and oces. This brief note, however, is not going to be about new technology. Rather, it is about us, about our sense of self, and about the nature of the human mind. The goal is not to guess at what we might soon become, but to better appreciate what we already are: creatures whose minds are special precisely because they are naturally geared for multiple mergers and coalitions. Cognitive technologies, ancient and modern, are best understood (I suggest) as deep and integral parts of the problem-solving systems we identify as human intelligence. They are best seen as proper parts of the computational apparatus that constitutes our minds. If we do not always see this, or if the idea seems outlandish or absurd, that is because we are in the grip of a simple prejudice: the prejudice that whatever matters about MY mind must depend solely on what goes on inside my own biological skin-bag, inside the ancient fortress of skin and skull. But this fortress has been built to be breached. It is a

Towards a science of the bio-technological mind

27

structure whose virtue lies in part in its capacity to delicately gear its activities to collaborate with external, non-biological sources of order so as (originally) to better solve the problems of survival and reproduction. Thus consider two brief examples: one old (see the Epilogue to Clark, 1997) and one new. The old one rst. Take the familiar process of writing an academic paper. Confronted, at last, with the shiny nished product the good materialist may nd herself congratulating her brain on its good work. But this is misleading. It is misleading not simply because (as usual) most of the ideas were not our own anyway, but because the structure, form and ow of the nal product often depends heavily on the complex ways the brain co-operates with, and depends on, various special features of the media and technologies with which it continually interacts. We tend to think of our biological brains as the point source of the whole nal content. But if we look a little more closely, what we may often nd is that the biological brain participated in some potent and iterated loops through the cognitive technological environment. We began, perhaps, by looking over some old notes, then turned to some original sources. As we read, our brain generated a few fragmentary, on-the-spot responses which were duly stored as marks on the page, or in the margins. This cycle repeats, pausing to loop back to the original plans and sketches, amending them in the same fragmentary, on-the-spot fashion. This whole process of critiquing, re-arranging, streamlining and linking is deeply informed by quite specic properties of the external media, which allow the sequence of simple reactions to become organised and grow (hopefully) into something like an argument. The brains role is crucial and special. But it is not the whole story. In fact, the true power and beauty of the brains role is that it acts as a mediating factor in a variety of complex and iterated processes which continually loop between brain, body and technological environment. And it is this larger system which solves the problem. We thus confront the cognitive equivalent of Dawkins (1982) vision of the extended phenotype. The intelligent process just is the spatially and temporally extended one which zigzags between brain, body and world. Or consider, to take a supercially very dierent kind of case, the role of sketching in certain processes of artistic creation. Van Leeuwen, Verstijnen and Hekkert (1999) oer a careful account of the creation of certain forms of abstract art, depicting such creation as heavily dependent upon an interactive process of imagining, sketching and evaluating [then re-sketching, re-evaluating, etc.] (Op. cit., p. 180). The question the authors pursue is: why the need to sketch? Why not simply imagine the nal artwork in the minds eye and then execute it directly on the canvas? The answer they develop, in great detail and

28

Andy Clark

using multiple real case-studies, is that human thought is constrained, in mental imagery, in some very specic ways in which it is not constrained during on-line perception. In particular, our mental images seem to be more interpretatively xed: less able to reveal novel forms and components. Suggestive evidence for such constraints includes the intriguing demonstration (Chambers and Reisberg, 1989) that it is much harder to discover (for the rst time) the second interpretation of an ambiguous gure (such as the duck/rabbit) in recall and imagination than when confronted with a real drawing. Good imagers, who proved unable to discover a second interpretation in the minds eye, were able nonetheless to draw what they had seen from memory and, by then perceptually inspecting their own unaided drawing, to nd the second interpretation. Certain forms of abstract art, Van Leeuwen et al. go on to argue, likewise, depend heavily on the deliberate creation of multi-layered meanings cases where a visual form, on continued inspection, supports multiple dierent structural interpretations. Given the postulated constraints on mental imagery, it is likely that the discovery of such multiple interpretable forms will depend heavily on the kind of trial and error process in which we rst sketch and then perceptually (not merely imaginatively) re-encounter visual forms, which we can then tweak and re-sketch so as to create a product that supports an increasingly multi-layered set of structural interpretations. This description of artistic creativity is strikingly similar, it seems to me, to our story about academic creativity. The sketch-pad is not just a convenience for the artist, nor simply a kind of external memory or durable medium for the storage of particular ideas. Instead, the iterated process of externalising and re-perceiving is integral to the process of artistic cognition itself. One useful way to understand the cognitive role of many of our self-created cognitive technologies is thus as aording complementary operations to those that come most naturally to biological brains. Consider here the connectionist image (Rumelhart, McClelland and the PDP Research Group, 1986; Clark, 1989) of biological brains as pattern-completing engines. Such devices are adept at linking patterns of current sensory input with associated information: you hear the rst bars of the song and recall the rest, you see the rats tail and conjure the image of the rat. Computational engines of that broad class prove extremely good at tasks such as sensori-motor co-ordination, face recognition, voice recognition, etc. But they are not well-suited to deductive logic, planning, and the typical tasks of sequential reasoning. They are, roughly speaking, Good at Frisbee, Bad at Logic a cognitive prole that is at once familiar and alien. Familiar, because human intelligence clearly has something of that avour. Yet

Towards a science of the bio-technological mind

29

alien, because we repeatedly transcend these limits, planning family vacations, running economies, solving complex sequential problems, etc., etc. A powerful hypothesis, which I rst encountered in Rumelhart, Smolensky, McClelland and Hinton (1986), is that we transcend these limits, in large part, by combining the internal operation of a connectionist, pattern-completing device with a variety of external operations and tools which serve to reduce various complex, sequential problems to an ordered set of simpler pattern-completing operations of the kind our brains are most comfortable with. Thus, to borrow the classic illustration, we may tackle the problem of long multiplication by using pen, paper and numerical symbols. We then engage in a process of external symbol manipulations and storage so as to reduce the complex problem to a sequence of simple pattern-completing steps that we already command, rst multiplying 9 by 7 and storing the result on paper, then 9 by 6, and so on. The value of the use of pen, paper, and number symbols is thus that in the words of Ed Hutchins;
[Such tools] permit the [users] to do the tasks that need to be done while doing the kinds of things people are good at: recognising patterns, modelling simple dynamics of the world, and manipulating objects in the environment. (Hutchins, 1995, p. 155)

This description nicely captures what is best about good examples of cognitive technology: recent word-processing packages, web browsers, mouse and icon systems, etc. It also suggests, of course, what was wrong with many of our rst attempts at creating such tools the skills needed to use those environments (early VCRs, word-processors, etc.) were precisely those that biological brains nd hardest to support, such as the recall and execution of long, essentially arbitrary, sequences of operations. See Norman (1999) for further discussion. The conjecture, then, is that one large jump or discontinuity in human cognitive evolution involves the distinctive way human brains repeatedly create and exploit various species of cognitive technology so as to expand and re-shape the space of human reason. We more than any other creature on the planet deploy non-biological elements (instruments, media, notations) to complement our basic biological modes of processing, creating extended cognitive systems whose computational and problem-solving proles are quite dierent from those of the naked brain. In this way human brains maintain an intricate cognitive dance with an ecologically novel, and immensely empowering, environment: the world of symbols, media, formalisms, texts, speech, instruments and culture. The computational circuitry of human cognition thus ows both within and beyond the head, through this extended network in ways which

30

Andy Clark

radically transform the space of human thought and reason. Such a point is not new, and has been well-made by a variety of theorists working in many dierent traditions. This brief and impressionistic sketch is not the place to delve deeply into the provenance of the idea, but some names to conjure with include Vygotsky, Bruner, Latour, Dennett, Hutchins, Norman and (to a greater or lesser extent) all those currently working on so-called situated cognition. My own work on the idea (see Clark 1997, 1998, 1999, 2001a) also owes much to a brief collaboration with David Chalmers (see our paper, The Extended Mind in ANALYSIS 58(1), 1998, p. 719). I believe, however, that the idea of human cognition as subsisting in a hybrid, extended architecture (one which includes aspects of the brain and of the cognitive technological envelope in which our brains develop and operate) remains vastly underappreciated. We cannot understand what is special and distinctively powerful about human thought and reason by simply paying lip-service to the importance of the web of surrounding Cognitive Technologies. Instead, we need to understand in detail how our brains dovetail their problem-solving activities to these additional resources, and how the larger systems thus created operate, change and evolve. In addition, and perhaps more philosophically, we need to understand that the very ideas of minds and persons are not limited to the biological skin-bag, and that our sense of self, place and potential are all malleable constructs ready to expand, change or contract at surprisingly short notice.

2. Some questions The challenge, then, is to take these tempting but impressionistic ideas and to turn them into a genuine science (or sciences) of the technologically-scaolded mind. This is new territory, and even the shape of the major problems and issues remains largely up for grabs. But some major ones look to me to include: 2.1 Origins Since no other species on the planet builds such varied, complex and constantly evolving designer environments as us, what is it that allowed this process to get o the ground in our species in such a spectacular way? Otherwise put, even if it is the designer environments that, in a familiar, boot-strapping kind of way make us what we now are, what biological dierence lets us build/discover/use them in the rst place?

Towards a science of the bio-technological mind

31

This is a serious, important and largely unresolved question. Clearly, there must be some (perhaps quite small) biological dierence that lets us get our collective foot in the designer environment door what can it be? (Contenders might include biological innovations for greater neural plasticity, and/or the extended period of protected learning called childhood. (See Quartz, 1999; Quartz and Sejnowski, 1997; Griths and Stotz, 2000.) Thus, Griths and Stotz argue that the long human childhood provides a unique window of opportunity in which cultural scaolding [can] change the dynamics of the cognitive system in a way that opens up new cognitive possibilities (op. cit.) These authors argue against what they nicely describe as the dualist account of human biology and human culture according to which biological evolution must rst create the anatomically modern human and is then followed by the long and ongoing process of cultural evolution. Such a picture, they suggest, invites us to believe in something like a basic biological human nature, gradually co-opted and obscured by the trappings and eects of culture and society. But this vision (which is perhaps not so far removed from that found in some of the more excessive versions of evolutionary psychology) is akin, they argue, to looking for the true nature of the ant by removing the distorting inuence of the nest. Instead we humans are, by nature, products of a complex and heterogeneous developmental matrix in which culture, technology and biology are pretty well inextricably intermingled. In short it is a mistake to posit a biologically xed human nature with a simple wrap-around of tools and culture. For the tools and culture are indeed as much determiners of our nature as products of it. 2.2 Our self-image as a species How should the recognition of ourselves as naturally bio-technological hybrids aect our views of human nature? How do these ideas t with, or otherwise impact, accounts which emphasize ancestral environments (see, e.g., Pinker, 1997)? At the very least we must now take into account a plastic evolutionary overlay which yields a constantly moving target, an extended cognitive architecture whose constancy lies mainly in its continual openness to change. Even granting that the biological innovations which got this ball rolling may have consisted only in some small tweaks to an ancestral repertoire, the upshot of this subtle alteration was a massive leap in cognitive potential. For our cognitive machinery is now intrinsically geared to self-transformation, artifact-based expansion, and a snowballing/bootstrapping process of computational and

32

Andy Clark

representational growth. The machinery of human reason (the environmentally extended apparatus of our distinctively human intelligence) may thus be rooted in a biologically incremental progression while simultaneously existing on the far side of a precipitous cli in cognitive-architectural space. 2.3 Social policy Educational policy, and social policy in general, need to be geared to our best current scientic image of the human mind. What educational and social policies best serve a race of constantly changing bio-technological hybrids? How can contemporary art help us to better understand these aspects of our nature? (Performance artists like Stelarc (www.stelarc.va.com.au) are tackling this latter issue head-on, with work in which the biological and technological merge and change places). 2.4 The mechanisms of co-adaptation The complex t between biological brains and technological scaolding depends on a two way give-and-take. Brains need to be plastic enough to factor the technologies deep into their problem-solving routines. And the technologies need (over cultural-evolutionary time at rst, and most recently, during their own life-spans) to adapt to better t the users. We urgently need to understand the multiple factors and forces that shape this complex dynamic. (See, e.g., Normans (1999) work on human-centered technologies and Daniel Dennetts (1995) work on the Cranes of Culture.) 2.5 Types of scaolding The single most important task, it seems to me, is to better understand the range and variety of types of cognitive scaolding, and the dierent ways in which non-biological scaoldings can augment (or impair) performance on a task. For example, there is interesting work comparing reasoning using selfconstructed external props (e.g., a diagram you draw to help you think about a problem) and reasoning using found or given props (the diagrams in a textbook, say). There is also work on the role of individual dierences (of cognitive style, etc.) and their impact on the usefulness of specic types of external structure. (For both the above, see Cox, 1999.) And there are detailed studies of the specic properties of various kinds of prop and aid (e.g., Scaife and Rogers (1996) work

Towards a science of the bio-technological mind

33

on graphical representations). These bodies of work cry out for extension and unication. The Holy Grail here is a taxonomy of dierent types of external prop, and a systematic understanding of how they help (and hinder) human performance. Such an understanding would also have immediate implications for the design process itself (see Norman, 1999; Dourish, 2001). 2.6 Collective eects A major part of our cognitive environment is other people, and their distinctive knowledge-bases. How can new technologies help us make the most of this resource? The use of collaborative ltering techniques, in which individuals activities leave electronic traces that can be used to automatically generate new knowledge (e.g., the familiar Amazon prompt: people who bought such-andsuch also liked.) is one simple tool whose power is not yet fully appreciated or exploited. But the potential is vast. For some discussion, see Bonabeau and Theraulez (2000). 2.7 Frameworks and organizing principles What general principles and concepts will allow us to make systematic sense (indeed, to make a science) of the bio-technological mind? Should we (following Hutchins, 1995) think in terms of the ow and transformation, through a series of external tools and media, of representations? This is, in eect, to extend traditional cognitive scientic approaches to mind outwards. Or should we be creating new analytic tools and approaches, perhaps borrowing ideas from dynamical systems theory and the study of complex, coupled systems (see, e.g., Thelen and Smith, 1994; Kelso, 1995; and discussion in Clark, 1997). What key concepts will help make unied sense of the complex and varied roles of external scaolding? Contenders include: O-loading : Several writers (e.g., Dennett, 1996, pp. 134135) stress the way cognitive technologies can be used to o-load work from the biological brain to external arenas. Complementarity : While straightforward o-loading certainly occurs, especially with regard to shifting stu from our limited short-term memory out into the world, it is surely only part of the story. In my own recent work (e.g., Clark, 2001b, ch. 8), the focus is more on complementarity, and on the way external stu can be congured so as to do the kinds of thing that biological brains

34

Andy Clark

either dont do at all, or do fairly badly (think of the role of the artists sketchpad as described above). Transformation : Yet another approach (Rumelhart et al., 1986; Clark, 1998) takes the key notion to be that of problem transformation. Here, the external aids turn the problems that need to be solved (to perform a given task) into the kinds of problems brains like ours like best (e.g., using pen and paper to transform complex arithmetical tasks into sequences of simple ones). Stabilization of Environments : Hammond et al. (1995) deploy the notion of environmental stabilization to analyze the problem-solving activity of embodied, environmentally active articial agents. Such agents act so as to keep the environment steady in ways that allow the easy reuse of once-successful plans and stratagems (e.g., putting pots and pans away in the same places, so as to reuse a cooking routine). Do we need just one or all of these concepts (and are there more?)? Do they all emerge as special cases of some deeper organizing principle? And within what kinds of explanatory framework are they best deployed?

3. Conclusions The project of understanding human thought and reason is easily misconstrued. It is misconstrued as the project of understanding what is special about the human brain. No doubt there is something special about our brains. But understanding our peculiar proles as reasoners, thinkers and knowers of our worlds requires an even broader perspective: one that targets multiple brains and bodies operating in specially constructed environments replete with artifacts, external symbols, and all the variegated scaoldings of science, art and culture. Understanding what is distinctive about human reason thus involves understanding the complementary contributions of both biology and (broadly speaking) technology, as well as the dense, reciprocal patterns of causal and coevolutionary inuence that run between them. Turning this kind of vision into a genuine science of the bio-technological mind is a massive task, calling for interdisciplinary co-operation on a whole new scale. I hope and believe that this volume will contribute to a crucial forum for that important endeavor.

Towards a science of the bio-technological mind

35

Note
* Section 1 is an expanded version of a text which appeared as Natural-Born Cyborgs in John Brockmans Reality Club (2000). It is electronically published at http://www.edge.org, and is reproduced with permission.

References
Bonabeau, E. & G. Theraulaz (2000). Swarm Smarts. Scientic American 282(3), 7279. March 2000. Chambers, D. & D. Reisberg (1989). Can Mental Images Be Ambiguous? Journal of Experimental Psychology: Human Perception and Performance II(3), 317328. Clark, A. (1989). Microcognition: Philosophy, Cognitive Science and Parallel Distributed Processing. Cambridge, MA: MIT Press. Clark, A. (1997). Being There: Putting Brain, Body and World Together Again. Cambridge, MA: MIT Press. Clark, A. (1998). Magic Words: How Language Augments Human Computation. In J. Boucher & P. Carruthers (Eds.), Language and Thought, pp. 162183. Cambridge: Cambridge University Press. Clark, A. (1999). An Embodied Cognitive Science? Trends In Cognitive Sciences 3(9), 345351. Clark, A. (2001a). Reasons, Robots and The Extended Mind. Mind And Language 16(2), 121145. Clark, A. (2001b). Mindware: An Introduction to the Philosophy of Cognitive Science. New York & Oxford: Oxford University Press. Clark, A. & D. Chalmers (1998). The Extended Mind. Analysis 58, 719. Cox, R. (1999). Representation, Construction, Externalised Cognition and Individual Dierences. Learning and Instruction 9, 343363. Dawkins, R. (1982). The Extended Phenotype. New York & Oxford: Oxford University Press. Dennett, D. (1995). Darwins Dangerous Idea. New York: Simon and Schuster. Dennett, D. (1996). Kinds of Minds. New York: Basic Books. Dourish, P. (2001). Where The Action Is. Cambridge, MA: MIT Press. Griths, P. E. & K. Stotz (2000). How the mind grows: A developmental perspective on the biology of cognition. Synthese. 122(12), 2951. Hammond, K, T. Converse & J. Grass (1995). The Stabilization of Environments. In P. Agre & S. Rosenschein (Eds.), Computational Theories of Interaction and Agency, pp. 304327. Cambridge, MA: MIT Press. Hutchins, E. (1995). Cognition In The Wild. Cambridge, MA: MIT Press. Kelso, S. (1995). Dynamic patterns. Cambridge, MA: MIT Press. Latour, B. (1993). We Have Never Been Modern. Cambridge, MA: Harvard University Press. Norman, D. (1999). The Invisible Computer. Cambridge, MA: MIT Press. Pinker, S. (1997). How the Mind Works. New York: Norton.

36

Andy Clark

Quartz, S. (1999). The Constructivist Brain. Trends In Cognitive Science. 3(2), 4857. Quartz, S. & T. Sejnowski (1997). The Neural Basis of Cognitive Development: A Constructivist Manifesto. Behavioral and Brain Sciences. 20, 537596. Rumelhart, D., P. Smolensky, D. McClelland & G. Hinton (1986). Schemata and Sequential Thought Processes in PDP Models. In J. L. McClelland, D. E. Rumelhart, & the PDP Research Group (Eds.), Parallel Distributed Processing: Explorations in the Microstructure of Cognition 2, pp. 757. Cambridge, MA: MIT Press. Scaife, M. & Y. Rogers (1996). External Cognition: How Do Graphical Representations Work? International Journal of Human-Computer Studies. 45, 185213. Thelen, E. & L. Smith (1994). A Dynamic Systems Approach to the Development of Cognition and Action. Cambridge, MA: MIT Press. Van Leeuwen, C. I. Verstijnen & P. Hekkert (1999). Common Unconscious Dynamics Underlie Common Conscious Eects: A Case Study in the Iterative Nature of Perception and Creation. In J. Scott Jordan (Ed.), Modeling Consciousness Across the Disciplines, pp. 179219. Lanham, MD: University Press of America.

Language as a cognitive technology*


Marcelo Dascal
Tel Aviv University

1.

Introduction

In our time we live surrounded by objects and devices we have created that are able to perform complex tasks, whose performance used to demand much concentration and mental eort by us. One can say that we have managed to transfer to these objects and devices some of the capacities that were considered until not long ago typical and exclusive to the human intellect. In this sense, we have created cognitive artifacts (Hutchins, 1999) or, more generally, cognitive technologies (Gorayska and Mey, 1996). These inventions save a considerable portion of our intellectual power, setting it free for the performance of higher cognitive tasks those for which we have not yet been and, as believed by some (e.g., Dreyfus, 1971, 1992), will never be able to invent mechanical, computational or other kinds of ersatz. Behind every technology created by humankind be it the wheel, agriculture, or the cellular telephone there is, of course, a lot of cognitive eort. But this does not make it, as such, a cognitive technology in the sense in which I propose to use this expression. What I have in mind are the main uses of a technology, rather than the process of its creation or its eects. Primarily, the wheel serves for transportation; agriculture, for food production; the cellular phone, for communication. Secondarily, these technologies can also be useful for cognition: transportation allows us to participate in scientic conferences where we are supposed to learn something; nourishing ourselves gives us also mental energy; the cellular phone can occasionally serve to communicate cognitive content. Cognitive technologies, however, are those that either have been designed for cognitive uses or else have been appropriated for such uses. They can of course retain their original uses and have non-cognitive eects such as the production of jobs, the waging of war, or space travel. By cognitive technology (CT) I mean, thus, every systematic means material or mental created by humans that is signicantly and routinely used

38

Marcelo Dascal

for the performance of cognitive aims.1 By cognitive aims I mean either mental states of a cognitive nature (e.g., knowledge, opinion, belief, intention, expectation, decision, plan of action) or cognitive processes (e.g., perception, memorization, conceptualization, classication, learning, anticipation, the formulation of hypotheses, demonstration, deliberation, evaluation, criticism, persuasion, discovery) that lead to cognitive states or help to reach them.2 Natural languages (NL), unlike formal languages created for specic purposes, can hardly be considered as such as prototypical artifacts, for they have not been purposefully designed. Yet they evolved genetically and culturally in view of certain human needs, and some of their features may have been appropriated (deliberately or not) in order to satisfy needs other than those whose pressure caused them to emerge in the rst place. In-so-far as such needs are cognitive, it seems to me appropriate to view the corresponding features of natural languages and their use as cognitive technologies. Researchers and developers in many elds of technology have been increasingly interested in natural languages. In an ad published in Time magazine a few years ago, the Daimler-Benz corporation asks, How will man and machine work together in the future?, and replies: With the spoken word. The ad reveals that one of the corporations divisions is currently researching and developing sophisticated new word-based systems, which will bring to life the Biblically sounding prophecy: Man will talk to Machine and Machine will respond for the benet of Man. Language-based technology also occupies a central position in MITs multimillion Oxygen project, which is heralded as the next revolution in information technology comparable to those achieved by the introduction of the computer and the internet. Oxygen will address peoples inherent need to communicate naturally: we are not born with keybord and mouse sockets but rather with mouths, ears and eyes. In Oxygen, speech understanding is built-in and all parts of the system and all the applications will use speech (Dertouzos, 1999, p. 39). The intense current interest in NL-based technologies, however, is almost entirely focused on one of its functions human-machine communication. Thus, in spite of its central position in the Oxygen system, the designers seem to conne NL to its communicative role in the human-computer interface.3 For example, the system is intended to be able to satisfy peoples need to nd useful information by being able to understand and respond adequately to a user who simply says Get me the big red document that came a month ago (Dertouzos, 1999, p. 39). This would be no doubt a great communicative achievement, for it would require endowing machines with advanced syntactic,

Language as a cognitive technology

39

semantic, as well as pragmatic processing abilities. But no less important is to realize how fundamental for satisfying the cognitive need to nd useful information is to fully use the syntactic, semantic and pragmatic potential characteristic of NL which is in fact what humans do in order to nd useful information. The Daimler-Benz ad seems to come close to realizing the cognitive potential of NL, for its heading declares: Language is the picture and counterpart of thought. Nevertheless, the research it announces is primarily concerned with the interface issue: And the machines will understand the words and respond. They will weld, or drive screws, or paint, or write they will even understand dierent languages. Such a focus on the use of spoken language as the upcoming predominant mode of human-machine interface orients research toward important issues such as auditory pattern recognition and automated voice production, identication of eventual sources of misunderstanding, elimination of articial constraints in communication (naturalness), eortlessness (no need of special learning), comfort (user-friendliness), ubiquity (widespread use of miniaturized devices, with minimal energy requirements, literacy requirements, etc.), security (through voice recognition), and, more recently, human-computer conversation in specic environments, the incorporation of non-verbal channels in human-computer communication, and social agent technologies.4 Most of these applications rely upon sophisticated cognitive means, namely those that subtend the communicative use of language. But they are not, as such, cognitive technologies in the sense dened above. These technologies are concerned with the role of NL in the cognitive processes themselves, regardless of whether and how they may be communicated to other humans or machines. It is to this cognitive use of NL, so far overlooked by researchers and developers of new technologies, that the approach here proposed wants to call attention. In my opinion, until this potential of NL is tapped, the truly revolutionary eect of the technological appropriation of NL will not be achieved. In terms of the above denition of CT, the question of whether certain aspects of NL are properly viewed as CTs is independent of the current or prospective state of technological development. In other words, this question is independent of the question whether a presumed CT aspect of language can be implemented by computational or other devices, enhanced by such devices, or used in interfaces with them. These questions depend upon the design and development of artifacts capable of simulating, enhancing or making use of the cognitive-technological features of NL, rather than upon the existence and use of such features themselves. Of course, the better we understand the nature of

40

Marcelo Dascal

NLs cognitive-technological functions, the better we will be in a position to develop the corresponding artifacts. We may eventually reach the conclusion that not all of these functions can be satisfactorily emulated by such artifacts. In this sense, the approach proposed here might provide some relevant input to the issue of what computers cant do. I believe this approach will also be valuable for a better understanding of why a proper handling of the cognitive uses of language is crucial for the development of other, not necessarily linguistic, more ecient cognitive technologies, for the design of humane interfaces, and more generally for the epistemology and philosophical anthropology of the digital culture. However, since my primary concern here is to show how several aspects of language and language use can be fruitfully conceptualized as cognitive technologies, the exploration of these further implications of such a conceptual shift will have to be left for another occasion. In the next section, I present a number of parameters in terms of which a typology of cognitive technologies in general can be outlined. Section 3 summarizes the main antagonistic positions in the traditional debate about the primacy of thought over language or language over thought and proposes to re-frame this debate with the help of the notion of cognitive technology. Section 4 analyzes some examples of linguistic structures and language use as possible candidates of language-based cognitive technologies. In the Conclusion (Section 5), I point out some of the gains to be derived from viewing language as a cognitive-technological resource.

2. Towards a typology of cognitive technologies In addition to their being directed at either cognitive states or processes as pointed out in the Introduction it is convenient to distinguish cognitive technologies according to other signicant parameters. 2.1 Strong and weak cognitive technologies Mental states can be distinguished (from each other) according to their modal qualications. For instance, an epistemic state can be certain or probable, intuitive or explicit, denitive or hypothetical, justied or accepted without justication, etc. Cognitive processes, in turn, can be oriented towards reaching mental states endowed with any of these modalities. A logical or mathematical demonstration, for instance, leads to an epistemic state of denitive certainty, whereas argumentation or deliberation may lead at most to a doxastic state of

Language as a cognitive technology

41

holding a belief which, although well grounded, is only provisional.5 Cognitive technologies vary according to the modal aims of their designers. When they choose those modalities one could call strong (certainty, irrevocability of the conclusions, etc.), they usually seek to endow the proposed technology with an algorithmic decision procedure which is entirely error-free and therefore irrevocable in its results. When they content themselves with weak modalities, they can employ less rigid algorithms (e.g., non-monotonic or probabilistic logics), which do not ensure that the results cannot be called into question and modied. 2.2 Integral and partial cognitive technologies Integral technologies are those that provide for the full execution of a given cognitive aim, without requiring any human intervention. Partial technologies are those that provide only helps for the performance of a given cognitive aim. These helps make it easier for a human agent to perform the task, but cannot dispense with his or her intervention. Often the designers ambitions lead them to propose maximalist projects of the rst kind; but quite often, when they realize the enormous diculties involved, they are likely to be less ambitious and satisfy themselves with partial technologies. The failure of the projects of full mechanical translation in the 50s and 60s thus led not without rst wasting hundreds of millions of dollars to the more modest current projects, in which the technologies developed merely suggest to the human translator alternative translations. The translator not only has to choose the most appropriate one; s/he must also edit it quite heavily in order to nally produce an acceptable text.6 This evolutionary pattern from more to less ambitious technologies for a certain cognitive aim is quite frequent. However, maximalist ambitions tend to reappear whenever new technological and scientic developments make the conditions seem ripe for achieving integral aims. 2.3 Complete and incomplete cognitive technologies One should further distinguish between the pragmatic notion of an integral technology in the above sense and the notion of a syntactically and/or semantically complete technology. The latter has to do with the ability of a technology to cover completely a given domain or ensemble of objects with respect to some desired property. For instance, if one creates an alphabet of trac signs in order to express through the combinations of its signs all the instructions to

42

Marcelo Dascal

be given to drivers, and if the alphabetical system in question has no means to express one of these instructions, it is incomplete. It may be incomplete either due to the insuciency of its formation rules or due to that of its transformation rules.7 In so-called non-standard logics, degrees of completeness are distinguished, so that one can talk of weakly complete systems, very weakly complete systems, and so on.8 2.4 Constitutive and non-constitutive cognitive technologies Some technologies are constitutive of certain cognitive states or processes, whereas others are not. The former are such that without them certain cognitive operations cannot be performed. The latter, although extremely useful for the facilitation of the achievement of certain cognitive aims, are not a sine qua non for that. An example of the rst kind could be the alleged necessity of supercomputers in order to generate very large numbers so as to be able to decide whether numbers endowed with certain arithmetical properties exist; or, more generally, the alleged necessary reliance on computational technologies in order to prove certain theorems. An example of the second kind is the dramatic increment in the ecacy of many of our cognitive performances thanks to computers, in spite of the fact that the latter have not (so far) become indispensable for the former. It is not easy to discern whether a given technology is constitutive or not. The endless debate about whether language is a necessary condition for thought, discussed in Section 3, illustrates such a diculty. 2.5 External and internal cognitive technologies Cognitive technologies can be external or internal. The former consist in physical devices or processes that are instrumental in achieving cognitive aims. The most ubiquitous example today of this kind of technology is the computer. But it is not the only one. Its predecessor the abacus, as well as paper and pencil, graphs and diagrams, and even the book can be included in this category. Discussions of cognitive technologies usually focus on such external or prosthetic as they are often called technologies. The signicance of internal technologies should not be overlooked, however. Internal cognitive technologies are mental procedures thanks to which we can improve our cognitive activity. In this category one should include, for instance, mnemonic techniques that improve our capacity of storage of and access to information, formal methods of reasoning that permit to draw correct

Language as a cognitive technology

43

conclusions from given premises, denitions that clarify and xate the meaning of concepts, and so on. Underlying these mental technologies there are according to current belief cerebral physical processes. But so far the mental has not yet been successfully reduced to its underlying neural layer. What characterizes the internal technologies, even in cases where they employ external devices, is the fact that they are part and parcel of the cognitive processes themselves at the mental level, rather than attempts to reduce to or replace such processes by devices or processes operating at another level.

3. Language and thought: Re-framing the classical debate Language has been and still is conceived as having as its primary function communication. In this respect, it serves to convey thought or other forms of cognitive content, but need not play any role in the formation of the thoughts it conveys. Descartes, who considered the ability to use language appropriately in any context to be a distinctive trait of humans (as opposed to animals and machines) and insisted that this ability shows that humans have minds, categorically ruled out the possibility that language may be constitutive of thought processes such as reasoning, as suggested by Hobbes. In the same vein, Turing (1950) considered that success in linguistic communication is a test for determining the presence of intelligence in a machine, but did not claim that this would also show that intelligence consists in the ability to manipulate linguistic symbols.9 Both Descartes and Turing assumed that the capacity to use language appropriately for communication requires high cognitive abilities, and therefore can testify to the existence of mind or intellect. To argue that language itself has a crucial function in cognition would not only violate Descartess mindbody dualism (which perhaps wouldnt bother Turing), but would also seem to involve an egg-chicken circularity that would certainly bother Turing, as it bothered many others earlier.10 As opposed to such a view of the relationship between language and mind as purely external, the former having only an indicative role vis-a-vis the latter, other thinkers have argued that language is much more intimately connected with mental life. For such thinkers, language plays an essential role in cognition. They argue that it is constitutive of fundamental cognitive processes (Hobbes), necessary for their performance (Leibniz), responsible for their historical emergence (Condillac), determinant of their structure and content (Whorf), required for their explanation (Sellars), the behavioral substrate of thinking and

44

Marcelo Dascal

other mental processes (Watson), an essential component of the social, cultural or ontological context where thought and other aspects of mental life take place (Vygotsky, Mead, Geertz, Heidegger), and so on. The centuries-old debate on the nature of the relationship between language and thought was mesmerized by these polar positions regarding which one of them is, in some sense, dependent upon the other.11 Under close scrutiny, however, both sides in the debate acknowledge the existence of language-thought interactions that do not t the sweeping versions of their claims. For example, avowed externalists like Bacon and Locke, undertake to criticize language as a dangerous source of cognitive mistakes and suggest methods (which gave rise to the attempt to elaborate scientic languages) to avoid such a danger. Yet, in so doing, they in fact admit that thought is not impervious to the inuence of language. On the other side of the fence, Leibniz, who argued forcefully for the view that language and other semiotic systems are indispensable for even moderately complex forms of cognition, acknowledged the non-linguistic character of certain kinds of knowledge, such as intuitive and clear but not distinct knowledge. As in many other debates in the history of ideas, the tendency to focus on mutually exclusive dichotomous positions renders them insoluble and to some extent sterile. I have suggested elsewhere that, instead of focusing exclusively on the primacy or dependency issue when discussing the relationship between language and thought, it might be more useful to envisage the details of how language is actually used in mental processes.12 The application to language of the notion of cognitive technology as dened above provides, I submit, a fruitful way of further exploring this suggestion.

4. Language as environment, resource and tool of cognition Languages presence in human life is overwhelming. Poets excelled in evoking the subtle ways in which words penetrate every corner of our mind,13 and as we have seen in the preceding section some thinkers have seen in language an essential and inevitable component of mental processes. This fact is not necessarily good or useful, if evaluated from the point of view of specic cognitive and practical aims. Hence the recurrent attempts to identify those aspects of language that are deemed to be pernicious and to propose a variety of therapies to lter them out. However justied it may be, such a critique testies to the importance of languages inuence on cognition.

Language as a cognitive technology

45

Without going as far as Heidegger, who claimed that language is the house of being, I would say it is certainly a major component of the context of thinking.14 Without going as far as Geertz (1973, p. 83), who claimed that language, being one of our key cultural resources, is ingredient, not accessory, to human thought, I would rather emphasize that it is a ready-at-hand resource that thinking can easily make use of. Without suggesting, as does Watson, that thinking is nothing but sub-vocal speech,15 I would claim that certain linguistic resources do become sharp cognitive tools that aord the emergence and performance of certain types of cognition. The label cognitive technology is, of course, more straightforwardly applicable to those aspects of language that were shaped into cognitive tools, both because of their specic cognitive function and because they comport an element of design. But one should not overlook the fact that such tools emerge from a background where languages potential and actual role as a cognitive environment and resource is unquestionable. In fact, the relationship between these three levels is dynamic and multi-directional. Just as environmental properties of language (e.g., sequential ordering) can give rise to resources (e.g., narrative structure) and thence to tools (e.g., explanatory strategies), so too a tool (e.g., a successful metaphor created in order to understand a new concept) can become a resource (a frozen metaphor) and then recede into the environmental background (e.g., by becoming incorporated into the semantic system as a lexical polysemy). 4.1 Language as environment As an environment of thought, language, through its sheer overwhelming presence in the mind, inuences cognition independently of our awareness or will. Perhaps the most important of the environmental contributions of language to cognition derives from its general structural properties. Languages are articulated systems; linguists describe them as consisting in a double articulation comprising two dierent sets of basic units and principles of organization: units of meaning (lexemes or morphemes) and, say, units of sound (phonemes).16 These units, in turn, can be combined and recursively recombined in rule-governed ways at dierent levels a mechanism that accounts for languages impressive generative potential. This elaborate analytic-combinatorial system provides a natural model for other cognitive activities where the segmentation of complex wholes into parts, the abstraction of recurrent units and patterns from their actual context of use, and their use in any number of

46

Marcelo Dascal

other contexts are performed. The application of such a model to cognitive needs other than strictly linguistic ones need not be deliberately undertaken, but the fact that we are familiar with it and master it perfectly in our daily language use certainly grants it a privileged position in our practice and conceptualization of how cognition in general does and should proceed. No wonder that Descartes, Leibniz, Locke and many other thinkers used this analytic-combinatorial model as the core of their epistemology, and that Condillac considered the availability of language as a sine qua non for moving from sensation to the higher level of cognitive ability he calls analysis, without which humans would not be able to generate distinguishable and recurrently usable ideas. Another fundamental inuence of the linguistic environment on cognition derives from the fact that language is a rule-based system. The power of the notion of rule is apparent in the attraction it exerts on a childs mind, as soon as the child gives up its egotistic privilege of creating its own communicative symbols and submits to the socially imposed linguistic rules. Not only does the child attempt to impose absolute exception-free regularity on the linguistic rules themselves through the well-attested phenomenon of over-generalization (e.g., by regularizing irregular verbs: eated instead of ate, shaked instead of shook). It also projects this strict rule model onto other activities such as games where no violations of the rules are tolerated. The strong appeal of the computation model of the mind, as well as of its earlier counterpart, the mind-machine analogy, may ultimately derive from our familiarity with the machine-like rules of grammar.17 The sequential organization of speech another structural characteristic of language imposes upon oral communication a linear and unidirectional pattern. This pattern is imitated in cognitive processes, even when they are not communication-oriented. As a result, an ante-post, step by step ordering of thoughts acquires a privileged canonical status, where what comes rst is assumed to be, in some sense, cognitively prior to what comes after. Such a priority may be interpreted as logical, epistemological, causal, psychological, or chronological, but the pattern is the same, and tends to be viewed as an indication that a cognitive process that follows it is rational. Obviously, this pattern does not t all cognitive processes, some of which (e.g., associative thought) display rather a net-like structure.18 Speech permits deviations from linear and unidirectional thematic order (e.g., digressions, ashbacks), and writing and electronic text-processing provide further means for so doing (e.g., footnotes, hypertext). But the fact that such deviations are perceived as exceptions to the

Language as a cognitive technology

47

standard linear pattern implies that both linguistically and cognitively they must be sparingly used and their use needs to be especially justied. In this sense, the environmental inuence of the linguistic sequential model obstructs, rather than helps, the performance of certain cognitive processes.19 The analytic-combinatorial, rule-based, and sequential models are not, however, the only ones the linguistic environment provides for cognitive imitation. Adam Smith observed that modern analytic languages stand to ancient synthetic languages, as far as their simplicity and systematicity is concerned, as early machines, which are extremely complex in their principles, stand to more advanced ones, which produce their eects with fewer wheels and fewer principles of motion. Similarly, in language every case of every noun, and every tense of every verb, was originally expressed by a particular distinct word, which served for this purpose and for no other. But succeeding observations discovered, that one set of words was capable of supplying the place of all that innite number, and that four or ve prepositions, and half a dozen auxiliary verbs, were capable of answering the end of all the declensions, and of all the conjugations in the ancient languages (Smith 1761, pp. 248249). But he immediately made clear that the language-machine analogy breaks down as soon as one goes beyond grammar: The simplication of machines renders them more and more perfect, but this simplication of the rudiments of languages renders them more and more imperfect, and less proper for many of the purposes of language (ibid.). Smith had in mind the expressive needs that language must provide for, such as eloquence and beauty or, more generally, its ability to express not only the thought but also the spirit and the mind of the author (Smith 1983, p. 19). It is for such purposes that the simplied machinery of analytic languages is inadequate due to their inherent prolixness, constraint, and monotony (Smith 1761, p. 251). Since even a paradigmatic analytic language such as English obviously overcomes these inadequacies and provides for the expressive needs mentioned (didnt Shakespeare write in English?), it must do so if we follow Smiths argument by evolving some non-mechanical means that compensate for its mechanical limitations. Smiths point can be generalized. First, the expressive needs for which more than the rules of grammar are needed comprise not only lofty literary-rhetorical ideals, but also down-to-earth everyday communicative needs. Second, there is no dierence between analytic and synthetic languages in this respect; in fact, no known natural language can dispense with additional wheels and principles of motion, other than the syntactic and semantic ones, in order to fulll its expressive and communicative duties. Such additions to the basic linguistic

48

Marcelo Dascal

system range from syntactic rules that permit to transform or adjust the output of the basic syntactic rules without substantial meaning change to devices that allow one to say one thing and mean another. The former can be compared to the addition of epicycles to the Ptolemaic system in order to cope with the observed phenomena, without modifying its methodological assumption about the kinds of wheels and principles that are supposed to account for the machines competence.20 The latter, studied mainly by pragmatics and rhetoric, are generally believed to obey dierent kinds of rules of an heuristic, rather than an algorithmic nature.21 A particularly signicant feature of the pragma-rhetorical component of a linguistic system is that it sometimes achieves its aims by resorting to explicit violations of the systems rules be they the algorithmic ones (as in metaphor, puns, and nonsense poetry) or the heuristic ones (as in conversational implicatures). A rule-based system that employs dierent kinds of rules and does not rule out, but rather permits and even exploits the violation of its own rules, is extremely valuable from a cognitive point of view. For it provides a living and eective model for many important cognitive processes that are open-ended, exible, creative, and yet not aleatory. It also shows that there is an alternative to treating as virtually all LN interfaces and applications to date do rule violations as mistakes to be corrected (sometimes automatically, thereby irritating users). Apart from its generic inuence, the linguistic environment can have quite specic eects upon cognition, which should not be overlooked. An interesting case is the presumed role of language in causing deviations from logically valid forms of reasoning. For example, there is evidence that the evaluation of invalid syllogisms as valid has to do with an atmosphere eect, produced by the particular linguistic formulation of the premises. Thus, syllogisms whose premises were both armative and universal tended to be viewed as having also an armative and universal conclusion, irrespective of whether the disposition of the subject and predicate terms in the premises logically warranted such a conclusion (Woodworth and Sells, 1935; Evans, 1982, pp. 8990). Similarly, a robust nding in studies of conditional reasoning using Wasons well-known selection task is the linguistically-driven matching bias. The subjects in this task are given a conditional sentence referring to a set of four cards laid down on a table. Each card has a letter on one side and a number on the other. The subjects are asked to determine which cards they would have to turn in order to tell whether the sentence is true or false. The matching bias consists in the fact that subjects tend to pick up those cards that match those named in the

Language as a cognitive technology

49

sentence, regardless of whether they verify or falsify it.22 Further, allegedly pernicious, specic examples of linguistic inuence on cognition will be mentioned in the next section. 4.2 Language as resource Under this rubric I include those aspects of language that are regularly and, for the most part, consciously put to use for cognitive purposes, with minimal elaboration. They deserve to be considered technologies in-so-far as the choice of a particular linguistic feature stands in a means-end relationship with the cognitive purpose in view. An example of a linguistic resource widely employed for an extremely important cognitive purpose is the use of words for gathering, organizing, storing, and retrieving information. This has been done for so long that it is taken for granted and we are unaware of its linguistic-cognitive underpinnings as well as of the fact that in its current uses be it in printed indices or in computerized search engines its potential is far from being fully exploited. For, whereas the value of words for tracing relevant information lies in their semantic content, most applications make use only of their graphic form in order to locate matching graphic strings that are assumed to lead to semantically relevant material. The cognitive burden to sort out truly relevant information is for the most part left to the user. Few systems make use of the truly relevant linguistic resource for information storage and retrieval, the resource humans naturally and eortlessly use, namely the rich semantic structure of natural languages.23 The semantic network of language is based on a set of semantic relations that connect expressions in a variety of ways as synonyms, near-synonyms, paraphrases, analytic, super-ordinate, subordinate, belonging or not to a semantic eld, antonyms, contrary, contradictory, etc. By structuring the mental lexicon, such a network is an inescapable resource the mind constantly resorts to in most of its cognitive operations, which rely upon conceptual similarities and dierences. The semantic network also comprises information such as connotations, prototypes, and factual information that belongs rather to the mental encyclopedia but, being standard, widely known and normally activated in the understanding of linguistic expressions, counts as semantically relevant.24 This extension makes the network an even more useful and constantly used cognitive resource, only minimally exploited to date by technologies that make use of thesauri.

50

Marcelo Dascal

The possibility of precision aorded by natural languages should not make us overlook the wide variety of syntactic, semantic and pragmatic means they have for expressing indeterminacy an umbrella term here used to refer to phenomena such as indeniteness, ambiguity, polysemy, unspecicity, imprecision and vagueness. Although considered a hindrance from the point of view of certain cognitive needs, such linguistic means are, from the point of view of other cognitive needs, an asset. For instance, they are an invaluable perhaps indispensable resource for the cognitive processes that begin with a foggy initial intuition which they undertake to clarify in a stepwise way, or vice-versa, for those processes that seek to sum up the gist of a theory, an argument, or a story. They are also essential for conceptualizing those situations in which the mind hesitates between alternatives, none of which seem to fall clearly into welldened categories. While we often wish everything could be clearly classied as either black or white, good or bad, true or false, we often stumble at borderline cases, which force our mind to abandon dichotomous thought and rather think in terms of gradual, continuous, and vague concepts (Gullvg and Naess, 1995). Language also provides its users with a repertoire of ready-at-hand, more or less conventionalized patterns that can be put to use not only communicatively but also cognitively. This repertoire ranges from phrases and sentences to fulledged discursive structures. It includes, among other things, formulaic expressions, conventional metaphors, proverbs, topoi,25 argumentative formulae, dialogue patterns, and literary forms.26 These resources are ready-at-hand for organizing thought. The existence of argumentative canonical formulae structured by prepositions and adverbs such as if then, but, eitheror, therefore provides directives to the reasoner, which allow her to complete what is missing, to determine if something in her reasoning is irrelevant or redundant, etc. So too the current canonical form of the scientic article, say, in psychology, provides guidelines not only for the presentation of the authors results, but also for the way in which his mental and practical steps leading to such results should be executed.27 Before concluding this sample of language-based cognitive resources, I want to mention a number of related linguistic devices that, I believe, are extremely important for cognition. Consider the parenthetical I believe I have employed in the preceding sentence. Its position could have been occupied, instead, by I know, I am sure, I have no doubt, I hypothesize, I submit, I argue, I contend or, with slight syntactical modications, by I wonder, I doubt, allegedly, etc. Some of these expressions express propositional attitudes; others, the illocutionary force of speech acts. Both act as operators on propositional

Language as a cognitive technology

51

contents, which reect the variety of dierent degrees of commitment, epistemic status, intentions, etc. with which the mind may relate to such contents. They thus belong to a family of expressions which perform a distinction between two layers of content the one referring to or modulating the other. The most familiar linguistic devices of this kind are metalinguistic operators such as quotation, thanks to which natural languages can act as their own metalanguage. As a whole, these linguistic resources correspond to and reveal the inherent reexivity of human mental processes, i.e., the fact that cognition is conscious of itself, and therefore involves metacognition.28 It seems to me that this is not a one-way road, leading from metacognition to its linguistic expression, but at least a two-way road, in which the existence of metalinguistic resources should also be credited with the enhancement of metacognitive awareness and its development. The mechanism of joint attention (towards perceptual objects, towards each other in an interaction), for example, which is a necessary ingredient of intentional communication (Brinck, 2001), involves the recognition of the others attentional state, as well as awareness of ones own. Similarly, the mothers attribution of intentions to the infant has been suggested to play a decisive role in the infants development of her self-perception as an intentional agent (De Gelder, 1981; discussed in Dascal, 1983, pp. 99.). Let us now turn to the alleged negative eects of language as a resource. The careless cognitive use of linguistic resources has been blamed, throughout the centuries, for inducing all sorts of cognitive mistakes. The indiscriminate use of linguistically productive (and legitimate) patterns of word generation (e.g., white whiteness) has been held responsible for yielding in fact vacuous terms (e.g., nothingness) which are the source of conceptual confusion and pointless dispute (Locke). The existence in language of general terms was blamed for inducing the false belief that there are general ideas and general objects (Berkeley). Natural language categorization, based on vulgar knowledge, was considered to be the most dangerous of the idols that threaten scientic thinking (Bacon). Vagueness was considered incompatible with logic and therefore utterly unreliable (Russell). Reliance on grammatical analogy was blamed for causing category mistakes and systematically misleading the understanding (Ryle). A list of pseudo-problems in which generations of metaphysicians were entangled was added to languages long list of cognitive decits (Carnap). Uncritical linguistic practice was singled out as the most dangerous cause whether deliberate or not of cognitive distortion, manipulation, and ultimately un-sanity (Count Korzybski and the General

52

Marcelo Dascal

Semantics movement).29 And so on. This small sample of criticism certainly shows that languages inuence on cognition can be indeed pernicious. But it also highlights the extent and variety of this inuence. The lesson to be drawn, as in many other cases, is simply that we must be aware of this variety, extent, and sometimes insidious nature, so as to be able to rely upon the linguistic environment of thought only judiciously. 4.3 Language as tool A language-based cognitive technology can be viewed as a tool when it is the result of the engineering of linguistic resources for a specic cognitive task. Let us consider some examples. The linguistic resource of explaining the meaning of one term by correlating it with a string of other terms that dene the former has been sharpened into the powerful cognitive tool of formal denition. This tool permits the creation of special terminology (new terms for new concepts, or redenition of existing terms) or of new notational systems.30 Usually the model of denition adopted in these cases is the classical one, i.e., the specication of necessary and sucient conditions. But natural language semantics also provides other models of capturing concepts, e.g., in terms of similarity to a prototypical member of the denoted class or in terms of clusters of properties which are hierarchically organized in terms of their centrality or weight, although none of them is per se necessary or sucient. Such non-classical models are characteristic of so-called natural kind terms (Achinstein, 1968). The elaboration of each of these kinds of denition yields dierent types of linguistic tools or technologies that are t for dierent cognitive purposes. The various forms of indeterminacy available in natural languages can be shaped into cognitive tools. For example, the linguistic possibility of generating scales of quantiers, making them as subtle as desired (e.g., everyone, virtually everyone, almost everyone, most of the people, the majority of the people, some people, nearly nobody, virtually nobody, nobody, etc.) can give rise to rigorous systems of quantication other than the standard one. The same is true of linguistic tense systems that can be elaborated into a variety of temporal logics. And vagueness has been elaborated semantically into fuzzy logic, that permits to reason rigorously with vague concepts (Zadeh, 1975; see also Black, 1963), as well as pragmatically, into a dynamic interpretive tool for gradually increasing precision until what appears to be an agreement or disagreement is shown to be in fact a pseudo-agreement or a pseudo-disagreement (Naess, 1966).

Language as a cognitive technology

53

Formulaic expressions can become powerful cognitive tools. A remarkable example, analyzed by Reviel Netz (1999), is the role of linguistic formulae in ancient Greek mathematics. Netz shows that, in contrast to deduction in modern mathematics, where one resorts to typographic symbols, thus opting for exploiting a visual resource, Greek mathematics made use of formulaic expressions of a linguistic resource presumably of oral origin. He analyzes in detail Book II of Euclids Elements, identifying and sorting out the 71 such formulaic expressions, i.e., highly repetitive and standardized phrases and sentences, which make up for most of the text. He argues that deduction as a cognitive tool may have been made possible due to the systematic use of such formulae: The constant re-shuing of objects in substitutions may be securely followed, since it is no more than the re-tting of well-known verbal elements into well-known verbal structures. It is a game of decomposition and recomposition of phrases, indeed similar to the jigsaw puzzle of putting one heroic epithet after another, which is to a certain extent what epic [Homeric] singers did (Netz 1999, p. 161). If we jump from mathematics to religion, we may nd in the Hindu mantra a similar phenomenon. The nature and role of mantras is quite controversial, as is patent from the papers in Alpers (Ed., 1989) collection. Some scholars even doubt their linguistic nature, and most view them as belonging to the religious ritual, where according to some they are akin to prayer. Nevertheless, there is no doubt that, at least in some of their variants, they are self-addressed linguistic or quasi-linguistic tools whose main purpose is to play a denite role in their users mental life. This is the case, for instance, in Yogi meditation; and a classical text presumably of the 3rd or 4th century, the Arthas a stra, goes as far as attributing to it impressive intellectual eects: a mantra accomplishes the apprehension of what is not or cannot be seen, imparts the strength of a denite conclusion to what is apprehended, removes doubt when two courses are possible [and] leads to inference of an entire matter when only a part is seen (quoted by Alper 1989, p. 2). Literary resources can also develop into cognitive tools par excellence. Tsur and Benari (2001) have shown how a specic poetic device composition of place employed in meditative poetry, is designed to overcome the linear and conceptual character of language so as to convey such non-conceptual experiences as meditation, ecstasy or mystic insights and thus to express the ineable (Tsur and Benari, 2001, p. 231). In the same vein, Zamir (2002b) shows, through a close reading of Hamlet, how literature is able to create awareness of ineable content.31 In both cases, I would suggest, literary tools not only express or induce certain mental states, but also in a sense create the

54

Marcelo Dascal

conditions for the very existence of these states in the rst place. As a last example of a linguistic resource that gives rise to a cognitive tool, I would like to mention the dialectical use of dialogue structures. Ever since Plato, philosophers developed what can be seen as a genre the philosophical dialogue in order to expound their ideas. Its formal structure, however, carries with it cognitive requirements quite dierent from other forms of exposition. To expound ones ideas for a specic interlocutor and to defend them against her specic objections even if both the character and objections are ctional creations of the writer requires techniques of persuasion, argumentation and justication other than those used in a linear text that addresses a generalized, non-present, and unknown reader. Other dialectical techniques developed independently, in oral rather than in written form. In the Middle Ages, codied forms of debate such as the disputatio and the obligatio evolved and success in them became part of the academic requirements to obtain a university degree. But the cognitive implications of these practices transcended both pedagogical needs and the Middle Ages. For the basic idea that a rational debate should obey a set of principles that dene the duties of the defendant and the opponent, the types of moves they are allowed to perform, and what will count as a victory, remains in force in fact up to this day even though the contents of such principles have changed. There is no space here to trace the development of dialectical techniques, which involves an interesting interplay between logic and rhetoric, culminating with dialogical logic on the one hand and the new rhetoric on the other. What is important to realize, for the purposes of this paper, is that the ensemble of techniques thus developed transcends pedagogical, expository, or communicative ends, for it becomes a powerful tool for actually implementing the idea that at the core of rationality lies the practice of critical thought.32 In this sense, a system of electronic argumentation should be designed not only to improve ones ability to express oneself (Sillince, 1996), but also as a tool to improve ones ability to think rationally.

5. Concluding remarks In this chapter I have proposed to look at language not only as a communicative interface between cognitive agents, but as a technology involved in cognition itself. I surveyed instances of how language functions as an environment, a resource, and a tool of cognition. Some of these examples are more easily

Language as a cognitive technology

55

acknowledged as cognitive technologies than others, but all of them share the main characteristics I have attributed to this notion. They contribute systematically and directly to cognitive processes and to the achievement of cognitive aims. And all of them are clearly language-based. In terms of the parameters presented in Section 2, most of the examples of language-based cognitive technologies discussed are internal, and await the eventual development of external counterparts; in spite of optimistically exaggerated claims of some designers, virtually all of the extant such developments are partial rather than integral; some of the language-based cognitive technologies are useful for strong cognition, others for weak cognition, and still others for both; very few purport to be complete; and only a few of them have been suggested to be constitutive. By emphasizing the direct contribution of language-based technologies to cognition, I want to stress that they are not mediated by the communicative use of language the kind of use that monopolizes the attention of designers of human-computer interfaces. I obviously do not deny the importance of the latter, but I think the justied desire to develop humane interfaces and, in general, humane technologies, requires a better understanding of how the human mind makes use of and is aected by naturally evolved or designed technologies. In this respect, this paper should be seen as a contribution to the incipient eld of an epistemology of cognitive technology (Gorayska and Marsh, 1996). By focusing on language, it connects this eld with one of the main philosophical achievements of 20th century thought, the linguistic turn, which transformed language into the fulcrum of research in philosophy, psychology, the social sciences, and the cognitive sciences. In his intriguing book Meaning in Technology, Arnold Pacey defends a worldview in which human relationships and human purposes may have a closer connection with technological progress than sometimes seems possible (Pacey, 2001, p. 38). He distinguishes between the prevalent detached approach to science and technology and a participatory approach, in which we feel ourselves to be involved in the system on which we are working (p. 12). According to him, it is the latter that endows technology with meaning. Pacey might have found support for his insights in the present paper. Not only because we have an intimate participatory relationship with language in general and language-based cognitive technologies in particular, but also because such technologies are, ultimately, the technologies of meaning par excellence.

56

Marcelo Dascal

Notes
* I have presented some of the ideas put together in this paper, in one way or another, in the following forums: Dialogue Analysis 2000 (International Association for Dialogue Analysis, Bologna, June 2000); Limited Cognitive Resources: Inference and Reference (Center on Resource Adaptive Cognitive Processes, Saarbrcken, October 2000); IV Encontro Brasileiro Internacional de Cincia Cognitiva; Marlia, Brazil, December 2000); and Ciencia, Tecnologa y Bien Comn: La Actualidad de Leibniz (Universidad Politcnica de Valencia, Spain, March 2001). I thank the organizers as well as the participants who enlightened me with their comments and criticism. 1. Notice that my denition is substantially narrower than those attributed to this term by other researchers (e.g., Dautenhahn 2000). 2. It should be noticed that some of the expressions in these two lists of illustrations e.g., demonstration, persuasion, decision, etc. display the well-known process/product ambiguity. This is why they can belong both to the list of states and to that of processes. 3. See Zue (1999). 4. On the three last items, see for example the papers presented in Proceedings (2000), as well as those collected in Cassell et al. (Eds., 2000) and Dautenhahn (Ed., 2000). 5. I have proposed a distinction between demonstration and argumentation as preferred moves in dierent types of polemics in Dascal (1998a). 6. For a critique of the initial projects of mechanical translation, which pointed out the insuciency of linguistic theory to support them, see Bar-Hillel (1964, Chapters 1014). 7. Usually, in the rst case it is said that it is syntactically incomplete, while in the second it is said to be semantically incomplete. In both cases, however, semantics in the broad sense of correspondence between a symbolic system and the properties it purports to represent is involved. The formation rules in fact select a set of well-formed formulae or combinations of symbols according to some criterion of well-formedness that is supposed to correspond to some property (e.g., grammaticality in a linguistic system or propositionality in the propositional calculus), whereas the transformation rules select a set of derivation relations between formulae that is supposed to correspond to another property (e.g., meaning invariance in the standard model of generative grammar or validity in the propositional calculus). 8. See Anderson & Belnap (1975, pp. 403.). 9. Some passages in Turings paper may suggest that he took success in playing the imitation game (i.e., what I called the test) as an operational denition of intelligence, and thus from the point of view of behaviorism as equivalent to it, rather than a sign of it. See, for example, Block (1981) and Richardson (1982). Eli Dresner tried to persuade me that this is the case, but he concedes that Turing denitely does not describe himself as a behaviorist (personal communication). 10. Among them Rousseau and Adam Smith (cf. Dascal 1978 and Forthcoming). 11. For an analysis of this debate, see Dascal (1995), where several of the authors mentioned in this and the preceding paragraphs are discussed. Those interested particularly in Leibniz

Language as a cognitive technology

57

and Hobbes should consult Dascal (1998b and 1998c, respectively). On the implications of this debate for AI and current work in the philosophy of mind and of language, see Dascal (1992b, 1997a) and Dresner & Dascal (2001). 12. I coined the term psychopragmatics for the branch of pragmatics that deals not with the social uses of language such as communication (a task reserved for sociopragmatics) but with the mental uses of language. See Dascal (1983) and references therein. 13. For example: We thought a day and night of steady rain / was plenty, but its falling again, downright tireless / Much like words / But words dont fall exactly; they hang in there / In the heaven of language, immune to gravity / If not to time, entering your mind / From no direction, travelling no distance at all, / And with rainy persistence tease from the spread earth / So many wonderful scents (Robert Mezey, Words; quoted in Aitchison, 1994, p. v). The images employed in this poem capture several of the environmental properties of language described in Section 4.1. 14. In this respect, I am much more moderate than Winograd & Flores, who interpret Heideggers dictum as claiming that nothing exists except through language (Winograd and Flores, 1986, p. 68). 15. Watson later rejected this reductionist claim (see Watson & McDougall, 1928). 16. In fact, linguistic articulation goes well beyond this, since one can identify sub-phonemic features out of which phonemes are formed, as well as supra-lexical meaningful compounds such as idioms, whose meaning cannot be accounted for in terms of lexical-syntactic composition. 17. Grammar itself is a machine / Which, from innumerable sequences / selects the strings of words for intercourse / When the words have vanished, grammars left, / And its a machine / Meaning what? / A totally foreign language (Lars Gustafsson, The machines, quoted by Haberland, 1996, p. 92). 18. The philosopher Gilles Deleuze, who describes this kind of structure using the botanical model of the rhizome, rather than the now popular neural net model, has highlighted its centrality for understanding the multi-layered complexity of human thought and its expression. See Deleuze & Guattari (1976, 1980). 19. A striking example of the sheer linguistic diculty in overcoming this obstacle is exemplied by Alejo Carpentiers story Viaje a la semilla (in Carpentier, 1979, pp. 6394). The story moves backwards from a current event to the seed whence it derives. In spite of the authors ingenious eorts, however, it becomes apparent that it is virtually impossible to neutralize the temporal order embedded in various levels of linguistic structure. 20. For example, Smith pointed out one of these expressive devices used to circumvent the basic syntactic order in English (subject-verb-object), namely the anteposition of whatever is most interesting in the sentence (Smith 1983, p. 18), which is accounted for in modern syntactic theory in terms of an epicyclic rule. 21. For example, the maxims that govern conversation according to Grice. For further exploration, application, and theoretical grounding of these and other pragmatic rules and principles, see Dascal (2003). For a critique of the view that, since conversation is not ruled

58

Marcelo Dascal

by constitutive rules of a grammatical kind, it is not, properly speaking, a rule-governed phenomenon, see Dascal (1992a). 22. For discussion and interpretation of the matching bias phenomenon, see Evans (1982, pp. 140144) and Dascal (1987). 23. Leibniz devoted much thought, in his projects for an encyclopedia and its role in the art of discovery, to the cognitive role of a variety of types of indexing. See Dascal (2002). 24. On the notion of semantic relevance, see Achinstein (1968). On the diculty of establishing a clear distinction between dictionary and encyclopedia, see Peeters (2000) and Cabrera (2001). 25. Topoi, loci communes, or commonplaces occupied a central place in humanist education in the renaissance and the early modern period. Dozens of CommonplaceBooks were printed at the time, and students were required to write and use their own commonplace lists. Such a practice not only established shared forms of expression, but also shared conceptual tools, which thus constituted a background of mental structures guiding the thought and understanding of educated persons throughout Europe for at least two centuries. For a study of this linguistic-based cognitive resource, see Moss (1996). 26. Some of these resources have been put to use in computer applications. Chinese wordprocessors, taking advantage of the Chinese habit of systematically using proverbs (mainly four-character ones), propose to the writer possible proverbial continuations once the rst two characters of the proverb are typed. Attempts to simulate and exploit the dialogical resources of natural language for human-computer interfaces are now proliferating. The pioneer classic ELIZA employed a number of phrasal structures routinely occurring in nondirective psychotherapy in order to create the impression of a real dialogue between therapist and patient (Weizenbaum, 1966). The MUD robot-agent JULIA, like ELIZA, employs lists of common queries and a matching procedure in order to generate naturallooking conversation with users (cf. Foner, 2000). More recent rule-based systems of dialogue and conversation (e.g., Kreutel and Matheson, 2000; Webb, 2000) are no doubt much more sophisticated and useful tools than ELIZA, but they still remain excessively subordinated, in my opinion, to the rule-following model. 27. For a rhetorical analysis of the scientic paper and its evolution from the 17th century onwards, see Gross (1990) and Gross et al. (2002). 28. For a sample of research on metacognition, see Metcalfe & Shimamura (1994). For the relationship between metacognition and consciousness, see Newton (1995), and for its relationship with conversation, see Hayashi (1999). For a critique of the exaggerated emphasis on metacognitive abilities in education, see Roth (2004) 29. A striking example of the use of language for alleged scientic purposes is Scientology. This religious movement, based on the science of Dianetics, claims to provide its followers with a cognitive technology that allows them to achieve the status of Clears, essentially through linguistic manipulation. For an analysis of this phenomenon, see Mishori & Dascal (2000).

Language as a cognitive technology

59

30. Lavoisier, who was in this respect a follower of Condillac, viewed his new chemical nomenclature as having cognitive implications far beyond those of a mere terminological reform (cf. Bensaude-Vincent, 1993). 31. Zamir (2002) also proposes an epistemological account of how literature can express and eventually generate cognitive content that the resources of philosophical discourse are unable to capture. 32. See Astroh (1995), Barth (1992), Dascal (1997b, 1998a, 2000) and references therein.

References
Achinstein, P. (1968). Concepts of science: A philosophical analysis. Baltimore: The Johns Hopkins Press. Aitchison, J. (1994). Words in the mind (2nd ed.). Oxford: Blackwell. Alper, H. P. (1989). Introduction. In H. P. Alper (Ed.), pp. 114. Alper, H. P. (Ed.) (1989). Mantra. Albany: State University of New York Press. Anderson, A. R. & N. D. Belnap, Jr. (1975). Entailment: The logic of relevance and necessity, vol. 1. Princeton: Princeton University Press. Astroh, M. (1995). Sprachphilosophie und Rhetorik. In Dascal et al. (Eds.) (19925), pp. 16221643. Bar-Hillel, Y. (1964). Language and information. Reading, MA & Jerusalem: Addison-Wesley & Magnes Press. Barth, E. M. (1992). Dialogical approaches. In Dascal et al. (Eds.), pp. 663676. Bensaude-Vincent, B. (1993). Lavoisier: Mmoires dune rvolution. Paris: Flammarion. Black, M. (1963). Reasoning with loose concepts. Dialogue, 2, 112. Block, N. (1981). Psychologism and behaviorism. The Philosophical Review, 80, 543. Brinck, I. (2001). Attention and evolution of intentional communication. Pragmatics & Cognition 9(2), 255272. Cabrera, J. (2001). Words, worlds, words. Pragmatics & Cognition 9(2), 313327. Carpentier, A. (1979). Cuentos completos. Barcelona: Brughera. Cassell, J., J. Sullivan, S. Prevost & E. Churchill (Eds.) (2000). Embodied conversational agents. Cambridge, MA: The MIT Press. Dascal, M. (1978). Aporia and theoria: Rousseau on language and thought. Revue Internationale de Philosophie 124/125, 214237. Dascal, M. (1983). Pragmatics and the philosophy of mind, vol. 1: Thought in Language. Amsterdam: Benjamins. Dascal, M. (1987). Language and reasoning: Sorting out sociopragmatic and psychopragmatic factors. In B. W. Hamill, R. C. Jernigan & J. C. Bourdreaux (Eds.), The role of language in problem solving II, pp. 183197. Amsterdam: North Holland. Dascal, M. (1992a). On the pragmatic structure of conversation. In H. Parret and J. Verschueren (Eds.), (On) Searle on Conversation, pp. 3556. Amsterdam: Benjamins. Dascal, M. (1992b). Why does language matter to articial intelligence?. Minds and Machines 2, 145174.

60

Marcelo Dascal

Dascal, M. (1995). The dispute on the primacy of thinking or speaking. In Dascal et al. (Eds.), pp. 10241041. Dascal, M. (1997a). The language of thought and the games of language. In M. Astroh, D. Gerhardus, and G. Heinzman (Eds.), Dialogisches Handeln: Ein Festschrift fr Kuno Lorenz, pp. 183191. Heidelberg: Spektrum Akademischer Verlag. Dascal, M. (1997b). Critique without critics? Science in Context 10(1), 3962. mejrkov, J. Dascal, M. (1998a). Types of polemics and types of polemical moves. In S. C Homannov, O. Mllerov & J. Sve tl, Dialogue analysis VI, vol. 1, pp. 1533. Tbingen: Max Niemeyer. Dascal, M. (1998b). Language in the minds house. Leibniz Society Review 8, 124. Dascal, M. (1998c). O Desao de Hobbes. In L. Ribeiro dos Santos, P. M. S. Alves & A. Cardoso (Eds.), Descartes, Leibniz e a Modernidade, pp. 369398. Lisboa: Colibri. Dascal, M. (2000). Controversies and epistemology. In Tian Yu Cao (Ed.), Philosophy of science (= Vol. 10 of Proceedings of the Twentieth World Congress of Philosophy), pp. 159192. Philadelphia: Philosophers Index Inc. Dascal, M. (2002). Leibniz y las tecnologas cognitivas. In A. Andreu, J. Echeverra & C. Roldn (Eds.), Ciencia, tecnologa y el bien comn: La actualidad de Leibniz, pp. 159188. Valencia: Universidad Politcnica de Valencia. Dascal, M. (2003). Interpretation and Understanding. Amsterdam: Benjamins. Dascal, M. (Forthcoming). Adams Smiths theory of language. In K. Haakonssen (Ed.), The Cambridge Companion to Adam Smith. Cambridge: Cambridge University Press. Dascal, M., D. Gerhardus, K. Lorenz & G. Meggle (Eds.) (19925). Philosophy of Language A handbook of contemporary research, vols. 1 & 2. Berlin & New York: Walter de Gruyter. Dautenhahn, K. (2000). Living with intelligent agents: A cognitive technology view. In K. Dautenhahn (Ed.), Human cognition and social agent technology, pp. 415426. Amsterdam: Benjamins. De Gelder, B. (1981). Attributing mental states: A second look at mother-child interaction. In H. Parret, M. Sbis & J. Verschueren (Eds.), Possibilities and limitations of pragmatics, pp. 237250. Amsterdam: Benjamins. Deleuze, G. & F. Guattari (1976). Rhizome. Paris: Minuit. Deleuze, G. & F. Guattari (1980). Mille plateaux. Paris: Minuit. Dertouzos, M. L. (1999). The future of computing. Scientic American 281(2), 3639. Dresner, E. & M. Dascal (2001). Semantics, pragmatics, and the digital information age. Studies in Communication Sciences 1(2), 122. Dreyfus, H. (1971). What computers cant do. New York: Harper & Row. Dreyfus, H. (1992). What computers still cant do. Cambridge, MA: The MIT Press. Dror, I. E. & M. Dascal (1997). Can Wittgenstein help free the mind from rules? The philosophical foundations of connectionism. In D. M. Johnson & C. E. Erneling (Eds.), The future of the cognitive revolution, pp. 217226. New York: Oxford University Press. Evans, J. T. St. B.. (1982). The psychology of deductive reasoning. London: Routledge & Kegan Paul. Foner, L. (2000). Are we having fun yet? Using social agents in social domains. In K. Dautenhahn (Ed.), Human cognition and social agent technology, pp. 323348. Amsterdam: Benjamins. Geertz, C. (1973). The interpretation of cultures. New York: Basic Books.

Language as a cognitive technology

61

Gorayska, B. & J. Mey (Eds.) (1996). Cognitive technology: In search for a humane interface. Amsterdam: Elsevier. Gorayska, B. & J. Marsh (1996). Epistemic technology and relevance analysis: Rethinking cognitive technology. In Gorayska & Mey (Eds.), pp. 2739. Gross, A. G. (1990). The rhetoric of science. Cambridge, MA: Harvard University Press. Gross, A. G., J. E. Harmon & M. Reidy (2002). Communicating science: The scientic article from the 17th century to the present. Oxford: Oxford University Press. Gullvg, I. & A. Naess (1995). Vagueness and ambiguity. In Dascal et al. (Eds.), pp. 14071417. Haberland, H. (1996). And we shall be as machines or should machines be as us? On the modeling of matter and mind. In Gorayska & Mey (Eds.), pp. 8998. Hayashi, T. (1999). A metacognitive model of conversational planning. Pragmatics & Cognition 7(1), 93145. Hutchins, E. (1999). Cognitive artifacts. In R. A. Wilson & F. C. Keil (Eds.), The MIT encyclopedia of the cognitive sciences, pp. 126128. Cambridge, MA: The MIT Press. Kreutel, J. & C. Matheson (2000). Information states, obligations and intentional structure in dialogue modelling. In Proceedings, pp. 8086. Metcalfe, J. & A. P. Shimamura (Eds.) (1994). Metacognition: Knowing about knowing. Cambridge, MA: The MIT Press. Mishori, D. & M. Dascal (2000). Language change as a rhetorical strategy. In Harish Narang (Ed.), Semiotics of language, literature and cinema, pp. 5167. New Delhi: Books Plus. Moss, A. (1996). Printed commonplace-books and the structuring of renaissance thought. Oxford: Oxford University Press. Naes, A. (1966). Communication and argument: Elements of applied semantics. Oslo: Universitetsforlaget. Netz, R. (1999). Linguistic formulae as cognitive tools. Pragmatics & Cognition 7, 147176. Newton, N. (1995). Metacognition and consciousness. Pragmatics & Cognition 3(2), 285297. Peeters, B. (Ed.) (2000). The lexicon-encyclopedia interface. Amsterdam: Elsevier. Pacey, A. (2001). Meaning in technology. Cambridge, MA: The MIT Press. Proceedings (2000). Proceedings of the 3rd International Workshop on Human-Computer Conversation (Bellagio, July 2000). Richardson, R. (1982). Turing tests for intelligence: Ned Blocks defense of psychologism. Philosophical Studies 41, 421426. Roth, M. (2004). Theory and praxis of metacognition. Pragmatics and Cognition 12(1), 153168. Sillince, J. A. A. (1996). Would electronic argumentation improve your ability to express yourself?. In B. Gorayska & J. L. Mey (Eds.), pp. 375387. Smith, A. (1761). Considerations concerning the rst formation of languages and the dierent genius of original and compounded languages. In J. R. Lindgren (Ed.), The Early Works of Adam Smith, pp. 225251. New York: Augustust M. Kelley Publisher, 1967. Smith, A. (1983). Lectures on rhetoric and belles lettres, ed. J. C. Bryce & A. S. Skinner. Oxford: Clarendon Press. Tsur, R. & M. Benari (2001). Composition of place, experiential set, and the meditative poem. Pragmatics & Cognition 9(2), 201234.

62

Marcelo Dascal

Turing, A. M. (1950). Computing machinery and intelligence. Mind 59, 433460. Watson, J. B. & W. McDougall (1928). Battle of behaviorism: An exposition and an exposure. London: K. Paul, Trench, Trubner & Co. Webb, N. (2000). Rule-based dialogue management systems. In Proceedings, pp. 164169. Weizenbaum, J. (1966). ELIZA A computer program for the study of natural language communication between man and machine. CACM 9, 3645. Winograd, T. & F. Flores (1986). Understanding computers and cognition: A new foundation for design. Reading, MA: Addison-Wesley. Woodworth, R. S. & S. B. Sells (1935). An atmosphere eect in syllogistic reasoning. Journal of Experimental Psychology 18, 451460. Zadeh, L. A. (1975). Fuzzy logic and approximate reasoning. Synthese 30, 407428. Zamir, T. (2002a), An epistemological basis for linking philosophy and literature. Metaphilosophy, 33(3), 321336. Zamir, T. (2002b). Doing nothing. Mosaic 35(3), 167182. Zue, V. (1999). Talking to your computer. Scientic American 281(2), 4041.

Relevance, goal management and cognitive technology*


Roger Lindsay and Barbara Gorayska
Psychology Department, Oxford Brookes University / SPS, University of Cambridge

1.

Introduction

Understanding what is relevant is absolutely fundamental to every cognitive operation carried out by humans from low-level feature recognition, to highlevel problem solving. In Articial Intelligence (AI) work the importance of relevance is easily passed over. AI programs are written by human programmers in such a way as to ensure that all the cognitive resources needed by the program are available at exactly the time the program needs them. Indeed, most of the challenge involved in programming computers comes from having to anticipate and make available what is relevant at dierent stages of a processing cycle, and having to exclude information and operations that are irrelevant. The central contention underlying this chapter can be expressed as a positive and a negative thesis. The negative thesis is that the central role of relevance in cognition passes largely unacknowledged in cognitive neuroscience, despite the fact that neuroscientists are forced to employ or grapple with the concept at every turn. The positive thesis is that by according to relevance the central role that it should properly have in explaining cognition, it is possible to clear up a considerable number of issues and problems that presently seem mysterious in connection with problem-solving, ethics, symbol-connection hybridism and the motivation-action nexus. The rst researchers with an interest in cognitive science to realise that relevance is important were Sperber and Wilson (1986/1995). Their updated Relevance Theory (Sperber and Wilson, 2004) is briey summarized below (preserving, in as much as possible, the authors own terminology and style of expression).

64

Roger Lindsay and Barbara Gorayska

1.1 Sperber and Wilsons Theory of Relevance According to Sperber and Wilson (1986/1995) their Theory of Relevance (RT) is a cognitive psychological theory which aims to explain, in cognitively realistic terms, the mental processes people employ when they overtly communicate. It is set within, and further elaborates on, the Gricean framework of Inferential Pragmatics where intended communicative acts, which include, but are not limited to, natural language utterances, are comprehended in situational contexts rather than merely decoded from, or coded in, what is strictly being said or ostensively done. It attempts to provide an empirically plausible account of human cognition at the interface between intention, action, and the language-mediated, perceivable world. Any cognitive psychological theory relies on how the mind is understood at the time of its formulation. RT is no exception. When it was rst developed (Sperber and Wilson, 1986), it adopted the mental architecture proposed by Fodor (1983) which postulated that the mind comprised the largely undierentiated, central thought processor (reective reasoning mechanism) and a set of peripheral input-modules, or faculties, of which language module was one. Comprehending verbal and non-verbal behaviour intended to communicate was thus inferential (employing intuitive and spontaneous inference) but dissociated from the cognitive processes that related mental states to other forms of action. This dissociation, albeit weakened, is still in place in the current version of RT updated (Sperber and Wilson, 2004, summarized in this section) to better t the modern, highly modular view of the mind in the cognitive sciences. What is now proposed is a dedicated inferential comprehension module which, according to the authors, is comparable to an Intention Detector, or an Eye Direction Detector (Leslie 1991; Premack & Premack, 1995; Baron-Cohen, 1995). It has its own proprietary concepts and mechanisms, which do not have to be learnt but come as a substantial innate endowment. This module, they say, is a part of a more general module for processing motivated action, but comprises special-purpose inferential comprehension procedures (or heuristics) attuned to, and taking advantage of, the regularities in the communicative domain. (Note that, in principle, a substantial innate endowment does not rule out procedures and heuristics that have to be learnt. If so, the proposed module is a prime candidate for inclusion into the category of natural technologies discussed in Meenan & Lindsay (2002), El Ashegh & Lindsay and Bowman et al. (both this volume). Further empirical research is necessary to validate this point.) Exploring this possibility, of a separate

Relevance, goal management and cognitive technology

65

specialized comprehension sub-module, is worthwhile, Sperber and Wilson argue, because of the disparate nature of the phenomena involved: Firstly, the range of actions an agent can possibly intend in situational contexts is limited while the range of meanings a speaker can intend in any situation is virtually unlimited. Secondly, a single level of metarepresentation is generally sucient for attributing intentions to agents (regular mind-reading) while several layers of metarepresentations are typically necessary for inferential comprehension. It is therefore unclear, they say, how the communicators meaning (a communicative intention) could be inferred by the same standard procedures that attribute intention to actions or, by the same token, if this were so, how a child of two who failed on regular rst-order false belief tasks could recognize and understand the multi-levelled representations involved in verbal comprehension.

The inferential comprehension module


RT postulates that the search for relevance is basic to human cognition and people exploit it when communicating. Modications to world models happen within cognitive environments when assumptions, the basic building blocks of these models, became manifest (i.e., are available to conscious awareness). Overt communicative signals from speakers (or communicators) provide evidence of their intentions and create precise and predictable expectations of relevance sucient to guide the recipients (at whom communication is directed) towards the speakers meaning. Note a departure from Grice (1961 and compendium: 1989) in abandoning the Cooperative Principle, the maxims of Quality, Quantity, and Manner and the role of maxim violations. For details see Sperber and Wilson (1986/1995 and 2004). Upon receiving the communicative signals (a sight, a sound, an utterance or a memory), recipients access from memory available assumptions (background information or contexts) that, when connected with the input signals (and not otherwise), yield conclusions (contextual implications) that make a worthwhile dierence to their world models (by answering queries, improving what is known on a given topic, eliminating a doubt, conrming a suspicion, or correcting a mistake). To Sperber and Wilson relevance of a communicative input is a matter of degree; It is a function of cognitive costs and awards: the greater the positive cognitive eects and the lower the costs of mental processing expended to achieve them (mainly due to the relative salience of input stimuli), the more relevant the (preferred interpretation of the) input signal. For this reason, selected input stimuli are not just relevant but are more

66

Roger Lindsay and Barbara Gorayska

relevant that any alternative available, and are, hence, the outcome of making the most ecient use of available resources. Note that the corollary of this view is that in RT the purpose of communicating is narrowed down to, and the relevance of what is said, seen or remembered is sought in association with, a mere desire to improve ones understanding of the world, i.e., the world model, by either adding new assumptions to it or by strengthening or weakening the already entertained assumptions within it. (For further discussion, see Gorayska and Lindsay, 1993.) Framing the notion of degrees of relevance in comparative rather than quantitative terms, RT bypasses the problem of how the ratio of eect and eort is to be measured in real time in psychologically plausible ways. Note that computation itself is eort expending and not all cognitive factors, e.g., levels of attention, are measurable. (For detailed criticisms, see, e.g., Sperber and Wilson, 1987.) Eort and eect are treated as non-representational dimensions of mental processes and comparative judgments of relevance are presumed intuitive rather than absolute, numerical ones. Consequently, the First, or Cognitive, Principle of Relevance (a regularity specic to the communicative domain) claims that humans automatically maximize relevance (because of the way their cognitive mechanisms have evolved due to constant pressure for increased eciency). We automatically perceive relevant stimuli, activate in memory relevant assumptions, or spontaneously infer conclusions in the most productive way. Further, the degree of relevance that the audience settles for in comprehending communicative stimuli is optimal. In inferential communication ostensive stimuli are designed to attract attention. Communicators intend both to inform others of something and, at the same time, also to inform them of their desire to inform. Consequently, the Second, or Communicative, Principle of Relevance claims that all ostensive stimuli convey a presumption of their own optimal relevance. They are the most relevant that communicators can and want to produce. What this means to the audience in terms of judging eorts vis--vis eects is that the designed stimuli are at least relevant enough to be worth processing, and as easy as possible to understand. This leads straightforwardly to the Relevance-theoretic comprehension procedure (a fast and frugal heuristic) whereby the path of least eort is followed in automatic computing of cognitive eects. Linguistically-encoded word meanings provide clues to the communicators meaning. Inferential processes are employed both in deriving explicatures (completing decoded logical forms of utterances, i.e., conceptual representations of

Relevance, goal management and cognitive technology

67

what is said, that are fragmentary or incomplete due to linguistic indeterminacy: disambiguating, resolving reference, or lexical-pragmatic processes such as narrowing or loosening in gurative uses of language, etc.) as well as implicatures (conclusions drawn from the explicatures and the background information). Comprehension of what is being communicated is an on-line cognitive task of the recipient whose goal is to formulate and conrm plausible hypotheses about the communicators meaning. In a highly parallel manner three subtasks are executed: (1) constructing appropriate hypotheses about the explicit content of utterances (explicatures), (2) constructing appropriate hypotheses about the intended contextual assumptions (implicated premises), and (3) constructing appropriate hypotheses about the intended contextual implications (implicated conclusions). Upon the evidence provided, interpretive hypotheses about the communicators meaning are tested in order of accessibility via mutual adjustment of context, content and cognitive eects. The process terminates when the rst plausible hypothesis is entertained, which is then considered most plausible and therefore most relevant in the context. Sperber and Wilson show that outcomes of the inference comprehension procedures (or heuristics), hence the operations of the mechanism in comprehending communication, are empirically testable: For example, predictable variations can be witnessed in deriving the speakers intended meaning or her deception due to degrees of sophistication in (meta)representation capacity of the interpreter (noticeable in child development). The operations of the mechanism can also serve to explain why in selection tasks such as the Wason task, responses of the subjects can be seen as the output (of deriving optimal relevance) according to the linguistic evidence (clues) provided: people choose options that are supported by the dierent situational contexts (that manipulate the eort and eect factors) made explicitly available to them (Sperber, Cara and Girottto, 1995). (See also Evans (1982) and Dascal (1987) who show that people also choose options explicitly named in utterances.) 1.2 Limitations of Sperber and Wilsons Theory of Relevance Though Sperber and Wilson deserve a great deal of credit for their vision in coming to appreciate the importance of the relevance construct, their vision, at least as far as it is so far realised in print remains seriously limited. For example, according to Sperber and Wilson, relevance relationships only exist between propositions, and hence relevance is fundamentally a relationship between symbols or symbol strings. By contrast, we will argue below that relevance is the

68

Roger Lindsay and Barbara Gorayska

key concept underlying all forms of cognitive processing, non-verbal as well as propositional, connectionist as well as symbolic, not just one of a number of important concepts in one or another sub-domain of cognition. Further, as indicated above, Sperber and Wilson (2004) suggest that the special-purpose inferential comprehension procedures (or heuristics) underlying the linguistic relevance relations with which they almost exclusively deal in their published work are a part of a more general module for processing motivated action. However, there is almost nothing in their work that oers insight into how actions are planned, or how motivation impinges upon this process. The theory is, even being charitable, loosely coupled to mechanisms such as Working Memory and the Episodic Buer (Baddeley, 2001) or Norman and Shallices SAS (Norman and Shallice, 1986). Last but not least, Sperber and Wilson clearly intend their theory to be interpreted as a theory of cognitive processing that is in some sense supposed to be implemented in the brains of human agents. Again however, there are few hints in their work that identify the actual neuropsychological mechanisms that are responsible for processing relevance information. Our intention in the sections below is to describe a theory of relevance that recties these omissions: our theory seeks to explain how relevance connects with motivation. Because relevance is a key variable in goal management, the theory tries to link relevance processing with the constructs of psychology by suggesting that relevance plays a fundamental part in problem-solving and action planning. Finally, the theory claims that relevance is neuropsychologically grounded because it is the mechanism by which associative processing in neural networks is converted into hypothesis testing in symbolically represented problem spaces. The rst step in delineating our theory is to explain how the concept of relevance is intrinsically bound up with the process of goal management. This will be followed by a discussion of the cognitive function of ethics in managing goals. The paper will end with some consideration of how the proposed theory of relevance processing and goal management can be put to technological use.

2. The ontogenesis of relevance The conceptual intimacy of the link between relevance and goal management derives from the fundamental fact that relevance is a goal-dependent predicate; That is to say, whether something can be accurately or meaningfully described as relevant, depends upon the prior specication of a goal (Gorayska

Relevance, goal management and cognitive technology

69

and Lindsay, 1989a, b; Gorayska and Lindsay, 1993; Lindsay, 1996a). Lindsay, Gorayska, and Cox (1994) have reported evidence which suggests that subjects can reliably match plan elements to goals, and while they can readily formulate eective plans to achieve specied goals using relevant plan components generated by other subjects, they are quite unable to do so when the plan components provided for them are not relevant. Gorayska and Lindsay (1989a, b, 1993, 1995) and Lindsay and Gorayska (1995) have oered a formal denition of relevance that attempts to capture this goal-dependent character: P is relevant to G i G is a goal, and P is an essential element of some plan that is sucient to achieve G. Several computer-based problem-solving systems have been developed which employ relevance (dened as above) as a central theoretical construct (Gorayska et al., 1992; Gorayska and Tse, 1993; Tse, 1994; Ip, Gorayska and Lok, 1994; Gorayska, Tse and Kwok, 1997; Zhang, 1993, Zhang, Nealon, and Lindsay, 1993; Johnson, 1997). It would seem that the idea has some practical utility for supporting AI systems which have at least a limited capability for reasoning and dialogue. As important as the evidence that planning systems based upon the processing of relevance relations are sucient to generate goal-oriented action plans, however, is the fact that a relevance-based theory can also supply a solution to a problem that at present fatally aicts symbol-based AI models of reasoning and problem-solving. AI systems for planning and reasoning almost all operate within a set of assumptions developed by Newell and Simon (Newell and Simon, 1972; Newell, 1990; Vera and Simon, 1993). This framework is often called the Symbolic Search Space Paradigm (SSSP) approach (Partridge, 1991). According to SSSP, a problem consists of a set of givens (objects or events), a set of operations, and a set of goals. Application of every allowable operation in every possible order to the givens, generates the state space of a problem. Any sequence of allowable operations is a plan. Solution of a problem requires the identication of a sequence of operations that can be applied to the given state of a problem so as to transform it into the goal state. Problem-solving may be dicult because such solution paths are sparsely distributed within the state space, and because a solver has no direct access to the state space but must construct its own symbolic representation of it. A symbolic representation of the state space of a problem which is constructed with the aim of locating an eective plan for solving the problem is called a problem space. A problem space may dier from the state space of a problem by over- or under-inclusion of objects or operations.

70

Roger Lindsay and Barbara Gorayska

A central diculty for AI research, and for theories of human problemsolving is the question of how problem spaces are constructed. Once the givens and operations are known, generating an eective plan is more-or-less mechanical. For some relatively formal problems such as chess playing, the issue is trivial: the problem space must include a representation of the chessboard, the pieces, and the allowable moves. In most cases, however, construction of the problem space is by far the most challenging aspect of problem-solving in AI. In practice, this diculty has been handled by hand-crafting the problem space, that is to say, by using human intuitions to decide what objects and operations are to be used. This tactic bootstraps over the problem, but leaves in its wake the worry that the models that employ it are little more than wellintentioned fakes, appearing successful only because they are being fed a preprocessed version of dicult problems that disguises their limitations. The SSSP framework is immensely powerful and has been successfully applied in AI models of perception, robotics, reasoning in formal and natural language and many types of learning. However, it is clear that for genuinely creative problemsolving to occur, particularly with problems which are incompletely or informally dened, it is essential that a better understanding is achieved of problemspace construction. If human beings can make reliable judgements concerning the relevance of plan elements to goals, it seems possible that problem spaces might be constructed on the basis of such judgements: rst objects and operations relevant to some goal are retrieved, then standard problem-solving in the sense of trying to locate an eective plan to transform givens to goal may proceed. There is however, a serious obstacle to this proposal: the analysis of relevance oered above denes the relevance of an element in terms of whether or not it is a component of an eective plan. This information cannot be available prior to the existence of the plan, so how can relevance information form part of the input to the planning process? The analysis of problem-solving and relevance processing oered so far is assumed to occur entirely within SSSP systems that manipulate symbolic representations. One possible way in which the circularity identied above may be broken, is for relevance information to originate outside the symbol processing system. How plausible is this suggestion in the case of human information processing? We wish to argue that both ontogenetically and psychogenetically, the case is a strong one. Though it is inappropriate to argue the case in detail here, it seems plausible that human infants are capable of learning, and have considerable problem-solving competence before they have symbolic planning capabilities

Relevance, goal management and cognitive technology

71

(Cohen and Strauss, 1979; Gottlieb and Krasnegor, 1985). Not only is there a strong a-priori case that such presymbolic learning is essential, as symbols must be interpreted before they can be used, and the process of symbol interpretation must require learning, but there is an accumulating body of evidence that the cognitive competences of human infants in early life can be more successfully explained by non-symbolic connectionist models than by models sharing SSSP assumptions (Elman, Bates, Johnson, Karmilo-Smith, Parisi and Plunkett, 1996; Plunkett, McLeod and Rolls, 1998). Similarly, there is mounting evidence that adults can show evidence of learning through improved performance without being able to symbolically represent or articulate their knowledge in some domains (Mack and Rock, 1998) or prior to the development of explicit access to relevant knowledge in others (Reber, 1993). It is obvious that learning requires the acquisition of relevance information. If it is true that learning can precede symbolic planning, then it must be true that the acquisition of relevance information can precede symbolic planning. The claim that relevance information may be a precursor of symbol processing raises an immediate further question: what pre-symbolic mechanisms could make relevance information available? A plausible answer is that two distinct processing systems might exist: a higher-level system running on interpreted (or grounded, Harnad, 1990; but see also Kay, 2001) symbols, the other a subsymbolic system that does not itself use symbols, but that generates outputs capable of satisfying the requirements of its symbol-based partner. The view that human beings have access to two dierent processing systems has arisen in a variety of forms. Perhaps the best known is the controlled versus automatic processing distinction of Shirin and Schneider (Schneider and Shirin, 1977; Shirin and Schneider, 1977). This distinction is now treated as a core element in two-process theories of action control such as Norman and Shallices Supervisory Attentional System (Norman and Shallice, 1986). In visual perception it is now generally accepted that there are two quite separate cortical pathways involved in processing sensory signals: the ventral route, terminating in the temporal lobe, is believed to process the kind of what information originating from the fovea of the eye and carries out detailed featural analysis of static objects. The dorsal route terminates in the parietal lobe and is thought to processes where information originating from peripheral vision, and concerned with controlling movement and directing eye movements (Mishkin, Ungerleider and Macko, 1983; Baizer, Ungerleider and Desimone, 1991; Boussaoud, di Pellegrino and Wise, 1996). In memory, it is now common wisdom that a conscious declarative memory system underlies performance in recognition and

72

Roger Lindsay and Barbara Gorayska

recall whilst a quite separate non-declarative or procedural system is responsible for implicit memory phenomena in priming and associative tasks (Squire, 1992). In learning, there is widespread acceptance that distinct learning systems encode very dierent sorts of information, one system inducing rules while a second system memorises instances (Shanks and St. John, 1993). Shanks and St. John suggest that the system which memorises instances is based on connectionist principles, and the learning exhibited is responsible for the phenomenon known as implicit learning (Cleeremans, 1993), which is less likely to be available for verbal report. In contrast [they claim] connectionist models do not t well with our understanding of the explicit hypothesis testing also found in the grammar learning literature. The separate operation of the two systems subserving language is perhaps most readily evident to human agents, who have full awareness of the propositional content and semantic aspects of speech but are entirely unconscious of grammatical operations and the processes underlying speech perception and production. Broca had begun to assemble evidence for the neurological independence of these two subsystems shortly after the middle of the nineteenth century (Broca, 1861). It is unlikely to be a coincidence that within each of the major sub-domains of cognition: action, visual perception, memory, learning and language, there seem to be two distinct systems operating in parallel. In each case, one system requires attention at input, content is explicit and directly available to conscious report processes, and real-time responses are generally fairly slow. The other system operates with unattended or incidentally processed material, is implicit, unavailable to consciousness and generally supports fast nonverbal responses. It does not seem to involve straying much beyond the available evidence to suggest that this duplex architecture is a general structural feature of human cognition, and that in every case the characteristics of the unconscious, nonsymbolic member of the pair seem to resemble those associated with connectionist systems, whilst the conscious and explicit processes correspond more closely with what might be expected in systems conforming to the assumptions of the SSSP. Smolensky (1988), among others, has also proposed that human cognition is likely to be subserved by connectionist systems whose operations are integrated with symbol processing mechanisms. The evidence seems to strongly indicate that a type of cognitive processing exists that is non-symbolic, largely or entirely implicit, and possibly mediated by connectionist mechanisms. This sub-symbolic system seems to support cognitive operations in many tasks when they are carried out by very young children. In adults sub-symbolic processing seems to occur when attention is

Relevance, goal management and cognitive technology

73

not paid, or is not available (Mack and Rock, 1998). Could sub-symbolic processing of this type yield relevance information? There is every reason to believe that it could. A connectionist system seeks to maximise the probability of certain classes of feedback by varying the probability of outputs as a function of input. In eect, a connectionist system seeks to converge upon a set of input/ output contingencies that do not require it to make any further adjustments to its internal parameters. Without knowing anything about the internal processes of such a device, and merely by treating it as a black box, a second system monitoring the behaviour of a connectionist learning system could infer relevance in the following way: An input I is relevant to a goal G when I causes some variation in output which is associated with a change in the value of feedback Roughly, this principle says that if a second system is monitoring a connectionist system during learning, the monitoring system can infer that an input is relevant to the goal state sought by the connectionist system when the output of the latter changes as a result of receiving that input. The validity of the inference arises because the change in output must be an eect of feedback being used to modify internal parameters, and hence of progress towards the goal. Johnson (1997) has demonstrated by building a working AI model, that using this associative relevance principle, a higher-order symbolic planning system can successfully capture and use relevance information. It is important, and not without interest, to note that the goal of a connectionist system is usually not represented within the system itself, as it would have to be within an SSSP problem-solver. For example, consider a connectionist system designed to analyse satellite data on CO2 emissions from earth-surface industrial installations. The goal of the system might be to generate one output when emissions exceed an acceptability threshold, and another when they do not. The problem is complicated by the fact that atmospheric variation interacts with source variation, and the system only has access to data collected from beyond the atmosphere. In training the system, the standard against which output accuracy will be judged is independently measured emission levels at the earths surface. The design objective (one kind of goal) is to identify as unacceptable all and only those installations that exceed threshold values at ground level. This abstract specication of the system goal will be found nowhere within the system itself. Even the operationalised proxy for the designer goal used in training, a list of acceptable and unacceptable installations in terms of surface measurements, cannot be an essential part of the system, as the system is intended to correctly

74

Roger Lindsay and Barbara Gorayska

classify installations for which the true value is unknown. For connectionist problem-solvers then, the system goal can, in general, only be inferred from the behaviour of the system. When such problem-solvers are artefacts designed by humans, the system developer can tell us what the design objective was, what the training criterion was and so forth, but even then the criterion might have been ill-chosen (and so fail to match the design objective), or the system may not have been trained on all contingencies falling under the criterion (so that there are behaviour/criterion mismatches in some situations). In either eventuality, the result will be that in some circumstances the observed goal (implicit in system behaviour) will dier from the intended goal (as described by the system originator). In the case of organic connectionist systems that have developed via evolution rather than an intentional design process, there are no design objectives to consult. Extraction of the system goal will require an analysis of those interactions between the structure and behaviour of the organism within which the system is embedded and the environmental constraints under which it behaves that aect the probability of survival and reproduction. Fortunately, the practical unavailability of system goals need not impede inferences about relevance. If, following a particular input-output pairing, a feedback-driven modication of system parameters occurs, then the input triggering the change must be relevant to the system goal, whatever that goal may be. This may seem a long way from the more familiar controlled versus automatic processing distinction which has now become a familiar landmark in the skills literature. The claim is that controlled processing is slow, serial. conscious, error-prone and characterises the early stages of skill acquisition. Automatic processing is fast, parallel, accurate, unconscious, and occurs when a skill is thoroughly mastered. However, if there are two systems, there seems no good reason why the parallel version should be inoperative during the early stages of learning, and if both systems operate throughout the course of learning, the high error rate during the early stages may as well result from poorer quality decisions by the parallel system as from the serial system. Let us propose an alternative scenario. Two learning systems operate in tandem throughout the course of learning. The symbolic planning system is serial, learns by hypothesis testing, which is usually all-or-none, executes relatively slowly, and depends upon relevance information from its connectionist fellow. The connectionist system is parallel, learns slowly and usually incrementally, but executes quickly. The time course of learning could be considerably compressed if relevance information from the connectionist system was used by the symbolic planning system to construct an eective plan,

Relevance, goal management and cognitive technology

75

which in turn was fed back to the connectionist system, allowing more rapid convergence on a set of network weights which approximated to optimum performance. This architecture may also seem more satisfactory from a functional perspective than a dual component system in which rst one, and then the other component does nothing, and the relationship between the two is largely unknown. The proposed architecture has a further advantage: a puzzling feature of relevance is that it seems to have both a subjective and an objective characterisation; this can readily be accounted for by assuming that, while a connectionist learning system seeks to exploit objective relevance information signalled by changing relationships between input, output, and feedback, symbolic plans are always hypothetically related to the external world and are thus subjective in the sense that they may incorrectly represent relationships underlying the regularities they seek to capture. What we have attempted to establish to date is that it is possible that relevance information could become available to a symbolic planning system from a connectionist learning system which operates in parallel with it, and that this suggestion is compatible with views of human information processing which are widespread in the current literature. Our argument goes beyond this however: we want to suggest that relevance is an essential theoretical construct which underpins all symbolic planning processes. An analogy might be made with familiarity in the domain of memory. The case for the utility of familiarity as a construct depends in part upon the demonstration that it has explanatory value, but the case for arguing that familiarity is a real dimension of information processing in memory is bolstered by evidence that the familiarity assignment mechanism can malfunction, for example in cases of dj vu, or in the moments immediately preceding an epileptic seizure (Peneld and Roberts, 1959). Is there any evidence that the relevance assignment process can function inappropriately in a similar way? Part of our answer is that inappropriate relevance assignment is a common human experience which is often responsible for problem-solving failure. (See further Lindsay, Gorayska and Cox, 1994.) A second part is that inappropriate relevance assignment is a common feature of cognitive disorder in conditions such as schizophrenia: for example Meehl (1962) has discussed cases of what he calls cognitive slippage which exhibit precisely the features to be expected as a result of dysfunctional assignment of relevance. Similarly Cutting (1985) cites a wealth of cases of delusional thinking such as the following:
A young man felt that the whole of London was uncertain and strange. He could only be sure of the date if he bought a newspaper that had been printed

76

Roger Lindsay and Barbara Gorayska

outside London, and only sure of the year if he visited a well known beauty spot to the south of London where there was a fallen tree whose age he knew by the number of rings on the trunk. (Cutting 1985, p. 319)

It seems hard to deny that cases such as this involve a failure to make appropriate relevance judgements. It is not that a pathology of relevance does not exist; rather, disorders of relevance processing have been attributed to other causes. Together with the demonstrations described elsewhere (Gorayska et al., 1992; Tse, 1994; Ip, Gorayska and Lok, 1994; Gorayska, Tse and Kwok, 1997; Zhang, 1993; Zhang et al., 1993; Johnson, 1997) that relevance can be used to support learning and problem-solving in AI systems, it would seem that the considerations reviewed here establish at least a prima facie case for regarding relevance as a neglected, but important, component of symbolic planning processes. The eect of irrelevant information on problem solving received considerable attention from researchers in the 1930s. Woodworth and Schlosberg (1954) conclude their classic text on experimental Psychology with a discussion of the atmosphere eect which includes a review of some of this research. It is reported that in studies of syllogistic reasoning Woodworth and Sells (1935) hypothesised that the global impression or atmosphere of the premises was an important factor in erroneous reasoning (Woodworth and Schlosberg 1954, p. 846). Woodworth and Schlosberg also note that the atmosphere eect is not conned to syllogisms. In speaking or writing you are likely to make the verb agree with the single or plural atmosphere of the subject phrase instead of with the grammatical subject, as in the examples:
The laboratory equipment in these situations were in many instances essentially the same as those used before. Is trial and error blind or not? (Woodworth and Schlosberg 1954, p. 847)

Whilst the phrase atmosphere eect is a useful and evocative label, it does nothing to explain the phenomena that Woodworth and Schlosberg discuss. An explanation may however lie in inappropriate relevance attribution which in turn induces subjects to set up problem spaces incorrectly. This phenomenon arises not only spontaneously in syllogistic reasoning and syntactic concordance, but is sometimes engineered by conjurers and experts in legerdemain, as well as by school children in constructing riddles. A well-known example of the latter category is the riddle which proceeds by reminding a dupe that there is an English word meaning a humorous saying or story, which is spelt J-O-K-E and is pronounced joke, and that there is also an English word meaning of, or to do with the common people, which is spelt F-O-L-K and pronounced folk.The

Relevance, goal management and cognitive technology

77

victim is then asked how to pronounce the word for the white of an egg. Few give the correct answer: A-L-B-U-M-E-N. The initial information concerning spelling is in fact irrelevant, but serves to induce the respondent to believe that the required answer rhymes with joke and folk. Victims of this riddle presumably establish a problem space which contains some plan such as: nd a word which refers to part of an egg and which rhymes with joke. Pronounce the word which is found. For the riddle to work, respondents must also fail to fully process and check all of the information in the riddle question. This failure to exhaustively check information has also reported in the case of semantic illusion sentences such as: How many animals of each type did Moses take into the ark?. Respondents commonly reply two to this question, even after repeating the question aloud, and when fully aware that it was Noah, not Moses, who was the protagonist in the biblical ood story (Erickson and Mattson, 1981; Reder and Kusbit, 1991). The egg-white example uses redundant preliminary information to establish a fallacious presumption of relevance. There is another type of riddle sentence which, unlike the Moses illusion but like the egg-white trick, uses irrelevant supplementary information to induce a false answer to a perfectly straightforward question. The paradigm form of the riddle sentence is illustrated by the following example: I am going to name a colour and ask you a question. Answer the question as quickly as you can: White. What do cows drink? Most subjects who have not encountered the problem before erroneously respond milk to this query, even though they are fully aware that cows drink water. It is easy to demonstrate that the incorrect milk response can equally well be induced by presenting a sheet of white paper immediately before the query, instead of using the word white. The colour-query problem provides a useful way of demonstrating the diculty that people can have in correctly assigning relevance to materials associated with a problem. In the cases cited by Woodworth and Schlosberg (1954), it is easy to dismiss the atmosphere eect as a relatively trivial failure to process local syntactic cues concerned with number or negative/armative information. The fact that non-linguistic visual input can also misdirect the verbal question-answering process seems to support the much more general theoretical claim that the locus of the atmosphere eect is a modality neutral problem space in which symbolic planning

78

Roger Lindsay and Barbara Gorayska

processes operate over elements rightly or wrongly classied as relevant to some goal. Deceptive colour-query sentences are not intrinsically deceptive, but when they are preceded by a colour cue, an incorrect answer to the query is primed by the cue and is in many cases articulated instead of the correct answer. We suggest that this occurs because the colour cue is wrongly identied as relevant, and, as a result, is included within the problem space. At least the colour query illusion demonstrates that errors in cognitive processing occur because of failures in relevance processing: if such errors can be triggered by playground riddles, it seems likely that the same phenomenon occurs in less contrived situations. The fact that cognitive processing is also aected when the query sentence is linguistic but the misleading cue is not, may be interpreted as at least weakly supporting the suggestion that the locus of the eect is a modality independent problem space established to support the symbolic planning of responses.

3. The origin and function of goals Goals are symbolic representations of states of the world, or of a planning system itself, which are the target of planning processes. Planning processes are attempts to sequence symbolic representations of actions and objects in a manner that allows a goal to be achieved. Planning processes are applied to models of the world and goals, to produce goal-plan pairings which are believed to be sucient to shift the world from its current state to the target state when the plan is implemented. Goals are always and necessarily abstract and symbolic, though they usually stand for, or represent, states that are not. Goals arise from two dierent sources: a. Cognitive Goals Most goals are part of complex goal chains, and can perhaps more properly be classied as sub-goals. A goal is cognitive if its achievement contributes to the construction or execution of a higher order plan. Any fully-specied goal must be associated with goal satisfaction conditions (GSCs). GSCs are conditions which an agent believes that the world will satisfy when it is in the goal state. For example, in a chess game the goal is to win, the GSCs are the conditions for believing that checkmate has been achieved. The GSCs for a cognitive goal are derived from the requirements of the higher order plan to which its achievement would contribute. Its justication (the answer to a question such as: Why are you doing x?) is entirely in terms of the grounds for believing that the contribution it makes to a higher order plan is essential or ecient, and that the

Relevance, goal management and cognitive technology

79

higher order plan will be eective in achieving the higher order goal with which it is associated. b. Terminal Goals The top goal of a complex goal chain does not contribute to a higher order plan. It is therefore non-cognitive. We call a non-cognitive top-goal of this sort a terminal goal. The justication of a terminal goal is exclusively in terms of the desirability of the state brought into existence by the achievement of that goal. All cognitive goals derive their justication ultimately from the terminal goal at the head of the chain of which they form a part. How are GSCs for terminal goals specied? The specication cannot derive from the requirements of a higher order plan, because no higher order plan exists. The question where do terminal goals come from? is equivalent in the human case to the question: how does cognition interface with motivation? It has been claimed already that goals are symbolic representations. The relationship between any symbolic expression and the world is hypothetical; that is to say, the symbolic expression is one of a set of possible models of the world. A terminal goal must therefore be the consequence of a hypothesis concerning the relation between some possible state of the world that does not currently exist and the motivational system of the cogniser. How could a system that generates such hypotheses develop, and what would its function be? It seems likely that motivation can control behaviour in the absence of intermediary symbolic processes. Human neonates vary their behaviour as a function of motivational states, as do non-human organisms. In many cases this occurs when there is no evidence of a capacity to manipulate symbols. The lack of a capacity for symbolic representation is not however demonstrable, as it requires the proof of a negative existential proposition. For present purposes, the absence of a developed capacity for symbolic representation in most nonhumans and in human newborns will be assumed on grounds of parsimony. Motivation can be conceptualised as the result of a set of subsystems which produce an output in proportion to an increase or a reduction in the value of some system variable, such as the concentration of substances in the bloodstream, or tension in a sphincter muscle. Let us assume that motivational changes of this kind are without eect until they reach some threshold value. A system which is controlled by motivation without cognitive mediation will behave in one of two ways: either the eect of the output of a motivational subsystem reaching threshold will be to nonspecically increase the intensity and variability of behaviour, or when some xed action-pattern (Tinbergen, 1953) is available, this will be executed as soon as the appropriate releasers (ibid.)

80

Roger Lindsay and Barbara Gorayska

are detected. The problems and limitations associated with such systems are obvious and severe: survival depends upon evolution having anticipated all important needs of the organism and provided appropriate releasers when and where they are required; energy will be wasted on inappropriate behaviour in contexts where consummation is not possible; motivational eects will interfere with one another; prioritisation and deferred gratication are impossible. No doubt all manner of inhibitory links between motivational subsystems could be developed, but the eect of this will be to make increasingly complex bets about the ecological context of behaviour. For example, an organism which suppresses motivational impulses to mate in favour of vigorous action until food is ingested, might fail to eat and to reproduce in dicult times. In the absence of symbolic control, the somewhat bleak picture presented above as well describes a human being as any other organism. If human neonates lack the capacity for symbolic control of behaviour then the features described might be expected to characterise the behaviour of human neonates. Indeed it does seem to be true that some motivational subsystems in newborns operate to switch on xed action patterns such as those involved in excretion, whereas others increase the variability and intensity of behaviour until a carer intervenes with a diagnosis of the motivational source of the problem, and the oer of whatever is necessary to alleviate it. The point of importance here is that the organism need not necessarily know (indeed in the absence of a symbolic capacity, cannot know, as knowledge is propositional, and therefore symbolic) either what is the motivational cause of its extreme behaviour, or what is required to prevent the cause from operating. The human carer for a newborn acts as an external cognitive support system, proposing hypotheses about underlying motivational states and testing them by changing the infants state in various ways. When an organism does not have a carer, it must simply allow itself to be carried along on the current of motivationally driven behaviour until the appropriate context for consummation is encountered. Evolution must ensure a close match between motivation, behaviour, and what exists in the environment to be encountered, if the organism is to survive. If an organism is capable of symbolic representation, then it can learn to internalise the operations carried out by a carer in the case of a human infant. An orphan creature surviving alone could similarly relate a symbolic representation of its motivational condition to a representation of the state of the world that has previously changed that condition in a desirable way. In both cases the same sequence of events must occur: a. Symbolic representation of the motivational condition

Relevance, goal management and cognitive technology

81

b. Symbolic representation of a world state in which condition (a) does not exist c. Derivation from (a) and (b) a criterion by which the achievement of the appropriate world state can be identied d. Symbolic representation of an action sequence which can cause the world state in (b) to exist. The predisposing motivational context provides the activation conditions for the goal: some aspect of these activation conditions is taken as criterial for triggering the whole goal structure, just as some satisfaction condition is taken as criterial for achievement of the goal. The criterial activation condition might be a feeling, a set of sensations, or a type of motor activity. The symbolic representation of actions capable of causing the world to enter the state which changes the motivational condition is a plan, and (a)(d) above taken together, dene a goal-plan pair. In the AI and problem solving-literature, all of the objects, or givens, which are required to solve a planning problem, are usually available. For instance, appropriate pieces and a chessboard are provided when a planner is required to solve a chess problem. In real-world planning a problem-solver does not usually have all required objects to hand. In these circumstances planning processes must specify what objects or material are required to achieve a particular goal, as well as the operations to be performed. It is undesirable for plan implementation to proceed until all of the objects which the plan requires are available. A plan will therefore be associated with implementation conditions; a plan will not normally be executed until these implementation conditions are satised. Goals are the cognitive representations of self-diagnosed needs, and wants, together with prescriptions for the states of the world which it is believed will satisfy them. The diagnosis of their own motivational state which individuals arrive at may well be wrong, and it is always possible that a third person who is more experienced or insightful, can provide a more accurate diagnosis. It seems likely that in the case of human beings, diagnosis of motivation is never perfect or complete, though no doubt it continues to improve throughout life. Failure to correctly detect and represent a source of motivation will result in behaviour which does not correspond to any cognitive goal, and which is not under cognitive control. Behaviour of this kind is often said to be the result of unconscious motivation. There is an important distinction to be made between positive and negative goals. The distinction is important because the two types of goal can have quite

82

Roger Lindsay and Barbara Gorayska

dierent relationships to planning processes. It is hypothesised that wants and needs are unavailable to consciousness unless and until they are symbolically represented as goals. Planning processes are intrinsically and necessarily cognitive and symbolic: plans represent sequences of operations which have not yet occurred, and which if implemented, will result in states of the world which do not yet exist The point of symbolically representing needs, wants, threats, and aversions, as goals is to enable planning processes to be brought to bear upon them. Positive goals represent states of the world that a plan is intended to achieve. Negative goals represent constraints on plans intended to achieve positive goals. The childrens game Snakes and Ladders provides a helpful analogy planning seeks to achieve the positive goal ladders, while avoiding the negative goal snakes. The constraints imposed upon planning processes by the necessity to avoid negative goals, do not necessarily make planning processes more complex or dicult. One of several major problems for planning systems is the combinatorial explosion problem. This problem exists because the number of plans to be generated and evaluated rises exponentially with the number of plan elements. With only a small number of elements, the number of possible plans can easily exceed the processing capacity of any realistic system. Negative goals prune planning trees, that is, they reduce the number of plans which need to be considered. In this way, planning may actually be made easier by the existence of negative goals. It is unlikely that negative goals alone suce to eliminate the combinatorial explosion problem. A similar device can, however, be used to make any planning problem tractable: this is the introduction of further quite arbitrary constraints on planning, which have no other function but to limit the class of possible plans. It seems at least possible that this is the functional explanation for the existence of subjective preferences in human beings. By denition, subjective preferences have no other functional justication, and it is hard to understand how a propensity for subjective preferences could evolve without some functional value. If some preferences need to exist to enable planning to occur, but their value can be assigned quite arbitrarily, then it becomes comprehensible that biology should dictate that people have preferences, but individuals should be free to decide what they are. If this suggestion is correct, it would seem to follow the existence of subjective preferences in a species is an indication that the species is capable of symbolic planning. This analysis of terminal goals and their relation to motivation implies that there are two distinct, and possibly even competing, control systems for human behaviour. The initial control system (ICS) is non-symbolic, unconscious, and

Relevance, goal management and cognitive technology

83

inecient in terms of minimising the energy expended to satisfy a given want or need. The goal management system (GMS) is symbolic, conscious, and capable of achieving high levels of eciency. The problem for evolution is how to pass control from ICS to GMS when the symbolic representations required by GMS cannot be specied in advance of experience. We suggest that in eect, a neonate is equipped with an on-board computer, which seeks to symbolically represent motivational conditions, states of the world that will modify them, and plans that will produce those states of the world. The symbolic planning system is a second-order system creating symbolic representations which capture regularities in the behaviour of the rst order system, which operates under the autonomous control of the ICS. As appropriate symbolic representations are established, so the occurrence of motivational activation conditions for goals already represented will result in control passing to the on board computer which can utilise symbolic information in memory to modify those motivational conditions more eciently. It is hypothesised that consciousness is intimately tied to planning processes, but that once formulated plans may be executed automatically: the activation conditions for a goal can initiate an action sequence directly without any need for further conscious planning processes to occur. There is a clear resemblance here to distinctions such as those between automatic and controlled processing (Schirin and Schneider, 1977) and supervised versus schema-controlled action (Norman and Shallice, 1986). Conscious symbolic planning thus has a managerial role in the regulation of behaviour: identifying problems, generating solutions, and delegating the implementation of eective plans to lower level agencies whenever possible. The set of goals identied and available at a particular point in the developmental history of the system constitutes the systems model of its own motivations. There is no reason why this model should ever be complete or accurate. Consequently individuals might be expected to misidentify some sources of motivation, and to fail to identify others, the result of these failures will be the frustration, conict, and control of behaviour by forces of which they are unaware, which we observe in other people, all the time. Oddly, there seems to be some resemblance between the processes assumed in this theoretical analysis and those of Freudian Psychoanalysis. For example, transfer of control from ICS to GMS has some features in common with the Freudian idea that where Id is, there Ego shall be; and the possibility of continued control by ICS where motivated behaviour has not been symbolically captured by GMS might be expected to result in behaviour the origins of which are unavailable to consciousness.

84

Roger Lindsay and Barbara Gorayska

Some of the advantages resulting from the shift from ICS to GMS and establishing cognitive goals are summarised in Figure 1 below. Evidence for the employment of goal-driven categorisation and an extended discussion of its utility is provided by Barsalou (1991). The importance of cognitive goals in controlling dynamic aspects of behaviour (Figure 1, point 8), is discussed, for example by Carver and Scheier (1982), and Hyland (1987, 1988). Oatley and Jenkins (1992) have also explicitly linked cognitive goals to emotion: Emotions have a biological basis. The most plausible current hypothesis is that they function (a) within individuals in the control of goal priorities, and (b) between people to communicate intentions and set outline structures for interaction (Oatley and Jenkins, 1992, p. 78). In the account we oered earlier of the origin of relevance information, we proposed that a symbolic planning system runs in tandem with a connectionist learning system and establishes relevance relationships by seeking systematic relationships between input to the connectionist system, and variations in output and feedback. In our account of goals, we have proposed that a symbolic planning system establishes symbolically represented goals by seeking systematic relationships between motivational conditions and those states of the world that change them. We have noted already (see pp. 6878 above) that this theoretical claim is consistent with evidence that in many cognitive domains two distinct processing systems exist. A clear example comes from work on the blindsight phenomenon, whereby visual information can apparently continue to guide forced-choice motor responses, even though, as a result of cortical damage, such visual informationship is not available to verbal report or to guide planning processes (Weiskrantz, 1988).

Benets of Cognitive Goals 1. 2. 3. 4. 5. 6. 7. 8. Generation and testing symbolically represented hypotheses Goal management, including use of goals as plan elements, and goal prioritisation Plan optimisation Voluntary control of behaviour Relation of current goals to symbolically represented material in memory External control of behaviour via symbolic input Enabling goal-driven categorisation Provision of reference criteria for dynamic aspects of behaviour

Figure 1. A summary of some of the benets conferred by establishing cognitive goals.

Relevance, goal management and cognitive technology

85

4. A goal management system On grounds both of parsimony and theoretical coherence, it would seem reasonable to suggest that there is a single goal management system which utilises (and may lose contact with), perceptual information, motivational information, and relevance information as part of an integrated process of establishing goals, diagnosing when it is appropriate to seek them, and formulating plans for their achievement. There is a good deal of plausibility in the general notion that the symbolic representational system which mediates understanding of the world is unied and second-order. It is not essential for motivation, perception, or action how could it be without symbols having innate meaning for newborns can be motivated, can perceive, and initiate action. The symbolic planning system can however oer the possibility of rational planning and systematic management. It can hugely enhance the eciency of learning by making available hypothesis testing as an additional problem-solving procedure. But most importantly, it can create a whole new world of possibilities through communication and cooperation. The existence of a goal management system implies the necessity for planning processes that use goal representations as their components. This in turn implies the existence of metagoals and heuristics for establishing and manipulating goals. Whilst we do not intend to discuss these processes in detail in the present paper, some examples of the kind of heuristics we have in mind are presented in Figure 2 below. Many of these heuristics involve plans that incorporate other agents. Goal management inevitably forms a bridge between individual cognition and social processes. We next turn to ethical reasoning, an aspect of cognition with respect to which social behaviour is brought centre stage. Though there have been some instructive explorations of the development of ethical values (Kohlberg, 1981), Cognitive Psychology and Cognitive Science have almost entirely neglected ethical problems and ethical decision-making. This neglect is unfortunate, as it is dicult to see how the cognitive regulation of human action, understanding of the actions of others, and comprehension of action related discourse can dispense with ethical notions. Ethical language and concepts constitute such a large proportion of human discourse that it is almost inconceivable that such terms and concepts are devoid of cognitive signicance.

86

Roger Lindsay and Barbara Gorayska

Metagoals and their Implications 1. Bring as much behaviour as possible under cognitive control identify and symbolically represent as many needs and wants as possible. 2. Plan so as to maximise the value of positive goals achieved. Three factors will contribute to decisions here: the value of each goal in the set about which a decision is to be made; the number of goals which can be attempted simultaneously or during the interval over which the decision process ranges; the expectancy that goals will be achieved if an attempt is made to achieve them. 3. Plan so as to minimise the number of negative goals experienced. Negative goals will include the experience of failure. This metagoal will therefore entail decisions concerning the competence of the system itself, the probable eectiveness of plan proposals, and the level of diculty of tasks to be attempted. A systems estimation of its own competence is hypothesised to be equivalent to the construct labelled self-esteem in studies of human judgement and performance. 4. Monitor achievement and switch goals if progress is unsatisfactory. This metagoal can be related to emotion via the role of control theory in goal-setting. (Hyland, 1988) 5. Plan to optimise parameters such as time and cost (eort, and derivatives such as money, materials, etc.). 6. Store frequently used plans. 7. Optimise planning and representation processes. This metagoal will require the relationship between plans and states of the world to be represented in memory as simply as possible. 8. Enlist the cooperation of other agents whenever this is advantageous. Other agents may assist not only in developing and executing plans, but also in, for example developing symbolic representations for the ecient use of memory. 7 and 8 together suggest that science may be a socially externalised institution for the discovery of ecient data compression algorithms. This would explain why parsimony features as such an important constraint on scientic theorising; something which has hitherto proved resistant to explanation. 9. Increase the capability of the system for planning and goal representation. GMS must have learning goals, as well as performance goals 10. Learn from other agents under appropriate conditions. The potential gains from utilising the experience of others are clear. There are however many important questions to be asked about the circumstances under which a cognitive system can reasonably decide to modify its permanent memory on the authority of another. Clearly such attributes of the authority as credibility, consistency, honesty, integrity, character, etc., are relevant. In our own society, free reign is usually given to professional teachers to modify and shape the cognitive system of children. Quality control of the modication process is handed over to external agencies. 11. Try to maximise the probability that other agents can be relied upon. This will entail minimally, choosing friends and teachers carefully, but less directly, advocacy of honesty, discouraging deception, insistence that promises be kept, etc. 10 and 11 together make a case for the pragmatic utility of ethics in the management of cognitive processes. This case is further explored below. 12. Teach other agents if this might enhance their power to cooperate or to assist others to do so.

Figure 2. Examples of metagoals and heuristics used in the cognitive management of goals.

Relevance, goal management and cognitive technology

87

5. Goal management and the cognitive function of ethics The function of ethical language has been largely ignored, both by Linguistics and Cognitive Psychology. In a recent study by one of the authors (RL), in which undergraduates were tape-recorded while discussing the design for a laboratory practical, no less than 45% of sentences contained some element which could loosely be classied as ethical, such as ought, should, good, etc. We do not wish to make a great deal of this, and accept that dierent samples of people on dierent days or with dierent topics might yield dierent data. But the nding does serve to underscore what is probably obvious: ethical language is a ubiquitous feature of the everyday cognitive environment. Philosophy is probably much to blame for the lack of attention to the cognitive function of ethics. There is an apocryphal story of an Oxford graduate who was asked whether he had learned anything which had endured from his study of the philosophy of ethics. He replied that he had learned never to use the word good as it was clear to him that no-one knew what it meant. Ethics in general has become almost inextricably mired in metaphysics, and this has served to distract attention from its cognitive function. This is in spite of the fact that a number of relatively recent ethical philosophers (Urmson, 1950; Hare, 1952; Hampshire, 1960; Gauthier 1963) have oered theories of ethical language which emphasise its role in making and inuencing decisions, and in planning actions. It seems likely that the major reasons for the failure of Linguistics and Cognitive Psychology to take up and apply these theoretical analyses are: a bias towards individualism, the absence of any satisfactory theoretical framework which can oer an account of how ethical principles can be related to cognitive mechanisms, and how ethical language can be used by one person to inuence the planning processes of another. The theory outlined in this paper attempts to overcome individualist bias, and to explain how ethical language and cognitive mechanisms are interrelated. To begin quite generally, we believe, along with many other recent commentators, that syntactic and semantic approaches to language have proved incapable of providing any real insight into language mechanisms. A more promising approach is to seek to analyse language processes at the level of pragmatics. On this assumption an articulated sentence is an implemented plan intended to inuence the behaviour of its audience. The locus of its intended eect is the symbolic planning processes of the recipient of the sentence. On the basis of arguments presented earlier, this means that a

88

Roger Lindsay and Barbara Gorayska

sentence may have its impact upon the recipients goals, or plans to achieve a goal (holding objects and operations constant), or upon the objects and operations employed in planning. In the GMS framework, sentence comprehension is the process by which the hearers of a sentence recover the intended eect of that sentence upon their own planning processes; utterance comprehension is the process by which hearers seek to reconstruct the plan which led a speaker to utter a particular sentence. These may dier: a speaker may interpret the sentence Fire! to mean that the goal of vacating a building should be adopted, but the plan leading to its utterance may be to steal the hearers briefcase. This approach has been growing in popularity over recent years, both in Linguistics (Levy, 1979) and in Articial intelligence (Bruce, 1975; Allen and Perrault, 1978; Cohen and Perrault, 1979; Appelt, 1985; Carberry, 1990). Its main attraction is that it moves beyond analyses of language which are little more than impoverished paraphrases in some logical formalism; it embeds language processing in more general cognitive operations, and in action planning in particular; and it oers some solution to the problem of indirect speech, which has proved quite intractable within syntactic and semantic approaches to language. To illustrate this last problem: it has not proved possible to nd convincing syntactic or semantic grounds for treating Buy me a drink, My glass is empty again, I wonder if the barman will cash a cheque?, and Would you like another? as linguistically equivalent. It is much less dicult to see how each could be uttered with a view to producing an equivalent eect on the planning processes of an unobservant companion. We have proposed elsewhere (Gorayska and Lindsay, 1989a, b; Lindsay, 1996b), that the processing of goals, plans, and plan elements are constrained by social factors as well as by the characteristics of the physical world. Other social agents may impose or enjoin the adoption of particular goals, or the use of prescribed plans or plan elements to achieve a goal which has been freely selected. In our culture imperatives are used to signal an attempt to impose constraints upon symbolic planning processes, while ethical terms are used to enjoin an agent to adopt a goal, a plan, or a plan element. To give one or two examples: you should try to get a job is to enjoin a goal; drive on the left is an imposed plan; you ought to take a bottle of wine to the party is an enjoined plan element. A probable objection to this claim is that at best it can only apply to prudential and not to genuinely moral reasoning the distinction which Kant (1953) sought to capture by contrasting hypothetical with categorical imperatives. Our answer has three components: rstly metaphysics has hampered

Relevance, goal management and cognitive technology

89

understanding of ethical language, not helped it; there is no respectable independent justication for the notion of objective moral ends. Secondly, there is no convincing case for the claim that words like ought are ambiguous: It is surely more plausible to argue that ought is not multiply ambiguous, that dierent equivalents correspond not to dierent uses of ought, but only to the diering grounds underlying the judgement (Gauthier 1963, p. 21). Thirdly, the use of ought and related words for the purpose of enjoinment are best construed as signals that the narrow interests of the agent are not the only interests at stake: the prudential concerns of social groups of which the agent is a member are also relevant, and these concerns may be presented as if they are objective constraints on an agents choice of action. There now exists a well-established AI technology concerned with problemsolving, decision-making and action planning, which is known as expert systems research. The basic premise upon which this research is founded is that intelligent decisions and actions require the application of logical operations to knowledge. Expert systems thus consist essentially of an inference engine which controls inference, and a knowledge base which contains appropriately represented knowledge. Knowledge consists of facts, rules, and heuristics, which are elicited from human experts. Though humans are indubitably more complex than current expert systems, these distinctions are often helpful in considering human cognition. In the present context, the question suggested by analogy with expert systems is whether ethical knowledge which constrains human action planning can be captured as facts, rules and heuristics? Our view is that not only is it true that ethical knowledge can be accommodated within this framework, but that some traditional ethical problems cease to be troublesome when so conceptualised. Propositions such as x is good are clear candidates for ethical facticity, closely resembling as they do such naturalistic facts as grass is green. Similarly do not steal has both the form and the substance of a rule. Perhaps the most interesting of the three categories, however, is that of ethical heuristics. Heuristics are decision-making principles which can protably be employed when it is too costly to compute and evaluate every possible cognitive option. An important feature of a heuristic is that though in general it increases the probability of a positive outcome, there is no guarantee that this is so. For example, in chess playing, where the number of alternative moves rapidly exceeds exhaustive computability, such heuristics as dominate the centre squares and protect your queen are often used. No matter how diligently such heuristics are employed, their use will not ensure victory. Plausible candidates for heuristic status in the area of ethical knowledge

90

Roger Lindsay and Barbara Gorayska

are principles such as killing is wrong and promises should be kept. Much philosophical energy has been expended on demonstrating that universal application of these latter principles results in ethical paradoxes when, for example, killing is morally justiable, or promise-keeping results in manifest harm. These paradoxes cease to puzzle when the principles which give rise to them are interpreted as heuristics, when invoked by an agent in action planning, which are bound to yield negative outcomes in some cases; and as enjoinment devices when invoked by a third party, which function to constrain the actionplanning of others, not to capture ethical absolutes in propositional form. It is widely acknowledged that limitations on processing time and capacity make it inevitable that human beings use heuristics in planning and decisionmaking (Kahneman and Tversky, 1982; Kahneman, Slovic, and Tversky, 1982). Most studies to date which have presented empirical evidence demonstrating the eects of heuristic use, have focused on the processes of memory search and retrieval. There is a strong case for arguing that eects of heuristic use might be expected to manifest themselves most strongly in the domain of action planning, particularly where agents must allow for interactions between their own behaviour and that of others. Certainly, limitations on computability will drastically restrict decision-making in this area. The problem of planning actions which aect, and are aected by other agents, is precisely the domain to which ethical principles apply, and it is therefore tempting to suggest that much of the sophisticated wrangling over the universality of ethical principles in which philosophers have engaged is best interpreted as dramatic evidence of the operation of cognitive heuristics. A compelling recent example is Barons (Baron, 1994) discussion of nonconsequentialism. Baron argues that while rational agents should seek to maximise the extent to which they achieve their goals, their are classes of case in which people demonstrably fail to do so, for example in laboratory studies, subjects consistently award third-party compensation on the basis of the cause of an injury rather than the benet of the compensation, they ignore deterrent eects in decisions about punishment, and they resist coercive reforms that they judge to be benecial (Baron, 1994, p.1). He infers that in such cases defective decision rules are being employed which should be identied and corrected. No-one would similarly be tempted to suggest that an expert system which produced suboptimal decisions in some cases was malfunctioning: this is exactly what would be expected of a heuristicbased decision-making system. Similarly, if consequentialist principles function as heuristics, then the reported cases of tolerated suboptimality are entirely to be expected.

Relevance, goal management and cognitive technology

91

If actions aect other agents, and actions are partially determined by knowledge, then the mechanisms of enjoinment and imposition might be expected to apply even to knowledge that is not intimately connected with action planning, as ethical knowledge obviously is. We believe that these processes do operate on knowledge which is relatively naturalistic. To give an example: in Western society, university science is one of the main institutionalised vehicles of knowledge manufacture. As the process of knowledge renement is endless, those involved in the renement process are aware that that knowledge is always provisional and revisable. In teaching and disseminating the results however, belief in currently accepted theory is enjoined. It is that which students and the public must believe and act upon, if their actions are to have the best chance of success. The consequence is the frequently noted paradoxical tendency of apologists for science to claim both that current theory is demonstrated truth, and that current theory will inevitably be modied and revised in the future. There are two implications of the claim that the mechanisms of imposition and enjoinment are the central devices by which a group seeks to generate consistency and coherence in the beliefs and actions of its members. One is that imposition and enjoinment are important determinants of between-group cognitive dierences. The other is that what is enjoined and imposed within a group is crucial to an understanding of the cognitive processes of group members. Explanation of cognitive dierences between cultures and sub-cultural groups has not in general been an area in which Psychology has been conspicuously successful. Early approaches assigned a constitutional origin to any observed dierences; later, linguistic dierences were invoked, for example by Whorf (1956) for cross-cultural dierences and by Bernstein (1964, 1971), for dierences between subcultures. More recently attention has turned to ecological variables. Irvine and Berry (1988), for example, propose an ecological interpretation of the Law of Cultural dierentiation which asserts that all populations have the same perceptual and cognitive processes and the same potential for cognitive and perceptual development, but ecological and cultural factors prescribe what shall be learned and at what age; consequently dierent cultural environments lead to the development of dierent patterns of ability (Ferguson, 1956, p. 121) (Kagitcibasi and Berry, 1989, p. 498). The mechanisms by which dierences in ability arise are to say the least, underspecied. The theoretical approach developed here and elsewhere attempts to be much more specic about what these mechanisms are.

92

Roger Lindsay and Barbara Gorayska

One important factor is what we have called the fabricated world eect (see Gorayska and Lindsay, 1989a, b and further discussion in Section 6 below) that can massively amplify intrinsic ecological dierences. For example, instead of investing time in remembering a complex and arbitrary environment, people may instead choose to construct cities laid out according to simple algorithms: the world is adapted to the limitations of memory rather than vice versa. Similarly buildings are frequently organised around goal satisfaction, with specic rooms to eat, sleep and cook, or specic buildings to drink, dance, or borrow a book. In this way, commonly sought goals in a culture become realised as physical structures, and goal-oriented planning is conditioned by the physical possibilities that exist. A second way in which cultural dierences in cognition can be induced, is by the use of imposition and enjoinment to constrain knowledge and planning processes. Initially the locus of imposed and enjoined constraints is the symbolic planning system, but as we have suggested earlier, once successful plans are formulated they are likely to be downloaded to an automatic, non-symbolic, action generation system, which means that the constraints initially imposed via ethical and imperative forms of language eventually become unconscious and habitual. If this account is correct, then human planning activities are, at least in part, shaped and controlled by enjoinment, through the use of ethical language. As particular ethical exhortations are usually, if not necessarily, derived from more general ethical values and principles, it should follow that the goals, plans, and actions that arise as a result of the enjoinment process are systematically related to the ethical values and principles that underlie them. This in turn suggests that ethical values and principles can be used by outside observers to explain regularities in the behaviour of members of a group within which those values are current, and that they can be employed by group members themselves, to assist in understanding the behaviour of their fellows. Some of the ways in which ethical principles and maxims can contribute to the cognitive processes underlying action planning are summarised in Figure 3 below. Ethical values and principles operate to reduce the cognitive demands of goal selection and planning, to enable cooperative goal-seeking, and to facilitate the construction of mental models of other agents, derived from their actions, which permit cognitive recovery of their goals. The cognitive benets of ethical constraints on reducing the number of possible plans to be evaluated in actionplanning contexts are closely analogous to the eects of subjective preference, discussed earlier (see p. 82). It is now widely accepted that AI models that seek to understand human text about the social and physical world require access to

Relevance, goal management and cognitive technology

93

Cognitive Functions of Ethical Values 1. 2. 3. 4. 5. 6. 7. Facilitating choice of goals Reducing the range of acceptable plans and plan elements to be considered Conveying plans not validatable within a single lifetime experience. Eg.: rules for a good life Providing a framework for cooperative goal-seeking Enabling socialisation via enjoinment Oering a source of principles to explain the behaviour of other agents Allowing the credibility assessment of agents attempting to enjoin knowledge

Figure 3. Some cognitive functions of ethical values.

a rich set of physical principles and social conventions. An example is given by Charniak (1972): Jane was invited to Jacks birthday party. She wondered if he would like a kite. She went to her room and shook her piggy bank. It made no sound. Charniak argues that a text comprehension system would need to know at least, that: Guests are expected to take gifts to birthday parties. A kite is a suitable gift for a child. Purchasing a gift requires money. Money is often kept in piggy banks, particularly by children. Piggy banks containing coinage emit sound when shaken.

In the absence of ethical principles, the actions, including dialogue, of articial cognitive systems cannot be comprehensible as human action is; nor without access to ethical values as a source of hypotheses can articial systems understand the behaviour of humans, learn from them, or cooperate with them. If machines are to be treated as agents, or even to understand the behaviour of agents, they must rst be given ethical values. It might be admitted that ethical values and principles have some connection with action and action planning, and yet be denied that this confers any particular plausibility on the model we have proposed which assigns a fundamental role to the processing of relevance information. Is there any good reason to assert that ethical reasoning and relevance processing are closely connected? We believe that there is. If ethical knowledge is represented in memory as abstract rules and general principles and heuristics, much of ethical reasoning must be concerned with determining which principles and heuristics are

94

Roger Lindsay and Barbara Gorayska

relevant to particular contexts in which action is required. If we are correct, then the notion of relevance must play a central, though unacknowledged part in ethical debate. In support of our contention, we quote from two philosophical texts concerned with ethics, these texts were chosen simply because they were readily available in the circumstances under which are writing: any of a wide range of others would no doubt serve as well. Singers (1963) book is concerned with the problems of generalisation associated with the Kantian categorical imperative, crudely, the claim that an agent ought to carry out only those acts which they are prepared to will as universal law, that is to advocate that any agent ought to carry out under similar circumstances. We take Singers book as an example because it is centrally concerned with those judgements generally regarded as uniquely ethical. Singer notes that there are serious problems involved in deciding who is similar to an agent making an ethical judgement, and when their circumstances are similar, and when not. He argues that while some similarities, and dierences, can always be specied, not all of them will be relevant ones. The generalisation principle must be understood in the sense that what is right for one person must be right for every relevantly similar person in relevantly similar circumstances (Singer, 1963, p. 19, authors italics). In discussing specic moral principles, Singer declares: though they hold in all circumstances in the sense that violations of them are always wrong, they are not relevant in all circumstances. On the contrary they are relevant only where the corresponding rule is relevant. The principle that it is always wrong to kill for the sake of killing is not relevant where killing is not involved (ibid. p. 109). By contrast with Singer, Gauthier (1963) denies that there are uniquely categorical (or ethical) judgements. He believes that all ethical reasoning is practical, that is rooted in the specic context in which action is necessary. His dependence on the notion of relevance is however, no less: a person need not consider all wants in determining what to do. Only a minute fraction of all wants have any possible relevance to the situation in which he is to act, or to what he may do. Indeed, most of the wants in each persons practical basis never do nor can enter into his reasoning, although it is impossible to provide criteria to determine, in advance, just which wants may possess practical relevance (Gauthier, 1963: 86). The quotations provided above only illustrate the point we have argued, they cannot prove its truth. Nonetheless, the considerations to which we have drawn attention add up to a powerful case for reconsidering the role of ethics in cognition. We have argued that ethical knowledge plays an important part in

Relevance, goal management and cognitive technology

95

action planning; that ethical language is used to shape the cognitive processes of members of social and cultural groups; that ethical values and principles are the main dimensions along which social actions are interpreted and understood; that articially intelligent systems must act upon and have access to ethical values and principles if they are to participate in dialogue with humans and act in a manner which is considered meaningful; and nally, that because of the part played by ethical knowledge in action planning, relevance information is central to the use of ethical knowledge and though never acknowledged, has consistently been invoked as a prominent element of ethical discussion.

6. The fabricated world hypothesis This hypothesis was rst proposed by Gorayska and Lindsay (1989a, b). The name assigned to the hypothesis was intended to reveal its debt to the earlier carpentered world hypothesis of Segall, Campbell and Herskovits (1966). In the 1950s and 60s a large body of evidence was accumulated indicating that preliterate, rural tribespeople were less susceptible than urban dwellers to a range of visual illusions. Campbell, Segal and Herskovits suggested that this might be because constant exposure to regular rectilinear structures in the carpentered world of modern cities predisposed the perceptual systems of city folk to interpret two dimensional arrays of straight lines as geometrically regular gures in 3-dimensional space. People from a rural environment in which rectilinearity is almost never encountered were less likely to use the same visual processing heuristic, and hence experienced the illusion to a lesser extent. Lindsay and Gorayska suggested that cognitive eects created or amplied by the environment were not conned to visual perception. Built environments, or fabricated worlds, have very dierent information processing properties from natural environments. The human cognitive system is only able to control motor vehicles travelling at scores or hundreds of miles per hour because highways aord an articially predictable context in which relevant change is relatively rare and usually signalled well in advance, for example by road signs. Millions of people a year nd their way through unfamiliar airports, often without assistance, because of the informational structure of airport organisation. Strangers can navigate abroad, because engineers have constructed an environment that embodies informational structures already in the head of the traveller. If we know what people know, we can design an environment that ts their preexisting knowledge structures like a glove. Novel environments can be mastered

96

Roger Lindsay and Barbara Gorayska

without new learning because they have been engineered to t old learning. The idea that the human cognitive system is not a stand-alone device, but part of a larger unit which incorporates elements of the external environment (the parts we have learned about) has arisen in several contexts. Andy Clark (2001) uses the term wideware to capture the fact that memory is outside in the world as well as in the head. Taking the same idea, that human cognition must be viewed as part of a larger whole that incorporates the learning environment: ORegan and Ne (2002) have sought to explain unexpected perceptual phenomena such as change blindness. This term is used to refer to the discovery that for example, when a visual image is modied even as it is being inspected, if the introduction of the change is masked by a icker, the change to the image even though perceptually gross, is often not noticed. Similarly, people frequently fail to notice when a receptionist interacting with them bends below a desk, and is replaced by a dierent person who continues seamlessly with the interaction. ORegan and Ne argue that change blindness occurs, because people do not have access to a direct representation of the perceptual environment in memory, that can be compared with current input to detect change. But it seems that we do: phenomenology gives us a false impression of the nature of our own consciousness. Instead of a cognitive representation of external reality inside our heads, ORegan and Ne suggest that the world is its own memory perceptual memory doesnt need to remember everything, only the motor coordinates of where information not currently relevant can be found if needed. An AI system built on similar assumptions, that directly uses relevance information to control what parts of a visual array are seen was described by Ip, Gorayska and Lok (1994). ORegan and Ne however, draw much more far-reaching implications from the idea that the environment is embedded in human cognitions: we suggest that the basic thing people do when they see is that they exercise mastery of the sensorimotor contingencies governing visual exploration. Thus, visual sensation and visual perception are dierent aspects of a persons skillful exploratory activity. The existence of sensory modalities can readily be explained by this approach: the dierence between seeing and hearing is to be explained in terms of the dierent things that we do when we see and hear, and there are implications for understanding consciousness itself: way of thinking about the neural bases of perception and action is needed that does not rest on the false assumption that the brain is the seat of consciousness. These recent suggestions that mind and the environment are an integrated whole seem to considerably increase the salience of a relevance-based theory of

Relevance, goal management and cognitive technology

97

cognition. Relevance-based processing provides a mechanism that is sucient to explain how the mind-environment duplex operates: action planning confers relevance, because only the set of perceptual features that can form part of action plans currently being developed or executed is available within the symbolic problem space. Associative relationships developed within the connectionist component of the cognitive system provide at least the initial set of features to which the problem space has access. This arrangement will naturally cause the development of cognitive modules and sensory modalities as a collateral outcome. When two problem space generators use disjoint operator sets they are functionally independent processing systems: cognitive goals constrain the set of plans that will work; the set of plans that will work constrains the set of plan elements that can be used in implementation. When operator sets are disjoint, relevance discontinuity exists between the two systems: operators used by one problem-space generator are never relevant to the other. Whilst many theorists (e.g., Fodor, 1983, and most present-day neuropsychologists) have proposed that the mind is modular in organisation, it has proved impossible to locate a neurophysiological basis for this organisation. Damage to the brain can sometimes cause specic decits in language, perception or memory, but damage to identical parts of the brain in other patients does not invariably lead to similar neuropsychological impairments. Modules seem to exist in the mind, but not in the brain. A GMS processing relevance information appears capable of showing how this can occur. The Fabricated World Hypothesis goes beyond the bare assertion of the environmental embodiment of mind, to draw attention to the technological possibilities of manipulating mind by engineering the environment.

7. The cognitive technology of goal management Let us suppose that the theory of relevance processing and goal management that we have described is true. What are the pragmatic implications? How can this knowledge be put to technological use? a. Controlling problem spaces via connectionist learning and modifying connectionist models via problem spaces The theory of cognitive goal management via relevance relationships proposes that the unconscious and automatic control of actions is handled by a connectionist learning system that provides the base inputs to a conscious symbolic hypothesis testing module that operates by developing explicit propositional

98

Roger Lindsay and Barbara Gorayska

models of the behaviour resulting from lower-level connectionist control processes. A symbol-based cognitive system that is sucient to learn by imitating the behaviour of others, must also be sucient to learn by symbolically modelling the systems own behaviour. However, the set of operators available to the symbolic processor is restricted to the set of operators designated as relevant by connectionist learning processes. The more powerful hypothesis testing paradigm available within the symbol-based learning system may require the use of operators not designated as relevant by connectionist processes, and hence cognitively unavailable, or available only in propositional form as intellectual beliefs that seem contrary to intuitions or unconnected with how a person acts. One practical implication of the goal-management theory is that action-planning at the symbolic level can be facilitated, and actions can be brought into line with beliefs by providing learning experiences that add new operators into the problem space. For example, the beliefs developed by theoretical physicists about the relationship between matter and gravity are symbolically coherent, but are usually found to be counter-intuitive and cognitively complex, presumably because they can only be handled by a serial operation working-memory with restricted processing capacity. However, the mass-gravity relationship can be modelled by using a rubber membrane distorted by objects of varying mass that are placed upon its surface. Experimenting with a model of this kind is directly training the connectionist actioncontrol system to operate with an extended set of action possibilities. But the utility of this particular training procedure could only be established by working through the higher-level symbolic system it could not have been arrived at directly by observing the eect of action contingencies. In a way, there is nothing new here: every teacher knows that if intellectually abstract and complex relationships can be experienced as physically instantiated by some model, learning is facilitated. What we are providing is an explanation of why this occurs, and an insight into the mechanisms underlying the process so that these processes become easier to manipulate in a principled way. b. Using shared learning to develop cognitively ecient, fabricated environments (eliminating relevance discontinuities) It is old news that operating environments can be made user-friendly by ensuring that they meet user requirements. It probably follows from Norman and Shallices (1986) model of action control that user requirements have more to do with habitual patterns of action controlled by the unconscious implementation of schemas in the contention scheduling process, than with consciously controlled behaviour. These familiar principles are surely given a new spin

Relevance, goal management and cognitive technology

99

when we suspect that the reason schema-driven action is unconscious is that it results from connectionist processes operating in a biologically primitive action control system. Further, modication of this system cannot occur directly through top down inuences from the symbolic processor because connectionist systems cannot be symbolically programmed. Changing someones beliefs about eective actions in a given domain will only change their behaviour when slow serial voluntary control is exercised, or when as a result of changed beliefs, an agent systematically practices new stimulus response contingencies to bring actions and beliefs into line. Think about a concert pianist learning a new nger movement: knowing how the action is to be executed at the level of belief, may provide the incentive for practice and the feedback by which practice is guided, but the belief is no substitute for the practice. Now it becomes quite clear that if what we mean by meeting user needs is enabling the direct transfer of schema-controlled actions, we cannot establish user needs by interviewing users, or by asking them to complete questionnaires. Cognitive tools built on the basis of such data, generated by what are probably the most commonly used methodologies in interface engineering, should match performer beliefs, but this provides no assurance that they will support skilled action. The new model suggests that we should rst seek discrepancies between the two control systems, then, when discrepancies exist, decide which system it is most important that a new tool interfaces with. Ramachandran (1996) has demonstrated how cognitive control of sensory feedback from (phantom) limbs can be established by modulating visual input (using mirrors to fool the connectionist system into acting as if an amputated limb is still present), when it is completely impossible to aect it via a persons beliefs. Marcel (1992) asked GY, a muchstudied individual with a right-sided hemianopia and blindsight to respond in 3 ways to a 200ms bright light in his blind eld. The 3 response modes were: blink right eye, forenger button-press, and say yes. Marcels results showed the fastest responses for blinking, then for button pressing with vocalisation being slowest. GY commonly dissociated, e.g., said yes with an eyeblink and no with a nger response nger on the same trial. Vocal responses were lowest and least accurate. Marcel found similar results when normal participants were presented with the same task. Findings such as those of Ramachandran and Marcel clearly demonstrate that our cognitive systems are not as they appear to phenomenal analysis: unied and under the control of conscious will. When we speak to another person, it is already clear that the message we transmit is available only to one department in the bureaucracy of mind: a department that is ignorant of much

100 Roger Lindsay and Barbara Gorayska

of what goes on elsewhere, and unable to control many of the events of which it has knowledge. Cognitive science is beginning to reveal the true nature of the Wizard-of-Oz-style mechanisms that really operate the levers and pulleys of our minds. Cognitive technology must follow in the train of cognitive science, developing techniques by which the various subsystems of mind can be inuenced or modied, as their properties become manifest. (For an early example of this practice, see Meenan and Lindsay (2002); for further discussion of desirable methods and practice in Cognitive Technology see Gorayska and Mey (1996), Gorayska and Marsh (1996 & 1999), Marsh et al. (1999), and Gorayska, Marsh and Mey (2001)). c. Modifying social action planning via ethical engineering Most people already believe that in a dimly understood way, the ethical principles and precepts shared by a society have some eect upon the quality of social life that results. The theoretical analysis of cognition as a goal management system within which ethical constraints play an essential role in ensuring the tractability of action planning computations, provides a much clearer picture of how ethical beliefs can be causally eective. This analysis oers a naturalistic account of ethics whilst still retaining some basis for claiming that ethical precepts are not just empirical generalisations, that they cannot (as heuristics rather than assertions) be falsied; that they play a cognitively essential part in the production of socially acceptable behaviour, and that they promote the well being of groups of which an agent is part, as well as serving the interests of individuals themselves. Questions about relative eciency and eectiveness naturally follow any functionalist analysis, and the naturalistic account of ethics we have oered above is no exceptions. Until now, no explanations have been available of how ethical reasoning operates at a cognitive level. Instead most people seem to operate with a vague notion that virtue is its own reword viz. acting according to moral precepts is itself virtuous, and this will either enhance an agents social desirability in the present or result in some form of posthumous remuneration. Our analysis suggests that the connection between ethics and theistic epistemologies is solely of historical interest. Social agents need to operate under the constraint of principles of some kind, to moderate the computational burden of social action planning. Societies will benet if the principles employed tend to promote the achievement of ends that are in the general rather than the individual interest, or that include precepts related to outcomes that are too remote from individual behaviour to otherwise gure in their action planning. Examples of the former kind might relate to smoking in public, vaccinating

Relevance, goal management and cognitive technology

101

children against childhood diseases, driving safely, or limiting noise pollution resulting from personal sound sources. These are all examples of cases in which the well-being of the community is likely to benet from the acceptance of constraints upon the behaviour of individual agents. Examples of moral precepts associated with remote ends might relate to environmental conservation issues: for example constraints upon waste disposal practices or energy use. The point that we are making is that historically, societies shaped the behaviour of their members by accepting ethical principles that reduced the frequency of acts such as murder, or promoted the occurrence of actions thought to be desirable, such as promise-keeping, marital delity, or truthfulness. Usually, gains or losses recorded and administered by a deity were a crucial part of an apparatus, that operated to enjoin conformity. It seems that consensually accepted ethical belief systems have tended to become abandoned along with religion, even though religion provides only a fanciful scheme for policing whatever ethical systems are adopted, rather than a rationale for any particular set of ethical principles. In the end, the fact that some deity likes promise keeping, is no more an explanation of why promise keeping is good or right, than the fact that promise-keeping is approved by Great-Aunt Emily, or the family dog. The profound unfashionability of ethical systems may also be connected with a sense of otherness associated with them, perhaps because of their association with theism. Ethical systems are revealed to humankind, handed down on tablets of stone, or presented in some equally mysterious manner. There is little sense that they are human products like other cognitive tools that can be sharpened or revised as needs and circumstances change. Because of this, ethical systems have become antiquated and irrelevant-seeming: curious vestiges of the past. They incorporate injunctions not to covet a neighbours wife (absurdly when the neighbours wife in question is skilfully disporting herself on a silver screen with the intention of inducing covetousness), but do not condemn a multitude of behaviours we commonly recognise as evils, such as corporate greed, paedophilia and environmental pollution. If our analysis is correct, the cognitive need for principles that simplify the task of action planning will not go away, nor will there be any diminution of the social benets of acting according to heuristics that promote community wellbeing or serve goals to remote for individual agents to learn how their behaviour is related to them. Contemporary societies need to understand how ethical beliefs operate, to agree upon ethical precepts that are relevant to contemporary goals, and via educational programmes to promulgate such precepts so that

102 Roger Lindsay and Barbara Gorayska

everyone can benet from constraints upon the action planning processes of individual agents that are in tune with the times.

Note
* This paper is an extended and updated version of Lindsay and Gorayska (1994).

References
Allen, J. F. & C. P. Perrault (1978). Participating in Dialogue: Understanding via Plan Deduction. Proceedings, Canadian Society for Computational Studies of Intelligence. Appelt, D. E. (1985). Planning English Sentences.Cambridge: Cambridge UP. Baddeley, A. D. (2001). The episodic buer: a new component of working memory? Trends in Cognitive Sciences 4 (11), 417423. Baizer, J. S., L. G. Ungerleider & R. Desimone (1991). Organisation of visual inputs to the inferior temporal and posterior parietal cortex in Macaques. Journal of Neuroscience 11, 187194. Baron, J. (1994). Nonconsequentialist decisions. Behavioral and Brain Sciences 17(1), 110. Baron-Cohen, S. (1995). Mindblindness: An Essay on Autism and Theory of Mind. MIT Press, Cambridge MA Barsalou, L. W. (1991). Deriving categories to achieve goals. In G. H. Bower (Ed.), The psychology of learning and motivation: Advances in research and theory, 27, pp. 164. San Diego, CA: Academic Press. [Reprinted in A. Ram & D. Leake (Eds.), Goal-driven learning (1995), pp. 121176. Cambridge, MA: MIT Press/Bradford Books] Bernstein, B. (1964), Elaborated and restricted codes: their social origins and some consequences. American Anthropologist 66, 5569. Bernstein, B. (1971), Class, codes and control vol. 1: Theoretical studies towards a sociology of language. London: Routledge & Kegan Paul. Boussaoud, D, G. di Pellegrino & S. P. Wise (1996). Frontal lobe mechanisms subserving vision-for-action-versus vision-for-perception. Behavioural Brain Research 72, 115. Bowman, S., L. Hinkley, J. Barnes & R. Lindsay (this volume). Gaze aversion and the primacy of emotional dysfunction in autism. In B. Gorayska & J. L. Mey (Eds.), Cognition and Technology: Co-existence, convergence, co-evolution, pp. 267301. Amsterdam: John Benjamins Publishing Company. Broca, P. P. (1861). Remarques sur la siege de la facult du language articule, suivies dune observation daphemie. Bulletin de la Socit Anatomique 36, 33057. Bruce, B. C. (1975). Belief Systems and Language Understanding. BBN Technical Report No. 2973. Carberry, S. (1990). Plan recognition and its use in understanding dialogue. In A. Kobsa & W. Wahlster (Eds.), User Models in Dialogue System, pp. 13362. Berlin: Springer Verlag.

Relevance, goal management and cognitive technology 103

Carver, C. S. & M. F. Sheier (1982). Control Theory: a useful conceptual framework for Personality-Social, Clinical and Health Psychology. Psychological Bulletin 92(1), 11135. Charniak, E. (1972). Towards a model of childrens story comprehension. Unpublished doctoral dissertation, MIT. Clark, A. (2001). Mindware. Oxford: Oxford University Press. Cleeremans, A. (1993) Mechanisms of Implicit Learning. Cambridge, Mass: MIT Press. Cohen, M. B. & M. S. Strauss (1979). Concept acquisition in the human infant Child Development, 50, 419424. Cohen, P. & Perrault, C. R. (1979). Elements of a Plan Based Theory of Speech Acts. Cognitive Science, 3, 177212. Cutting, J. (1985). The Psychology of Schizophrenia. Edinburgh: Churchill Livingstone. Dascal, M. (1987). Language and reasoning: Sorting out sociopragmatic and psychopragmatic factors. In B. W. Hamill, R. C. Jernigan & J. C. Bourdreaux (Eds.), The role of language in problem solving II, pp. 183197. Amsterdam: North Holland. El Ashegh, H. A. & R. Lindsay (this volume). Cognition and Body Image. In B. Gorayska & J. L. Mey (Eds.), Cognition and Technology: Co-existence, convergence, co-evolution, pp. 175223. Amsterdam: John Benjamins Publishing Company. Elman, J. L., E. A. Bates, M. H. Johnson, A. Karmilo-Smith, D. Parisi & K. Plunkett (1996). Rethinking Innateness: a Connectionist Perspective on Development. Cambridge, MA: MIT Press. Erickson, T. & M. Mattson (1981). From words to meaning: a semantic illusion. Journal of Verbal Learning and Verbal Behaviour 20, 540551. Evans, J. T. St. B. (1982). The psychology of deductive reasoning. London: Routledge & Kegan Paul. Ferguson, G. A. (1956). On transfer and the abilities of man. Canadian Journal of Psychology 10, 12131. Fodor, J. A. (1983). The Modularity of Mind. Cambridge, Mass.: MIT Press. Gauthier, D. P. (1963). Practical Reasoning. Oxford: Clarendon Press. Gorayska, B. & R. O. Lindsay (1989a). Metasemantics of relevance. The First International Congress on Cognitive Linguistics. Print A265. L. A. U.D. (Linguistic Agency at the University of Duisburg) Catalogue: Pragmatics, 1989. Available from http:// www.linse.uni-essen.de:16080/linse/laud/shop_laud Gorayska, B. & R. O. Lindsay (1989b). On relevance: Goal dependent expressions and the control of planning processes. Technical Report 16. School of Computing and Mathematical Sciences. Oxford: Oxford-Brookes University. (First published as Gorayska and Lindsay 1989a.) Available at http://cogtech.org/publicat.htm Gorayska B. & R. O. Lindsay (1993). The Roots of Relevance. Journal of Pragmatic 19, 301323. Gorayska, B. & R. O. Lindsay (1995). Not Really a Reply More Like an Echo. Journal of Pragmatics 23, 683686. Gorayska, B., R. Lindsay, K. Cox, J. Marsh & N. Tse (1992). Relevance-Derived Metafunction: How to Interface Intelligent Systems Subcomponents. Proceedings of the AI Simulation and Planning in High Autonomy Systems Conference, Perth Australia, 811 July 1992, pp. 6472. Los Alamitos: IEEE Computer Society Press.

104 Roger Lindsay and Barbara Gorayska

Gorayska, B. & J. Marsh (1996). Epistemic Technology and relevance analysis: Rethinking Cognitive Technology. In B. Gorayska & J. L. Mey (Eds.), Cognitive Technology: In search of a humane interface, pp. 2739. Amsterdam: North Holland. Gorayska, B. & J. Marsh (1999). Investigations in Cognitive Technology: Questioning perspective. In B. Gorayska, J. Marsh & J. L. Mey (Eds.), Humane interfaces: Questions of methods and practice in Cognitive Technology, pp. 1743. Amsterdam: North Holland. Gorayska, B., J. Marsh & J. L. Mey (2001). Cognitive Technology: Tool or Instrument. In M. Beynon, C. L. Nehaniv & K. Dautenhahn (Eds.), Cognitive Technology: Instruments of mind, CT01. Lecture Notes in AI 2117, pp. 116. Berlin: Springer. Gorayska, B. & J. L. Mey (1996). Of minds and men. In B. Gorayska & J. L. Mey (Eds.), Cognitive Technology: In search of a humane interface, pp. 124. Amsterdam: North Holland. Gorayska, B. & N. Tse (1993). The Goal Satisfaction Heuristic in a Relevance Based Search. Technical Report TR-0393. Department of Computer Science, City Polytechnic of Hong Kong. Gorayska, B., N. Tse & W. H. Kwok (1997). A Goal Satisfaction Condition as a Function Between Problem Spaces and Solution Spaces. Technical Report TR-9706. Department of Computer Science, City University of Hong Kong. Available at http://cogtech.org/ publicat.htm Gottlieb, G. & N. A. Krasnegor, (1985). Measurement of Audition and Vision in the First Year of Postnatal Life: A Methodological Overview. Norwood, NJ: Ablex. Grice, H. P. (1961). The causal theory of perception. Aristotelian Society Proceedings, Supplementary Volume 35, 121152. Reprinted in Grice 1989: 224247. Grice, H. P. (1989). Studies in the Way of Words. Harvard University Press, Cambridge MA. Hampshire, S. (1960). Thought and Action. London: Chatto and Windus. Hare, R. M. (1952). The Language of Morals. Oxford: Clarendon Press. Harnad, S. (1990). The Symbol Grounding Problem. Physica D 42, 335346. Hyland, M. E. (1987). Control theory interpretations of psychological mechanisms of depression: comparison and integration of several theories. Psychological Bulletin 102 10921. Hyland, M. E. (1988). Motivational control theory: an integrative perspective. Journal of Personality and Social Psychology 55, 64251. Ip, H., B. Gorayska & W. Y. Lok. (1994). Relevance-Directed Vision using Goal/Plan Architecture. Proceedings of the Third Pacic Rim International Conference on Articial Intelligence, Beijing, 1618 August 1994, pp. 945951. Irvine, S. H. & J. W. Berry (1988). Human Abilities in Cultural Context. New York: Cambridge UP. Johnson, G. (1997). Neuron to Symbol: relevance information in hybrid systems. PhD Thesis. Oxford Brooke University, UK. Kagitscibasi, C. & J. W. Berry (1989). Cross-Cultural Psychology: Current Research and Trends. Annual Review of Psychology 40, 493531. Palo Alto: Annual Reviews Inc. Kahneman, D. & A. Tversky (1982). On the study of statistical intuitions. Cognition 11, 123141. Kahneman, D., P. Slovik & A. Tversky (1982). Judgements under uncertainty: Heuristics and biases. Cambridge: Cambridge UP.

Relevance, goal management and cognitive technology 105

Kant, I. (1953) Groundwork of the Metaphysics of Morals.Translated by H. J. Paton as: The Moral Law. London: Hutchinsons University Library. Kay, K. (2001). Machines and the Mind: Do articial intelligence systems incorporate intrinsic meaning? Harvard Brain Review, 8, Spring. http://www.hcs.harvard.edu/~husn/ BRAIN/vol8-spring2001/ai.htm. Accessed September 2002. Kohlberg, L. (1981). The Philosophy of Moral Development: Moral Stages and the idea of justice. New York: Harper and Row. Leslie, A. (1991). The theory of mind impairment in autism: Evidence for a modular mechanism of development? In A. Whiten (Ed.), Natural Theories of Mind: Evolution, Development and Simulation of Everyday Mindreading, pp. 6378. Oxford: Blackwell. Levy, D. M. (1979). Communicative Goals and Strategies: Between Discourse and Syntax. In T. Givon (Ed.), Syntax and Semantics, pp. 183210. New York: Academic Press. Lindsay, R. O. (1996a). Cognitive Technology and the pragmatics of impossible plans a study in Cognitive Prosthetics. AI & Society 10, 273288. Special issue on Cognitive Technology. Lindsay, R. O. (1996b). Heuristic Ergonomics and the Socio-Cognitive Interface. In B. Gorayska & J. L. Mey (Eds.), Cognitive Technology: In Search of a Humane Interface, pp. 14758. Advances in Psychology 113. Amsterdam: North-Holland, Elsevier Science. Lindsay, R. O. & B. Gorayska (1994). Towards a Unied Theory of Cognition. Unpublished manuscript. Lindsay, R. O. & B. Gorayska (1995). On Putting Necessity in its Place, with R. Lindsay. Journal of Pragmatics 23, 343346. Lindsay, R. O., Gorayska, B. & Cox, K. (1994) The Psychology of Relevance. Unpublished manuscript; available at http://cogtech.org/publicat.htm Mack, A. & I. Rock (1998). Inattentional blindness. Cambridge, MA: MIT Press. Marcel, A. J. (1993). Slippage in the unity of consciousness. In G. R. Bock & J. Marsh (Eds.), Experimental and Theoretical Studies of Consciousness. pp. 16880. CIBA Foundation Symposium 174. Chichester, UK: John Wiley & Sons. Marsh, J., B. Gorayska, & J. L. Mey (Eds.) (1999). Humane interfaces: Questions of methods and practice in Cognitive Technology. Amsterdam: North Holland. Meehl, P. E. (1962). Schizotaxia, schizotypy, schizophrenia. American Psychologist 17, 827838. Meenan, S. & R. Lindsay (2002). Planning and the Neurotechnology of Social Behaviour. International Journal of Cognition and Technology 1(2), 233274. Mishkin, M, L. G. Ungerleider & K. A. Macko (1983). Object vision and spatial vision: two cortical pathways. Trends in Neurosciences 6, 414417. Newell, A. (1990). Unied Theories of Cognition. Cambridge, Mass.: Harvard University Press. Newell, A. & H. A. Simon (1972). Human Problem Solving. Englewood Clis NJ: PrenticeHall. Norman, D. A. & T. Shallice (1986). Attention to action: Willed and automatic control of behavior. In R. Davidson, G. Schwartz & D. Shapiro (Eds.), Consciousness and SelfRegulation. New York: Plenum Press. Oatley, K. & J. Jenkins (1992). Emotion. Annual Review of Psychology 43, p. 5585.

106 Roger Lindsay and Barbara Gorayska

ORegan, J. K. & A. Ne (2001) A sensorimotor account of vision and visual consciousness. Behavioral and Brain Sciences 24 (5). Available at www.bbsonline.org. Partridge, D. (1991). A new guide to articial intelligence. Norwood, N. J.: Ablex. Peneld, W & Roberts L. (1959). Speech and brain mechanisms. Princeton NJ: Princeton University Press. Plunkett, K., P. McLeod & E. T. Rolls (1998). An Introduction to Connectionist Modelling Cognitive Processes. Oxford: Oxford University Press. Premack, D & A. J. Premack (1995). Origins of human social competence, In M. S. Gazzaniga (Ed.), The Cognitive Neurosciences, pp. 205218. Cambridge, Mass: MIT Press, A Bradford Book. Ramachandran, V. S. (1996). Synaesthesia in phantom limbs induced with mirrors. Proceedings of the Royal Society Londo, 263, 377386. Reber, A. (1993). Implicit learning and tacit knowledge: an essay on the cognitive unconscious. New York: Oxford University Press. Reder, L. & G. Kusbit (1991). Locus of the Moses Illusion: Imperfect Encoding, Retrieval, or Match? Journal of Memory and Language 30, 385406. Schneider, W. & R. M. Shirin (1977). Controlled and automatic human information processing: I. Detection, search, and attention. Psychological Review 84, 166. Segall, M. H., D. T. Campbell & M. J. Herskovits (1966). The Inuence Of Culture On Visual Perception. Indianapolis/NY: Bobbs-Merrill. Shanks, D. R. & M. F. St. John (1993). Characteristics of Dissociable Learning Systems. Brain and Behavioural Sciences 17(3), 367447. Shirin, R. M. & W. Schneider (1977). Controlled and automatic human information processing: II. Perceptual learning, automatic attenging, and a general theory. Psychological Review 84, 12790. Singer, M. G. (1963). Generalisation in Ethics. London: Eyre and Spottiswoode. Smolensky, P. (1988). On the Proper Treatment of Connectionism. Behavioral and Brain Sciences 11(1), 159. Sperber, D., F. Cara & V. Girotto (1995). Relevance theory explains the selection task. Cognition 57, 3195. Sperber, D. & D. Wilson (1986/1995) Relevance: Communication and Cognition. 2nd edition. Oxford: Blackwell. Sperber, D. & D. Wilson (1987). Prcis of Relevance: Communication and Cognition. Behavioral and Brain Sciences 10, 697754. Sperber, D. and D. Wilson (2004). Relevance theory. In G. Ward & L. Horn (Eds.), Handbook of Pragmatic, pp. 607632. Oxford: Blackwell. Also available at http:// www.dan.sperber.com/relevance_theory.htm. Accessed: September 2002 and January 2004. Squire, L. R. (1992). Declarative and Non-declarative memory: multiple brain systems supporting learning and memory. Journal of Cognitive Neuroscience 4, 232243. Tinbergen, N. (1953). The Herring Gulls World. London: Collins. Tse, N. (1994). The Learning Mechanism in GEPAM. MPhil Dissertation. Computer Science Department, City University of Hong Kong. Urmson, J. O. (1950). On Grading. Mind 59 (234), 145169.

Relevance, goal management and cognitive technology 107

Vera, H. A. & H. A. Simon (1993). Situated Action: a Symbolic Interpretation. Cognitive Science 17 (1), 748. January-March. Weiskrantz, L. (1988). Some contributions of neuropsychology of vision and memory to the problem of consciousness. In A. J. Marcel & E. Bisiach (Eds.), Consciousness in Contemporary Science, pp. 183199. Oxford: Clarendon Press. Whorf, B. L. (1956). Language Thought and Reality. (Ed. J. Carroll). Cambridge Mass: MIT Press. Woodworth, R. S. & H. Schlosberg (1954). Experimental Psychology. London: Holt, Rhinehart, Winston. Woodworth, R. S. & S. B. Sells. (1935). An Atmosphere eect in formal syllogistic reasoning. Journal of Experimental Psychology 18, 45160. Zhang, X, H. (1993). A Goal-Based Relevance Model and its Application to Intelligent Systems. Ph.D. Thesis, Oxford Brookes University, Department of Mathematics and Computer Science, October, 1993. Zhang, X. H., J. L. Nealon & R. O. Lindsay (1993). An Intelligent User Interface for Multiple Application Systems. Research and development in Expert Systems IX: Proceedings of Expert systems 92, The Twelfth Annual Technical Conference of the British Computer Society Specialist Group on Expert Systems, Cambridge, 1992. Cambridge: CUP.

Robots as cognitive tools*


Rolf Pfeifer
University of Zurich

1.

Introduction

Cognitive Technology is concerned with the relationship between humans and machines, in particular how the human mind can be explored via the very technologies it produces. In this chapter, I explore the use of robots as cognitive tools. While in science ction scenarios, the focus is often on how robots change society and our daily lives, I restrict myself here to the exploration of how robots can be productively used as tools for cognitive science, including the cognitive scaolding that robots provide for our understanding of embodied cognitive processes. The idea to design artifacts to explore human and other forms of intelligence goes back to the early days of articial intelligence. From the 1950s until the mid-80s, the period in which the so-called classical paradigm was predominant, the goal was mostly to develop algorithms for cognitive processes, cognitive being a very general term for mental processes. Examples include playing games such as chess and checkers, solving cryptarithmetic puzzles, performing medical diagnosis, proving mathematical theorems, or processing of written natural language text. This classical paradigm which deliberately abstracts from the physical level and focuses on symbol processing, can be naturally represented as algorithms. The models developed for these kinds of tasks typically had a very centralized, hierarchical, top-down organization. We will see later, that these characteristics are not appropriate to describe naturally intelligent systems. It turned out that this approach had severe limitations as can be seen, for example, from the failure of expert systems (for detailed arguments, see, e.g., Clancey, 1997; Pfeifer and Scheier, 1999; Vinkhuyzen, 1998; Winograd and Flores, 1986). In the mid-80s Rodney Brooks of the MIT Articial Intelligence Laboratory suggested that we forget about logic and problem solving, that we do away with high-level symbolic processing and focus

110

Rolf Pfeifer

on the interaction of agents with the real physical world (Brooks, 1991a, b). This interaction is, of course, always mediated by a body, i.e., his proposal was that articial intelligence be embodied. As a consequence, many researchers in the eld started using robots as their workhorse which, by denition, are embodied systems. What originally seemed nothing more than yet another buzzword turned out to have profound consequences and radically changed our thinking about intelligence, behavior, and society in general; a change in which robots as cognitive tools have been instrumental. Since the methodology employed by articial intelligence traditional or embodied is a synthetic one, it is briey introduced at the beginning. Then I outline the concept of embodiment and provide a set of case studies to illustrate dierent kinds of implications. The implications are integrated into an approach that has been called developmental robotics, which I introduce next. Finally, I attempt to characterize what we have learned by using robots as cognitive tools.

2. Synthetic methodology Research in articial intelligence employs a synthetic methodology, i.e., an approach that can be succinctly characterized as understanding by building: by developing artifacts that mimic certain aspects of the behavior of natural systems, a deeper understanding of that behavior can be acquired. There are three steps in the synthetic methodology: (1) building a model of some aspect of a natural system, (2) abstracting general principles of intelligence, and (3) applying these abstract principles to the design of intelligent systems. Examples of behaviors one might be interested in are how humans recognize a face in a crowd, how physicians arrive at a diagnosis, how rats learn about a maze, how dogs can run so quickly and at the same time catch a Frysbee, or how insects manage to get back to their nests after having found food. The models of interest are artifacts, either computer programs as in classical articial intelligence, or robots as in embodied articial intelligence. In the embodied approach simulations are used as well, but they are of a particular type, the socalled embodied agent simulations. They include physically realistic models of an environment and of the agents sensory and motor interactions with that environment. While biologists might be satised with an accurate model of a biological systems behavior (step 1), from the perspective of articial intelligence, and in

Robots as cognitive tools

111

particular from the perspective of the present paper, it is important to extract general principles (step 2). Those principles not only include principles of designing intelligent agents but also principles concerning scaolding. By scaolding we mean the structuring of the environment with the goal to enable the achievement of complex tasks through interaction with this structured environment (e.g., Clark, 1997). Robots employed as cognitive tools may provide precisely this perspective. Abstracting from purely biological principles is also a prerequisite for engineering applications (step 3) because in engineering the best solution is, in general, not as close a copy of the biological system as possible but some abstraction and modication thereof. In this way, the engineer can also exploit means not available to biological systems such as new types of sensors, media of communication, actuators, etc. Steps 1, 2 and 3 are not sequential: they are pursued partly in parallel and in an iterative way. The synthetic methodology contrasts with the analytic one where a given system is analyzed in a top-down manner, as is the standard way of proceeding in science. The synthetic approach, building aspects of the system one is trying to understand, has proved enormously successful: If one attaches a camera to a computer in order to develop a perceptual system, ones attention becomes immediately attracted to the relevant problems. For example, it becomes obvious that trying to map a pixel array from a camera image onto an internal symbolic representation is not going to work. As an aside, it is interesting to note that science in general is becoming increasingly synthetic as illustrated, for example, by the rapid growth of the computational sciences.

3. Case studies This section provides a number of demonstrations of the sorts of insights that can be gleaned using robots as cognitive tools. Two important points must be mentioned upfront. First, for the better part, research performed in the eld of robotics is of a very traditional nature. It is based on a so-called sense-think-act, or sense-model-plan-act cycle: There is an input (sense), this input is then mapped onto some internal representation (model) which is used for planning (plan), and nally the plan is executed in the real world (act). Modeling and planning together form the think part of the cycle. This cycle is closely related to the idea of hierarchical, centralized, symbol processing systems as they have been employed in the classical approach. Thus, if one is after truly interesting results and if one wants to explore the true implications of embodiment, merely

112

Rolf Pfeifer

building robots is not enough since robots can be used in very traditional ways. Instead, one has to apply a novel research strategy such as the framework outlined in embodied cognitive science (e.g., Pfeifer and Scheier, 1999). In the case studies below I will show how robots can be employed to explore embodiment. As we will see, this requires an entirely new way of thinking and necessitates reecting on the interaction with the real world which is messy and not as neat as the world of computer programs. Second, embodiment has two main types of implications: physical and information theoretic. The former are concerned with physical forces, inertia, friction, vibrations, and energy dissipation, i.e., anything that relates to the (physical) dynamics of the system. The latter are concerned with the relation between sensory signals, motor control, and neural processing. While in the traditional approach the focus is on the internal control mechanisms, or the neural substrate, in the new approach the focus is now on the complete organism which includes morphology (shape, distribution and physical characteristics of sensors and actuators, limbs, etc.) and materials. One surprising consequence is that, often, problems that seem very hard if viewed from a purely computational perspective, turn out to be easy if the embodiment and the interaction with the environment are appropriately taken into account. For example, given a particular task environment, if the morphology is right, the amount of neural processing required may be signicantly reduced (e.g., case study 1). Because of this perspective on embodiment, entirely new issues are raised and need to be taken into account. As I will illustrate in the following case studies, one important issue concerns the so-called ecological balance, i.e., the interplay between the sensory system, the motor system, the neural substrate, and the materials used (Hara and Pfeifer, 2000; Pfeifer, 1996b; Pfeifer, 1999, 2000; Pfeifer and Scheier, 1999). I will begin with a simple robotics experiment, the Swiss Robots (Case study 1) and an example from articial evolution (Case study 2), which illustrate mostly the relation between behavior, sensor morphology, and internal mechanism. Then I will discuss motor systems, in particular biped walking, where the exploitation of (physical) dynamics as well as the interrelationship between morphology and control are demonstrated. This will be followed by an introduction to developmental robotics which incorporates the major implications of the embodied approach to articial intelligence and cognitive science. Because of its importance, I will devote an entire section to it (Section 4).

Robots as cognitive tools

113

a.

b.

c.

d.

Figure 1. The Swiss Robots. (a) Robot with IR sensors and neural network implementing a simple avoidance reex. (b) Clustering process. (c) Explanation of cluster formation. (d) Changed morphology: modied sensor positioning (details: see text).

Case study 1: The Swiss Robots The Swiss Robots (Figure 1a) can clean an arena cluttered with Styrofoam cubes (Figure 1b) (which is why they are called Swiss Robots). They can do this even though they are only equipped with a simple obstacle avoidance reex based on infrared (IR) sensors. The reex can be described as stimulation of right IR sensor, turn left or stimulation of left IR sensor, turn right. If a robot happens to encounter a cube head-on, there will be no sensory stimulation because of the physical arrangement of the sensors. The robot will move forward and at the same time push the cube until it encounters another one on the side (Figure 1c), at which point it will turn away. If the position of the sensors is changed (Figure 1d), the robots no longer clean the arena, although the enviroment and the control program are exactly the same (for more detail, the reader is referred to Maris and te Boekhorst, 1996; Pfeifer and Scheier, 1998; or Pfeifer, 1996a, 1999). A powerful idea illustrated by this example is that, if the morphology is right, control can become much simpler (in this case a simple obstacle avoidance reex leads to clustering behavior). This point will be further illustrated below when I discuss the trade-o between morphology and

114

Rolf Pfeifer

control in the case study on the evolution of the morphology of an insect eye on a robot.

Case study 2: Evolving the morphology of an insect eye on a robot When sitting in a train, looking out the window in the direction of the train, a light point, say a tree, will travel slowly across the visual eld as long as the tree is well in front and far away. The closer we are to the tree, the more the tree will move to the side, and the faster it will move across the visual eld. This is called the phenomenon of motion parallax; it is solely a result of the geometry of the system-environment interaction and does not depend on the characteristics of the visual system. If the agent is moving at a xed lateral distance to an object with a constant speed, we may want its motion detectors to deliver a constant value to reect the constant speed. Assume now that we have an insect eye consisting of many facets or ommatidia. If they are evenly spaced, i.e., if the angles between them are constant (Figure 2a), dierent motion detector circuits have to be used for each pair of facets. If they are more densely spaced toward the front (Figure 2b), the same circuits can be used for motion detection in the entire eye. Indeed, this has been found to be the case in certain species of ies (Franceschini et al., 1992) where the same kind of motion detectors are used throughout the eye, the so-called EMDs, the Elementary Motion Detectors, because the motion parallax is compensated away, so to speak. This is an illustration of how morphology can be traded for computation a kind of preprocessing is performed by the morphology. How this trade-o is chosen depends on the particular task environment, or in natural systems, on the ecological niche: natural evolution has come up with a particular solution because morphology and neural substrate have co-evolved. In order to explore these ideas, Lichtensteiger and Eggenberger (1999) evolved the morphology of an insect eye on a real robot (Figure 3). They xed the neural substrate. That is, the elementary motion detectors which were taken to be the same for all pairs of facets, were not changed during the experiment and they used a exible morphology where they could adjust the angles at which the facets were positioned (Figure 3a). They used an evolution strategy (Rechenberg, 1973) to evolve the angles for the task of maintaining a minimal lateral distance to an object. The results conrm the theoretical predictions: the facets end up with an inhomogeneous distribution with a higher density towards the front (Figure 3c). The idea of space-variant sensing (e.g., Ferrari et al., 1995; Toepfer et al., 1998) capitalizes on this trade-o and is gaining rapid acceptance in the eld of robot vision.

Robots as cognitive tools

115

East transition

Constant transition

Slow transition

Figure 2. Trading morphology for computation. (a) Evenly spaced facets imply dierent motion detection circuits for dierent pairs of facets. (b) Inhomogeneous distribution of facets implying that the same neural circuits can be used for motion detection throughout the entire eye.

Although these examples are very simple and obvious, they demonstrate the interdependence of morphology and control in sensory systems, a point that should always be explicitly taken into account but has to date not been systematically studied. Similar considerations apply to the motor system.

Case study 3: The passive dynamic walker I start with an example illustrating the relation between morphology, materials, and control. The passive dynamic walker by Steve Collins (originally suggested by McGeer, 1990a, b), illustrated in Figure 4a, is a robot (or, if you like, a mechanical device) capable of walking down an incline without any actuation whatsoever. In other words, there are no motors and there is no control on the robot; it is brainless, so to speak. In order to achieve this task the passive dynamics of the robot, its body and its limbs, must be exploited. This kind of walking is very energy ecient but its ecological niche (i.e., the environment

116

Rolf Pfeifer

in which the robot is capable of operating) is extremely narrow: it only consists of inclines of certain angles. The strategy is to build a passive dynamic walker, and then to extend its ecological niche and have the robot walk on a at surface (and later more complex environments) by only adding little actuation and control. Energy-eciency is achieved because in this approach the robot is operated near one of its Eigenfrequencies. A dierent approach has been taken by the Honda design team (see Figure 4b). There the goal was to have a robot that could perform a large number of movements. The methodology was to record human movements and then to reproduce them on the robot, which leads to a relatively natural behavior of the robot. On the other hand, control is extremely complex and there is no exploitation of the intrinsic dynamics as in the case of the passive dynamic walker. The implication is that the movement is not energy ecient. It should be noted that even if the agent is of high complexity as the Honda robot, there is nothing in principle that prevents the exploitation of its passive dynamics. There are two main conclusions that can be drawn from these examples. First, it is important to exploit the dynamics in order to achieve energy-ecient and natural kinds of movements. The term natural does not only apply to biological systems. Articial systems also have their intrinsic natural dynamics. Second, there is a kind of trade-o or balance between exploitation of the dynamics, simplicity of control and amount of neural processing: the better the exploitation of the dynamics and the simpler the control, the less neural processing will be required and vice versa. At this point one might be tempted to say Well, this is all very interesting, but how does it relate to the goal of articial intelligence, i.e., understanding and building intelligent systems? How can robots help us make progress towards this goal? I do not have an answer to these questions. However, I would like to propose an approach which might bring us closer to this goal, developmental robotics. Using this approach, I will show how the ideas developed so far in this paper can be taken one step further. This requires a bit of a digression into the foundations of cognition and its development.

4. Developmental robotics a synthesis Developmental robotics designates an approach whose goal is to design robots in which cognition develops as the robots interact with their physical and social environment over extended periods of time. For the purposes of the present

Robots as cognitive tools

117

a.

b.

c.

Figure 3. Evolving the morphology of an insect eye. (a) The Eyebot used for experiments on motion parallax. (b) The experiment seen from the top. The robot has to maintain a minimal lateral distance to an obstacle (indicated by the vertical light tube) by modifying its morphology, i.e., the positioning of the facet tubes. This is under the control of an evolution strategy. The same EMDs are used for all pairs of facets. (c) Final distribution of facets from three dierent runs. The front of the robot is towards the right. In all of the runs, the distribution is more dense towards the front than on the side. In all of them, there are no facets directly in the front of the robot. This is because of the low resolution (the aperture) of the tubes.

paper I restrict myself to the physical environment. As I will show below, in this process morphology and materials play an essential role and robots that utilize both can help us acquire a deeper understanding of the developmental processes. Before going into the details, let me briey argue why development is essential to our understanding of cognition.

Rationale Before I can introduce the approach, however, I need to make a short comment on categorization. One of the most fundamental abilities of agents animals,

118

Rolf Pfeifer

a.

b.

Figure 4. Two approaches to robot building. (a) The passive dynamic walker, (b) the Honda robot.

humans, and robots in the real world, is their capacity to make distinctions: food has to be distinguished from non-food, predators from con-specics, the nest from the rest of the environment, and so forth. This ability is also called categorization and forms the basis of concept formation and, ultimately, of high-level cognition. It turns out that making distinctions in the real world is very hard, since the proximal stimulation (the stimulation on the retina) from one and the same object varies greatly depending on distance, orientation, and lighting conditions. Moreover, the agent is confronted with a continuously changing stream of sensory stimulation which, in addition, strongly depends on the current behavior of the agent. Categorization in the real world is not well understood. As the vast literature on computer and robot vision documents, categorization and object recognition cannot be achieved by simply mapping the pixel array from a camera onto some form of internal representation. In categorization behavior, processes of sensory-motor coordination play an essential role. Often, when a process is poorly understood, a developmental perspective may shed new light on the process: once we understand how a particular ability came about, we may have a deeper understanding of it. The basic idea of a developmental approach, i.e., the attempt to understand cognition by investigating its ontogenetic development, is shared by an increasing number of researchers (e.g., Clark, 1997; Edelman, 1987; Elman et al., 1996; Metta et al., 1998; Thelen and Smith, 1994, to mention but a few). Thelen and

Robots as cognitive tools

119

Smith, for example, argue that while in human infants behavior is initially highly sensory-motor and is directly coupled to the system-environment interaction, during development some processes become decoupled from the direct sensory-motor interaction, but the underlying neural mechanisms remain essentially the same. The advent of the discovery of mirror neurons (For an overview see, e.g., Rizzolatti et al., 2000), i.e., neurons that are equally activated when performing or just observing an action, adds validity to this view. The question of what the mechanisms are through which, over time, this decoupling from the environment takes place is, to my knowledge, an unresolved research issue. And here is where robots might come into play, in spite of the fact that they are extremely simple compared to a human: Abstractions, simplications, and leaving out detail are always necessary in order to achieve explanatory power. In robots we can record the sensory-motor and internal states (e.g., of a neural network) and trace the entire developmental history. In particular, we can measure, and thus get a precise image of, the patterns of sensory stimulation that originate from an interaction with the real world such as grasping a cup. These patterns form, so to speak, the raw material for the neural substrate to process. It has been shown (Lungarella and Pfeifer, 2001; Pfeifer and Scheier, 1997, 1999; Scheier, Pfeifer, and Kuniyoshi, 1998) that the continuously varying, highly complex sensory stimulation is signicantly simplied if the interaction is in the form of a sensory-motor coordination, i.e., behavior in which there is a tight coupling between the action and the sensory stimulation as, for example, in foveating, in moving towards an object, or in grasping. In other words, sensory patterns are induced by sensory-motor coordination. Note that sensory-motor coordination does not only include information processes but also physical processes. This sensory-motor coordination has strong information theoretic implications in that it reduces the complexity of the sensory-motor patterns, and this reduced complexity is a prerequisite for learning and development.

Exploiting morphology and materials If our objective is to model human development, it seems natural to specically employ a humanoid robot for this purpose, i.e., a robot whose morphology, at least supercially, resembles that of a human. Such robots typically have very many degrees of freedom and are, as such, hard to control. However, a look at the natural system may provide some inspiration towards a solution for the robots which then, in turn, may provide insights into the functioning of the natural the human system.

120 Rolf Pfeifer

To this end, let us pursue the idea of exploiting the dynamics a little further and include material properties which can also be conveniently exploited when designing actual robots. Most robot arms available today work with rigid materials and electrical motors. Natural arms, by contrast, are built of muscles, tendons, ligaments, and bones, materials that are non-rigid to varying degrees. All these materials have their own intrinsic properties like mass, stiness, elasticity, viscosity, temporal characteristics, damping, and contraction ratio to mention but a few. These properties are all exploited in interesting ways in natural systems. For example, there is a natural position for a human arm which is determined by its anatomy and its material properties. Grasping an object like a cup with the right hand is normally done with the palm facing left, but could also be done with considerable additional eort the other way around, i.e., the palm facing right. Assume now that the palm of your right hand is facing right and you let go. Your arm will immediately turn back into its natural position. This is not achieved by neural control but by the properties of the muscle-tendon system: On the one hand, the system acts like a spring the more you stretch it, the more force you have to apply and, if you let go, the spring moves back into its resting position. On the other hand, there is intrinsic damping. Normally, reaching an equilibrium position and damping are conceived of in terms of electronic (or neural) control, whereas in this case, this is achieved (mostly) through the material properties. If these ideas are applied to robots, control becomes much simpler. Many researchers have started building articial muscles (for reviews of the various technologies see, e.g., Kornbluh et al., 1998, and Shahinpoor et al., 2000) and use them for robots (Figure 5). ISAC, a service robot, and the articial hand by Lee and Shimoyama use pneumatic actuators, Cog uses the series elastic actuators, and the Face Robot uses shape memory alloys. Facial expressions also provide an interesting illustration for the point made here. If the facial tissue has the right sorts of material properties in terms of elasticity, deformability, stiness, etc., the neural control for the facial expressions becomes much simpler. Take, for example, smiling. It involves the entire face, but its actuation is very simple: the complexity is added by the tissue properties.

Implications of morphology and materials for neural processing Although articial muscles have been and are increasingly used in robotics, their intrinsic dynamic properties (elasiticity, damping, providing constraints through the muscle-tendon system) have to date not been really exploited. However, for developmental robotics their exploitation will be essential for a number of reasons.

Robots as cognitive tools

121

a.

b.

c.

d.

Figure 5. Robots with articial muscles. (a) ISAC (pneumatic actuators). (b) COG (Series-elastic actuators). (c) Lee-Shimoyama hand (pneumatic actuators). (d) Face Robot (shape-memory alloys).

Because of the material properties of the muscle-tendon system, control is not only simpler, but highly decentralized, thus freeing the control systems from a lot of neural processing. In addition, and more specically, the human hand-arm-shoulder system has a particular morphology (or anatomy): the arms with the hands and ngers facing mostly inwards, i.e., towards the body. Assume now that there is random neural stimulation of the muscles in this system. Rather than performing a random movement, the arm will swing in a highly constrained fashion, the constraints being given by the anatomy and the material properties of the muscle-tendon system. For example, the palm with (the inside of) the nger tips will roughly face in the direction of the arm movement. Thus, if the hand hits an object (or the body for that matter), it will most likely hit it with the palm or the nger tips. Because the latter have an

122 Rolf Pfeifer

extremely high density of sensors (in particular for touch), there will be rich haptic sensory stimulation. Assume also, that a grasp reex is triggered as the hand hits an object. On the one hand, this will generate rich sensory stimulation originating from the densely spaced sensors on the nger tips and, on the other hand, there is a high chance that the object will be brought into the visual eld by this movement; perhaps the agent will even stick it into its mouth, thus creating an additional rich pattern of sensory stimulation. This way, temporarily stable patterns of sensory stimulation are induced in several sensory channel (the visual, the haptic, and the proprioceptive one). In other words, correlations are induced which can be exploited by the neural system for forming crossmodel associations, a process deemed essential in concept formation (e.g., Thelen and Smith, 1994). Over time, the information acquired through one sensory channel (e.g., the visual one) can become a partial predictor for the information obtained from a dierent one (e.g., the haptic or olfactory one). Because the sensory stimulation and the state of the neural networks controlling a robot can be fully recorded, the sensory patterns induced can be quantitatively analyzed, as demonstrated by Lungarella and and Pfeifer (2001). This type of analysis provides the basis for exploring models of neural mechanisms of categorization from which an articial developmental process can be bootstrapped that might eventually lead to behavior that we would call high-level cognition. Moreover, this type of analysis can then be used to formulate hypotheses about the neural processing and control in natural systems, because currently, relatively little is known about the true sensory stimulation and the internal processes (though great strides have been made by employing brain imaging techniques). This closes the loop from natural systems to robots and back to natural systems. The story is much longer, but these comments should illustrate the basic ideas.

5. Summary and conclusions Let us briey summarize the main points from the various case studies. Traditionally, the focus of research in articial intelligence has been on the control architecture (in the form of computer programs or neural network models). By contrast, in the present approach we have looked at complete agents and their interaction with the real world. It has been shown that there is an intricate relation between morphology, materials and control and that all these aspects, not only control, are essential in understanding intelligent behavior. The

Robots as cognitive tools

123

Didabot experiment demonstrated that by exploiting the constraints of the ecological niche (Styrofoam cubes of a particular size, closed arena), control can be enormously simplied. Experiments with the Eyebot showed how appropriate morphology can perform some of the preprocessing which, because it is performed by the physics, is fast and for free. The passive dynamic walker illustrated the exploitation of morphology to achieve natural locomotion with no or in general, little control and high energy eciency. Finally, the approach of developmental robotics integrates all of these ideas by incorporating processing of sensory-motor coordination where sensor morphology (physical nature and distribution of the sensors on the agent), morphology of the motor system (anatomy), and materials (properties of the muscle-tendon system) all work together to provide the basis for cognitive development. As mentioned earlier, one cannot say with certainty that these ideas would not have evolved had it not been for the robots, but certainly, using robots as cognitive tools has helped a great deal for the following reasons: Robots are physical systems interacting with the real world. Designing and building robots for particular tasks forces the designers attention on the fundamental issues and there is no glossing over hard problems. Although robots are dierent from natural systems, because of their nature as real-world devices, they are subject to the same physical conditions as natural systems, which makes them excellent candidates for the synthetic methodology. Sensory-motor and internal state can be measured, recorded into time series les and analyzed, thus providing an objective basis from which to bootstrap a process of ontogenetic development. Recording sensory stimulation and internal state is only possible to a very limited extent in natural systems. Dierent control schemes can be explored (network or otherwise), experiments can be performed with dierent sensory and motor systems, and with dierent materials, which may or may not exist in nature. Because the systems can be engineered, there is much more exibility in experimentation than if one works with natural systems, as in the analytic framework. This is essential in generating testable hypotheses for biologists. Because experiments are performed on artifacts in the real world, these artifacts may be exploited for practical applications. Robots provide an excellent vehicle for communication in transdisciplinary projects which is notoriously hard as researchers from dierent elds not only have dierent backgrounds, but they also use dierent languages.

124 Rolf Pfeifer

As mentioned in the introduction, it is interesting to view this approach of using robots as cognitive tools in term of scaolding. By designing and building robots we structure our research environment in ways that enable new, more sophisticated types of interactions. This is one of the central goals of Cognitive Technology. I expect the potential of this approach to increase with progress in embodied cognitive science in general, and with robotics technology and material science in particular. Going back to our initial characterization of Cognitive Technology as a discipline concerned with how the human mind can be explored via the very technologies it produces, I have pointed to one particular way in which this might be achieved, namely by employing the synthetic methodology. Historically speaking, technology has always shaped the way human nature has been perceived. Recently, the computer technology has suggested the very powerful and popular metaphor of the brain as a computer. In this paper, I have suggested that by instantiating embodied systems robotics technology provides an entirely novel and more appropriate perspective on the functioning of the human mind. More importantly, from the viewpoint of Cognitive Technology, it also opens the window for further investigations into the workings of the tool-empowered, embodied mind.

Note
* This paper contains signicant portions of two papers entitled Embodied Articial Intelligence (Pfeifer, 2001) and Teaching powerful ideas with autonomous mobile robots (Pfeifer, 1996a). I would like to thank Barbara Gorayska for suggesting that I write this chapter to further Cognitive Technology research. I would also like to thank the members of the Articial Intelligence Laboratory for many discussions and for their patience in discussing the same issues with me over and over again. Last but not least, I would like to thank the Swiss National Science Foundation who has generously supported this research with grants #1165310.01 and #2061372.00.

References
Brooks, R. A. (1991a). Intelligence without representation. Articial Intelligence, 47, 139160. Brooks, R. A. (1991b). Intelligence without reason. Proceedings International Joint Conference on Articial Intelligence-91, 569595. Clancey, W. J. (1997). Situated cognition. On human knowledge and computer representations. Cambridge, UK: Cambridge University Press.

Robots as cognitive tools

125

Clark, A. (1997). Being there: Putting brain, body, and world together again. Cambridge, Mass.: MIT Press. Edelman, G. E. (1987). Neural Darwinism. The theory of neuronal group selection. New York: Basic Books. Elman, J. L., E. A. Bates, M. H. Johnson, A. Karmilo-Smith, D. Parisi & K. Plunkett (1996). Rethinking innateness. A connectionist perspective on development. Cambridge, Mass.: MIT Press. Ferrari, F., P. Q. J. Nielsen & G. Sandini (1995). Space variant imaging. Sensor Review 15, 1720. Franceschini, N., J. M. Pichon & C. Blanes (1992). From insect vision to robot vision. Philosophical Transactions of the Royal Society, London B, 337, 283294. Hara, F. & R. Pfeifer (2000). On the relation among morphology, material and control in morpho-functional machines. In J. A. Meyer, A. Berthoz, D. Floreano, H. Roitblat, and S. W. Wilson (Eds.), From Animals to Animats 6. Proceedings of the sixth International Conference on Simulation of Adaptive Behavior 2000, pp. 3340. Kornbluh, R. D., R. Pelrine, J. Eckerle & J. Joseph (1998). Electrostrictive polymer articial muscle actuators. In Proceedings of 1998 IEEE International Conference on Robotics and Automation, pp. 21472154. New York, N. Y.: IEEE. Lichtensteiger, L. & P. Eggenberger (1999). Evolving the morphology of a compound eye on a robot. In Proceedings of the third European Workshop on Advanced Mobile Robots (Eurobot99) (Cat. No.99EX355), pp. 127134. Piscataway, N. J.: IEEE. Lungarella, M. & R. Pfeifer (2001). Robots as cognitive tools: information theoretic analysis of sensory-motor data. Proceedings of the IEEE-RAS International Conference on Humanoid Robots, 2001, pp. 245252. Maris, M. & R. te Boekhorst (1996). Exploiting physical constraints: heap formation through behavioral error in a group of robots. In Proceedings of IROS 96, IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 16551660. McGeer, T. (1990a). Passive dynamic walking. International Journal of Robotics Research 9, 6282. McGeer, T. (1990b). Passive walking with knees. In Proceedings of the IEEE Conference on Robotics and Automation 2, pp. 16401645. Metta, G., G. Sandini & J. Konczak (1998). A developmental approach to sensori-motor coordination in articial systems. In Proceedings of IEEE Conference on Systems, Man and Cybernetics, pp. 1114. San Diego, USA. Piscataway, N. J.: IEEE Service Center. Pfeifer, R. (1996a). Teaching powerful ideas with autonomous mobile robots. Computer Science Education 7, 161186. Pfeifer, R. (1996b). Building Fungus Eaters: Design principles of autonomous agents. In P. Maes, M. Mataric, J. A. Meyer, J. Pollack & S. W. Wilson (Eds.), From Animals to Animats 4, Proceedings of the 4th International Conference on Simulation of Adaptive Behavior, pp. 312. Cambridge, Mass.: A Bradford Book, MIT Press. Pfeifer, R. (1999). Dynamics, morphology, and materials in the emergence of cognition. In Burgard, W., Christaller, T., Cremers, A. B. (Eds.), KI-99 Advances in Articial Intelligence. Proceedings of the 23rd Annual German Conference on Articial Intelligence, Bonn, Germany, 1999, Lecture Notes in Computer Science 1701, pp. 2744. Berlin: Springer.

126 Rolf Pfeifer

Pfeifer, R. (2000). On the role of morphology and materials in adaptive behavior. In J. A. Meyer, A. Berthoz, D. Floreano, H. Roitblat, & S. W. Wilson (Eds.), From Animals to Animats 6. Proceedings of the sixth International Conference on Simulation of Adaptive Behavior 2000, pp. 2332. Pfeifer, R. (2001). Embodied Articial Intelligence: 10 years back, 10 years forward. In R. Wilhelm (Ed.), Informatics 10 years back, 10 years ahead. Lecture Notes in Computer Science, pp. 294310. Berlin: Springer. Pfeifer, R. & C. Scheier (1997). Sensory-motor coordination: the metaphor and beyond. Robotics and Autonomous Systems 20, 157178. Pfeifer, R. & C. Scheier (1998). Representation in natural and articial agents: an embodied cognitive science perspective. Zeitschrift fr Naturforschung 53c, 480503. Pfeifer, R. & C. Scheier (1999). Understanding intelligence. Cambridge, Mass.: MIT Press (2nd printing 2000 paperback edition). Rechenberg, I. (1973). Evolutionsstrategie: Optimierung technischer Systeme nach Prinzipien der biologischen Evolution. Stuttgart: Frommann-Holzboog. Rizzolatti, G., L. Fogassi & V. Gallese (2000). Cortical mechanisms subserving object grasping and action recognition: A new view of the cortical motor functions. In M. S. Gazzaniga (Ed.), The new cognitive neurosciences, pp. 539552. Cambridge, Mass.: MIT Press. Scheier, C., R. Pfeifer & Y. Kuniyoshi (1998). Embedded neural networks: exploiting constraints. Neural Networks 11, 15511569. Shahinpoor, M., Y. Bar-Cohen, J. O. Simpson & J. Smith (2000). Ionic Polymer-Metal Composites (IPMC) as biomimetic sensors, actuators & articial muscles A review. http://www.unm.edu/~amri/paper.html Thelen, E. & L. Smith (1994). A dynamic systems approach to the development of cognition and action. Cambridge, Mass.: MIT Press, Bradford Books. Toepfer, C., M. Wende, G. Barato & H. Neumann (1998). Robot navigation by combining central and peripheral optical ow detection on a space-variant map. In Proceedings of the Fourteenth International Conference on Pattern Recognition, pp. 18041807. Los Alamitos, CA: IEEE Computer Society. Vinkhuyzen, E. (1998). Expert systems in practice. Unpublished Ph.D. Dissertation, University of Zurich. Winograd, T. & F. Flores (1986). Understanding computers and cognition. Reading, Mass.: Addison-Wesley.

The origins of narrative


In search of the transactional format of narratives in humans and other social animals*
Kerstin Dautenhahn
University of Hertfordshire

1.

Introduction: The social animals

Humans share fundamental cognitive and behavioral characteristics with other primates, in particular apes (orangutan, gorilla, chimpanzee, bonobo). Although it is widely accepted that humans and other apes have a common ancestor and that human behavior and cognition are grounded in evolutionary older characteristics, many people still insist that human intelligence and human culture are unique and qualitatively dierent from most (if not all) other non-human animals. Traditionally human language has often served as an example of a unique characteristic. However, due to Donald Grin, the founder of the eld of cognitive ethology, it is recognized as a valid endeavor to study the evolutionary continuity of mental experiences (Grin, 1976). Humans are not discontinuous from the rest of nature. The particular topic of this chapter is narrative. With a few exceptions (Read and Miller, 1995), most discussions on the narrative mind have neglected the evolutionary origins of narrative. Research on narrative focuses almost exclusively on language in humans (see, e.g., Turner, 1996). Similarly, narrative is often conceived of as a (sophisticated) art form, rather than serving a primarily communicative function. The work presented here1 argues that human narrative capacities are not unique and that an evolutionary continuity exists that links human narratives to transactional narrative formats in social interactions among non-human animals. Also, from a developmental point of view, we argue that narrative capacities develop from pre-verbal, narrative, transactional formats that children get engaged in with their parents and peers. Instead of focusing on dierences between humans and other animals, we point out

128 Kerstin Dautenhahn

similarities and evolutionary, shared histories of primates with specic regard to the origins and the transactional format of narratives. The chapter sets o by reviewing the main arguments of a debate that is currently discussed intensively in primatology and anthropology, namely, that the primary function of human language might have been its capacity to aord coping with increasingly complex social dynamics. Based on this framework of the social origin of human intelligence, we discuss the Narrative Intelligence Hypothesis (NIH), rst suggested in (Dautenhahn, 1999), that points out the intertwined relationship between the evolution of narrative and the evolution of social complexity in primate societies. The underlying assumptions and arguments are discussed in greater detail. The NIH as referred to in this paper consists of the following line of arguments: a. Individualized societies are a necessary (but possibly not sucient) substrate for the evolution of narratives. In such societies members know each other individually. b. The specic narrative format of such transactions serves an important communicative function among primates and has possibly evolved independently in other groups of species that live in individualized societies. Narrative capacities co-evolved in order to cope with increasingly complex dynamics. c. The evolution of communication in terms of narrative language (storytelling) was an important factor in human evolution that has shaped the evolution of human cognition, societies and human culture. The use of language in a narrative format provided an ecient means of social grooming that maintains group coherence. d. Pre-verbal transactions in narrative format bootstrap a childs development of social competence and social understanding. e. Human cultures which are fundamentally narrative in nature provide an environment that young human primates are immersed in and facilitate not only the development of a child into a skilled story-teller and communicator, but also the development of an autobiographical self. The NIH is speculative and part of ongoing research. The particular contribution of this chapter is that it discusses in more detail the transactional and canonical format of narrative that can be found in dierent verbal and nonverbal social interactions among primates, and in preverbal communication of human infants.2 While this chapter discusses work in progress, it is hoped that future research in this area can lead to a theory of the (social) origins of

The origins of narrative 129

narrative. Essential for the development of such a theory is empirical evidence. The current paper only provides supporting material that helps in (a) the process of synthesizing ideas from various research elds, and (b) in formulating the NIH. In Section 5 we discuss experiments that are needed in order to test/falsify a theory on the social origins of narrative. The NIH implies a better understanding of the origins of narrative intelligence in humans and other animals. Such an understanding can point out issues relevant to the design of narrative technology. Therefore, Section 6 concludes this paper by discussing implications of the NIH for technology that meets the social and cognitive needs of human story-tellers.

2. The Social Brain Hypothesis Primate societies belong to individualized societies with complex types of social interactions, social relationships and networks. In individualized societies group members individually recognize each other and interact with each other based on a history of interactions as part of a social network. Many mammal species (such as primates, elephants, and cetaceans) live in highly individualized societies, so do bird species such as corvids and parrots. Preserving social coherence and managing cooperation and competition with group members are important aspects of living in individualized societies. Dealing with such a complex social eld often requires sophisticated means of interaction and communication which are important for the Narrative Intelligence Hypothesis discussed in this article. 2.1 Primate group sizes and the neocortex Why do humans have, relatively speaking, large brains? No other organ of the human body consumes as much of the bodys energy (20%), even at rest, while making up only 2% of an adults body weight. How can human primates aord such an expensive organ? What were the particular selective pressures in human evolution that led to such costly brains, or to put it dierently, what are brains good for? In the context of human (or generally primate) intelligence the Social Intelligence Hypothesis (SIH), sometimes also called Machiavellian Intelligence Hypothesis or Social Brain Hypothesis, suggests that the primate brain and primate intelligence evolved in adaptation to the need to operate in large groups

130 Kerstin Dautenhahn

where the structure and cohesion of the group required a detailed understanding of group members (cf. Byrne and Whiten, 1988; Whiten and Byrne, 1997; Byrne, 1997). Given that maintaining a large brain is very costly, it is assumed that the necessity to evolve social skills (which allow interpreting, predicting and manipulating conspecics) has been a prominent selective factor accelerating primate brain evolution. Identifying friends and allies, predicting behavior of others, knowing how to form alliances, manipulating group members, making war, love and peace, are important ingredients of primate politics (de Waal, 1982). Thus, there are two interesting aspects of primate sociality: it served as an evolutionary constraint which led to an increase of brain size in primates, which, in return, led to an increased capacity to further develop social complexity. Research in primatology that studies and compares cognitive and behavioral complexity in and among primate species, can shed light on the origins of primate cultures and societies. Particularly relevant for the theme of this chapter are the potential relationships between social complexity and brain evolution. A detailed analysis by Dunbar and his collaborators (Dunbar, 1992, 1993, 1998) suggests that the mean group size N is a function of relative neocortical volume CR (volume of neocortex divided by volume of the rest of the brain (see formula (1) and Figure 1)).
log10 (N ) = 0.093 + 3.389 log10 (CR ) (1)

This correlation does not provide hard evidence, which is fundamentally dicult to obtain for many aspects of the evolution of animal (and human) minds, but it supports the argument that social complexity might have played a crucial role in primate brain evolution. In order to manage larger groups bigger brains might provide the required processing capacity. No such correlates have been found when comparing the increase of neocortex size with the complexity of the environment, such as the size of the home range of a species.3 The causality and complexity of the argument complex social dynamics led to larger neocortices are still not completely understood, but in primatology it is currently widely acknowledged that social complexity provided an important, and possibly causal, factor in the evolution of primate (social) intelligence. How can primate societies cope with an increase in the number of group members and relationships among them? How are social networks and relations established and maintained? How is cohesion and stability preserved? What are the mechanisms that serve as social glue?

The origins of narrative

131

Mean group size

Social Complexity

Neocortex ratio

Figure 1. Group size plotted against neocortex ratio (logarithmic scales). Correlations were found, e.g., in 36 primate genera (Dunbar, 1993). Similar relationships (not necessarily on the same grade as the primate regression) have been found in carnivors, some insectivors (Dunbar and Bever, 1998), cetaceans (Marino, 1996), and some bats (Barton and Dunbar, 1997). Thus, it seems that a common relationship between social complexity and encephalization (relationship between brain size and body size) can be found in animal species that live in stable social groups, although each species might live in very distinctive environments, with very distinctive brains and very dierent evolutionary histories.

2.2 Preserving cohesion in primate societies: Grooming and language Judging from our own experience as a member of human society, communicating via language seems to be the dominant mechanism for the purpose of preserving social cohesion. However, non-human primates in the wild do not seem to use a human-like language. Here, social cohesion is maintained through time by social grooming. Social grooming patterns generally reect social relationships; they are used as a means to establish coalition bonds, for reconciliation and consolation and other important aspects of primate politics. Social grooming is a one-to-one behavior extended over time, that poses particular constraints on the amount of time an animal can spend on it, given other needs such as feeding, sleeping, etc. Also, cognitive constraints limit the complexity of social dynamics that primates can cope with, as discussed in the following paragraph.

132

Kerstin Dautenhahn

Given the neocortical size of modern humans, Dunbar (1993) extrapolated from the non-human primate regression (relative neocortical volume vs. group size) and predicted a group size of 150 for human societies. This number limits the number of relationships that an individual human can remember and monitor. It is the upper group size limit which still allows social contacts that can be maintained and interaction-histories that can be remembered, thus supporting eective coordination of tasks and information-ow via direct person-to-person contacts. The number 150 is supported by analysis of contemporary and historical human societies. But how do humans preserve cohesion in groups of 150 individuals, a function that (physical) social grooming serves in non-human primate societies? In terms of survival needs (resting, feeding, etc.) primates can only aord to spend around 20% of their time on social interactions and social grooming, much less than a group size of 150 requires. It was therefore suggested by Dunbar (1993) that, in order to preserve stability and coherence in human societies, human language has evolved as an ecient mechanism of social bonding, replacing social grooming mechanisms in non-human primate societies where direct physical contact aords only much smaller groups. Following this argument, language allowed an increase in group size while still preserving stability and cohesion within the group. The next section will elaborate this argument further by analyzing what the particular features of communication via language are that make it an ecient social glue in human societies.

3. The Narrative Intelligence Hypothesis According to the primatologist Richard Byrne (Byrne, 1997), in the context of the evolution of human intelligence, the Social Intelligence Hypothesis oers little explanation for the evolution of specic ape and human kinds of intelligence (e.g., involving mental representations): clear evidence for a systematic monkey-ape dierence in the neocortex ratio is lacking. Great apes do not form systematically larger groups than monkeys do, which draws attention to physical rather than social factors (e.g., tool use, processing plant food, etc.). Why have in particular human apes evolved sophisticated representational and mental skills? Are there any candidate factors that could have accelerated the evolution of human intelligence? If the evolution of language played an important role, as suggested by others (e.g., Dunbar, 1993; Donald, 1993), what are the particular characteristics of language that matter?

The origins of narrative

133

3.1 What is special about language? A closer look at the ontogeny of language and narrative, i.e., the role of language in the development of children, can give important hints about the special characteristics of language: Studies in developmental psychology of how children develop narrative skills, show that narratives play a crucial role in how young human primates become socially skilled individuals (cf. Nelson, 1993; Engel, 1995). Narrative psychology considers stories the most ecient and natural human way to communicate, in particular, to communicate about others (Bruner, 1987, 1990, 1991). According to Read and Miller stories are universally basic to conversation and meaning making, and as developmental and cross-cultural studies suggest, humans appear to have a readiness, from the beginning of life, to hear and understand stories (Read and Miller, 1995, p. 143). The Narrative Intelligence Hypothesis (Dautenhahn, 1999) interprets such observations from the ontogeny of human language in the context of primate evolution. It proposes that the evolutionary origin of communicating in stories co-evolved with increasing social dynamics among our human ancestors, in particular the necessity to communicate about third-party relationships (which in humans seems to reach the highest degree of sophistication among all apes (cf. gossip and manipulation (Sinderman, 1982)). According to the NIH, human narrative intelligence might have evolved because the format of narrative is particularly suited to communicate about the social world. Thus, in human evolution, we can observe an evolutionary trend from physical contact (non-human primates) to vocal communication and language (hominids), to communicating in stories (highly enculturated humans living in complex societies), correlated with an increase in complexity and sophistication of social interaction and communication. This trend demonstrates the evolution of increasingly ecient mechanisms for time-sharing the processes of social bonding. While physical grooming is generally a dyadic activity, language can be used in a variety of ways, extending the dyadic use in dialogues to, e.g., one-to-many communication as it is used extensively in the mass media (television, books, email, etc.) today. It can be estimated (Dunbar, 1993) that the human bonding mechanism of language is about 2.8 times as ecient as social grooming (the non-human primate bonding mechanism). Indeed, Dunbars studies indicate that conversational groups usually consist of one speaker plus two or three listeners. Of course larger groups can be formed easily but, in terms of actively participating and following dierent arguments within

134 Kerstin Dautenhahn

the group, 1+2(3) seems to be the upper limit for avoiding information processing overload in the primate social brain. Also, because of its representational nature, language aords documentation, preservation in storage media and transmission of (social) knowledge to the next generation, as well as communication between geographically separated locations (Donald, 1993). 3.2 Narrative, the social context and meaning Discussions in the social domain (e.g., on social relationships and feelings of group members) are fundamentally about personal meaning (Bruner, 1990). Narrative might be the natural format for encoding and transmitting meaningful, socially relevant information (e. g., emotions and intentions of group members). Humans use language to learn about other people and third-party relationships, to manipulate people, to bond with people, to break up or reinforce relationships. Studies suggest that people spend about 60% of conversations on gossiping about relationships and personal experiences (Dunbar, 1993). Thus, a primary role of language might have been to communicate about social issues, to get to know other group members, to synchronize group behavior, to preserve group cohesion. To summarize, the following strategies of coping with a complex social eld in primate societies were outlined in the preceding sections: stage 1: non-verbal, physical, social grooming as a means of preserving group cohesion, limited to one-to-one interaction stage 2: communicating about social matters and relating to others in the narrative format of transactions with non-verbal enacted stories (see Section 4) stage 3: using language and verbal narratives in order to cope with social life The Narrative Intelligence Hypothesis suggests that the evolution and development of human narrative capacities might have gone through these dierent stages, not replacing preceding stages, but adding additional strategies that extend an individuals repertoire of social interaction. These range from physical contact (e.g., in families and very close relationships) to preverbal narrative communication in transactions with others (let alone the subtleties of body language and nonverbal social cues, not necessarily conscious (cf. Hall, 1968; Farnell, 1999)), to developing into a skilled story-teller within the rst years of life and rening these skills throughout ones life. The next section gives a few examples of where we might nd narratives in the behavior of

The origins of narrative

135

humans, other animals, and possibly even artifacts. To begin with, we need to have a closer look at the specic canonical format of narrative.

4. In search for narratives 4.1 What are narratives? Many denitions and theories of narrative exist in the literature. In the following we select and discuss a few denitions. With respect to adult literature and conversation, what we usually mean by narrative is a story with the following structure: First, a certain introduction of the characters is given (making contact between individuals, actors, listener and speaker). Then, the story develops a plot, namely a sequence of actions over time that convey meaning (value, pleasurable, not pleasurable), usually with a high point and a resolution (reinforcement or break-up of relationships), and focuses on unusual events rather than stereotypical events. Note that such a structure is typical for adult narratives. Childrens narratives can have rudiments of this structure but still count as narratives that describe personal experience of the story-teller. Children are not born as perfectly competent story-tellers. The format of narrative story-telling is rapidly learnt by typically developing children during their rst years of life. Here, the social environment is crucially important in developing and shaping childrens narrative skills (Engel, 1995; Nelson, 1989, 1993); narrative skills are socially constructed. The narrative styles and abilities of children develop during their daily interactions and conversations that they participate in and listen to. The environment, e.g., parental input, shapes and inuences this development. Childrens narrative styles and abilities reect particular styles, values and conventions used in families, cultures, etc. Story-telling development is a highly social and interactive process, tutoring is usually informal and playful (Engel, 1995). The typical adult story format of beginning, middle and end is usually mastered by 3-year olds. The following two examples show the dierences between a typical story of a 2-year old girl and a 5-year old boy, both telling the story to their mother:
We went trick and treating. I got candy. A big red lolly-pop and I lost my hat. (Engel, 1995, p. 16) Once there was a monster that lived where other monsters lived just like him. He was very nice. He made bad people good. He lived always happy. He loved

136 Kerstin Dautenhahn

to play with kids. One day he gets caught in a hurricane. The lights went o except there was ashlights there. He jumped into the ocean. He meeted all the sh. And he lived in water. (Engel, 1995, p. 71)

Thus, childrens stories can occur in rudimentary forms (see rst example), or can be elaborated (second example), from which the step towards fully-edged adult story-telling seems relatively small. Note that even the rudimentary story told by a 2-year old is successful in terms of its meaning and its communicative function: a signicant experience in the childs life is recalled and reconstructed. While many discussions of the format of narrative focus on narratives in oral format, in this chapter we refer to narrative in a wider sense, including written and spoken narratives. Note that structural aspects relating to the syntax of a story are not our primary concern. Story grammars (Mandler, 1984), i.e., notational, rule-based systems for describing regularities and formal structures in stories (e.g., in traditional folk-tale or problem-solving stories), are not the kind of narrative formats that are the focus of this article. Instead, we stress the transactional nature of narratives and the way that narratives convey meaning, can create intersubjectivity and are embedded in a social context. According to this perspective, we tentatively propose the following denition of narrative: A narrative consists of verbal (spoken or written) or non-verbally enacted social transactions with the following necessary properties: a. Narratives have an important communicative function. We use stories to guide and shape the way we experience our daily lives, to communicate with other people, and to develop relationships with them. We tell stories to become part of the social world and to know and rearm who we are (Engel, 1995, p. 25). Thus, the major topic of narratives is the social eld, involving transactions of intentional, social agents acting in a social context. Narratives are means to create intersubjectivity between people who communicate with each other, or between ourselves and our former or future self, which leads us to the second important property of narrative: b. Remembered experience, when put in the format of a narrative, allows us to think about the past (Engel, 1995, p. 26) and to go back in time. More generally, narrative extends the temporal horizon from the present (the here and now), to the past (how things used to be) and to the future (how things might be) (cf. Nehaniv et al., 1999). Narratives allow us to travel back and forth in time, to create imaginary or alternative realities, to re-interpret the past, and in this way are fundamentally dierent from communicative non-narrative events that are limited to the immediate present. One might speculate that,

The origins of narrative

137

because narratives extend the temporal horizon, they are crucial to the development of a self (Nelson, 1989, 1993), an autobiographic self. But what is the format of narrative that provides all this, namely, creating intersubjectivity and extending the temporal horizon? We suggest that: c. The narrative follows a particular transactional format which in its simplest form, found in preverbal children, and possibly other non-verbal non-human animals, consists of the following sequence: canonical steady state, precipitating event, restoration, and a coda marking the end. This transactional format was suggested by Bruner and Feldman (1993), see Section 4.2. Other transactional formats of narrative might exist, but, for the purpose of this paper, we focus on this simple format suggested by Bruner and Feldman. In the elds of narrative psychology and narrative intelligence Jerome Bruners theories and work have been very inuential (Bruner, 1987, 1990, 1991). Particularly relevant to this chapter is Bruners notion that stories are primarily dealing with people and their intentions; they are about the social and cultural domain rather than the domain of the physical world. Narratives are often centered towards subjective and personal experience. According to Bruner (1991), narrative is a conventional form that is culturally transmitted and constrained. Narrative is not just a way of representing or communicating about reality, it is constituting and understanding (social) reality. Unlike scripts (Schank and Abelson, 1977) that describe regular events, narratives are about unusual events, things worth telling (Bruner, 1991). Narratives describe people or other intentional and mental agents, acting in a setting in a way that is relevant to their beliefs, desires, theories, values, etc., and they describe how these agents relate to each other. Although narrative capacities (understanding and producing stories) are capacities shaped by society, they clearly develop in an individual (cf. Nehaniv, 1997; Dautenhahn and Coles, 2001) with an important meaning for the individual agent. For example, stories that children tell to themselves play an important part in a childs abilities to make meaning of events (cf. Nelson, 1989; Engel, 1995). Nevertheless, stories, at least for fundamentally social animals such as humans, are most eective in communication in a social context:
We converse in order to understand the world, exchange information, persuade, cooperate, deal with problems, and plan for the future. Other human beings are a central focus on each of these domains: We wish to understand other people and their social interactions; we need to deal with problems

138

Kerstin Dautenhahn

involving others; and other people are at the heart of many of our plans for the future. (Read and Miller, 1995, p. 147)

Human culture has developed various means of artistic expression (sequential visual arts, dance, pantomime, comics, literature, etc.) which are fundamentally narrative in nature, conveying meaning about people and how people relate to the world. Children who are immersed in human culture, exposed to those narratives, develop as skilled story-tellers, as is shown in the following story called Jig Jags Day, written by a 9-year old girl when asked to write a story about a robot. This story and the one mentioned in Section 4.2 were part of a project with typically developing children, summarized in (Bumby and Dautenhahn, 1999). The story ts Bruners criteria very well:
Once there was a robot called Jig Jag and Jig Jag lived in the countryside. One day Jig Jags lights started to ash, that meant that the robot had an idea. I think I will go for a walk, so Jig Jag went into a eld with some sheep in it and the silly robot tried to talk to the sheep, Silly, silly, Jig Jag. Next Jig Jag saw some cows in the next eld, so silly Jig Jag tried to talk to the cows! After that Jig Jag went to the shops, he wanted to buy some bolts and oil. So Jig Jag went into the hardware shop, but poor Jig Jag set the alarm o. So Jig Jag went into another hardware store across the road. So the robot tried to get into the shop but again Jig Jag set the alarm o. So poor Jig Jag had to go home empty handed.

4.2 Narratives and autism Traditionally psychologists interested in the nature and development of narratives have a particular viewpoint of narratives in terms of human verbal storytelling. Interestingly, Bruner and Feldman (1993) proposed the narrative decit hypothesis of autism, a theory of autism that is based on a failure of infants to participate in narrative construction through preverbal transactional formats. Children with autism generally have diculty in communication and social interaction with other people. A variety of competing theories attempt to explain the specic communication and social decits of people with autism (Jordan, 1999). Among them is the well known Theory of Mind (TOM) (cf. Leslie, 1987; Baron-Cohen, 1995). TOM models of mindreading have a clear modular, computational and metarepresentational nature. However, the TOM explanation of autistic decits is controversial and other researchers suggest that primary decits in emotional, interactive, or other factors central to the embodied and intersubjective nature of social understanding, might be causing

The origins of narrative 139

autism (e.g., Rogers and Pennington, 1991; Hobson, 1993). Decits in narrative skills have been observed in children with autism (e.g., Loveland et al., 1990; Charman and Shmueli-Goetz, 1998). Bruner and Feldmans theory suggests that autistic decits in communication and social interaction can be explained in terms of a decit in narrative communication skills. This theory, which diers from TOM, assumes that transactional capacities, and the lack thereof, are at the heart of autistic decits. As we discuss later in this chapter, this work gives important hints about the transactional structure of narratives, a structure that we believe is of wider importance, not limited to the specic context of autism. What exactly is a narrative transactional format? Bruner and Feldman distinguish dierent stages. They suggest that the rst transactional process is about reciprocal attribution of intentionality and agency. The characteristic format of preverbal transactions is, according to Bruner and Feldman, a narrative one, consisting of four stages: 1. 2. 3. 4. canonical steady state precipitating event a restoration a coda marking the end.

An example is the peek-a-boo game where (1) mutual eye gaze is established between infant and caretaker, (2) the caretaker hides her face behind an object, (3) the object is removed revealing the face again, and (4) Boo, marking the end of the game. Let us consider the following story called Weebo, told by an 11-year old girl:
In America there was a professor called Peter Brainared and in 1978 he created a robot called Weebo. Weebo could do all sorts of things: she could create holograms, have a data bank of what the professor was going to do, show cartoon strips of what she was feeling like by having a television screen on top of her head which could open and close when she wanted to tell Peter how she felt. And she could record what she saw on television or what people said to her. Weebo looked like a ying saucer about as big as an eleven year olds head also she could y. Peter Brainared had a girlfriend called Sarah and they were going to get married but he didnt turn up for the wedding because he was too busy with his experiments so she arranged for another one and another one but he still didnt turn up, so she broke o the engagement and when he heard this he told Weebo how much he loved her and she recorded it, went round to Sarahs house and showed her the clip on her television screen to show Sarah how much he loved her and it brought Sarah and Peter back together.

140 Kerstin Dautenhahn

Bruner and Feldmans four stages of the transactional narrative format are clearly identiable in this written narrative: 1. 2. 3. 4. introduction of setting and actors Peter misses the wedding and is sad Weebo comes to the rescue: he shows Sarah how much Peter loves her happy ending: Sarah and Peter are back together

Interestingly, although a central protagonist in the above story is a robot, it is depicted as an intentional agent (Dennett, 1987), embedded in a social context and behaving socially. Bruner and Feldman suggest that problems of people with autism in the social domain are due to an inability early in their lives to get engaged in appropriate transactions with other people. These transactions normally enable a child to develop a narrative encoding of experiences that allows it to represent culturally canonical forms of human action and interaction. Normally, this leads a child, at 23 years of age, to rework experiences in terms of stories until she ultimately develops into a skilled story-teller (Engel, 1995). As research by Meltzo, Gopnik, Moore and others suggests, transactional formats play a crucial role very early in a childs life when she takes the rst steps of becoming a mindreader and socially skilled individual: reciprocal imitation games are a format of interaction that contributes to the mutual attribution of agency (Meltzo and Gopnik, 1993; Meltzo and Moore, 1999), immediate imitation creates intersubjective experience (Nadel et al., 1999). By mastering interpersonal timing and sharing of topics in such dyadic interactions, childrens transition from primary to pragmatic communication is supported. It seems that imitation games with caretakers play an important part in a childs development of the concept of person (Meltzo and Gopnik, 1993; Meltzo and Moore, 1999), and are a major milestone in the development of social cognition in humans. As we mentioned above, studies by Bruner and Feldman (1993) and others (e.g., Loveland et al., 1990) indicate that children with autism seem to have diculty in organizing their experiences in a narrative format, as well as a diculty in understanding the narrative format that people usually use to regulate their interactions. People with autism show a tendency to describe rather than to narrate, lacking the specic causal, temporal and intentional pragmatic markers needed for story-making. A preliminary study with highfunctioning children with autism, reported by Bruner and Feldman (1993), indicates that, although they understood stories (gave appropriate answers

The origins of narrative

141

when asked questions during the reading of the story), they showed great diculty in retelling the story, i.e., composing a story, based on what they knew. The stories they told preserved many events and the correct sequence, but lacked the proper emphasis on important and meaningful events, events that motivated the plot and the actors. The stories lacked the narrative bent and did not conform to the canonical cultural expectations that people expect in ordinary social interaction. Such a lack of meaning-making makes conversations in ordinary life extremely dicult, although, as Bruner and Feldman note, people with autism can show a strong desire to engage in conversations (Bruner and Feldman, 1993). 4.3 Narratives in animal behavior? Stories have an extended temporal horizon, they relate to past and future, they are created depending on the (social) context. Do animals use (non-verbal) narrative formats in transactions? Studies, e.g., with bonobos, Grey parrots and dolphins, on animal language capacities usually focus on teaching the animals a language (using gestures, icons or imitating human sounds), and test the animals language capacities primarily in interactions with humans (SavageRumbaugh et al., 1986; Pepperberg, 1999; Herman, 2002). In the wild, the extent to which animals use a communication system as complex as human language is still controversial. For example, dolphins and whales are good candidates for sophisticated communicators. However, we argue that looking for verbal and acoustic channels of communication might disguise the nonverbal, transactional nature of narratives, as shown in preverbal precursors of narratives in the developing child, and possibly evolutionary precursors of (non-verbal) narrative that can be found in non-human animals. Michael Arbib (2002) proposes an evolutionary origin of human language in non-verbal communication and body language that can be found in many social species (e.g., mammals, birds). He suggests that imitation (and the primate mirror neuron system (Gallese et al., 1996)) provided the major mechanisms that facilitated the transition from body language and nonverbal imitation to verbal communication. Arbibs work supports the arguments as presented in this chapter, namely, proposing a) the existence of a strong link between non-verbal, preverbal and verbal communication, and b) stressing the important role of dynamic formats of interactions, such as imitative games, in the development of social communication. With this focus on interactional structure and non-verbal narratives, what can stories in non-human primate species look like, and how can we recognize

142 Kerstin Dautenhahn

them? To date we are not aware of any hard empirical evidence for storytelling capacities in non-human animals. However, it is known that primates are excellent politicians in primate societies, involving extensive knowledge about direct (one-to-one) and third-party relationships. Primate behavior is not conned to fullling their immediate biological needs. Actions taken by an individual need to consider the social context, the primate social eld. Primatologists know numerous examples of interactions that cannot be understood without assuming that the animals are aware of the social context. Note that any description of animal behavior can be biased by the narrative mind of the human observer, the story-teller. When watching a paramecium under a microscope, we can use our imagination to make-up a story about an intentional agent that is hungry, chases prey, searches for a mate, etc. However, in the case of single-cell organisms, it is safe to assume that their social eld is far less developed (if at all) than in primate or other social species. Because of this danger of using imagination and anthropomorphism to attribute a narrative structure to animal behavior, below we give examples of stories of animal behavior told by primatologists who have been working for many years with their subjects, and who are more likely than untrained observers to report on observable sequences of events and their own well informed interpretations of the animals intentions and motivations. Let us consider Frans de Waals description of an event of reconciliation in chimpanzees.
On this occasion Nikkie, the leader of the group, has slapped Hennie during a passing charge. Hennie, a young adult female of nine years, sits apart for a while feeling with her hand on the spot on her back where Nikkie hit her. Then she seems to forget the incident; she lies down in the grass, staring in the distance. More than fteen minutes later Hennie slowly gets up and walks straight to a group that includes Nikkie and the oldest female, Mama. Hennie approaches Nikkie, greeting him with soft pant grunts. Then she stretches out her arm to oer Nikkie the back of her hand for a kiss. Nikkies hand kiss consists of taking Hennies whole hand rather unceremoniously into his mouth. This contact is followed by a mouth-to-mouth kiss. Then Hennie walks over to Mama with a nervous grin. Mama places a hand on Hennies back and gently pats her until the grin disappears. (de Waal, 1989, pp. 39, 42)

This example shows that the agent (Hennie) is interacting with an eye to future relationships, considering past and very recent experiences. Hennie, Nikkie and Mama have histories, autobiographic histories as individual agents (Dautenhahn, 1996), as well as a history of relationships among each other and as

The origins of narrative 143

members of a larger group. Although the event might be interpreted purely on the basis of behavioristic stimulus-response rules, for many primatologists the interpretation of the event in terms of intentional agents and social relationships is the most plausible explanation. Interestingly, Hennies interaction with Nikkie can be interpreted in terms of the canonical format of narrative transactions among intentional agents described in Section 4.2: 1. canonical state: greeting: soft pant grunts 2. precipitating event: Hennie reaches out to Nikkie (attempt at reconciling relationship) 3. restoration: kissing (relationship is restored) 4. end: Hennie is comforted by Mama The second example we discuss is a dierent type of primate social interaction, namely, tactical deception whereby the animal shifts the targets attention to part of its own body. In this particular case the animal (a female Olive baboon) distracts the target (a male Olive baboon) with intimate behavior.
One of the female baboons at Gilgil grew particularly fond of meat, although the males do most of the hunting. A male, one who does not willingly share, caught an antelope. The female edged up to him and groomed him until he lolled back under her attentions. She then snatched the antelope carcass and ran. Cited in (Whiten and Byrne, 1988, p. 217)

Here, the analysis in terms of transactional narrative formats looks as follows: 1. canonical state: male brings antelope, female waits 2. precipitating event: distraction by grooming 3. restoration: female snatches food and runs away (resolution, female achieves goal) 4. end: female eats meat (not described) Episodes of animal behavior as described above are very dierent from other instances of structured and sequential animal behavior, such as the chase-tripbite hunting behavior of cheetahs. Also, the alarm calls of vervet monkeys (Cheney and Seyfarth, 1990), although serving an important communicative function in a social group and having a component of social learning, are not likely to be narrative in nature. It is not the short length of such calls that makes it dicult to interpret them in terms of narrative, it is the fact that their primary function is to change the behavior of others as a response to a non-social stimulus, i.e., the sight of a predator, causing an appropriate behavior such as

144 Kerstin Dautenhahn

running to the trees after hearing a leopard alarm. The narrative format in animal behavior, on the other hand, refers to communicative and transactional contexts where communication is about the social eld, i.e., group members, their experiences and relationships among them. Narratives are constructed based on the current context and the social context (communicator/speaker plus recipients/audience). The primate protagonists described above apparently interacted with respect to the social context, i.e., considering the social network and relationships among group members, with the purpose of inuencing and manipulating others mental states. Thus, such kinds of non-verbal narratives are fundamentally social in nature. Table 1 summarizes the role of narratives in human ontogeny and phylogeny as discussed above.
Table 1.
Human Ontogeny Transactions with a narrative format in infant-caretaker interactions (Dyadic interactions, direct relationships, preverbal children) Language in a narrative format: narratives spoken, written (Direct and third party relationships, verbal humans) Primate Phylogeny Primary mechanism for social bonding

Non-verbal transactions in Grooming narrative format: narratives enacted in primate social interactions (Direct and third party relationships, primates) Language in a narrative format: narratives spoken, written (Direct and third party relationships, humans) Language

A lot more work is necessary for a more detailed analysis of narrative formats in animal behavior. For example, the characteristics of the transactional format that Bruner and Feldman (1993) suggested need to be elaborated, possibly revised or replaced, and might need to be adapted to specic constraints of the primate social eld. So our interpretation can only give a rst hint of what aspects one might be looking for when searching for narrative formats in animal behavior.

The origins of narrative

145

5. How could the narrative intelligence hypothesis be tested? If human language and narrative intelligence, rooted in nonverbal narrative skills in non-human primates, have evolved to deal with an increasing need to communicate in more and more complex societies, what predictions can be made based on this hypothesis? How could the Narrative Intelligence Hypothesis be tested? What are important research directions based on the importance of narrative in animals and artifacts? Let us rst consider how the NIH might be tested or falsied. As with other hypotheses on the origin of primate/human intelligence and language, animal behavior and communicative abilities are not directly documented in the fossil record. They can only be inferred indirectly from anatomical features (e.g., the vocal system that is necessary to produce a human-like language) and remains that indicate social structures (e.g., remains of nests or resting places, or groups of animals that died together). However, recent primate species that could serve as models of ancestors of the human species might give clues of what groups of primate species one might analyze if one wants to trace the origins of human narrative intelligence. Possible narrative structures conrmed in primate behavior might then be correlated with the complexity of the social eld in these species. Todays primates show a great variety of social organizations and group living. The Narrative Intelligence Hypothesis would predict that comparative studies of communicative and, in particular, narrative formats of interactions across primates species with dierent social organizations can identify a correlation between the complexity of the narrative format and an increasing complexity of the primate social eld. Such an increase of social complexity need not be limited to group size. It could also cover all other aspects of social complexity, such as an increasing number of dierent types of interactions and roles of group members, and the dynamics of how the social network can change and adapt to changes. Such stages of social organization can be related to behavioral as well as cognitive and mental capacities of primates. The NIH suggests a search for the narrative format in interactions, a format that is so eciently suited to communicate and deal with the complexity of social life. What kind of research directions and research methods could the NIH inspire?

Testing with robotic and computational models Robots have been increasingly used as models for understanding behavior, and sensori-motor control, in humans and other animals. Similarly, robots might

146 Kerstin Dautenhahn

have their place in the study of the origins of narrative intelligence. In an initial study (Dautenhahn and Coles, 2001) we investigated precursors of narrative based on episodic memory in autonomous robots. Following a bottom-up, Articial Life approach towards narrative (Dautenhahn and Nehaniv, 1998; Nehaniv and Dautenhahn, 1998), we studied a single robot that could remember sequences of events (pre-narratives). A particular goal in this project was to study minimal experimental conditions of how story-telling might emerge from episodic memory. An initial experiment (Dautenhahn and Coles, 2001) showed that story-telling could be benecial even to a single agent (cf. Nehaniv, 1997), since it increased the behavioral variability of the robot. The benet of communicating episodic memory has also been shown in multi-agent simulation studies (Ho et al., 2004). Such research with an experimental, computational and robotic test-bed demonstrates a bottom-up approach towards studying narrative and how it can arise and evolve from pre-narrative formats (e.g., episodic memory abilities and formats that are necessary, but not sucient, for narratives, as discussed in previous sections) in agents and agent societies. Also, it can provide a means to design and study narrative robots with meaningful narratives that are grounded in the robots own experiences, and means of interacting, with the world and other agents (including robots), so as to contribute to the robots agenda to survive. The work described above indicates how artifacts might be used as scientic instruments to explore and experimentally test the design space of narrative intelligence. Narratives in this sense need to have a meaning for an (intentional) agent. The approach of using artifacts as experimental test beds has been used successfully for many years in the areas of Adaptive Behavior and Articial Life, yielding many interesting results that (a) help understand animal behavior and (b) help design life-like artifacts, in this case artifacts with narrative skills.

Study and analysis of animal narrative capacities Since the Narrative Intelligence Hypothesis does not assume any fundamentally novel development in the transition from nonverbal (through evolution) or preverbal (development) to verbal narrative intelligence, a detailed study and analysis of the structure and format of animal narrative communication is required in order to develop a proper theory. Many animal species are highly social and use non-verbal means of body language in interaction and communication. Narrative intelligence has a communicative function (as a means of discourse and dialogue). However, it also has an individual dimension (understanding and thinking in terms of narrative, recreating a self ). Revealing

The origins of narrative 147

narrative structure in animal communication might, therefore, further our understanding of meaningful events in the lives of these animals.

Interesting open research questions (this is not an exhaustive list) Relationship between preverbal and verbal narrative intelligence in humans (ontogeny) Relationship between nonverbal narrative intelligence in non-human animals and narrative intelligence in humans (phylogeny) The format of nonverbal narrative intelligence in animals (Species specic? Specic to social organization of animal societies?) Can we identify narrative formats of interaction in dierent animal species?
The work presented in this chapter is a small rst step towards developing a theory of narrative that shows the evolutionary and developmental continuum of narrative capacities in humans and other animals. However, if, as we argued above, narrative and the narrative formats of transaction are deeply rooted in our ontogeny and phylogeny, then these provide important constraints and requirements for the design of artifacts that can meet the cognitive and social needs of Homo narratus.

6. Homo narratus: Implications for Human Society and Technology There are many implications of the Social Brain Hypothesis and the Narrative Intelligence Hypothesis for technology development. Human cognitive and narrative capacities are constrained by evolution and development. Even technological extensions and enhancements (new media, new means of communication, new interfaces and implants) need to operate within the boundaries set out by biology. Firstly, imagined relationships might stand in for human beings, in particular when the real social network is smaller than 150. With the help of book, television or email we can easily know (by name or sight) more than 150 people, e.g., have more than 150 phone numbers stored in our mobiles database. However, these are not the types of individually known kin, friends, allies, or even enemies who are mutually known over an extended period of time so that the term relationship applies. In particular, mass media such as television can give us the illusion that we know news presenters, talk show hosts, movie stars, comic or video game characters, etc. The roles of friends and

148 Kerstin Dautenhahn

social partners might be lled by such imagined partners, and might serve a role similar to real human networks (Dunbar, 1996). However, any such relationships are uni-directional; feelings such as love and admiration can only be expressed from a distance and will (realistically) not be returned. Recently emerging interactive agent technology adds another dimension to such imagined friends: virtual or robotic agents that give the illusion of life, namely, show appearance and behavior of real humans, such as embodied conversational agents (Cassell et al., 2000). However, no matter how many virtual and robotic friends will become members of our social network, these extensions are not without limits. There are biological limits, constrained by the cognitive group size limit of 150 that characterizes the size of social networks of human primates. As Dunbar argues (1996), modern information technology might change a number of characteristics of how and with whom and with what speed we communicate, but will not inuence the size of social networks, nor the necessity of direct personal contact that is needed to provide trust and credibility to social relationships. Yet underlying it all are minds that are not innitely exible, whose cognitive predispositions are designed to handle the kinds of small-scale societies that have characterized all but the last minutes of our evolutionary history. (Dunbar, 1996, p. 207). We cannot escape our biology, as Dunbar (1992, p. 469) put it: species will only be able to invade habitats that require larger troops than their current limit if they evolve larger neocortices. Consequently, for us to exceed the magic number 150, our environmental niche would have to change so that larger group sizes have a selective advantage and biological evolution (if it still applies to the human species today) can select for larger neocortices. Expanding this argument to a hypothetical super-human species that might evolve, we might speculate that this Homo narratus would have enhanced narrative intelligence that enables the species to deal with an increasing group size. It is impossible to predict what the stories of the future might look and sound like: Will they be beautifully complex and experience rich? Will language itself have changed, adapting to an enhanced need to deal with a complex social eld? Generally, we can expect that empowering human skills of forming and maintaining social networks might be advanced by supporting the development of narrative skills in children and adults. As we have shown in this chapter, narratives are not only entertaining and fun; they serve an important cognitive function in the development of social cognition and a sense of self (Dennett, 1989). Humane technology needs to respect human narrative grounding (Nehaniv 1999).

The origins of narrative 149

The narratives of the future might reect our ability to preserve coherence and structure in human societies that consist of increasingly fragmented, temporally and geographically distributed, social networks. In shaping this development it is important to investigate the evolutionary heritage of our narrative capacities and the natural boundaries it provides. Also, appreciating the stories other non-human animals tell will allow us to put our familiar stories-aswe-know-them into the broader perspective of stories-as-they-could-be.

Notes
* I would like to thank Barbara Gorayska and three anonymous reviewers for very helpful comments on a previous version of this paper. Chrystopher Nehaniv helped with many discussions on narrative over the past few years. Penny Stribling and Tim Luckett gave me very useful pointers to literature on autism and narrative. 1. This article is a modied version of K. Dautenhahn (2001). See also related work in (Dautenhahn, 1999) and (Dautenhahn, 2003). 2. The relationships between narrative, on the one hand, and culture and autobiography, on the other hand, are only touched upon in this chapter but are discussed in more detail elsewhere (Dautenhahn, 1999; Dautenhahn, 2003). 3. Note that group size as such is not the only indicator of social complexity: other researchers have found, e.g., that primate species with relatively larger neocortices exhibit more complex social strategies than species with smaller neocortices (Pawlowski et al., 1998).

References
Arbib, M. (2002). The mirror system, imitation, and the evolution of language. In K. Dautenhahn & C. L. Nehaniv (Eds.), Imitation in Animals and Artifacts, Cambridge, MA; MIT Press. Baron-Cohen, S. (1995). Mindblindness. Cambridge, MA, London, England: A Bradford Book, The MIT Press. Barton R. A. & R. I. M. Dunbar (1997). Evolution of the social brain. In A. Whiten & R. W. Byrne (Eds.), Machiavellian Intelligence II: Extensions and Evaluations, pp. 240263. Cambridge: Cambridge University Press. Bruner, J. (1987). Actual Minds, Possible Worlds. Cambridge, MA: Harvard University Press. Bruner, J. (1990). Acts of Meaning. Cambridge, MA: Harvard University Press. Bruner, J. (1991). The Narrative Construction of Reality. Critical Inquiry 18(1), 121. Bruner, J. & C. Feldman (1993). Theories of mind and the problem of autism. In S. BaronCohen, H. Tager-Flusberg, D. J. Cohen (Eds.), Understanding other Minds: Perspectives from Autism. Oxford: Oxford University Press.

150 Kerstin Dautenhahn

Bumby, K. & K. Dautenhahn (1999). Investigating Childrens Attitudes Towards Robots: A Case Study. In K. Cox, B. Gorayska & J. Marsh (Eds.), Proceedings of the. Third International Conference on Cognitive Technology: Networked Minds (CT99), pp. 359374. (Available at www.cogtech.org) Byrne, R. W. (1997). Machiavellian intelligence. Evolutionary Anthropology 5, 172180. Byrne, R. W. & A. Whiten (Eds.) (1988). Machiavellian Intelligence. Oxford: Clarendon Press. Cassell, J., J. Sullivan, S. Prevost & E. Churchill (Eds.) (2000). Embodied Conversational Agents. Cambridge, MA: MIT Press. Charman, T. & Y. Shmueli-Goetz (1998). The relationship between theory of mind, language, and narrative discourse: an experimental study. Current Psychology and Cognition 17(2), 245271. Cheney, D. L. & R. M. Seyfarth (1990). How Monkeys See the World. Chicago: University of Chicago Press. Dautenhahn, K. (1996). Embodiment in animals and artifacts. In Proceedings of the AAAI Symposium on Embodied Cognition and Action, pp. 2732. Menlo Park, California: AAAI Press. Dautenhahn, K. (1999). The lemurs tale Story-telling in primates and other socially intelligent agents. In M. Mateas & P. Sengers (Eds.), Proceedings of the AAAI Symposium on Narrative Intelligence, pp. 5966. Menlo Park, California: AAAI Press. Dautenhahn, K. (2001). The Narrative Intelligence Hypothesis: In Search of the Transactional Format of Narratives in Humans and Other Animals. In M. Beynon, C. L. Nehaniv & K. Dautenhahn (Eds.), Proceedings of the Fourth International Cognitive Technology Conference, CT2001: Instruments of Mind, pp. 248266. Berlin: Springer Verlag. Dautenhahn, K. (2003). Stories of Lemurs and Robots The Social Origin of Story-Telling. To appear in M. Mateas & P. Sengers (Eds.), Narrative Intelligence, pp. 6390. Amsterdam & Philadelphia: John Benjamins. Dautenhahn, K. & S. Coles (2001). Narrative Intelligence from the bottom up: A computational framework for the study of story-telling in autonomous agents. Journal of Articial Societies and Social Simulation (JASSS) 4(1), January 2001. Dautenhahn, K. & C. L. Nehaniv (1998). Articial life and natural stories. In Proceedings of the. Third International Symposium on Articial Life and Robotics, Volume 2, pp. 435439. Dennett, D. C. (1987). The intentional stance. Cambridge, MA: MIT Press. Dennett, D. C. (1989/91). The origins of selves. Cogito 3, 16373, Autumn 1989. Reprinted in D. Kolak and R. Martin (Eds.) (1991), Self & Identity: Contemporary Philosophical Issues. New York: Macmillan. de Waal, F. (1982). Chimpanzee Politics: Power and sex among apes. London: Jonathan Cape. de Waal, F. (1989). Peacemaking among Primates. Cambridge, MA: Harvard University Press. Donald, M. (1993). Precis of Origins of the modern mind: Three stages in the evolution of culture and cognition. Behavioral and Brain Sciences 16, 737791. Dunbar, R. I. M. (1992). Neocortex size as a constraint on group size in primates. Journal of Human Evolution 20, 469493. Dunbar, R. I. M. (1993). Coevolution of neocortical size, group size and language in humans. Behavioral and Brain Sciences 16, 681735.

The origins of narrative

151

Dunbar, R. I. M. (1996). Grooming, Gossip and the Evolution of Language. London, Boston: Faber and Faber Limited. Dunbar, R. I. M. (1998). The social brain hypothesis. Evolutionary Anthropology, 6, 178190. Dunbar, R. I. M. & J. Bever (1998). Neocortex size predicts group size in carnivores and some insectivores. Ethology 104, 695708. Engel, S. (1995/99). The Stories Children Tell: Making Sense of the Narratives of Childhood. New York: W. H. Freeman and Company. Farnell, B. (1999). Moving Bodies, Acting Selves. Annual Review of Anthropology 28, 341373. Gallese, V., L. Fadiga, L. Fogassi & G. Rizzolatti (1996). Action recognition in the premotor cortex. Brain 119, 593609. Grin, D. R. (1976). The question of animal awareness: Evolutionary continuity of mental experience. New York: The Rockefeller University Press. Hall, E. T. (1968). Proxemics. Current Anthropology 9(23), 8395. Herman, L. M. (2002). Vocal, social, and self imitation by bottlenosed dolphins. In K. Dautenhahn & C. L. Nehaniv (Eds.), Imitation in Animals and Artifacts. Cambridge, MA: MIT Press. Ho, W. C., Dautenhahn, K., Nehanv C. L. & R te Boekhors (2004). Sharing memories: An experimental investigation with multiple autonomous autobiographical agents. In F. Groen, N. Amato, A. Bonarini, E. Yoshida & B. Krse (Eds.), Intelligent Autonomous Systems 8 (IAS8). IOS Press, pp. 361370. Hobson, P. (1993). Understanding persons: the role of aect. In S. Baron-Cohen, H. TagerFlusberg & D. J. Cohen (Eds.), Understanding other minds, Perspectives from autism, pp. 204227. Oxford: Oxford University Press. Jordan, R. (1999). Autistic Spectrum Disorders: An introductory handbook for practitioners. London: David Fulton Publishers. Leslie, A. M. (1987). Pretence and representation: The origins of Theory of Mind. Psychological Review 94 (4), 412426. Loveland, K. A., R. E. McEvoy & B. Tunali (1990). Narrative story telling in autism and Downs syndrome. British Journal of Developmental Psychology 8, 923. Mandler, J. M. (1984). Stories, scripts, and scenes: Aspects of schema theory. Hillsdale, New Jersey: Lawrence Erlbaum Associates. Marino, L. (1996). What can dolphins tell us about primate evolution? Evolutionary Anthropology 5(3), 8186. Meltzo, A. N. & A. Gopnik (1993). The role of imitation in understanding persons and developing a theory of mind. In S. Baron-Cohen, H. Tager-Flusberg & D. J. Cohen (Eds.), Understanding other minds, Perspectives from autism, pp. 335366. Oxford: Oxford University Press. Meltzo, A. N. & M. K. Moore (1999). Persons and representation: why infant imitation is important for theories of human development. In J. Nadel & G. Butterworth (Eds.), Imitation in Infancy, pp. 935. Cambridge: Cambridge University Press. Nadel, J., C. Guerini, A. Peze & C. Rivet (1999). The evolving nature of imitation as a format of communication. In J. Nadel & G. Butterworth (Eds.), Imitation in Infancy, pp. 209234. Cambridge: Cambridge University Press.

152

Kerstin Dautenhahn

Nehaniv, C. L. (1997). Whats Your Story? Irreversibility, Algebra, Autobiographic Agents. In K. Dautenhahn (Ed.), Proceedings of the AAAI Symposium on Socially Intelligent Agents, pp. 150153. Menlo Park, California: AAAI Press. Nehaniv, C. L. (1999). Story-Telling and Emotion: Cognitive Technology Considerations in Networking Temporally and Aectively Grounded Minds. In K. Cox, B. Gorayska & J. Marsh (Eds.), Proceedings of the. Third International Conference on Cognitive Technology: Networked Minds (CT99), pp. 313322. (Available at www.cogtech.org) Nehaniv, C. L. & K. Dautenhahn (1998). Embodiment and Memories Algebras of Time and History for Autobiographic Agents. In R. Trappl (Ed.), Proceedings of the 14th European Meeting on Cybernetics and Systems Research, pp. 651656. Vienna: Austrian Society for Cybernetic Studies. Nehaniv, C. L., K. Dautenhahn & M. J. Loomes (1999). Constructive Biology and Approaches to Temporal Grounding in Post-Reactive Robotics. In G. T. McKee & P. Schenker (Eds.), Sensor Fusion and Decentralized Control in Robotics Systems II, Proceedings of The International Society for Optical Engineering (SPIE), Volume 3839, pp. 156167. Nelson, K. (Ed.) (1989). Narratives from the crib. Cambridge, MA: Harvard University Press. Nelson, K. (1993). The psychological and social origins of autobiographical memory. Psychological Science 4(1), 714. Pawlowski, B., C. B. Lowen & R. I. M. Dunbar (1998). Neocortex size, social skills and mating success in primates. Behaviour 135, 357368. Pepperberg, I. M. (1999). The Alex Studies. Cognitive and Communicative Abilities of Grey Parrots. Cambridge, MA: Harvard University Press. Read, S. J. & L. C. Miller (1995). Stories are fundamental to meaning and memory: For social creatures, could it be otherwise? In R. S. Wyer (Ed.), Knowledge and Memory: the Real Story, pp. 139152. Hillsdale, N J: Lawrence Erlbaum Associates. Rogers, S. J. & B. F. Pennington (1991). A theoretical approach to the decits in infantile autism. Development and Psychopathology 3, 137162. Savage-Rumbaugh, E. S., K. McDonald, R. A. Sevcik, W. D. Hopkins & E. Rubert (1986). Spontaneous symbol acquisition and communicative use by pygmy chimpanzees (Pan paniscus). Journal of Experimental Psychology: General 115, 211235. Schank, R. C. & R. P. Abelson (1977). Scripts, Plans, Goals and Understanding: An Inquiry into Human Knowledge Structures. Hillsdale, NJ: Erlbaum. Sindermann, C. J. (1982). Winning the Games Scientists Play. New York & London: Plenum Press. Turner, M. (1996). The Literary Mind. Oxford: Oxford University Press. Whiten, A. & R. W. Byrne (1988). The manipulation of attention in primate tactical deception. In R. W. Byrne & A. Whiten (Eds.), Machiavellian Intelligence pp. 211237. Oxford: Clarendon Press. Whiten, A. & R. W. Byrne (Eds.) (1997). Machiavellian Intelligence II: Extensions and Evaluations. Cambridge: Cambridge University Press.

The semantic web


Knowledge representation and aordance*
Sanjay Chandrasekharan
Carleton University, Ottawa

Introduction
The World Wide Web is a complex socio-technical system, and can be understood in many ways. One dominant view looks at the Web as a knowledge repository, albeit a very disorganized one, and the challenge is to get the maximum knowledge out of it in the minimum time possible. Most of the Semantic Web eort (which develops standards for metadata), and the work on search engines, assume this view of the Web. However, theres another way of understanding the web, which is to view it as an action-enabling-space, where you can buy, sell, bid, book, gamble, play games, debate, chat, etc. There is not much of an eort to understand and classify the web from this point of view. For instance, there are no search engines that allow you to search exclusively for possible actions, like sending_owers, buying_tickets, booking_rooms, etc., though all these are activities possible over the Web. And there are no action metatags. The primary reason for this absence is the overarching nature of the rst view the web-as-information one which subsumes the action-space view. This results in information about actions being treated as just another kind of information. So, if you need to know about buying tickets and booking rooms, you search Google (or Froogle). And if you need to execute the action of buying tickets or booking a room, you search Google again, probably using the same keywords. In this chapter, I make a distinction between these two ways of approaching the Web, and argue that the design of the Semantic Web should focus more on possible actions humans and articial agents can execute on the Web. This means we should develop ways to distinguish between search for actions and

154

Sanjay Chandrasekharan

search for knowledge. In particular, I argue that the current design of topdown, exhaustive ontologies does not consider the representation of possible actions on the Web. The following are the two major theoretical assumptions of this chapter: The web is a world-mediating system (Clark, 2001). According to Clark, it mediates between users and a part of the world, often by manipulating machine representations of the world. State changes in the software system may cause state changes or side eects in the real world. In his article in xml.com, Clark explains this notion using the following example:
Consider a web-based banking application. Performing banking tasks by using a web application is functionally equivalent to performing them at the banks physical location. There are obvious phenomenological dierences to the user in each case, but there arent any dierences to the users bank account. A $100 withdrawal from a teller is equivalent, in all respects relevant to the bank account itself, to a $100 web application withdrawal. A web-based funds transfer just is a funds transfer, as a matter, among other things, of convention and institutional fact.

From this view, of the web as a world-mediating or action-mediating space, it follows that the development of the Semantic Web (developing tags that allow documents and other entities to describe themselves to programs), involves building action-infrastructure for agents, both human and articial ones. That is, the design of the Semantic Web is akin to designing environments that support human actions in the world environments like cockpits, kitchens and studios. The Semantic Web eort is thus about designing action-enabling information structures in the web world, to t the actions web agents want to perform. The dierence from cockpits and kitchens is that the actions performed on the web are based on linguisticacts, and therefore the environment designed to t those actions is also a linguistic one.

The view of the web as world-mediating and action-enabling turns the structure provided by the Semantic Web into aordances for action1 (Norman, 1993 and 1998; Reed, 1996; Gibson 1979), or action-oriented-representations (Clark, 1997). However, the commonly accepted view is that Semantic Web structures are knowledge representation structures, designed to facilitate knowledge recovery and inference. So should the Semantic Web be creating aordances or knowledge representation? Or both? Is there a distinction between the two? What advantage, if any, do aordances oer? How do

The semantic web

155

aordances t in with current approaches to agent design? These are the questions I will be tackling in this chapter. The chapter is structured as follows. In Section 1, I will describe briey a framework for understanding how humans and other organisms generate structures in the world to help them (or others) perform actions better. In Section 2, I introduce a classication of agent-environment relationships, to understand how adding structure to the world ts in with current agent design methodologies. In Section 3, I consider the Semantic Web as an instance of changing the world for better cognition. Section 4 applies the insights gained from the previous sections to the design of ontologies. Section 5 discusses the design of category-based ontologies and aordance-based ontologies from a cognitive load-balancing point of view. Section 6 considers the question we started with whether the Semantic Web should provide aordances or knowledge representation and provides the conclusion.

1.

Distributed cognition and the web

When organisms generate structures in the world for action, it results in tailoring the world to the agents capabilities, in such a way that the world contributes to cognition and action at run-time. The generation of such congenial structures for action, in physical and representational environments, is explored by the Distributed Cognition (DC) framework (Hutchins, 1995; Hollan et al., 2000; Kirsh, 2001). Within Distributed Cognition, Kirsh (1996), and to some extent Hutchins (1995) have explored such world-changing in detail. Kirshs analysis considers how animals change their environment to make their tasks easier. He identies two kinds of structure animals create in the environment, physical and informational. An example of physical structure created in the environment for improved action is tools used by animals for instance Caledonian crows using twigs to probe out insects from the ground. The crows even redesign their tools, by making probes out of twigs bitten from living trees, and even wires in laboratory conditions, as illustrated by Weir and colleagues recently (Weir et al., 2002). An example of informational structure created for action is people reorganizing their cards in a game of gin rummy. In this case, the player is using the cards to encode his plans externally. The card grouping tells the player what she needs to do, she does not have to remember it. In Kirshs terms, the player

156 Sanjay Chandrasekharan

makes a call to the world when he uses the grouping of the cards, thus making the world part of cognition. The gin rummy algorithm is distributed across the player and the card set. The action of sorting the card set reorganizes the environment for mental rather than physical savings. Kirsh and Maglio (1994) term these kind of actions epistemic actions as dierent from pragmatic actions. Epistemic action changes the world to provide agents knowledge, pragmatic action changes the world for the actual physical execution of the task. According to Kirsh and Maglio, the rst kind of structures created in the environment, informational structures, furthers cognitive congeniality, as against physical congeniality. We will term such structures, which improve cognitive congeniality for agents, epistemic structure. Many animals create epistemic structures in the world to reduce their own and others cognitive complexity. Wood mice (Apodemus sylvaticus) distribute small objects, such as leaves or twigs, as points of reference while foraging. They do this even under laboratory conditions, using plastic discs. Such waymarking diminish the likelihood of losing interesting locations (Stopka and MacDonald, 2003) during foraging. Red foxes (Vulpes vulpes) use urine to mark food caches they have emptied. This marking acts as a memory aid and helps them avoid unnecessary search (Henry, 1977, reported in Stopka and MacDonald, 2003). Ants drop pheromones to trace a path to a food source. Many mammals mark up their territories. Plants develop colors and smells to attract pollinators, sometimes even to ght predators (Heiling et al., 2003; Beck, 2001). The bower bird creates colorful nests to attract mates (Zahavi and Zahavi, 1997). Many birds advertise their desirability as mates using some form of external structure, like colorful tails, bibs etc. (Bradbury and Vehrencamp, 1998). Other animals have signals that convey important information about themselves to possible mates and even predators (Zahavi and Zahavi, 1997). At the most basic level, cells in the immune system use antibodies that bind to attacking microbes, thereby marking them. Macrophages use this marking to identify and destroy invading microbes. Bacterial colonies use a strategy called quorum sensing to know that they have reached critical mass (to attack, to emit light, etc.). This strategy involves individual bacteria secreting molecules known as auto-inducers into the environment. The auto-inducers accumulate in the environment, and when it reaches a threshold, the colony moves into action (Silberman, 2003). These kind of structures (usually termed signaling) form a very important aspect of animal life across biological niches. We will use one case of signaling to analyze the advantages provided by changing the informational structure of the environment. Consider the peacocks tail, the paradigmatic instance of an

The semantic web

157

animal signal. The tails function is to allow female peacocks (peahens) to make a mating judgment, by selecting the most-healthy male (Zahavi and Zahavi, 1997). The tail reliably describes the inner state of the peacock, that it is healthy (and therefore has good genes). The signal is reliable because it pays only a peacock with enough resources to produce a amboyant tail. If you are a sickly male, you cannot spend resources to produce ornaments. Thus the health of the peacock is directly encoded in the tail; the peacock carries its internal attributes on its tail, so to speak. To see the cognitive eciency of this mechanism, imagine the peahen having to make a mating decision without the existence of such a direct and reliable signal. The peahen will need to have a knowledgebase of how the internal state, of health, can be inferred from behavioral and other cues. Lets say good dancing, lengthy chase of prey, long ights (peacocks y short distances), tough beak and good claws are cues for the health of a peacock. To arrive at a decision using these cues, rst the peahen will need to know these cues, and that some combinations of them imply that the male is healthy. Armed with this knowledge, the female has to sample males for an extended period of time, and go through a lengthy sorting process based on the cues (rank each male on each of these cues: good, bad, okay). Then it has to compare the dierent results, keeping all of them in memory, to arrive at an optimal mating decision. This is a computationally intensive process. The tail allows the female peacock to shortcut all this computation, and go directly to the mosthealthy male in a lot. The tail provides the peahen a single, chunked, cue, which it can compare with other similar ones to arrive at a decision. The tail provides a standardized way of arriving at a decision, with the least amount of computation. The peacock describes itself using its tail. Reliable self-description, like the peacocks tail, is one of natures ways of avoiding long-winded sorting and inference. In using self-describing metadata structures like XML, we are seeking to emulate natures design in the Semantic Web. The peacock example (and others above) shows that the reduction of others cognitive complexity using externally stored informational structures is very common, and it can be considered one of the building blocks of nature. Signaling exists at all levels of nature, from bacteria to plants, crickets, gazelles and humans. Note that the signal provides cognitive congeniality to the receiver, and not to the sender. The sender, for instance the peacock, gains because he has an interest in being selected for mating. This is very similar to the case of semantic mark-up, where the cognitive congeniality (less processing load) is for the reader, the encoder does the marking up because she stands to gain in some way.

158

Sanjay Chandrasekharan

Even though signaling is a basic structure of cognition, it has received very little attention from agent design methodologies. Many researchers have considered the role of stigmergy in changing environment structure. Stigmergy is a coordination mechanism where the action of one individual in a colony triggers the next action by others (Susi, 2001). It is a form of indirect communication, and has been a favoured mechanism for situated AI because it avoids the creation of explicit representations. Signaling, on the other hand, is closer to being a representation, and therefore more useful in understanding the creation of representations, like in the case of socio-technical systems like the Web. In the following section I develop a framework to understand how epistemic structures like signals t in with agent-environment relationships in current agent design.

2. Agent design based on epistemic structure I categorize agent design into four frameworks. To illustrate these four frameworks, I will use the problem of providing disabled people access to buildings. There are four general approaches to solve this problem. Approach I: This involves building an all-powerful, James Bond-style, vehicle that can function in all environments. So it can run, jump, y, climb spiral stairs, raise itself to high shelves, detect curbs, etc. This design does not incorporate detailed environment structure into the vehicle, it is built to overcome limitations of all environments. Approach II: This involves studying the vehicles environment carefully and using that information to build the vehicle. For instance, the vehicle will take into account the existence of curbs (and them being short), stairs generally being non-spiral and having rails, level of elevator buttons, etc. So it will have the capacity to raise itself to short curbs, climb short ights of straight stairs by making use of the rails, etc. Note that the environment is not changed here. Approach III: This involves changing the environment. For instance, building ramps and special doors so that a simple vehicle can have maximum access. This is the most elegant solution, and the most widely used one. Here the environment is changed, so that it contributes to the agents action. Our analysis will focus on this approach. Approach IV: The fourth one is similar to the rst one, but here the environment is all-powerful instead of the vehicle. The environment becomes smart, and the building detects all physically handicapped people, and

The semantic web 159

glides a ramp down to them, or lifts them up etc. This solution is an extreme case of approach III, we will ignore it in the following analysis. Now, the rst approach is similar to the centralized AI methodology, which ignores the structure provided by specic environments during design. The environment is something to overcome, it is not considered a resource. This methodology tries to load every possible environment on to the agent, as centrally stored representations (see footnote 2). The agent tries to map the encountered world on to this internal template structure, and when the template structure does not obtain in the world, fails (see Figure 1, centralized AI). The second approach is similar to the situated AI model promoted by Rodney Brooks (1991).2 This methodology recognizes the role of the environment as a resource, and analyses and exploits the detailed structure that exists in the environment while building the agent. Notice that the environment is not changed here. This is a passive design approach, where the environment is considered a given. (See Figure 1, Brooksian AI.) In the third approach, the designer actively intervenes in the environment and gives structure to it, so that the agent can function better. This is Active Design, or agent-environment co-design. The idea is to split the intelligence load part to the agent, part to the world. This is agent design guided by the principle of distributing cognition, where part of the computation is hived o to the world. Kirsh (1996) terms this kind of using the world to compute Active Redesign. (See Figure 2, Active Design.) This design principle underlies many techniques to minimize complexity. At Kirshs physical level, the Active Design principle can be found in the building of roads for wheeled vehicles. Without roads, the vehicles will have a hard time, or all vehicles will need to have tank wheels. With roads the movement is a lot easier for average vehicles. This principle is also at work in the intelligent use of space where people organize objects around them in a way that helps them execute their functions (Kirsh, 1995). Kitchens and personal libraries (which use locations as tags for identifying content) are instances of such use of space in cognition. A good example of active design at Kirshs information level (the cognitive congeniality level) is bar coding. Without bar coding, the checkout machine in the supermarket would have to resort to a phenomenal amount of querying and object-recognition routines to identify a product. With bar coding, it becomes a simple aair. The Semantic Web enterprise is another instance of Active Design at the information level.3 The eort is to create structure in an information

160 Sanjay Chandrasekharan

Passive Design Approach 1: Centralized AI

???

Designer abstracts structure from environment and stores it within the agent. Imposes his/her structure on agent. Design Time

Queries for structure at run time. Compares stored structure with perceived one. Fails if there is no match. Run-time

Passive Design Approach 2: Brooksian AI

Environment studied and query-action associations developed, to exploit structure in the environment at runtime. Little or no structure stored within agent. Design Time

Queries constantly for external structure. Executes action if structure obtains. More robust design. Run-time

Figure 1. Passive approaches to agent design. The environment is considered a given, the designer makes no changes to the environment.

environment (the Web) so that software and human agents can function eectively in it. The Active Design principle is also at work in the Auto-ID4 and the Physical Markup Language eorts, which try to develop low-cost Radio-frequency Identication (RFID) tags and a common standard to store information in such tags. These tags can be embedded in products, quite like meta tags in web pages. Such tagged objects can be easily recognized by agents tted with RFID readers (for instance, robots working in a recycling plant). The tags essentially create a referable world for such agents (See Chandrasekharan and Esfandiari, 2000, for more on the relation between agents and worlds). I consider the Auto-ID eort as an extension of the web, because most of these tagged objects will be tracked by supply-chain applications over the web, some such applications already exist.

The semantic web

161

Active Design Approach

Doping the World: The designer actively intervenes in the environment and adds structure to it Design Time

Queries for added structure at run time. Executes action when structure obtains. Run-time

Figure 2. Active design, or agent-environment co-design. In the third approach, the knowledge is split equally between the agent and the environment. The world is doped in a way that it has some necessary properties. The agent and the environment evolve together. In the fourth case (which is not illustrated here), it is the environment that is designed, and the agent is assumed to have minimal capabilities.

These tagged objects would thus become the webs real-world nodes, and could be manipulated over the web. The Active Design approach is applied at the social level as well, especially in instances involving Trust. Humans actively create structure in the environment to help others make trust decisions. Formal structure created for trust includes credit ratings, identities, uniforms, badges, degrees, etc. These structures serve as reliable signals for people to make trust decisions. Less reliable, and more informal, structure we create include standardized ways of dressing, talking, etc. The fourth approach in our agent design taxonomy is the ubiquitous/ pervasive computing idea. This is an extreme version of the Active Design approach. Real design can be seen as a combination of two or more of these approaches. As illustrated by the examples, the third approach is the most elegant one change the world, redesign it, so that a minimally complex agent can work eectively in that world.

162 Sanjay Chandrasekharan

3. Semantic Web and Active Design Active Design is a minimal design framework based on the principle of distributing cognition to the world. In this design methodology, there is an active eort to split the cognitive load and push it to the world, thereby making the environment work for the agent. The Semantic Web eort can be considered an instance of Active Design, because the eort is to provide machine-understandable structure to documents and programs (the environment in this case) so that other programs (software agents) and people can work better. The structure provided stabilizes the environment (Hammond et al., 1995, Kirsh, 1995) for particular functions other agents want to execute. This way of looking at meta-tags as structure generated to t agents functions puts meta-tags closer to the ecological psychology notion of built environment and created aordances (Reed, 1996; Gibson, 1979) than to knowledge representation. All standardizations result in such stable environments for actions. For instance, library classication schemes stabilize document environments for search. In the case of the Semantic Web, there is stabilization at dierent levels, the dierent metadata formats create dierent structure. Thus XML provides standardized syntax, and RDF provides a standardized description of resources. Notice that while low-level standardization (like XML) provides stabilization that supports a variety of functions, high-level standardization (like library organizations and lename extensions) usually are designed to t particular functions, for instance high-level search, instead of cue-based search. How does the change in the environment reduce complexity? The complexity-reduction happens through the focusing of structures to function (See Agre and Horswill, 1997, for an elaboration of this point). The computational and neural basis of this process is extremely complex and poorly understood. For design purposes, we can say that the created structure in the environment ts the function (or action) the agent seeks to full or perform. A good example from the animal world is again the tail of the peacock, which ts the mate selection function of the peahen the tail provides information exclusively for that function, and it exists just for that purpose. An artifact example is bar coding. The information in the barcode is extremely focused to the functioning of the supermarket check-out machine. It provides the machine with a number, which it uses to retrieve information such as the name of the product, weight, date of expiry and price from the point-ofsale computer. It does not retrieve information such as the container is round, or is made of plastic, or that it was made during foreman Bills shift at a factory

The semantic web 163

in Paradise Falls. There could be functions that need such information. For instance, one can imagine using the barcode to categorize containers (retrieve the type of container Made of plastic), and a machine in a recycling plant using that information to sort plastic containers. Notice that the check-out machine has no use for this information, and that the presence of this information in the le it retrieves will only add complexity to the check-out machines decision-making. The optimal design is where the agent (the check-out machine) gets just the information it needs, like price, etc. Similarly, the recycling machine needs only information on the type of material, not the prices and date of expiry and whose shift the container was made. For Active Design to work best, the structure provided to the environment should focus on the function an agent needs to perform. A uniform, generalised, structure that is potentially useful for all agents is not an ecient structure. Such a generalised structure would only add complexity to the processing any single agent needs to perform, because such a structure would increase search and inference, as it is not focused to the function the agent wants to perform.

4. Ontologies: Knowledge representation vs. aordance Formal ontologies, an aspect of the Semantic Web, is currently designed as a generalized structure that supports any function an agent wants to perform. A formal ontology is a specication of a conceptualization (Gruber, 1993). In less formal terms, it is a standardized description of concepts and their relations. An ontology metatag essentially classies a document as being part of a knowledge domain (or a real world domain). It also supports inference by providing a pointer to a le that describes the categories and the possible relationships between categories that exist within that domain. This is very similar to the functioning of a barcode, except that ontologies are exhaustive. So if you have a document with the metatag USE-ONTOLOGY ID="csdept-ontology", this means the agent encountering the document should use the cs-dept-ontology,5 which is available at a URL, to categorise the document and its content elements. This ontology formally captures content elements found in computer science department pages and their relationships. Note that this is a generalized structure, and can be used by any agent, for any function. Imagine the use case of an agent that roams the web and collects the names and e-mail addresses of all the faculty members in computer science departments. Once the ontology part of the Semantic Web is in place, all the agent

164 Sanjay Chandrasekharan

needs to do is look at the ontology meta-tag on a web page. If it says cs-deptontology, then the agent looks up the ontology, and then parses the document to see whether the faculty categories mentioned in the cs-dept-ontology exist within the document, and if one of them does, gets that category, name and e-mail of the faculty member. This provides computational eciency for the agent, because it doesnt have to infer from the page content whether the page is a computer science professors or not. However, notice that this simple agent is using only a snippet of the detailed categories and relationships captured by the formal ontology, the rest of the categories and relationships only provide computational complexity for this agent. So even though there is some computational advantage, there is an overall wastage of resources. On the side of the designer and user, notice that the harvesting of computer science professors e-mail addresses is not a function that the developer of the ontology, and the designer of the site, necessarily wanted to support.6 If the professors pages refer to ontology snippets (like in the case of bar code les), and provide just enough information to support desirable functions, this kind of exploitation of structure would be more dicult. Now, think of the popular use case of a software agent going out on the web to nd and book the best room for your holiday. Suppose hotel sites visited by the web agent uses the hotel-ontology meta-tag. The hotel-ontology tag allows the agent to know that it is on a hotel site, and to infer that hotels have rooms, with states available and unavailable. It also allows the agent to know the rent for a room, and to know that rooms have features like air-conditioning. However, the hotel-ontology meta tag and the generalized description of hotels at some URL dont provide the agent with a computationally ecient way of booking a room, given its preferences. Categorizing an entity is not enough to decide how to go about using the entity (think of VCRs). To actually execute the action of booking a room, the agent needs a more action-oriented tag set, something like check_dates, compare_prices and book_room. In some cases, generalized structures will just not work. An example from the Physical Markup Language (PML) domain illustrates this. PML allows designers to provide structure to everyday objects using RFID (Radio-frequency identication) tags. If these objects are tracked using the web, they could be considered as real-world nodes of the Semantic Web. Suppose that we want to mark up a coee cup so that a housemaid robot can nd the cup and bring us coee. We can markup the cup using current formal ontologies in the following manner:

The semantic web 165

[Object] Container Cup Coee cup Jims Coee cup [Measure-ont] LengthUnit 20 cm WeightUnit 200 Grams VolumeUnit 77.9 cm3

Let us call this the property model of ontologies. This markup in an RFID tag (or a retrieved le) identies the cup and provides some of its properties, and the robot can detect this cup using its RFID reader. Like the barcode, this information allows the robot to short cut object recognition routines. However, to execute the robots action of lling the cup with coee, this information is not enough. This is because the markup does not say what the functions supported by the cup are. To use the above information to execute its function, the robot has to know that coee cups are used for lling coee, and the procedure to ll a cup with coee is to hold it open-side up under the coee machines tap after switching it on. It also has to know that it should not hold the cup upside down once the coee is lled. Finally, the robot has to know where it can nd coee cups and what actions to select from its repertoire of actions to use on the cup to fetch coee. A much more useful informational structure for the robot would be:
[ontology: cyc (container); object: coee cup; properties: radius (3 cm), height (20 cm); volume(77.9 cm3); owner: Marvin; best_supported_function: get_coee; supported_actions: grasp, lift, hold, ll; constraints: (this_side_up, touch_pot_lip); default_location: kitchen cupboard 1;]

Let us call this the aordance-model of ontologies. Here the self-description provided by the cup explicitly tells the robot what functions it supports, and leads to a lot less inference by the robot. Of course, the cup can be used for many other functions, such as measuring rice, holding candles, as a paperweight, etc. But these are not the prototypical functions of a cup, and to put in all these functions in the tag would make the structure-creation a never-ending exercise. It is better to put in the prototypical functions, and leave the rest to the resourcefulness of the agents encountering the cup, as is the case with humans. This description exploits the selective representation model of the world (Mandik and Clark, 2002), where an organism is considered to perceive and cognize a relevant-to-my-lifestyle world, as opposed to a world-with all-itsperceptual-properties. In this view, a rabbit seeing a looming shape does not

166 Sanjay Chandrasekharan

take an action by trying to categorize the shape using generalized propertybased queries like does it have spots, is it red or does it have a snout, etc. It uses self and action-oriented queries like is it big, is it moving towards me, is it acting predator-like, etc. This kind of querying is much more ecient computationally than property-based queries, because they avoid the assembly of properties into task-relevant structure, look-up of objects based on these properties, and selection of action. Action-oriented queries chunk many operations into one. This eciency in computation is reected in structures generated in nature by organisms, like markers and mating signals, which are tailored to particular actions, and are action-oriented representations (Clark, 1997). On the property side, the above tag just includes the properties that are required for the agent to execute the functions suggested/desired. This makes the job of the tag designer (the PML equivalent of the ontology designer) a lot easier, because function-based tagging means that a lot less information needs to be put in. Interestingly, this aordance-based tag structure has some positive side eects: 1. The agent can nd out its location by sending out a query and collating the default locations returned by objects. If most of the objects reply to the query with kitchen as their default location, the agent can infer with a high probability it is in the kitchen. This is very similar to how humans infer their location during a power blackout if the objects you encounter are kitchen objects, you are in the kitchen. A functional location is an assembly of functional objects. 2. States of objects can be inferred from locations (e.g., in sink means dirty). 3. Navigation route for a task can emerge from the objects involved in the task. For instance, cup coee pot user for the coee-bringing task. 4. Objects can be returned to their stable locations at the end of the day. None of these can be achieved using the rst category-based generalized structure, which does not take into account the locations of the cups.

5. Cognitive load balancing The basic distinction between the two ontology designs above is the way the cognitive load is carved between the agent and the environment. Traditionally,

The semantic web 167

functions have been considered as something the agent brings to the world (or objects), and to execute its function the agent needed to know only the properties of the object. The properties are considered to be the objects only contribution to the action. The agent can infer whether a given object supports the function, based on the properties the object possesses. In classical AI methodology, the agent had an internal knowledge-base of properties possessed by objects. Actions were selected based on matching properties in the knowledge base to the properties gleaned from objects. The Semantic Web eort moves away from this design, by shifting the agents knowledge base into the world, and storing it in objects (or URLs linked to objects) using self-descriptions. This design does away with the object-identication process and the inference involved in nding relationships between properties. However, the rest of the model remains the same, including the inference-driven model of action-selection and context-identication. This means other problems in agent design, like context identication, actionselection, etc., remain challenging. In contrast, the aordance approach shifts some parts of the action and context also to the object, as possibilities, or preferences, for action. If the actions the agent wants to execute are the same ones the object aords, there is a better t between the world and the agent, and there is less cognitive overhead. The context embedded in the object allows the agent to make better decisions. From the point of view of making the environment work for agents, this kind of function-based ontologies is much more useful and ecient than general purpose, exhaustive ontologies, which require extensive search and inference on the part of the agent. The detailed categories and hierarchical relations encoded in such ontologies make them cumbersome to develop, and computationally intensive to use. In the aordance approach, actions/functions embedded in objects act as lenses that edit out unnecessary structure, both for agents and for tag designers. In the Semantic Web, function-oriented ontologies help users, agents and search engines to easily discover web pages that provide functions like buy, sell and bid, allowing them to distinguish between the information part of the web from the functional part. Action-oriented ontologies will allow the web to be split along action and knowledge domains, and allow for separate and detailed searches in both. Another advantage of putting actions/functions in web pages is that we can create a network of functions, by linking pages by function, instead of topic. This is an instance of a point made by Cox (1999), who argues that externalized

168 Sanjay Chandrasekharan

representations help unconnected cognitive systems interact. This is not possible if the function resides within agents. Another related advantage is the ability of agents to exploit the functions as paths (like in the coee-bringing case), which is also not possible if the function resides within the agent. Also, as pointed out earlier, by providing focused ontology snippets that support desirable functions, we could also potentially keep out eavesdroppers like e-mail harvesting bots.

6. Meta-tag pluralism Now, to the question we started with. Should the Semantic Web eort be directed at such action-oriented structures exclusively? Interestingly enough, the action-oriented approach does not contradict the design of top-down, exhaustive ontologies, or other top-down category-based structures. This is because aordances and categories exist at dierent levels, and serve dierent purposes. Top-down ontologies and other structures, with their exhaustive listings of categories and relationships, establish a standard way of using terms. This leads to better interoperability. The view of function acting as a lens (allowing agents to focus on only the needed environment structure for action) is based on computational eciency, and functional structuring of task environments. This is a higher level view, and does not contradict the standardization role of topdown structures like ontologies. The aordance approach just makes a distinction between low-level interoperability structures and high-level functionoriented structures. The design confusion comes from using a low-level structure, designed for interoperability, to enable actions. It can be made to work to some extent, but the design is not ecient. A rough analogy would be trying to develop a user manual using just labels of components and hierarchical ordering of components. The design implications of the aordance view are the following: The designer of an individual web page or an RFID tag should not have to put in (or refer to) an entire exhaustive ontology. The designer should be able to put in, or refer to, ontology snippets, focused to particular functions she wants the object to serve. This is similar to the bar-code design. She should be able to pick and choose these snippets in any way she wants to, without restrictions based on hierarchy or inheritance. Two, there needs to be ontologies for actions and standard locations, and these ontologies need to be accessible as snippets as well.

The semantic web 169

And nally, there needs to be graphical tools that allow users to mix and match these ontology snippets to create action-supporting tags, even networks of them. Note that the action-oriented approach makes such tagging more user-friendly, because it is hard for people to think of categories supporting actions, if they dont know what actions they want to support. If users can pick and choose the actions they want to support, the selection of categories becomes easier. Action-orientation may thus help solve the vexing problem of designing user-friendly tools for generating meta-tags.

So the answer to the question we began with (Is the Semantic Web about aordance or knowledge representation?) is this: it is both. The property-based approach seeks to create a common vocabulary, and the aordance-based approach seeks to exploit the existence of this common vocabulary for higherlevel functions. The second approach advocates the use of elements of a common vocabulary as aordances in objects, i.e. as action-structures focused to functions. This is similar to the way humans (and other organisms) use ontologies we access only action-relevant parts of perceived objects, and they are accessed in a customized manner to t tasks or functions. We are selective about the parts of the environment we attend to during a task, and we almost never use all the properties of an object to execute a task. The extension of this principle to the Web is quite natural if we take the stance of the Web as an action-mediating space. The inclusion of functionoriented structure as a lens in a self-describing object or website allows agents who need to perform that function to easily detect that structure, and only that structure, and use the structure to eciently perform the task, or a set of interconnected tasks.

Notes
* A signicant portion of this chapter was developed while the author was pursuing a predoctoral fellowship with the Adaptive Behavior and Cognition (ABC) Group of the Max Planck Institute for Human Development, Berlin. I would like to thank Dr. Peter Todd of the ABC group for supporting and sharpening the ideas reported here. 1. I am not committed to the environmental determinism inherent in the Gibsonian view of aordances, where organisms are considered not to have any kind of mental representations.

170 Sanjay Chandrasekharan

2. Brooks questioned the traditional picture of articial intelligence, where an idealized, representation of the world is stored within the agent and compared with the environment at run-time. The program executes actions based on these comparisons. Brooks observes that to port to a system, this notion of intelligence needs an objective world model provided by the programmer. This model would then be compared against situations and agents in the real world. Brooks has convincingly argued that this is not a robust way of building intelligence, because the world does not always t the models made by the programmer. Instead, Brooks advocates a design where the designer considers the environments structure in detail and builds low-level perception-action pairs (like obstacle-run_away) based on that structure. The agent constantly queries the environment to gain information on the structure of the environment, and acts on the basis of that information. This is a more robust design. Unfortunately, in the process of developing this design framework, Brooks took a stance against representations, which has resulted in this design framework not being applied much in representational domains like the Web. 3. Interestingly, if we consider the Semantic Web eort as the most recent development in the history of processing natural language, it follows the three design levels outlined above. First, in the era of NLP, language was considered as something that could be processed by using centrally stored rules and representations (rst design approach). Then came automatic classication, the idea of trying to understand the structure of a document based on its context and domain (environment), using pattern analysis and vocabularies (second approach). And now we have the Semantic Web, where the designer actively seeks to change the document environment, by providing structure to the document. 4. http://www.autoidlabs.mit.edu/index.htm 5. This example is based on SHOE. For details see: http://www.cs.umd.edu/users/hendler/ sciam/walkthru.html 6. In animal signaling, this kind of undesirable exploitation of informational structures by others is termed eavesdropping. For example, the songs male crickets sing to attract females are used by some parasitic ies to locate the male crickets and deposit their eggs on them. The ies kill the cricket when the eggs hatch. The problems the Semantic Web eort faces from signal exploitation and its more potent complement dishonest signaling are topics by themselves, but they are beyond the scope of this paper.

References
Agre, P. & I. Horswill (1997). Lifeworld Analysis. Journal of Articial Intelligence Research 6, 111145. Beck, C. (2001). Chemical signal mobilizes reserve units. Max Planck Research, Science magazine of the Max Planck Society 4/2001, 6263. Bradbury, J. W. & S. L. Vehrencamp (1998). Principles of Animal Communication. Sunderland, Mass: Sinauer Associates. Brooks, R. A. (1991). Intelligence without representation. Articial Intelligence 47(13), 139160.

The semantic web

171

Chandrasekharan S. & B. Esfandiari (2000). Software Agents and Situatedness: Being Where?. Proceedings of the Eleventh Mid-west conference on Articial Intelligence and Cognitive Science, Menlo Park, CA, AAAI Press, pp. 2932. Clark A. (1997). Being There: putting brain, body, and world together again. Cambridge, Mass.: MIT Press. Clark, G. K. (2001). The Politics of Schemas, available at: http://www.xml.com/pub/a/2001/ 01/31/politics.html Cox, R. (1999). Representation construction, externalised cognition and individual dierences. Learning and Instruction (Special issue on learning with interactive graphical systems) 9, 343363. Gibson, J. J. (1979). The Ecological Approach to Visual Perception. Boston: Houghton Miin. Gruber, T. R. (1993). A translation approach to portable ontologies. Knowledge Acquisition 5(2): 199220. Hammond, K. J., T. M. Converse & J. W. Grass (1995). The stabilization of environments. Articial Intelligence 72(12): 305327. Heiling, A. M., M. E. Herberstein, & L. Chittka, (2003). Crab-spiders manipulate ower signals. Nature 421: 334. Hollan, J. D., E. L. Hutchins & D. Kirsh (2000). Distributed cognition: A new theoretical foundation for human-computer interaction research. ACM Transactions on HumanComputer Interaction 7(2) (2000), 174196. Hutchins, E. (1995). How a cockpit remembers its speeds. Cognitive Science, 19, 265288. Kirsh, D. & P. Maglio (1994). On distinguishing epistemic from pragmatic action. Cognitive Science 18, 513549. Kirsh, D. (1995). The Intelligent Use of Space. Articial Intelligence 73, 3168. Kirsh D. (1995b). Complementary Strategies: Why we use our hands when we think. Proceedings of the Seventeenth Annual Conference of the Cognitive Science Society, pp. 212217. Hillsdale, NJ: Lawrence Erlbaum. Kirsh, D. (1996). Adapting the Environment Instead of Oneself. Adaptive Behavior 4 (3/4), 415452. Kirsh, D. (2001). The Context of Work. Human Computer Interaction 16, 305322. Mandik, P. & A. Clark (2002). Selective Representing and World Making. Minds and Machines, 12, 383395. Norman, D. A. (1993). Things That Make Us Smart. Addison-Wesley Publishing Company, Reading, MA. Norman, D. A. (1998). Aordances and Design, available at http://www.jnd.org/dn.mss/ aordances-and-design.html Reed, E. S. (1996). Encountering the World: Toward an Ecological Psychology. New York: Oxford University Press. Silberman, S. (2003), The Bacteria Whisperer. Wired Online, 11.04, April 2003. available at: http://www.wired.com/wired/archive/11.04/quorum.html Stopka, P. & D. W. Macdonald (2003). Way-marking behaviour: an aid to spatial navigation in the wood mouse (Apodemus sylvaticus). BMC Ecology, published online, http:// www.biomedcentral.com/1472-6785/3/3

172 Sanjay Chandrasekharan

Susi, T & T. Ziemke (2001). Social Cognition, Artefacts, and Stigmergy: A Comparative Analysis of Theoretical Frameworks for the Understanding of Artefact-mediated Collaborative Activity. Cognitive Systems Research 2(4), 273290. Weir, A. S., J. Chappell & J. Kacelnik (2002). Shaping of Hooks in New Caledonian Crows. Science 297(5583), 981. Zahavi, A. & A. Zahavi (1997). The Handicap Principle: A missing piece of Darwins puzzle. Oxford: Oxford University Press.

Part II

Applications

Cognition and body image


Hanan Abdulwahab El Ashegh and Roger Lindsay
Department of Psychology, University of Cambridge / Department of Psychology, Oxford Brookes University

Introduction

1.

Natural technology and cognition

Though Cognitive Technology is a new discipline, human beings have been developing and employing cognitive technologies for thousands of years. When we consider non-natural information processing devices such as computers, it is generally accepted that a software program enabling some computation to be executed is a functional artefact and hence falls within the domain of technology. If the program embodies knowledge or beliefs, as most non-trivial programmes do, then the technology involved is cognitive. Now, suppose that instead of being developed to run on a computer, a program, consisting of a set of procedures for generating a specic set of outcomes, was developed to allow humans to accomplish a particular mental function: for example, multiplying numbers, or remembering a list of words exceeding the span of immediate memory. The resulting program is no less a contribution to technology, and no less cognitive, just because the relevant procedures were developed to be executed by a human information processing device. There seems to be no principled reason for withholding the term cognitive technology when procedures enabling specic functions to be computed more eciently or eectively are developed to constrain human information processing operations, rather than those of inorganic devices. Accordingly, we assume from the outset that many contributions to cognitive technology, including conspicuously, the development of spoken language and arithmetic, consist of procedures developed by humans to enhance their own mental operations. Meenan and Lindsay (2002) introduce the term natural technology to describe those cases in which no physical artefact is involved beyond the mental apparatus itself, for example,

176 Hanan Abdulwahab El Ashegh and Roger Lindsay

knowing how to multiply numbers, as opposed to getting the same result by using a calculator:
Technology is the use of knowledge with the intention of bringing about positive change in the world. Cognitive technology is centrally concerned with those uses of knowledge which modify mental competencies and capabilities. Sometimes mental capabilities are modied through the use of physical objects external to the agent: writing, and the use of calculators are examples. In other cases, for example in the use of speech, or mnemonic systems to improve memory, cognitive artefacts are used to modify the users mind, or the minds and behaviour of other agents without the use of physical mediation. We shall refer to cases of the latter sort as examples of natural technology. Natural technology includes the deliberate use of action sequences to modify the behaviour of other people: the domain of social behaviour. (Meenan and Lindsay, 2002, pp. 234/5)

Natural technology also includes cases in which individual human physical competence is extended, for example by developing teachable skills such as singing, juggling, or using an abacus. It excludes evolved skills such as eating and walking that do not need to be taught formally. Such competences do not fall within the domain of technology because the procedures that underlie them were not intentionally developed (or fortuitously discovered, but intentionally applied) to achieve an explicit goal. Evolution has no goals and hence produces no technological artefacts. Goldberg (2001) has made similar claims about the relationship between culturally transmitted software and brain processes:
The whole history of human civilisation has been characterised by a relative shift of the cognitive emphasis from the right hemisphere to the left hemisphere owing to the accumulation of ready-made templates of various kinds. These cognitive templates are stored externally through various cultural means, including language, and are internalised by individuals in the course of learning as cognitive prefabricates (Goldberg 2001, p. 52)

Despite more than half a century of startlingly rapid progress in non-natural cognitive technology (e.g., programming computers), humans remain capable of many learned cognitive achievements that still cannot be emulated by machines. The study of natural cognitive artefacts thus oers rich possibilities for transferring technology from the natural to the non-natural domain. Perhaps more importantly, treating learned human mental competencies as portable cognitive artefacts promotes a theoretical and methodological stance which presupposes that they are all analysable in terms of information processing

Cognition and body image 177

resources and procedures. Perhaps the major obstacle to this approach, is a formidable diculty characterised as the decoupling or unitisation problem. Essentially, this is the problem of identifying the basic components that make up human cognition. Does language comprehension result from the application of a perception module? Or the application of a general intelligence module? Or is it the result of applying many special purpose modules, such as phoneme identication, lexeme identication, syntactic analysis and so on? The term module has fallen into some disrepute, at least partly because Fodors (1983) identication of modules with faculties such as perception and language, has not proved to be helpful. Cognitive modularity has also often been identied with symbolic sequential stage models, which in turn are widely believed to have been fatally undermined by the development of connectionist alternatives. The latter belief is simply mistaken. The claim that there are functionally specialised processing units with relatively wide band internal communication and relatively narrow band communication with other units does not imply anything about the nature of the processing operations within a unit. Moreoever, there is now a tremendous accumulation of neuropsychological evidence demonstrating selective loss of specic competencies whilst equally complex, and apparently closely related abilities are spared (for example, loss of the ability to recognise human faces whilst object recognition remains normal, and vice versa). This evidence irresistibly suggests that at least in part, the human cerebral cortex and the information processing operations it carries out, are functionally organised. That is to say, specic regions of the brain are specialised for particular tasks. The trouble is that complex information processing tasks can be accomplished in many dierent ways, and in most performance domains psychological evidence is too weak to discriminate between the possibilities. One of the main challenges within this research framework at present is to nd sets of cognitive operations that map onto the functional organisation of the cortex. Successful mappings confer validity on both neuropsychological theories of brain function and information processing analyses of behaviour. At a deeper level, they support the realist view that the brain actually does process information in a manner similar to non-natural artefacts such as computers. A strong case can be made for a function-specic processing module that is responsible for generating cognitive representations of the physical self. It is easy to assume that human beings have a generic capability to represent entities in the physical world by constructing symbolic models based upon perceptual information. Using essentially the same mechanism, one instant a person might

178 Hanan Abdulwahab El Ashegh and Roger Lindsay

symbolically model a tree or a dog, and the next, another person or themselves. There is a good deal of persuasive evidence that this account is false. At the hard end of the evidence spectrum, clinical phenomena such as anosognosia and phantom limb experiences demonstrate that people frequently maintain cognitive representations of their own body that are grossly inaccurate. In anosognosia, patients may insist that they have full control over a hand or leg, even though it is actually paralysed and moribund as a result of brain damage. In phantom limb disorder, patients attribute pain to a limb that has long since been removed as a result of accident or surgery. These cognitive failures are highly specic to physical self-image: suerers do not show general perceptual failure, nor widespread faulty inference, nor yet erroneous judgements about the intactness of other people. There is also a plethora of softer evidence. People often seem to misperceive their own physical characteristics, denying with apparent honesty and real indignation, the evident fact that they are overweight; or, conversely, subjecting themselves to harsh dietary and exercise regimes to rid themselves of excessive weight. Others might claim that an entirely unexceptionable nose is in truth, intolerably large, or that breasts or buttocks are shamefully small, or embarrassing in their magnitude. Again, clinical syndromes have been identied: Eating disorders as anorexia nervosa are commonly claimed to involve misperception of ones own body as fat, even when it may be chronically undernourished and when this is shockingly evident to all except the suerer.1 Such conditions are highly specic to judgements about the physical self, and do not seem to result from any general impairment in ability to judge the dimensions of objects or other people. Another clinical syndrome: body dysmorphia applies when people unreasonably regard a feature of their body as shameful or disguring.2

2. Body schema and body image In spite of the central part played by physical conceptions of self in determining the quality of peoples lives and even their mental health, theoretical understanding of the processes involved is poorly developed and the relevant academic literature is shot through with confusion. The most striking manifestation of this is the co-existence of two rival constructs each supposed to underpin a persons conception of physical self. One of these models: body schema3 has been developed by neurologists to explain the consequences of brain damage; the other: body image4 is a construct primarily used to explain psychological

Cognition and body image 179

disorders that have own-body dissatisfaction as a central feature. Whilst body schema is usually considered to be a perceptual model of the body, body image is generally believed to be primarily a cognitive model that possesses social and emotional components. Outside neuropsychology body image has come to be used as part of a loose characterisation of the ego or social self (Bower, 1977), but such formulations are unfortunately not articulated with sucient explicitness to be useful. It seems that there is widespread agreement in the neuropsychological literature that we experience and describe our bodies with the assistance of a multidimensional cognitive construct general designated the body schema (this agreement is not by any means universal).5 When psychological rather than neurological mechanisms (such as emotions and values) are the focus of attention, researchers tend to implicate a second multidimensional cognitive construct known as the body image (Gallagher, 1986; Fisher, 1990; Gallagher and Cole, 1995). The relationship between these two constructs has been little explored. Gallagher (1986) analysed recent psychological studies on the relationship between body image and body schema. He concluded that the operations of the body schema sometimes place constraints on intentional consciousness. In particular, Gallagher suggests that changes in various aspects of body schema can aect the way subjects perceive their own bodies, that is: change in body schema can cause change in body image. Gallaghers conclusion inevitably raises the question of whether body schema and body image are truly independent at all. Evidence that one construct causally aects the other is prima facie also evidence that there are not two constructs but only one. This suspicion is reinforced by the fact that body image is invariably used in the absence of any clear denition. In an attempt at clarifying matters, Altabe and Thompson (1996) suggested that one concept of body image is an internalized view of ones appearance that drives behaviour and inuences information processing (Altabe and Thompson 1996, p. 190). This denition assimilates body image to the more widely accepted concept of a cognitive schema. Thompson and Altabe report a series of studies, the outcome of which they take to support the idea that body image cognitions act like schemas. Slade (1994) has argued that an individuals perception of their own body is highly inuenced by cognitive, aective, attitudinal, and other variables. Slade proposes a general schematic model of body image as a loose mental representation of the body that is inuenced by at least seven sets of factors:

180 Hanan Abdulwahab El Ashegh and Roger Lindsay

These sets are the history of sensory input to body experience, the history of weight change uctuation, cultural and social norms, individual attitudes to weight and shape, cognitive and aective variables, individual psychopathology, and biological variables. (Slade, 1994, p. 501)

The body schema and body image constructs thus seem to have long coexisted as rival constructs supported by overlapping, but distinct bodies of evidence. Body schema tends to be preferred by researchers who are interested in brain mechanisms, but not in psychological processes; body image tends to be employed when evaluative processes are of most interest. There seems to be an opportunity to integrate these two constructs in a productive way. Specically, we propose that it is possible to explain phenomena in both domains with greater theoretical economy. Instead of two distinct cognitive representations of self, one underpinning perception and action, the other underlying attitudes to self and social behaviour, we propose that there is only one internal model of physical self, but that this is operated upon by an evaluative process. We now turn to the clinical literature on disordered experience of body.

3. A unied model of body image representation Two theoretical constructs have been used to provide a basis for mental representations of physical self. Body schema is a construct supported by hard neurological evidence that can explain perceptual and motor dysfunctions caused by brain pathology, but that says little or nothing about psychological dysfunctions involving self-representation. Body image incorporates the judgemental and evaluative information essential to explain psychopathologies such as eating disorders and Body Dysmorphic Disorder (BDD) but has been specied only in vague terms and has relatively little empirical support. We have already suggested above that these two constructs are ripe for integration. The neurological evidence strongly supports the suggestion that the human brain contains a specic representation of physical self that can be directly aected by insults to the cortex. The psychological evidence suggests that ownbody representations can be dramatically aected by belief and judgement. Anorexics do not appear to perceive their own bodies accurately, but persist in believing they are overweight despite this perceptual evidence. And not all BDD suerers see their noses or buttocks as contoured within normal limits, but insist nonetheless that they are grossly disgured. Rather, in both of these classes of disorder the evidence is compelling that distortions of judgement accompanied by

Cognition and body image

181

obsessive preoccupation with specic bodily features can result in a modication of the manner in which those bodily features are mentally represented. Evidence of this kind seems to suggest that in some individuals, judgement of their own bodily dimensions is defective. This impairment of judgement might result from distortion of a normal magnitude estimation process, such as might be caused by a calibration error, or the basic process may operate abnormally, but estimation of one quantity such as size, may be inappropriately aected by another (e.g., self-worth). There does seem to be a strong case for arguing that judgements about the physical self involve a special-purpose body image generator, the operation of which plays an essential part in the cognitive processes underlying body image judgements, and impairment of which helps to explain body image disturbances. More speculatively, it is possible that there are two distinct sub-types of body image disorder, one arising from a negative view of an undistorted internal representation of body, associated with internalized cultural norms, depression, and alleviated by drugs that are eective for the latter condition (see Note 2). The second subtype of body image disorder seems more likely to result from a faulty cognitive representation of body. This type of disorder is expected to be independent of cultural stereotypes and depressive illness, but may be related to defective perceptual and cognitive processes. Investigations of conditions such as anosagnosia suggest that knowledge of bodily impairments resulting from neurological lesions may not be cognitively available, despite such impairments being grossly apparent to others. This in turn seems to imply that cognitive awareness of body does not result from direct perception, but is mediated by an internal representational process that may fail to register impairment. This hypothetical system for representing body will be referred to as the Body Image Generator (BIG), an internal cognitive mechanism that underlies an individuals view of his/her physical appearance and provides the images towards which evaluations of the physical self are directed. It is important to remember throughout that we are concerned here with cases of pathological belief that ones body is defective, not those cases in which a person must emotionally adjust to a real bodily defect such as limb loss, obesity or facial disgurement. The assumption that there is a dedicated cognitive system that has a distinct and independent neurological locus allows the two subtypes of BID distinguished above to be more clearly dierentiated. i. Type 1 BDD: actual body parameters are normal, and BIG generates an accurate image of the body, but individuals judge their body image as defective because of inappropriate norms or the application of negative

182 Hanan Abdulwahab El Ashegh and Roger Lindsay

schemas to themselves. ii. Type 2 BDD: actual body parameters are normal, but BIG generates an image that is anomalous in some way (e.g., excessively large in some dimensions). Individuals with this impairment may have normal bodies, and normal evaluation criteria, but still perceive their body as defective. It seems probable that these two cases are not completely independent, in that judgements about ones body image may be capable of causally modifying the image (Gallagher, 1986; Gallagher and Cole, 1995). This capability might have many psychological advantages, for example, it would allow people to mitigate the distress caused by undesirable physical characteristics that are outside their control. In this way, the disgured, disabled or aesthetically unfortunate could perhaps avoid or reduce the debilitating eects of depression. Though there is considerable evidence to support the claim that negative attitudes to ones self can lead to negative changes in body image, there is presently no evidence that positive attitudes to self can enhance cognitive representations of ones physical self. However, the absence of such evidence is hardly surprising: individuals with an excessively positive image of their bodies are hardly likely to turn up in the psychiatric consulting room, as with BDD. The possibility of representations of the physical self being modied via psychological processes creates a clear functional rationale for isolating self-representations from the mechanisms responsible for producing representations of other features of the physical world. It would be unfortunate for instance, if perceiving oneself as thinner to avoid psychological discomfort, resulted in universal misperception of breadth. There is one further implication: we have noted that negative attitudes to self are largely, if not entirely, culturally mediated. As cultural factors can only begin to operate after considerable learning has already taken place, it seems likely that BIG is not hardwired in from birth but a cognitive construct that incorporates cultural values and assumptions. Myers and Biocca (1992) refer to the elastic body image and have reported research suggesting that a females body image is constructed with reference to a number of source models, the socially represented ideal body, the individuals internalized ideal body, the present body image and the object body image. Myers and Bioccas research on female university students aged concludes that an individuals body shape perception can be changed with less than 30 minutes exposure to television. They also found that young females had a tendency to overestimate their body size, which seemed to indicate that they may have internalized the idealized body image

Cognition and body image

183

presented by advertising. The evidence that BIG is cognitively malleable and adapts to cognitive context, seems to justify the assumption that it falls under the characterisation of natural technology presented at the beginning of the present paper. Finally, though there might be psychological benets from enhancing representations of ones physical self, there is an inevitable Faustian downside associated with a mechanism that oers this possibility. Negative attitudes to self will be equally capable of producing negative distortions in body image. There is apparently an increase in people suering from negative body image at present. For example, Gordon (1992) reports that:
A survey of 33,000 women, of varying age and employment levels in the early 1980s revealed that 75% felt that they were too fat, even though according to conservative weight tables only 25% were actually overweight. The particular body parts that caused the most distress were the thighs, hips and stomach. When asked whether they would be happiest a) losing weight b) hearing from an old friend c) a date with a man you admire or d) success at work; 42% indicated losing weight, 21% indicated dating and 22% indicated work success. (Gordon, 1992, p. 71)

Another survey of over 1000 high school students in Berkeley, California revealed that 56% of 12th grade girls considered themselves to be overweight, whereas objective measure revealed only 25% to be moderately or extremely overweight. (Gordon, 1992, p. 80). Gordon (1992, p. 32) comments that: it is not surprising that disorders of body image, in which people have diculties seeing themselves accurately, have become rampant. The present review argues that in making this statement Gordon is confusing the two-hypothesized subtypes of BDD. It is not surprising, in an era of mass transmission of images of human perfection, that people might tend to judge themselves accurately, but why they should fail to see themselves accurately requires further explanation. In the investigation described below, it is intended to measure the accuracy with which people judge various aspects of their own body and investigate whether such judgements are correlated with general intelligence and more specic spatial and magnitude judgements. It is anticipated that the data will allow us to decide whether defective representations of body result from generic cognitive inadequacies, or might with more plausibility be attributed to a special purpose module such as BIG. In the following section a full account of the methodology employed to test this hypotheses is reported.

184 Hanan Abdulwahab El Ashegh and Roger Lindsay

4. Method 4.1 Participants The sample was an opportunity sample in that, though a truly random sampling procedure was not employed, the method of selecting participants does not introduce biases relevant to the hypotheses under investigation. Fifty males and fty females, ranging from the age of 1855 participated in the study. 70 of the participants were undergraduate or graduate students at Oxford Brookes University. The remaining 30 participants were drawn from people who regularly work out at gymnasiums or were acquaintances of participants already in the experiment. 4.2 Materials The psychometric tests used are listed and described below. A tripod-mounted Olympus CL 840 1.3 megapixel digital camera was used to produce the photographic images on which size judgements were based. Participants made their judgements whilst viewing digital images displayed on a laptop computer. The objects used in this experiment were a table and a chair also digitally displayed on a laptop screen. To allow quantication of magnitude judgements, a physical measuring tool (the Estimation Caliper) was designed and constructed specically for the experiment. The Estimation Caliper and examples of objects used

Figure 1. The Estimation Caliper.

Cognition and body image

185

Figure 2. Digital image of table.

Figure 3. Digital image of chair.

in the study are illustrated in Figures 13 below. All photographic images were taken under controlled conditions (e.g., standard lighting, standard background, standard distance of 3 metres from the camera, and with the tripod/ camera always set at a standard distance of 133 centimetres from the oor). 4.3 Psychometric tests Psychometric measures were used to allow quantication of a number of independent variables that the research literature suggests might be associated with body-size misjudgement. Participants completed 5 questionnaires and an interview-style VOSP Battery: VOSP BDI BDDQ-R STAI-X EDI EPI (Visual Object/Space Perception Battery) (Beck Depression Inventory) (Body Dysmorphic Disorder Questionnaire (revised) (State-Trait Anxiety Inventory) (Eating Disorder Inventory) (Eysenck Personality Inventory)

The Psychometric measures are described more fully below. 4.3.1 The Visual Object/Space Perception Battery (VOSP) The VOSP was incorporated into the study to provide assessments of the general competence and accuracy of participants on perceptual tasks. The VOSP includes the following sub-tests:

186 Hanan Abdulwahab El Ashegh and Roger Lindsay

1. 2. 3. 4. 5.

Shape Detection Incomplete Letters Silhouettes Object Decision Dot Counting

6. 7. 8. 9.

Progressive Silhouettes Position Discrimination Number Location Cube Analysis

Warrington and James (1991), who developed the VOSP, report that:
The (VOSP) Visual Object and Space Perception Battery consist of nine tests each designed to assess a particular aspect of object or space perception, while minimizing the involvement of other cognitive skills. The VOSP will enable an assessor to compare the scores of a subject with those of a normal control sample and those obtained by patients with right- and left-cerebral lesions. Although a theoretical issue was the original motivation for each of these tests, it was their pragmatic strength in terms of their selectivity and sensitivity that determined their selection for inclusion in the battery. They are all untimed and should be administered at a pace suitable to the individual patient. The tests can be administered singly, in groups, or as a whole battery; and, apart from the initial screening test, in any order. This battery of eight visual object and space perception tests (VOSP) has been developed, validated and standardized in the Psychology Department of the National Hospital for Neurology and Neurosurgery, Queen Square, London. The majority of these tests require very simple responses. Each was devised to focus on one component of visual perception, while minimizing the involvement of other cognitive skills. The tests are all untimed and should be administered at a pace suitable to the individual patient. (See Appendix F.) (Summary information taken from Warrington and James, 1991)

4.3.1.1 VOSP subtest 1: Shape Detection. The test stimuli are random patterns, on half of which a degraded X is superimposed. The subject is required to judge whether the X is present or absent. The 20 stimulus items are preceded by two practice items (A&B), which are used to explain the tasks. Warrington et al. (1991) advise that if participants score below 15 on this subtest, they should be considered to have failed this screening test and it would therefore be inappropriate to administer the remainder of the VOSP battery (Warrington and James, 1991). 4.3.1.2 VOSP subtest 2: Incomplete letters. Neuropsychological studies have established that patients with right-hemisphere lesions may have selective decit reading degraded letters. Incomplete letters were constructed by photographing a letter through a random mask so that either 30 percent or 70 per cent of the letter was obliterated. The test stimuli are 20 stimulus letters

Cognition and body image 187

(degraded by 70 per cent) and the two practice items F and B (degraded by 30 per cent) that are used to explain the task. Participant are shown the practice items and asked to name them. The test is abandoned if the participant is unable to name or identify the practice items. Participants are then told that the remaining capital letters are rather more incomplete and asked to name or identify each one. The total number correct (maximum = 20) is recorded (Warrington and James, 1991). 4.3.1.3 VOSP subtest 3: Silhouettes. This subtest is based on ndings that recognition of common objects from an unusual view may be selectively impaired in patients with lesions in posterior regions of the right hemisphere. Participants are shown animal silhouettes, told that they are drawings of an animal, and asked to name them (e.g., dog, camel). The silhouettes were constructed from outline drawings of each object rotated through varying degrees from the lateral axis. The test was constructed to be of graded diculty ranging from very easy silhouettes that could be identied by all participants to dicult silhouettes that only a proportion of the normal sample identied. The tests consist of 15 silhouette drawings of animals and 15 silhouette drawings of inanimate objects. The silhouettes of these two sets are arranged in order of diculty. The total number of silhouettes named or identied (maximum = 30) is recorded as the score (Warrington and James, 1991). 4.3.1.4 VOSP subtest 4: Object decision. The origin of the Object Decision test was the nding that patients with right hemisphere lesions had a signicant selective decit in the selection of the real object when presented with a two dimensional silhouette drawing of an object together with three nonsense shapes (Warrington and James, 1991). The test stimuli consisted of two dimensional silhouette drawings of objects constructed from the original 3D shadow images by tracing the projected outline of the object at an angle of rotation at which approximately 75% of a normal control group could identify it. Distracter items were constructed to be similar object-like shapes but are in fact entirely imaginary. The object decision test consists of 20 arrays, each of which displays one real two-dimensional object together with three distracter items. Participants must point to the real object. The number of correct choices (maximum = 20) is recorded (Warrington and James, 1991). 4.3.1.5 VOSP subtest 5: Progressive silhouettes. A series of 10 silhouettes was constructed by varying the angle of view from 90 degrees rotation to 0 degrees

188 Hanan Abdulwahab El Ashegh and Roger Lindsay

rotation of the lateral axis. The test consists of two series, a gun and a trumpet. The rst silhouette of the series is presented and used to explain the task. Silhouette drawings of an object are presented which become progressively easier to identify. With each new silhouette version the participant is asked: to name the object. The number of trials required to identify each object are summed and recorded as the score (maximum trials=10+10) (Warrington and James, 1991). 4.3.1.6 VOSP subtest 6: Dot counting. The test stimuli consist of arrays of black dots on a white card. There are two arrays each of ve, six, seven, eight and nine dots and each array is randomly arranged. The maximum distance of a dot from the centre of a card was 120mm and the minimum distance between dots was 10mm. The rst array is used to explain the task and to conrm (Warrington and James, 1991). 4.3.1.7 VOSP subtest 7: Position discrimination. Each test stimulus consists of two adjacent horizontal squares, one with a black dot (5mm) printed exactly in the centre and one with a black dot just o centre. In each of the 20 stimuli the o centre dot is in a dierent position within the square, in ten stimuli the centre dot is in the left square and in ten in the right square. The rst test stimulus is used to explain the task. One of these two dots is exactly in the centre of the square. I want you to point to the dot that is in the centre. If you are not certain I would like you to guess. Subjects who consistently choose the square to the left or the right should be reminded to Look at both squares before deciding. The number of correct choices is recorded (Warrington and James, 1991). 4.3.1.8 VOSP subtest 8: Number location. The ten stimuli consist of two squares (62mm 62mm), one above the other with a small gap between them. The top square contains randomly placed numbers (19) and the bottom square a single black dot corresponding to the position of the numbers. The position of the dot is dierent in each of the stimulus cards and there are four dierent number arrays. The task is to identify the number that corresponds with the position of the dot. There are two practice stimulus cards that are used to explain the task. One of the numbers in the square corresponds with the position of the dot in the square; tell me the number that matches the position of the dot. If there is an error on the rst card, the subject should be told the correct number before proceeding to the second practice card and the test should be abandoned if the subject fails to get either practice card at least approximately correct. The ten

Cognition and body image 189

test cards are presented and the number selected is noted and the total number of correct responses is recorded (maximum=10) (Warrington and James, 1991). 4.3.1.9 VOSP subtest 9: Cube analysis. The test stimuli consist of black outline representations of a 3D arrangement of square bricks. There are two practice items, which are used to explain the task and ten stimuli. The two practice stimuli are representations of three bricks. The ten test stimuli are graded in diculty by increasing the number of bricks from ve up to 12 and by including hidden bricks. The subject is told: This is a drawing of some solid bricks; how many solid bricks are represented in the drawing? If an error is made on either of the practice items the task is explained again. The task is abandoned if the subject is unable to count the bricks in both practice items. The ten stimulus items are presented and on the rst occasion, if an omission error is made to a hidden brick, the subject is asked to Try again and remember the bricks that are underneath the other bricks. The subjects response is noted and the total number of correct counts (maximum=10) is recorded (Warrington and James, 1991). 4.3.2 The Eysenck Personality Inventory (EPI) The EPI was designed around a theory of personality developed by Hans Eysenck (Eysenck and Eysenck, 1964). This is a nomothetic theory, meaning that it is applicable to all people and more or less rigorously testable. Eysenck believed that all people could be placed on a pair of independent continua, dealing respectively with Extraversion-Introversion and Neuroticism-Stability. He argued that the basis of personality was genetic, and, specically, that the degree of Extraversion depended crucially upon the level of arousal in the Ascending Reticular Activating System of the brain. Eysencks personality inventory can be taken as an example of a personality questionnaire. It is a psychometric test; it aims to measure particular psychological characteristics. In this case it is measuring extroversion and neuroticism, the two dimensions which Eysenck believed to be sucient to describe an individuals personality. 4.3.3 The Eating Disorders Inventory (EDI) The Eating Disorder Inventory (EDI Garner, Olmstead and Polivy, 1983; Garner et al., 2003) is a 64-item, 6-point forced-choice inventory assessing several behavioural and psychological traits common in two eating disorders, bulimia and anorexia nervosa. The EDI, a self-report measure, may be utilized as a screening device, outcome measure, or part of typological research. It is not reported to be a diagnostic test for anorexia nervosa or bulimia; rather, it is

190 Hanan Abdulwahab El Ashegh and Roger Lindsay

designed as a self-report measure of psychological and behavioural traits common in anorexia nervosa and bulimia. (Eysenck, ibid.). It can be administered to a population of ages 12 and over. There are 8 subscale scores they are: drive for thinness, bulimia, body dissatisfaction, ineectiveness, perfectionism, interpersonal distrust, interoceptive awareness, and maturity fears. The EDI is recommended to delineate subtypes of anorexia nervosa in clinical or research settings; and the test average administration time is (1525) minutes (Garner, Olmstead and Polivy, 1983; Garner et al., 2003). 4.3.4 The State-Trait Anxiety Inventory (STAI-X) The State-Trait Anxiety Inventory (STAI) was designed as a research instrument for the study of anxiety in adults (Spielberger, 1970). It is a 40-item selfreport assessment device, which includes separate measures of state and trait anxiety. According to the author, state anxiety reects a transitory emotional state or condition of the human organism that is characterized by subjective, consciously perceived feelings of tension and apprehension, and heightened autonomic nervous system activity. State anxiety may uctuate over time and can vary in intensity. In contrast, trait anxiety denotes relatively stable individual dierences in anxiety proneness . . . and refers to a general tendency to respond with anxiety to perceived threats in the environment. Scores on the STAI have a direct interpretation: high scores on their respective scales mean more trait or state anxiety and low scores mean less. Both percentile ranks and standard (T) scores are available for male and female working adults in three age groups (1939, 4049, 5069), and for male and female high school and college students, male military recruits, male neuropsychiatric patients, male medical patients, and male prison inmates (Spielberger et al., 1983). 4.3.5 The Body Dysmorphic Disorder Questionnaire revised (BDDQ-R) BDDQ questions are in a self-report format. The BDDQ mirrors the DSM-IV diagnostic criteria for BDD and scores indicate whether these criteria are met by particular patients. The BDDQ assesses whether BDD is likely to be present. (See Appendix III for an example of a completed BDDQ.) The BDDQ can suggest that BDD is present but cannot provide give a denitive diagnosis. The nal diagnosis must be determined by a trained clinician in a face-to-face interview (Phillips, 1996b). 4.3.6 The Beck Depression Inventory (BDI) The original version of the BDI was introduced by Beck, Ward, Mendelson, Mock

Cognition and body image

191

and Erbaugh in 1961. The BDI was revised in 1971 (Groth-Marnat, 1990). The original and revised versions have been found to be highly correlated (Lightfoot and Oliver, 1985, cited in Groth-Marnat, 1990). The BDI is a 21 item self-report rating inventory measuring characteristic attitudes and symptoms of depression. Participants require a fth sixth grade reading age to adequately understand BDI questions and the inventory takes approximately10 minutes to complete (Groth-Marnat, 1990). The content of the BDI was obtained by consensus from clinicians regarding symptoms of depressed patients (Beck et al., 1961). The revised BDI items are consistent with six of the nine DSM-111 categories for the diagnosis of depression (Groth-Marnat, 1990). Each item provides an assessment of a specic component of depression. The 21 components are: 1. Sadness; 2. Pessimism; 3. Sense of failure; 4. Social withdrawal; 5. Guilt; 6. Expectation of punishment; 7. Dislike of self; 8. Self Accusation; 9. Suicidal ideation; 10. Episodes of crying; 11. Indecisiveness; 12. Change in body image; 13. Retardation; 14. Insomnia; 15. Fatigability; 16. Loss of appetite; 17. Dislike of self; 18. Expectation of punishment; 19. Loss of Weight; 20. Loss of Weight; 21. Low level of energy. Scores are summed over the twenty-one questions to obtain the total. The highest score on each of the twenty-one questions is three, the highest possible total for the whole test is sixty-three. The lowest possible score for the whole test is zero. One score is added per question (the highest rated if more than one option is circled)

Interpretation of depression scores


0509 Normal uctuations in aect 1018 Mild to moderate depression 1929 Moderate to severe depression 3063 Severe depression Below 4 = Possible denial of depression, faking good; this is below usual scores for normals. Over 40 = This is signicantly above even severely depressed persons, suggesting possible exaggeration of depression; possibly characteristic of histrionic or borderline personality disorders. Signicant levels of depression are still possible (Groth-Marnat, 1990).

192 Hanan Abdulwahab El Ashegh and Roger Lindsay

4.4 Procedure To impose comparability between judgements of own versus other bodies, all judgements of body size were made using digital photographs taken under standard conditions of illumination and from a standard position and distance. The three classes of object photographed (own-body, other-body, objects) were intended to permit inferences about the specicity of underlying cognitive mechanisms (e.g., are all objects, including human bodies, misjudged in the same way, or do human body judgements dier from judgements of nonhuman objects? Does the accuracy of other-body judgements dier from ownbody judgements etc.). Digital photographs were transferred to a laptop PC and later displayed on the computer screen for participants to make size estimations. To eliminate errors associated with verbal processes or the language used to express judgement, size estimations were made using a specially constructed caliper, which was purpose-built, for the present study. There were two sessions each of about 45 minutes duration. In session 1 participants rst had their photographs taken in the standard poses required by the design of the study. Physical measurements such as height and weight were then recorded and nally participants were then asked to complete the psychometric tests. A second testing session was then scheduled to occur within 7 days of session 1. In session 2, participants viewed their own images on a laptop screen and were asked to make judgements of physical size using the estimation caliper. They were then asked to look at digital images of another person, and 2 objects and asked to make size judgements of those images. Instructions for sessions 1 and 2 are presented below. 4.4.1 Session 1 Each participant was tested individually. Participants were asked to stand with their toes on a line of tape stuck to the oor at 250 cm distance from a digital camera. The line was located so as to ensure that participants were photographed against a plain background with no features that could be used as cues to size. The position of the digital camera was also marked by a tape line on the oor at 250 cm distance from the mark used to position participants. The digital camera was mounted and xed on a tripod 150 cm from the oor. The room was illuminated by uorescent strip lights supplemented by natural light from windows set in the wall behind the camera. The digital cameras ash facility was used for every photograph. Participants were photographed from two positions facing the camera (front) and in prole (i.e., facing 90 degrees to

Cognition and body image 193

Shoulders Front & Side Waist Front & Side

Thighs Front & Side

Figure 4. Digital images of participant in standard poses from front and side.

the camera: side). Examples of standard photographs of participants appear in Figure 4. 4.4.1.1 Initial instructions to participants. The following instructions were read aloud to each participant:
Please stand facing the camera behind the line on the oor labelled standbehind this line. One picture of you will be taken facing the camera and another facing to the side. These pictures will be saved onto a computer and used by you later to make body size estimates of yourself. The estimates will be those of your shoulders, waist, and thighs. Then you will be asked to look at images of another person and to make the same estimates of them from the front view and side view of shoulders, waist and thighs. Following this you will be asked to look at digital images of two objects, one of a table and one of a chair and asked to estimate the width and length of both. All these judgements will be made using a measuring caliper, which the experimenter will show you after taking your two photos. If you have any question please feel free to ask. If you are uncomfortable with any of these procedures you are free to pull out of the experiment at any time. After reading these instructions photographs of the participant from front and side were taken.

4.4.1.2 Body height and weight. Height was measured by the experimenter and scales were used to record body weight. From weight and height together, the height: weight ratio can be calculated which is a reasonable indicator of whether an individual is over/under weight for his/her age, gender and height.

194 Hanan Abdulwahab El Ashegh and Roger Lindsay

4.4.1.3 Psychometric tests. Participants were seated and asked to complete the ve questionnaires and the VOSP interview-style psychometric test. Participants were given standard instructions on how to complete the VOSP before attempting this. 4.4.2 Session 2 4.4.2.1 Own body measurement: Procedure and instructions. Participants were shown on a computer screen the standard digital images of themselves taken in Session 1 and asked to make estimates of body width across shoulders, waist and thighs, rst as viewed from the front, and then as viewed from the side. To eliminate problems caused by language and measurement terminology, all estimates were made by adjusting the caliper to match estimated body size and reading o the measurements in centimetres. Instructions
Please hold the caliper at arms length in front of your body at about the height of your shoulders, and look at the digital images on the computer screen in front of you. You are standing 1 exactly metre away from the screen. Please do not move forward or shift your upper body forward, as this will distort the standard arrangements that have been set for all participants. Try to maintain this same distance throughout the experiment. You are going to be making judgements about the width of your own shoulders, waist and thighs as they appear in the image. Please move the caliper to the width you believe to be correct for these three body dimensions and then tell the experimenter what measurement in centimetres it comes out to on the caliper. After making all three estimates from the front view, please repeat the procedure from the side view, also for width of shoulders, waist and thighs.

4.4.2.2 Other person measurement: Procedure and instructions. Participants were shown two sets of standard computer digital images of another person on a computer screen, and asked to make the same three size estimates for each set using the caliper as a measuring instrument. Instructions
Please hold the caliper at arms length in front of your body at about the height of your shoulders, and look at the digital images on the computer screen in front of you. You are standing exactly 1 metre away from the screen. Please do not move forward or shift your upper body forward, as this will distort the

Cognition and body image 195

standard arrangements that have been set for all the participants. You are going to be making judgements about the width of this persons shoulders, waist and thighs as they appear in the image. Please move the caliper to the width you believe to be correct for these three body dimensions and then tell the experimenter what measurement in centimetres it comes out to on the caliper. After making all three estimates from the front view of the person you are looking at on the screen, please do the same from the side view, also for shoulders width, waist and thighs.

4.4.2.3 Object measurements: Procedure and instructions. Standardised digital images of two common objects (illustrated in Figures 2 and 3 above) were shown to participants and they were asked to make height and width judgements of each. Instructions
Please stand behind the line. This line is exactly 1 metre from the computer screen you are looking at. Please do not move forward or shift your body, as this will distort the standard arrangements that have been set for all participants. Please look at each of the images and by adjusting the measuring caliper, try to estimate how wide you think the width of the chair is and its height from the ground. Then read out the measurement in centimetres as marked on the caliper. Finally, please make similar size estimates for the table. These instructions are also written on the screen in front of you. If you are uncomfortable with these proceedings or want to stop at any time you are free to do so.

5. Results Table 5.1 shows that there were signicant negative correlations between Eating Disorders Inventory (EDI) & Self Error, and between EDI & Height-Weight Ratio. There are also signicant correlations between BDD-R scores Other Error and Average Error. Negative correlations are reported between Gender, Self Error & Other Error. Signicant correlations were observed between Overall Front Self Error and EDI subscales for Drive for Thinness & Body Dissatisfaction. These correlations are presented in Table 5.2 below. There were also signicant correlations between Overall Self Error (average of Front & Side) and EDI subscales for Body Dissatisfaction, Drive for Thinness, & Perfectionism. These 3 subscales are related to body image concern, but not necessarily to Eating Disorders.

Table 5.1. Correlations between a range of test battery scores and errors in estimating the size of own body dimensions, object size, and the body dimension of other people. Correlations between test scores and the height: weight ratio of participants also appear in the table.
Self Error Other Error Object Error Average Error Height : Weight Ratio r = 0.260* r = 0.141 r = 0.020 r = 0.163 r = 0.004 r = 0.060 r = 0.000 r = 0.244 r = 0.046 r = 0.079 r = 0.025 r = 0.023 r = 0.014 r = 0.036

Key to Table r/r = Pearson correlation; n = 100 p < 0.05, p < 0.01*, p < 0.005** r = 0.260* r = 0.082 r = 0.029 r = 0.049 r = 0.120 r = 0.154** r = 0.213 r = 0.219 r = 0.046 r = 0.008 r = 0.089 r = 0.088 r = 0.044 r = 0.054 r = 0.029 r = 0.145 r = 0.121 r = 0.049 r = 0.055 r = 0.130 r = 0.192** r = 0.107 r = 0.118 r = 0.604* r = 0.116 r = 0.140 r = 0.088 r =-0.008 r = 0.013 r = 0.152 r = 0.148 r = 0.436* r = 0.136 r = 0.016 r = 0.006 r = 0.090

Mean Eating Disorders Inventory

Mean Visual Object & Spatial Perception

Beck Depression Inventory

Body Dysmorphic Disorder Questionnaire

State Anxiety

196 Hanan Abdulwahab El Ashegh and Roger Lindsay

Trait Anxiety

Gender

Eysenck Personality Inventory (Psychosis)

Eysenck Personality Inventory (Neurosis)

Eysenck Personality Inventory (Lie Scale)

Table 5.2. Correlation between EDI subscales and error scores when judging ones own body size and the size of physical objects.
EDI EDI EDI EDI EDI EDI (Body (Ineectiveness) (Perfectionism) (Interpersonal (Interoceptive (Maturity Fear) Dissatisfaction) Distrust) Awareness)

EDI Key to Table r/r = Pearson correlation (Drive for Thinness) n = 100 p < 0.05 , p < 0.01*, p < 0.005** r = 0.157 r = 0.083 r = 0.012 r = 0.021 r = 0.100 r = 0.091 r = 0.016 r = 0.047 r = 0.109 r = 0.207 r = 0.106 r = 0.045 r = 0 .148 r = 0.074 r = 0.084 r = 0.073 r = 0.071 r = 0.049 r = 0.080 r = 0.080 r = 0.097 r = 0.022 r = 0.151 r = 0.219 r = 0.094 r = 0.245 r = 0.127 r = 0.175 r = 0.077 r = 0.220 r = 0.312** r = 0.205 r = 0.270** r = 0.062 r = 0.004 r = 0.232 r = 0.160 r = 0.139 r = 0.161 r = 0.262** r = 0.184 r = 0.051 r = 0.142 r = 0.204 r = 0.104 r = 0.229 r = 0.249 r = 0.316** r = 0.086 r = 0.197 r = 0.239 r = 0.079 r = 0.267** r = 0.035 r = 0.095 r = 0.293** r = 0.022 r = 0.069 r = 0 .106 r = 0.271** r = 0.079 r = 0.112 r = 0.114 r = 0.100 r = 0.060 r = 0.010

EDI (Bulimia)

Front Self Error Shoulders r = 0.140

r = 0.044

Front Self Error Waist

r = 0.295**

r = 0.247

Front Self Error Thighs

r = 0.184

r = 0.106

Overall Front Self Error

r = 0.295**

r = 0.135

Side Self Shoulders Error

r = 0.010

r = 0.112

Side Self Waist Error

r = 0.258**

r = 0.186

Side Self Thighs Error

r = 0.316**

r = 0.260**

Overall Self Side Error

r = 0.168

r = 0.061

Overall Self Error (side & front)

r = 0.318**

r = 0.160

Object Error Chair

r = 0.002

r = 0.009

Cognition and body image 197

Object Error Table

r = 0.109

r = 0.058

198 Hanan Abdulwahab El Ashegh and Roger Lindsay

As Table 5.3 shows, there were signicant correlations between Overall Front-Other Error & EDI Subscales Drive for Thinness, Body Dissatisfaction & Interpersonal Distrust. Correlations were also observed between Overall-Other Error & the EDI subscale of Interpersonal Distrust. Table 5.4 reports signicant correlations between the VOSP Position Discrimination subscale & Front-Self Shoulder errors; Front-Self Waist errors and Side-Self Shoulders errors. There were also signicant correlations between the VOSP Number Location subscale and both Front-Self Shoulder errors and Side-Self Shoulders errors. Finally, there was a signicant correlation between the VOSP Dot Counting sub-scale and Front-Self Shoulder errors. Table 5.5 shows that there were no signicant correlations between scores on any VOSP subscale and errors in making judgements about the body size of other people. Similarly, 5.6 shows that no signicant correlations were observed between errors in judging the size of ones own body or the size of objects and scores on the Beck Depression Inventory, the Body Dysmorphic Disorder Questionnaire-Revised, the State-Trait Anxiety Inventory or the Eysenck Personality Inventory. Nor were there signicant correlations between errors in judging the size Questionnaire-Revised, the State-Trait Anxiety Inventory or the Eysenck Personality Inventory of other people and scores on the Beck Depression Inventory, the Body Dysmorphic Disorder Questionnaire-Revised, the State-Trait Anxiety Inventory or the Eysenck Personality Inventory. As with judgements of self, so with judgements of others: Table 5.7 shows that there were no signicant correlations between errors in judging the size of other people and scores on the Beck Depression Inventory, the Body Dysmorphic Disorder Questionnaire-Revised, the State-Trait Anxiety Inventory or the Eysenck Personality Inventory. Psychometric measures were treated as independent variables in interpreting the data and estimation error scores were treated as dependent variables. Estimation error scores consisted of variables directly measured, e.g., Self-Front Shoulder error refers to percent error in frontal estimation of shoulder width. Averaging errors in frontal estimation of shoulders, waist and thighs yields Self-Front Error; averaging prole estimation of shoulders, waist and thighs yields Self-Side Error, etc. Pearson correlations were used to check that individual variables over which averages were computed did not behave dierently from the average. For example, when a correlation between Height/Weight Ratio and the summary variable TotalSelf Error is reported, it was established that the correlation with the summary variable was a reasonable reection of the correlation with the three variables over which the average was computed (the operational criterion for this was that the

Table 5.3. Correlation between EDI subscales and error scores when judging the dimensions of other peoples bodies.
EDI EDI EDI EDI EDI EDI (Body (Ineectiveness) (Perfectionism) (Interpersonal (Interoceptive (Maturity Dissatisfaction) Distrust) Awareness) Fear) EDI Average

EDI EDI Key to Table r/r = Pearson correlation (Drive for (Bulimia) Thinness) n = 100 p < 0.05, p < 0.01*, p < 0.005** r = 0.134 r = 0.057 r = 0.036 r = 0.127 r = 0.053 r = 0.060 r = 0.028 r = 0.074 r = 0.103 r = 0.073 r = 0.156 r = 0.011 r = 0.046 r = 0.275** r = 0 .004 r = 0.148 r = 0.105 r = 0.044 r = 0.093 r = 0.138 r = 0.056 r = 0.157 r = 0.132 r = 0.103 r = 0.012 r = 0.135 r = 0.028 r = 0.002 r = 0.492** r = 0.009 r = 0.319** r = 0.021 r = 0.143 r = 0.011 r = 0.330** r = 0.137 r = 0.083 r = 0.143 r = 0.059 r = 0.483** r = 0.038 r = 0.348** r = 0.036 r = 0.316 r = 0.308** r = 0.322** r = 0.179 r = 0.033 r = 0.042 r = 0.053 r = 0.178 r = 0.159 r = 0.165 r = 0.080 r = 0.051 r = 0.163 r = 0.037

Front Other Error Shoul- r = 0.128 der

r = 0.078

Front Other Error Waist r = 0.300** r = 0.233

Front Other Error Thighs r = 0.251

r = 0.133

Overall Front Other Error r = 0.288** r = 0.189

Side Other Error Shoulders

r = 0.000

r = 0.070

Side Other Error Waist

r = 0.062 r = 0.129

Side Other Error Thighs r = 0.056 r = 0.061 r = 0.059 r = 0.058 r = 0.249

Overall Side Other Error r = 0.063 r = 0.056

Cognition and body image 199

Overall Other Error

r = 0.158

r = 0.159

r = 0.420** r = 0.047

r = 0.090

Table 5.4. Correlations between scores on the Visual Object & Spatial Perception Battery (VOSP) and errors in estimating ones own body dimensions.
VOSP (Cube Analysis) VOSP VOSP (Silhouettes) (Average)

VOSP Key to Table r/r = Pearson correlation (Object Decision) n = 100 p < 0.05, p < 0.01*, p < 0.005** r = 0.103 r = 0.049 r = 0.118 r = 0.113 r = 0.025 r = 0.055 r = 0.084 r = 0.100 r = 0.019 r = 0.003 r = 0.078 r = 0.014 r = 0.059 r = 0.054 r = 0.087 r = 0.029 r = 0.084 r = 0.076 r = 0.055 r = 0.114 r = 0.028 r = 0.016 r = 0.028 r = 0.037 r = 0.087 r = 0.016 r = 0.066 r = 0.019 r = 0.155 r = 0.184 r = 0.072 r = 0.137 r = 0.074 r = 0.016 r = 0.010 r = 0.003 r = 0.115 r = 0.012 r = 0.079 r = 0.122 r = 0.104 r = 0.035 r = 0.089 r = 0.039 r = 0.115 r = 0.077 r = 0.070 r = 0.055 r = 0.064 r = 0.048 r = 0.092 r = 0.085 r = 0.072 r = 0.085 r = 0.004 r = 0.017 r = 0.033 r = 0.042 r = 0.134 r = 0.042 r = 0.058 r = 0.055 r = 0.021 r = 0.010 r = 0.018 r = 0.006 r = 0.012 r = 0.062 r = 0.118 r = 0.104 r = 0.055

VOSP VOSP VOSP VOSP VOSP VOSP (Progressive (Shape (Incomplete (Dot (Position (Number Silhouettes) detection) letters) Counting) Discrimination)Location)

Front Other Error Shoul- R = 0.001 r = 0.131 ders

Front Other Error Waist R = 0.027

r = 0.082

Front Other Error Thighs R = 0.040

r = 0.097

Overall Front Other Error R = 0.027

r = 0.133

200 Hanan Abdulwahab El Ashegh and Roger Lindsay

Side Other Error Shoulders

R = 0.003 r = 0.020

Side Other Error Waist

R = 0.037

r = 0.009

Side Other Error Thighs R = 0.097

r = 0.080

Overall Side Other Error R = 0.074

r = 0.043

Overall Other Error

R = 0.060

r = 0.0114 r = 0.047

Table 5.5. Correlations between Visual Object & Spatial Perception Battery (VOSP) scores and errors in judging the bodily dimensions of others.
VOSP VOSP (Shape (Incomplete detection) letters) VOSP VOSP VOSP VOSP (Dot (Position (Number (Cube Counting) Discrimination) Location) Analysis) VOSP VOSP (Silhouettes) (Average)

VOSP Key to Table r/r = Pearson correlation (Object Decision) n = 100 p < 0.05, p < 0.01*, p < 0.005** r = 0.148 r = 0.091 r = 0.304* r = 0.109 r = 0.122 r = 0.106 r = 0.041 r = 0.073 r = 0.020 r = 0.157 r = 0.005 r = 0.013 r = 0.027 r = 0.190 r = 0.170 r = 0.098 r = 0.117 r = 0.229 r = 0.021 r = 0.095 r = 0.014 r = 0.000 r = 0.137 r = 0.140 r = 0.287* r = 0.095 r = 0.209 r = 0.101 r = 0.139 r = 0.030 r = 0.142 r = 0.111 r = 0.221 r = 0.077 r = 0.235 r = 0.160 r = 0.014 r = 0.129 r = 0.122 r = 0.086 r = 0.107 r = 0.100 r = 0.199 r = 0.260* r = 0.233* r = 0.287* r = 0.018 r = 0.001 r = 0.142 r = 0.105 r = 0.033 r = 0.011 r = 0.153 r = 0.014 r = 0.232 r = 0.029 r = 0.107 r = 0.072 r = 0.017 r = 0.066 r = 0.091 r = 0.107 r = 0.042 r = 0.077 r = 0.007 r = 0.012 r = 0.114 r = 0.049 r = 0.029

VOSP (Progressive Silhouettes)

Front Self Error Shoulders r = 0.192

r = 0.093

Front Self Error Waist

r = 0.010 r = 0.154

Front Self Error Thighs

r = 0.063 r = 0.071

r = 0.148 r = 0.052 r = 0.001

Overall Front Self Error

r = 0.061

r = 0.068

Side Self Shoulders Error r = 0.237

r = 0.095

r = 0.296* r = 0.113

Side Self Waist Error

r = 0.133

r = 0.004

r = 0.175 r = 0.010 r = 0.016 r = 0.192 r = 0.004 r = 0.068

Side Self Thighs Error

r = 0.074 r = 0.076

Overall Self Side Error

r = 0.197

r = 0.072

Overall Self Error

r = 0.126

r = 0.092

Cognition and body image 201

Object Error

r = 0.224

r = 0.125

Table 5.6. Correlations between various measures of psychopathology and errors in judging ones own bodily dimension and in judging the dimensions of objects.
Trait Anxiety Eysenck Personality Eysenck Personality Eysenck Inventory (Psychosis) Inventory Personality (Neurosis) Inventory (Lie Scale) r = 0.090 r = 0.019 r = 0.070 r = 0.095 r = 0.122 r = 0.116 r = 0.043 r = 0.138 r = 0.136 r = 0.054 r = 0.010 r = 0.036 r = 0.038 r = 0.093 r = 0.088 r = 0.034 r = 0.016 r = 0.029 r = 0.066 r = 0.128 r = 0.093 r = 0.039 r = 0.125 r = 0.008 r = 0.063 r = 0.062 r = 0.014 r = 0.014 r = 0.006 r = 0.145

Beck Depression Body State Anxiety Key to Table r/r = Pearson correlation Inventory Dysmorphic Disorder n = 100 Questionnaire p < 0.05, p < 0.01*, p < 0.005** r = 0.048 r = 0.050 r = 0.047 r= r = 0.109 r = 0.065 r = 0.128 r = 0.112 r = 0.120 r = 0.109 r = 0.088 r = 0.005 r = 0.046 r = 0.049 r = 0.147 r = 0.125 r = 0.149 r = 0.176 r = 0.152 r = 0.089

Self Front Error Shoulders r = 0.148

r = 0.000

Self Front Error Waist

r = 0.132

r = 0.007

Self Front Error Thighs

r = 0.078

r = 0.145

Overall Front Self Error

r = 0.019

r = 0.086

202 Hanan Abdulwahab El Ashegh and Roger Lindsay

Side Self Shoulders Error

r = 0.049

r = 0.082

Side Self Waist Error

r = 0.007

r = 0.022

Side Self Thighs Error

r = 0.042

r = 0.098

Overall Self Side Error

r = 0.020

r = 0.079

Overall Self Error

r = 0.008

r = 0.013

Object Error

r = 0.046

r = 0.008

Table 5.7. Correlations between various measures of psychopathology and errors in judging the bodily dimensions of other people.
State Anxiety Trait Anxiety Eysenck Personality Eysenck Personality Eysenck Disorder Inventory Disorder Inventory Personality (Psychosis) (Neurosis) Disorder Inventory (Lie Scale) r = 0.159 r = 0.043 r = 0.114 r = 0.155 r = 0.132 r = 0.097 r = 0.032 r = 0.059 r = 0.091 r = 0.140 r = 0.116 r = 0.054 r = 0.029 r = 0.187 r = 0.023 r = 0.159 r = 0.049 r = 0.028 r = 0.032 r = 0.016 r = 0.043 r = 0.012 r = 0.099 r = 0.066 r = 0.123 r = 0.004 r = 0.038 r = 0.077 r = 0.088

Beck Depression Body Dysmorphic Key to Table r/r = Pearson correlation Inventory Disorder Questionnaire n = 100 p < 0.05, p < 0.01*, p < 0.005** r = 0.045 r = 0.101 r = 0.135 r = 0.016 r = 0.108 r = 0.080 r = 0.086 r = 0.005 r = 0.079 r = 0.118 r = 0.088 r = 0.127 r = 0.004 r = 0.072 r = 0.066 r = 0.090 r = 0.051 r = 0.102 r = 0.107 r = 0.089

Front Other Error Shoulders

r = 0.127

r = 0.087

Front Other Error Waist

r = 0.067

r = 0.127

Front Other Error Thighs

r = 0.064

r = 0.050

Overall Other Front Error r = 0.111

r = 0.071

Side Other Error Shoulders r = 0.159

r = 0.237

Side Other Error Waist

r = 0.233

r = 0.157

Side Other Error Thighs

r = 0.142

r = 0.128

Overall Side Other Error

r = 0.097

r = 0.252

Overall Other Error

r = 0.130

r = 0.192

Cognition and body image 203

Overall Object Error

r = 0.046

r = 0.008

r = 0.145

204 Hanan Abdulwahab El Ashegh and Roger Lindsay

Pearson r value for the average did not dier by more than 0.3 from the r value for any of the components of the average). Pearson correlations were also used as a lter to select those independent variables showing an association with the summary dependent variables. Observed correlations between subscales of the Eating Disorders Inventory and average overall error scores are reported in Table 5.8 and correlations between overall averages for other measures of psychpathology and average overall error scores are reported in Table 5.9 below. 5.1 Outcome of factor analysis Fourteen variables were entered into the factor analysis (Age, Gender, Height/ Weight Ratio, Body Dissatisfaction Inventory, Body Dysmorphic Disorders Questionnaire, Eating Disorders Inventory Striving for perfection subscale, Eating Disorders Inventory Body dissatisfaction subscale, Eating Disorders Inventory Drive for thinness subscale, Average Visual Object and Space Perception Test score, Average Self-Front error, Average Self-side error, Average Other-front error, Average Other-side error, and Average Object Error). A Principal Components Analysis with Varimax rotation extracted 5 factors with an eigenvalue greater than 1.0, which together accounted for 63.6% of the variance. Factor naming and interpretation was based upon variables with factor loadings of greater than 0.4. Factors meeting this criterion are summarised in Table 5.10 and illustrated in Figure 5. These factors are briey discussed below. Factor 1 (24.3% of variance) Negative female body image The variables loading onto this factor were: Gender (0.9), Eating Disorders Inventory striving for Perfection subscale (0.6), Eating Disorders Inventory

25 20 15 10 5 0 Factor 1 Factor 2 Factor 3 Factor 4 Factor 5

Figure 5. Scree Plot of percent of variance explained by Factors 15.

Table 5.8. Signicant correlations between subscales of the Eating Disorders Inventory and average overall error scores.
EDI EDI (Body (I) Dissatisfaction) p < 0.01** p < .01** EDI EDI (Perfectionism) (ID) EDI (IA) EDI (Maturity Fear) EDI Mean

EDI (Drive for Thinness)

EDI (Bulimia)

Front Self Shoulders Error

Front Self Waist Error

p < 0.01**

Front Self Thighs Error p < 0.01** p < 0.01** p < 0.01** p < 0.01** p < 0.01** p < 0.01** p < 0.01** p < 0.01 p < 0.01** p < 0.01**

Overall Front Self Error

p < 0.01**

Side Self Shoulders Error

Side Self Waist Error

p < 0.01**

Side Self Thighs Error

p < 0.01**

p < 0.01**

Overall Side Self Error

Overall Self Error (Side & Front)

p < 0.01**

Cognition and body image 205

Table 5.9. Signicant correlations between overall averages for various measures of psychpathology and average overall error scores.
Other Error p < 0.01** Object Error Average Error Height : Weight Ratio

Self Error

Eating Disorders Inventory Aver- p < 0.01** age

Average Visual Object & Space Perception Battery

Beck Depression Inventory p < 0.005*** p < 0.01**

Body Dysmorphic Disorder-Q

State Anxiety

Trait Anxiety p < 0.01**

206 Hanan Abdulwahab El Ashegh and Roger Lindsay

Gender

p < 0.01**

Table 5.10. Factor Loadings with Varimax rotation.

Factor Loadings > 0.4 Factor 2 0.431 0.631 Factor 3 Factor 4 Factor 5

Factor 1

Age

Gender

0.507

Posneg 0.790 0.644 0.633 0.760 0.461

0.887

Beck Depression Inventory

Eating Disorder Inventory Average

Height : Weight Ratio

Average Visual Object & Spatial Perception Battery

Average Front Self Error

0.715

Average Self Side Error 0.772 0.779

0.762

Average Front Other Error

Average Side Other Error

Cognition and body image 207

Object Error

0.956

208 Hanan Abdulwahab El Ashegh and Roger Lindsay

body dissatisfaction subscale (0.6), Eating Disorders Inventory drive for thinness subscale (0.6), Average Self-Front error (0.5), Average Self-side error (0.7), and Average Other-front error (0.7). The most satisfactory interpretation of this factor seems to be that women who strive for perfection and who are dissatised with their body characteristics and have a high drive towards thinness, tend to underestimate their own bodily dimensions. [N. B. the mean value for the self estimation variables is negative (Front: 0.8; Side: 9.5) so a negative factor loading implies that as values on other variables on which the factor loads positively increase, so scores on this variable get smaller, i.e. become more negative]. There is also an association with overestimation of the dimensions of other people when viewing them frontally. Factor 2 (11.9% of variance) Age-dependent body-dissatisfaction The variables loading onto this factor were: Age (0.5), Body Dissatisfaction Index (0.7), Body Dysmorphic Disorders Questionnaire (0.7), and Eating Disorders Inventory body dissatisfaction subscale (0.4). The factor interpretation seems to be that the younger a person is, the more dissatised they are with the physical parameters of their own body. Factor 3 (10.1% of variance) Self-Objectivity The variables loading onto this factor were: Height/Weight Ratio (0.8), Eating Disorders Inventory body dissatisfaction subscale (0.5), Eating Disorders Inventory drive for thinness subscale (0.5), Average Visual Object and Space Perception Test score (0.5). Factor 3 seems to indicate that people who are overweight, and who perceive object and spatial relations accurately, tend to have high body dissatisfaction and a high drive for thinness. Factor 4 (9.3% of variance) Inattention to body-size of others The variables loading onto this factor were: Eating Disorders Inventory striving for Perfection subscale (0.4), Average Other-front error (0.4), and Average Other-side error (0.9). This factor indicates that people who do not strive for perfection tend to make errors in estimating the physical size of other peoples bodies. Factor 5 (8.1% of variance) Accuracy orientation The variables loading onto this factor were: Eating Disorders Inventory striving for Perfection subscale (0.6), Average Visual Object and Space Perception Test score (0.5), and Average Object Error (0.8). The most satisfactory interpretation for Factor 5 would seem to be that people

Cognition and body image 209

who strive for perfection also tend to accurately perceive objects and spatial relations, and to make few errors in estimating the physical size of objects.

6. Discussion The investigation described in the present report had a broad as well as a narrow purpose. The narrow purpose was to examine experimentally the hypothesis that cognitive representations of the physical self are mediated by mechanisms that are psychologically and neurologically independent of the cognitive operations that model the physical world of space and objects. The broader purpose was to seek evidence that natural technology exists. The rationale here was to consider a cognitive function that seems as basic and fundamental as a cognitive function can be: the set of processes by which a human organism represents itself as a physical and social agent. The survival of an organism depends crucially upon the accuracy and validity of the planning processes that underlie its ability to act in and upon the physical world. In representing the physical self, if anywhere, it might be expected that evolution would have ensured that underlying processes are user-proof. Just as physical processes, such as the control of heart or kidney function, are not susceptible to voluntary control, so, if cognitive processes are hardwired, self-representation by an agent is so cognitively fundamental that user intervention would surely be impossible. We have argued for an alternative view. Eective human action depends not only upon ecient and accurate computational processes, but also upon motivational factors involving relatively abstract conceptions such as self-worth. Why should agents who attach no value to their own life, develop or implement a plan to avoid death at the hands of a predator or an enemy? In human society self-worth is intimately bound up with conceptions of ones physical self. Attractiveness to others is not the sole determinant of self-worth, but it is plausible to suppose that it makes an important contribution. This line of thinking suggests why it might be important that representations of physical self should be cognitively mutable human agents may sometimes need to see themselves not as others see them. If physical self were just one more spatial object, represented within a general cognitive system for modelling the physical world, it would be dicult to mis-represent the self without constantly calling attention to the deceit by over-riding normal scaling operations whenever the self was the object of cognition. If self-deception were the objective, this process would be self-

210 Hanan Abdulwahab El Ashegh and Roger Lindsay

defeating. Whilst it might be possible to achieve the same end by introducing appropriate distortions into all spatial and object modelling, there would be a high price to pay for this, as all action plans would then be computed from spatial models with reduced validity. Given the premiss that misrepresentation of physical self can be cognitively desirable under some circumstances, the most obvious way of achieving this end is by developing a special-purpose modelling system that incorporates exactly the sought distortions, but applies them to no object other than the self. However, except in cases such as hereditary disorders, the misrepresentations required will almost always depend upon contingencies within an individuals life and cultural context: becoming overweight in a culture valuing youth-like slimness; facial disgurement in a culture valuing intactness and beauty, and so forth. The self-modelling distortions necessary to conceal or eliminate such aberrations could not be known in advance by evolution, nor could the aesthetic values arising from cultural context. Hence, the modelling process must necessarily be developed within the cognitive lifetime of the agent. It follows that the self-modelling process cannot be hardwired and must instead use cognitive software in other words it must constitute an example of what we have referred to as natural technology. What then is to be expected in the data if cognitive representation of physical self is hard-wired (or at least, not susceptible to distortion by psychological factors)? And what is to be expected if the system for self-representation has developed with an inbuilt capability to incorporate culturally desirable distortions? If self-representation is not a special-purpose module, then errors in judgements concerning the physical parameters of ones own body should be highly correlated with errors in judgement concerning the bodies of others, and with errors in judging the spatial characteristics of inanimate objects. Judgements about body and object size should also be positively correlated with measures of spatial intelligence and measures of perceptual accuracy. None of these things should show any correlation with personality variables such as anxiety or depression scores, or with psychometric measures of attitudes to self or ones own body. When the data is subjected to factor analysis, accuracy of size and spatial judgements should load onto one factor that is entirely independent of personality and attitude variables. If the natural technology hypothesis is correct, the expected outcomes are quite dierent. Judgements about physical self should show no or low correlations with judgements about the bodily dimensions of other people and physical objects. Because the self-representation module is hypothesized to be

Cognition and body image

211

sensitive to psychological processes indeed this is its raison dtre a positive correlation with personality and attitude measures is to be expected. Factor analysis should yield a rather more complex factor solution with physical self-judgements loading onto the same factor as some personality measures, but showing independence from factors associated with size judgements about objects and other people. The most striking feature of the results we have reported is the outcome of the factor analysis, which suggests that errors in judging the physical size of ones own physical dimensions from a photograph, errors in judging the size of other people and errors in judging the size of objects such as chairs and tables are assigned to quite separate and independent factors. High error scores in judging ones own bodily dimensions are associated with females (0.9) with body dissatisfaction (0.6) and a drive for thinness (0.6) particularly amongst participant who strive for perfection (0.6) [Factor 1, see Table 5.8 above]. Inaccuracy in making size judgements of others bodies (Front 0.4; Side 0.9) is also associated with striving for perfection, but this time the association is negative (0.4) [Factor 4, see Table 5.8 above]. The lower participants score on striving for perfection items, the less accurate their judgement of others tends to be. Finally, accurate judgement of the size of objects (0.8) is negatively associated (0.8) with VOSP scores (0.5) and positively associated with striving for perfection (0.6) [Factor 5, see Table 5.8 above]. Factor 5 is perhaps least surprising: to the extent that VOSP scores measure what they are intended to measure, accurate perception of objects and spatial relations, this is precisely the outcome to be expected. It seems likely that the association with the striving for perfection subscale of the EDI reects the importance that participants attach to making accurate judgements. If people accurately perceive objects and spatial relations, and accuracy is important to them, then they make relatively few errors in judging object dimensions. Conventional theories of size estimation, assuming a unitary perceptual apparatus, applied in a standard manner to all classes of input, would predict the same negative association between VOSP scores and errors for judgements of self-size and other-size. The fact that VOSP scores showed little or no association with error scores in either case (see Tables 5.4 and 5.5) suggests that there are important dierences in the cognitive operations underlying perceptual judgements about objects, and perceptual judgements about people. Size estimation requires a human judge to cognitively represent the person or object depicted in a photograph, to retrieve from memory an object of known size, to adjust the two cognitive representations to the same scale, and

212 Hanan Abdulwahab El Ashegh and Roger Lindsay

nally to read o the size of the unknown parameters in the perceptual representation. The absence of a VOSP connection with errors in estimating the physical parameters of people, suggests that individual dierences in error rate between participants arise from the cognitive components of the operation (retrieval and comparison), rather than the perceptual component. The accuracy of the scaling operation seems to be related to the striving for perfection subscale of the EDI. Adult participants are highly skilled at perceptual scaling, so the observed error rate may well reect the eort that they are prepared to expend in achieving accuracy. In estimating object size, if they have accurately perceived and represented the object, and they strive for accuracy in judgement then low error scores result. In judgements about other people, judgements are accurate except amongst participants who are not predisposed to strive for accuracy. If this account is correct, then judgement errors about objects and judgements errors about others can be explained by the same basic cognitive process, with observed dierences in performance resulting from a greater contribution of perceptual factors when objects are judged, and from dierences in the extent to which participants strive for accuracy between the two cases. When judging objects, people who try atypically hard make lower than average errors, when judging other people, participants who do not strive for accuracy make higher than average errors. The anomalous case is the error rate in conditions in which people make size judgements about their own body, and in this condition, the tendency to underestimate their own body parameters is associated with female participants in particular. There is no association with VOSP scores, so the errors do not result from faulty perception or spatial judgement. There is a positive association with striving for perfection, so that if this variable can legitimately be interpreted as an index of the eort applied in trying to judge accurately, it would appear that the comparison component of the represent, retrieve, compare model is not the source of the increased error rate. It seems more likely that the errors in self-estimation are related to the retrieval component. If the retrieved image of known size is incorrectly scaled, accurately mapping it onto the perceptual representation derived from the stimulus photograph would still produce incorrect size estimations. We have argued earlier in this chapter that there seems to be a strong theoretical case for supposing that cognitive operations related to the physical self depend upon the existence of a Body-Image Generator. It seems reasonable to use this term to refer to the cognitive process by which an image of ones own body is produced from memory. The implicit claim that own-body images are not processed by the same image generator that

Cognition and body image

213

handles objects and other people, is consistent with the evidence that self-size estimates do not seem to be scaled in the same way as object-size estimates or other-size estimates. Evidence from raw correlations seems to support this theoretical analysis. Table 5.1, Table 5.4 and Table 5.5 show that though there are signicant correlations between Total VOSP scores and Height/Weight Ratio (see Table 5.1. p. 58), there are no correlations between Total VOSP and any of the other body size and object error estimations (see Table 5.4 and Table 5.5). It was hypothesized in advance of the study that inaccurate perception of ones own body would not be related to any general deciency in perception or spatial judgement. Instead, it was suggested there may be two distinct sub-types of BDD, both resulting from faulty operation of the BIG mechanism. One BDD subtype was hypothesized to result from a faulty cognitive representation of ones own body. This type of BDD (Type 1 BDD) was expected to be independent of cultural stereotypes and depressive illness: cases in which though the actual body is normal, BIG generates an image that is anomalous in some way (e.g., excessively large in some dimensions). Individuals with this impairment may have normal bodies, and normal evaluation criteria, but will still perceive their body as defective. The second type of BDD (Type 2 BDD) was expected to be associated with a negative view of an undistorted cognitive representation of body, and associated with internalized cultural norms, depression, and possibly side-eects of drugs prescribed to alleviate emotional imbalance: cases in which the actual body is normal, and BIG generates an accurate image of the body, but individuals judge their body image as defective because of inappropriate norms or the application of negative schemas to themselves The data from the present study, particularly the factor analysis outcomes summarized above, provide clear support for the existence of Type 1 BDD. As expected, this BDD subtype is associated with high body dissatisfaction scores & self-estimation errors, but independent of VOSP scores (as perception is not compromised) & Height/Weight Ratio as body dissatisfaction is not supposed to result from the actual physical characteristics of the body. The extremely strong association with gender (0.9) arises because in the database, females were coded as 1 and males as 0. Hence, the positive association indicates that Type 1 BDD occurs predominantly amongst females. If Type 1 BDD resulted from a structural or genetic deciency of some kind, it might be expected to occur with roughly equal frequency in both genders. A genetic link between sex and cognitive characteristics is not biologically impossible, but despite an almost obsessive search for male/female dierences in cognition, little of substance has

214 Hanan Abdulwahab El Ashegh and Roger Lindsay

yet been found (see, for example, Maccoby and Jacklin, 1974). If this argument is accepted, the conclusion: that some females in our culture suer psychological distress because of faulty cognitive representations of their own body, has two implications. Firstly, the characteristics of BIG, the operations of which appear to underlie the problem, are the result of learning, and as such, are likely to be localized in time and culture. This does indeed seem to be indicated by the data particularly the association with gender. Though the present study cannot support a detailed analysis of the mechanisms giving rise to inaccurate cognitive representations of faulty body image, a process that would be sucient to do so can readily be imagined. In a culture that assigns value to females according to their body size and shape, those females who attach importance to social approval will experience distress if their bodies do not conform to the most highly valued body-stereotype (image-stereotype mismatch). Distress could be reduced if image-stereotype mismatch can be eliminated, and this can be achieved either by generating and maintaining a distorted cognitive representation of a high-value female body stereotype, or by generating and storing a distorted cognitive representation of ones own body. The cost of this strategy will come from having to deal with a constant stream of perceptual information that is discrepant with respect to stored cognitive self-representations. This constant challenge to the veridicality of cognitions can be removed either by systematically misperceiving reality, or by bringing reality into line by changing ones bodily characteristics to conform to the stored self-image, e.g., by dieting. An optimistic implication of this analysis is that if the characteristics of BIG that underlie Type 1 BDD are learned, then they can also be modied via re-learning processes. This suggests that in extreme cases of BDD re-educating BIG through a systematic training regime intended to produce more accurate judgements of own-body parameters might oer an eective clinical intervention strategy. This would sharply contrast with current clinical approaches based upon attempts to directly modify behaviours eating habits, or to reduce distress by administering drugs such as tranquillizers. Such approaches have been notoriously unsuccessful. If the source of dissatisfaction with their own bodies amongst females lies in cognitive processes, then only cognitive interventions are likely to work, and it is unsurprising that attempts to solve the problem by modifying consequential behaviour or emotions have not been successful. The evidence for Type 2 BDD is indicative rather than compelling, as the present study used normal and not clinical participants, the data showing the presence of few or no cases of eating disorder (mean score on EDI = 3.6. Criterion score for ED is > 10 on all subscales. The number of participants

Cognition and body image

215

exceeding the criterion score was therefore zero). Nor were there any cases of clinical depression present in the sample (mean score on BDI=10.5; The number of participants exceeding the criterion score for moderate depression was 13, with no participants exceeding the score of 30 required for moderate depression). It is consequently unsurprising that associations between variables directly associated with pathology were insuciently robust to emerge in the factor analysis. A correlation between EDI Total scores and average overall Front and Side Self estimation error was observed, but as Tables 5.2, 5.3 and 5.8 show, correlations were predominantly between estimation errors concerning Self and specic EDI subscales: Drive for Thinness (EDI-DT), Body Dissatisfaction (EDI-BD), and Perfectionism (EDI-P). These correlations imply, not an association between body estimation errors and eating disorders, but rather a correlation between estimations of body size, and the EDI sub-scales related to body image. The correlational data does, however, suggest that some of the expected relationships are present in the data in a weak form, emerging when the power of the analysis is increased by averaging over variables such as photograph angle (Front or Side) and Self- and Other-judgements (these relationships are reported in Tables 5.8 and 5.9). To take one example, it has been suggested in the research literature (Phillips, 1996b) that there is an association between depression and body over- or under-estimation. Results from the present study weakly support this claim as there was a signicant correlation between scores on the Beck Depression Inventory and the overall average error score in estimating size (r = 0.01; n = 100; p < 0.01; see Table 5.1). However, the fact that the relationship was barely signicant, even when all size estimation tasks were combined, means that it would be unwise to place much reliance on this nding. The correlational data also showed a signicant association between scores on the Body Dysmorphic Disorder Questionnaire and overall average error score in body parameter estimation conditions (r = 0.154; n = 100; p < 0.005; see Tables 5.1 and 5.9). This association has also been reported in the research literature (Phillips, 1996b). The general claim is that people scoring high on BDDQ-R are liable to cognitive-perceptual distortions in the way that they see their own bodies: imagining features that are not present or exaggerated size or shape. Such individuals also tend to have preoccupations with imagined defects in appearance, so that if a slight physical anomaly is present, the persons concern is markedly excessive (DSM-IV, 1997). Dissatisfaction with body does not appear to be correlated with trait or state anxiety, as there is no signicant correlation between either measure and errors in size estimation (of Self, Others or Objects).

216 Hanan Abdulwahab El Ashegh and Roger Lindsay

The theoretical analysis oered in the present study suggests that syndromes involving body dissatisfaction, misperceptions of body size and shape and eating disorders may have two distinct aetiologies. In Type 1 BDD, the origin of any eventual pathology seems to lie in defensive cognitive distortions strategically introduced to control or suppress dissatisfaction with ones own body, particularly in females. Cognitive representations embodying scaleddown Self-dimensions or scaled-up Other-dimensions may eventually produce secondary behavioural and emotional problems, for example by misleading a person into being inappropriately tolerant of excessive bodyweight in comparison to the scaled-up representations of others. Type 2 BDD begins with an emotional problem For example, depression may cause self-misattribution of negative characteristics: Everything about me is bad, being overweight is bad, so I must be overweight. Alternatively high anxiety may cause a preoccupation with physical aspects of ones body that produces scaling errors via attentional processes, just as a mouth ulcer or facial blemish can subjectively assume exaggerated size because of the attention it attracts. Neither Type 1 nor Type 2 BDD are hypothesized to be caused directly by eating disorders, but rather it is supposed that they lead to unrealistic cognitive representations which interfere with the cognitive mechanisms normally involved in body-size regulation. Taking a broader theoretical perspective on the ndings we have reported, the data seems to provide reasonably robust support for two conclusions. Firstly, the cognitive mechanism underlying perception of the physical characteristics of a persons own body seems to be independent of the mechanisms involved in perceiving objects or other peoples bodies. The evidence for this takes the form of a double dissociation (Teuber, 1955): some variables aecting own body perception do not aect object and other-body perception, and vice versa. Double dissociation evidence is generally taken to be strongly indicative that two systems are independent. Secondly, the data also show that own body perception is aected by psychological variables and judgements that would seem to depend upon cultural norms. Though somewhat surprising in the context of standard and lay assumptions about the perception of the physical world and ones own body in particular, these ndings are exactly what would be expected if own body perception is the product of natural technology manifest in the form of a special-purpose module. In addition to making sense of some rather surprising data, this theoretical framework is also productive in that at least potentially it leads to new approaches to clinical intervention in a domain within which old approaches have a poor record of success.

Cognition and body image 217

Notes
1. Eating disorders is one of two main classes of psychological body image disorder having relatively high prevalence (see Note 2 below for discussion of body dysmorphism, the second main group of disorders). Eating disorders involve serious disturbances in eating behavior, such as extreme and unhealthy reduction of food intake or severe overeating, as well as feelings of distress or extreme concern about body shape or weight Eating disorders are not due to a failure of will or behaviour; rather, they are real, treatable medical illnesses in which certain maladaptive patterns of eating take on a life of their own. The main types of eating disorders are anorexia nervosa and bulimia nervosa. A third type, binge-eating disorder, has been suggested but has not yet been approved as a formal psychiatric diagnosis. Eating disorders frequently develop during adolescence or early adulthood, but some reports indicate their onset can occur during childhood or later in adulthood (Spearing, 2001, p. 1). Eating disorders are far more common amongst females than males. Only an estimated 5 to 15 percent of people with anorexia or bulimia (Andersen, 1995) and an estimated 35 percent of those with binge-eating disorder are male (Spitzer et al., 1993). An estimated 0.5 to 3.7 percent of females suer from anorexia nervosa in their lifetime (American Psychiatric Association Work Group on Eating Disorders, 2000). Symptoms of anorexia nervosa include: resistance to maintaining body weight at or above a minimally normal weight for age and height; intense fear of gaining weight or becoming fat, even though underweight; disturbance in the way in which ones body weight or shape is experienced, undue inuence of body weight or shape on self-evaluation, or denial of the seriousness of the current low body weight; infrequent or absent menstrual periods (in females who have reached puberty (Spearing, 2001). People with anorexia nervosa see themselves as overweight even though they are dangerously thin. The process of eating becomes an obsession. Unusual eating habits develop, such as avoiding food and meals, picking out a few foods and eating these in small quantities, or carefully weighing and portioning food. People with anorexia may repeatedly check their body weight, and many engage in other techniques to control their weight, such as intense and compulsive exercise, or purging by means of vomiting and abuse of laxatives, enemas, and diuretics. Girls with anorexia often experience a delayed onset of their rst menstrual period (Spearing, 2001). The mortality rate among people with anorexia has been estimated at 0.56 percent per year, or approximately 5.6 percent per decade, which is about 12 times higher than the annual death rate due to all causes of death among females ages 1524 in the general population (Sullivan, 1995). The most common causes of death are complications of the disorder, such as cardiac arrest or electrolyte imbalance, and suicide. Approximately 1.1 percent to 4.2 percent of females have bulimia nervosa in their lifetime (American Psychiatric Association Work Group on Eating Disorders, 2000). Symptoms of bulimia nervosa include: recurrent episodes of binge eating, characterized by eating an excessive amount of food within a discrete period of time and by a sense of lack of control over eating during the episode; recurrent inappropriate compensatory behaviour in order to prevent weight gain, such as self-induced vomiting or misuse of laxatives, diuretics, enemas, or other medications (purging); fasting; or excessive exercise, the binge eating and inappropriate compensatory behaviours both occur, on average, at least twice a week for 3 months.

218 Hanan Abdulwahab El Ashegh and Roger Lindsay

Self-evaluation is unduly inuenced by body shape and weight; because purging or other compensatory behaviour follows the binge-eating episodes, people with bulimia usually weigh within the normal range for their age and height. However, like individuals with anorexia, they may fear gaining weight, desire to lose weight, and feel intensely dissatised with their bodies. People with bulimia often perform the behaviours in secrecy, feeling disgusted and ashamed when they binge, yet relieved once they purge (Spearing, 2001). Community surveys have estimated that between 2 percent and 5 percent of Americans experience binge-eating disorder in a 6-month period (Bruce and Agras, 1992; Spitzer et al., 1993). Symptoms of binge-eating disorder include: recurrent episodes of binge eating, characterized by eating an excessive amount of food within a discrete period of time and by a sense of lack of control over eating during the episode. Binge-eating episodes are associated with at least 3 of the following: eating much more rapidly than normal; eating until feeling uncomfortably full; eating large amounts of food when not feeling physically hungry; eating alone because of being embarrassed by how much one is eating; feeling disgusted with oneself, depressed, or very guilty after overeating; marked distress about the binge-eating behaviour. Binge eating occurs, on average, at least 2 days a week for 6 months. Binge eating is not associated with the regular use of inappropriate compensatory behaviours (e.g., purging, fasting, excessive exercise). People with binge-eating disorder experience frequent episodes of out-of-control eating, with the same binge-eating symptoms as those with bulimia. The main dierence is that individuals with binge-eating disorder do not purge their bodies of excess calories. Therefore, many with the disorder are overweight for their age and height. Feelings of self-disgust and shame associated with this illness can lead to further bingeing, creating a cycle of binge eating (Spearing, 2001). Obsessive concern with dieting among adolescents has frequently been linked to a general dissatisfaction with their bodies (Huenemann et al., 1966). Huenemann et al. (1966) found that, among US ninth graders, 50% of boys and 65% of girls said that they were trying to do something about their weight. Tobin-Richards, Boxer, and Peterson (1983) reported that adolescent girls were less satised with their weight than adolescent boys. This satisfaction was linked to perceived body weight, with girls expressing most satisfaction when they perceived themselves as underweight and least satisfaction when they perceived themselves as overweight. Concern with thinness and dieting has been linked to an increasing prevalence of eating disorders among adolescent girls. Concern with weight, dieting, and body image is apparently associated with eating behaviours such as fasting, crash diets, binge eating, and selfinduced vomiting and with the use of laxatives, diuretics, and diet pills (Greenfeld et al., 1995). Furthermore, a strong desire for thinness has been associated with problems of eating behaviour (Lundholm and Littrell, 1986). 2. The second main group of body image disorders is Body Dysmorphic Disorder. A condition known as dysmorphophobia was rst described as long ago as 1886 as a: subjective feeling of ugliness or physical defect which a patient feels is noticeable to others, although he/she has appearance within normal limits (Morselli, 1886, p. 100). Body Dysmorphic Disorder (BDD) was rst recognized as a distinct disorder only in DSM IV (1997) and placed within the category of somatoform disorders (disorders having a pattern of

Cognition and body image 219

recurring multiple, clinically signicant, somatic complaints summary of DSM-IV, p. 486) According to DSM-IV the diagnostic criteria for Body Dysmorphic Disorder are: i. Preoccupation with an imagined defect in appearance. If a slight physical anomaly is present, the persons concern is markedly excessive. ii. The preoccupation causes clinically signicant distress or impairment in social, occupational, or other important areas of functioning. iii. The preoccupation is not better accounted for by another mental disorder (e.g., dissatisfaction with body shape and size in Anorexia Nervosa. Body dysmorphic disorder (BDD) is a distressing and sometimes psychologically disabling preoccupation with an imagined or slight defect in appearance. BDD is regarded as an Obsessive-Compulsive spectrum disorder that appears to be relatively common. BDD often goes undiagnosed, however, due to patients reluctance to divulge their symptoms because of secrecy and shame. Any body part can be the focus of concern (most often, the skin, hair, and nose), and most patients engage in compulsive behaviours, such as mirror checking, camouaging, excessive grooming, and skin picking. Approximately half are delusional, and the majority experience ideas or delusions of reference (Phillips et al., 1996). Nearly all patients suer some impairment in functioning as a result of their symptoms, some to a debilitating degree. Psychiatric hospitalization, suicidal ideation, and suicide attempts are relatively common. Phillips et al. (1996, p. 126) note that depression and suicide are frequent complications of BDD, other manifestations including obsessional preoccupation with an imagined appearance defect; clinically signicant distress or functioning impairment; impoverished social interactions because of embarrassment or shame; attempts to camouage the perceived deformity with clothing, makeup, or hair; use of non-psychiatric treatment (i.e., dermatologic or plastic surgery) with some patients even attempting surgery on themselves. As clinical depression involves negative attitudes to pretty much everything, it might be thought unsurprising that this may include negative attitudes to features of ones physical self. However, Phillips (1996a) is clearly of the opinion that the causal direction is from BDD to depression: Patients with BDD are more prone to major depression. in clinical settings, 60% of patients with BDD have major depression, and the lifetime risk for major depression in these patients is 80%. Patients with this co-morbid duo are at risk for suicide. Determining if depressed patients have BDD is important because the treatment is dierent. Usually, major depression occurs as a result of the BDD, not vice versa. (Phillips, 1996a, p. 156). Body dissatisfaction seems to be systematically related to body size distortion. The relationship between body dissatisfaction and body distortion was examined in an experiment by Gardner-Rick and Tockerman (1993). These investigators assessed Body Image accuracy using the Colour-a-Person Test (Wooley and Roll, 1991), and a computer based TV video that measured distortions resulting from inaccurate body size estimations and selfideal discrepancies. Results showed there was a signicant correlation between body dissatisfaction and body distortion (Gardner-Rick and Tockerman, 1993). A number of studies by Altabe and Thompson (1996) support the claim that body image acts like a cognitive structure. For example social comparison enhances the body image schema priming eect. Additionally, high trait distress individuals tended to be more sensitive to priming of both body image information and self-relevant information (Altabe and Thompson,

220 Hanan Abdulwahab El Ashegh and Roger Lindsay

1996). Altabe and Thompson conclude that these studies supported the interpretation of body image as a mental representation (ibid., p. 191). Treatment for people with BDD involves accurate assessment, proper diagnosis, and adherence to a medical regimen. According to Kirksey et al.: Although reportedly dicult to treat, selective serotonin reuptake inhibitors (SSRIs) and cognitive-behavioural therapy have been eective for some individuals with BDD. Examples of SSRIs that have resulted in improvement are: Prozac TM (uoxetine), AnafranilTM (clomipramine), LuvoxTM (uvoxamine), Zoloft TM (sertraline) and Pails (paroxetine) (Kirksey, Goodroad, Butensky and Holt-Ashley, 2000, http://www.ispub.com). Psychoactive drugs that appear to be clinically eective in treating BDD are substances generally reported to be benecial in cases of depression. This suggests that drug therapy is associated with BDD resulting from negative evaluations of body image rather than faulty body image representations. The present review is most concerned with BDD resulting from misjudgements of body size and shape, and it is unlikely that such cases would respond to drugs designed to relieve depression. Lack of sensitivity to anti-depressant drugs may even serve as a criterion for BDD resulting from cognitive misrepresentation of body image. 3. Bonnier (1905) seems to have rst used the term body schema in reporting observations of patients with brain lesions aecting their bodily experiences. In Bonniers study the experiences of interest resulted from vestibular dysfunction, which, according to Bonnier, alters the way that a subject experiences spatial aspects of the body. He referred to the disordered experiences that resulted from this disorder as aschematie (Bonnier, 1905, p. 606). Head and Holmes (1911) studied the pattern of impairment in postural sensation following lesions in various parts of the central nervous system to conclude that: [t]he nal product of the tests for the appreciation of posture or passive movement rises into consciousness as a measured postural change. For this combined standard against which all subsequent changes of posture are measured before they enter consciousness we propose the word schema (Head and Holmes, 1911, p. 246). Head (1926) claimed that a body schema is a model or representation of ones own body that constitutes a standard against which postures and body movements are judged. This representation can be considered the result of comparisons and integrations at the cortical level of past sensory experiences (postural, tactile, visual, kinaesthetic and vestibular) with current sensations. This gives rise to an almost completely unconscious plastic reference model that makes it possible to move easily in space and to recognise the parts of ones own body in all situations. 4. The term body image seems to have been rst employed (also originally within neurology and neuropsychology) to incorporate cognitive elements that were excluded from the body schema concept such as wishes, emotional attitudes and interactions with others. Schilder (1935) incorporated visual and emotional aspects, referring to a schema as a three dimensional image, a self-appearance of the body and a unit of its own, inuencing mental life by means of the emotional value invested in it. Though Lhermitte (1939) emphasised the representational component, derived from memory, he too preferred the term body image to body schema. The Gestalt oriented neurologist Conrad (1933) regarded body image as a particularly good example of his theory of mental function: the whole (a psychological body image) was greater than the sum of the parts (the physiological contributions from the various sense organs).

Cognition and body image 221

5. Kinsbourne (1993) has attacked the idea that there is a single system responsible for representing the physical self: There is no evidence for the existence of body schema in consciousness, bits of which can be nibbled away by disease. There is no set of localized decits of regional body awareness, which, when tted together like jigsaw pieces, cover the total body surface. Observation suggests that the representation of somatic awareness is object centred, not space-centred the objects in this instance being the body parts. We cannot achieve a simultaneous over feel of all the body. We can, however, shift attention to one or another body part at a time, just like to one or other object in a visual display (Kinsbourne, 1993, p. 71). Despite the intensity with which Kinbourne asserts his position, the terms body image and body schema are deeply embedded in psychology and appear to have demonstrable clinical utility. Equally important, Kinsbournes argument seems to be more a claim about the how mental representations of body are implemented, rather than a denial that they exist. No-one would say that a document wasnt represented in the memory of a computer merely because it was spread over a variety of memory locations.

References
Altabe, M. & J. K. Thompson (1996). Body Image: A Cognitive Self-Schema Construct? Cognitive Therapy & Research 20(2), 171193. Andersen, A. E. (1995). Eating disorders in males. In K. D. Brownell, C. G. Fairburn (Eds.), Eating disorders and obesity: a comprehensive handbook, pp. 177187. New York: Guilford Press. American Psychiatric Association Work Group on Eating Disorders (2000). Practice guideline for the treatment of patients with eating disorders (revision). American Journal of Psychiatry 157 (1 Suppl), 139. Beck, A. T., C. H. Ward, M. Mendelson, J. Mock & J. Erbaugh (1961). An inventory for measuring depression. Archives of General Psychiatry 4, 561571. Bonnier, P. (1905). Laschmatie. Revue Neurologique, 13, 605609. Bower, F. L. (1977). Normal Development of Body Image. New York: John Wiley Medical Publishers. Bruce, B. & W. S. Agras (1992). Binge eating in females: a population-based investigation. International Journal of Eating Disorders 12, 36573. Conrad, K. (1933). Der Konstitutionstypus, theoretische Grundlegung und praktische Bestimmung. Berlin: Springer. Eysenck, H. J. & S. B. G. Eysenck (1964). Manual for the Eysenck Personality Inventory. London: ULP. Fisher, S. (1990). The evolution of psychological concepts about the body. In T. F. Cash & Y. T. Pruzinsky (Eds.), Body Images: Development, Deviance and Change. New York: The Guildford Press. Fodor, J. A. (1983). The Modularity of Mind. Cambridge, MA: MIT Press. Gallagher, S. (1986). Body Image and Body Schema: a conceptual clarication. Journal of Mind & Behaviour 4, 541554.

222 Hanan Abdulwahab El Ashegh and Roger Lindsay

Gallagher, S. & J. Cole (1995). Body Image and Body Schema in a Deaerented Subject. Journal of Mind & Behaviour 16, 36989. Garner, D. M., M. P. Olmstead & J. Polivy (1983). Development and validation of a multidimensional eating disorders inventory for anorexia nervosa and bulimia. International Journal of Eating Disorders 2, 1534. Garner, D. M., M. P. Olmsted & J. Polivy (2003). Handbook of the Eating Disorders Inventory. Lutz, Florida: Psychological Assessment Resources, Inc. Gardner-Rick, M. & Yale R. Tockerman (1993). Genetic Social and General Psychology Monographs, 119 (1), 125145. New York: Heldref Publications. Goldberg, E. (2001). The Executive Brain. New York: Oxford Univesity Press. Gordon, R. A. (1992). Anorexia And Bulimia: Anatomy of A Social Epidemic. Oxford: Blackwell Publishers. Greenfeld, D., D. Mickley, D. M. Quinlan & P. Rolo (1995). Hypokalemia in outpatients with eating disorders. American Journal of Psychiatry 152, 603. Groth-Marnat, G. (1990). The handbook of psychological assessment (2nd ed.), New York: John Wiley & Sons. Head, H. (1926). Aphasia and kindred disorders of speech. London, Cambridge: University Press. Head, H. & G. Holmes (1911). Sensory Disturbance from Cerebral Lesions. Brain 34, 102254. Huenemann, R. L., L. R. Shapiro, M. C. Hampton & B. W. Mitchell (1966). A longitudinal study of gross body composition and body conformation and their association with food and activity in a teenage population. American Journal of Clinical Nutrition 18, 32538. Kinsbourne, M. (1993). Orientational Bias Model of Unilateral Neglect. Evidence from attentional gradients within hemispace. In I. H. Robertson & J. C. Marshal (Eds), Unilateral Neglect: clinical and experimental studies, pp. 6385. Hillsdale, NJ: Erlbaum. Kirksey, K. M., B. K. Goodroad, E. A. Butensky & M. Holt-Ashley (2000). Body Dysmorphic Disorder in an Adolescent Male Secondary to HIV-related Lipodystrophy: A Case Study. The Internet Journal of Advanced Nursing Practice 4 (2): 114. Accessed from http:// www.ispub.com on 9 Jan 2004. Lhermitte, J. (1939). Limage de Notre Corps. Paris: Nouvelle Revue Critique. Lundholm, J. K. & J. M. Littrell (1986). Desire for thinness among high school cheerleaders: Relationship to disordered eating and weight control behaviors. Adolescence 21, 573579. Maccoby, E. E. & C. N. Jacklin (1974). The Psychology of Sex Dierences. Stanford, CA: Stanford University Press Meenan, S. & R. O. Lindsay (2002). Planning and the Neurotechnology of Social Behaviour. International Journal of Cognition and Technology 1 (2), 233274. Morselli, E. (1886). Sulla Dismorfofobia e sulla Tafefobia. Bulletino academia della Scienze Mediche di Genova 6, 100119. Myers, P. & F. Biocca (1992). The Elastic Body Image: The Eect of Television Advertising and Programming on Body Image Distortions in Young Women. Journal of Communication 42(3), 108133.

Cognition and body image 223

Phillips, K. A. (1996). An open study of buspirone augmentation of serotonin-reuptake inhibitors in body dysmorphic disorder. Psychopharmacological Bulletin, 32, 17580. Phillips, K. A. (1996b). The Broken Mirror: Understanding and Treating Body Dysmorphic Disorder. Oxford: Oxford University Press. Phillips, K., K. Atala & R. Albertini (1996). Case study: body dysmorphic disorder in adolescents. Journal of the American Academy of Child and Adolescent Psychiatry 34 (9), 121620. Schilder, P. (1935). Image and Appearance of the Human Body. London: Kegan Paul. Slade, P. D. (1994). What is body image? Behaviour Research and Therapy 32(5), 497502. Spearing, M. (2001). Eating Disorders. NIH Publication No. 014901. Bethesda, Maryland: NIMH. Spielberger, C. D., H. L. Gorsuch, R. E. Lushene, P. R. Vagg & G.A. Jacobs (1983). Manual for the State-Trait Anxiety Inventory (STAI). Palo Alto, CA: Consulting Psychologists Press. Spitzer, R. L., S. Yanovski, T, Wadden, R. Wing, M. D. Marcus, A. Stunkard, M. Devlin, J. Mitchell, D. Hasin & R. L. Horne (1993). Binge eating disorder: its further validation in a multisite study. International Journal of Eating Disorders 13(2), 13753. Sullivan, P. F. (1995), Mortality in anorexia nervosa. American journal of Psychiatry 152(7), 10734 Teuber, H-L. (1955). Physiological Psychology. Annual Review of Psychology 9, 26796. Tobin-Richards, M., A. M. Boxer & A. C. Petersen (1983). The psychological signicance of pubertal change: Sex dierences in perceptions of self during early adolescence. In J. Brooks-Gunn & A. C. Petersen (Eds.), Girls in Puberty: Biological and Sociological Perspectives, pp. 127154. New York: Plenum. Warrington, E. K. & M. James (1991). Visual Object and Spatial Perception Battery. Bury St Edmonds, UK: Thames Valley Test Company Wooley, O. W. & S. Roll (1991). The Color-A-Person Body Dissatisfaction Test: Stability, internal consistency, validity, and factor structure. Journal of Personality Assessment 56 (3), 395413.

Looking under the rug


Context and context-aware artifacts*
Christopher Lueg
University of Technology, Sydney

Introduction
A rather important expectation in research communities having a strong belief towards technological progress is Weisers (1991) vision that technologies will weave themselves into the fabric of everyday life until they are indistinguishable from it. The idea is that embedded and invisible technology calms our lives by removing the annoyances. A decade later technological progress indeed allows for the development of intelligent gadgets that are much smaller and more powerful than the bulky desktop computers that were around when the vision came up. Everyday life is shaped by people and what they do, how they do it, and how they perceive what they are doing. Computers, however, still do not have the intuitive understanding of usage situations humans do naturally have. Computational artifacts that exhibit a notion of context-awareness are expected to address this problem. Attributing context-awareness to computational artifacts means that artifacts are to some extent capable of sensing the context in which they are being used. The idea is that artifacts determine this context and adapt their functionality to what might be helpful in the respective context. According to Gupta et al. (2001), tremendous progress in context-awareness is required in order to achieve invisibility in pervasive computing. The idea of a context-aware mobile phone nicely illustrates the potential benet of context-aware artifacts. It is easy to imagine a context-aware mobile using context aspects to determine the level of intrusiveness that would be appropriate when trying to notify the user of incoming calls (e.g., Lueg 2001). Notications could range from ringing (quite intrusive) to buzzing or vibrating (less intrusive). The mobile even might suppress notications of less important calls (not intrusive at all). One could even imagine that the mobile answers

226 Christopher Lueg

certain calls while presenting others to the user. Context aspects that might be sensed by a context-aware mobile might include the users identity, the users location, and the users current schedule which might be available electronically from his or her personal digital assistant (PDA). Other examples for context-aware artifacts are cooperative buildings, intelligent rooms, personal assistants, etc. Building context-aware artifacts requires operationalizing notions of context. People usually have some kind of intuitive understanding of what context might be and upon request they are able to list a virtually indenite number of aspects in their environment that they would consider relevant to a given situation. This means that to a man with the intent to build a contextaware artifact, almost every aspect of the surrounding world might appear to be context, the matter to be operationalized in the artifact. This is actually a variation of the aphorism To a man with a nail everything looks like a hammer which itself is a variation of the well known aphorism To a man with a hammer everything looks like a nail (Gorayska and Marsh, 1999). The rst expression beautifully captures the problem that is underlying context-aware artifacts: the generation of context and the problem of determining observerindependent descriptions that could be operationalized in computational artifacts. Even describing context in computational terms seems to be rather dicult. Drawing from a range of related disciplines I will illustrate that the way humans use context is quite dierent from the way computational artifacts might use context. A less obvious issue underlying the development of context-aware artifacts is that the generation of context involves a notion of responsibility for the course of action. I will argue that the very idea of contextaware artifacts is closely related to much older ideas about intelligent machines pursued (with limited success) in the realm of classical Articial Intelligence.

Context-aware artifacts and denitions of context


A sound understanding of how context-aware artifacts are typically implemented helps us to understand the notions of context that are operationalized in these artifacts and allows to illustrate what can reasonably be expected in terms of human-like context-awareness. In Lueg (2002c) I have argued that contextaware artifacts can be seen as a subclass of socially adept technologies (Marsh, 1995). It is therefore important to note that not all socially adept technologies need to be context-aware. There may be situations in which designers of socially-adept technologies can exploit the fact that humans are good at creating

Looking under the rug 227

and adapting to situations (consider, for example, how people react when confronted with robots like Kismet or Cog). The 2001 HCI special issue on context-aware artifacts is a rich and highly relevant resource. In the anchor article of the special issue, Dey et al. (2001) start with a denition given in Websters Dictionary: the whole situation, background or environment relevant to some happening or personality and argue that this denition is too general to be useful in context-aware computing. After considering a number of denitions they nally come up with a denition of context that is based on information that characterizes a situation, and that is relevant to the interaction between a user and his or her application:
Any information that can be used to characterize the situation of an entity, where an entity is a person, place, or object that is considered relevant to the interaction between a user and an application, including the user and the application themselves. Context is typically the location, identity and state of people, groups and computational and physical objects. (Dey et al., 2001, p)

Similar to many other denitions of context in the technically oriented literature, the denition suggests that context is understood as a kind of model or representation of a particular type of situation. The term situation seems to comprise everything, whereas context only consists of specic aspects that are distilled from a particular situation. Examples for such aspects listed by Dey et al. (2001) include location, identity and state of people, groups and computational and physical objects. Hull et al. (1997) mention identity, locations, companions, vital signs, air quality, and network availability as examples of context aspects. The underlying assumption seems to be that such aspects can be used to identify a users current situation, which means it is assumed that the context aspects characterize that situation. Elsewhere (e.g., Lueg 2002a) I have discussed that there may be signicant dierences between what designers of context-aware artifacts dene as context in verbal descriptions and what is actually operationalized in context-aware artifacts. These dierences directly impact the capabilities of context-aware artifacts, as capabilities of artifacts depend on the context models that are actually implemented.

Artifacts, context and situations


In what follows I try to explain from a number of dierent perspectives, such as epistemology, sociology and phenomenology, why it is so dicult to implement

228 Christopher Lueg

context-awareness in artifacts. From a logic-oriented perspective, the problem of dening context in computational terms is related to the frame problem (e.g., Pylyshyn, 1987) in classical, representation-based Articial Intelligence (AI). Roughly, the frame problem is about what aspects of the world would have to be included in a suciently detailed world model, and how such a world model could be kept up-to-date when changes occur. The frame problem has been under investigation for more than two decades and it seems to be reasonable to state that the frame problem is intractable in realistic settings (e.g., Dreyfus, 2001). The frame problem is often considered a more technical problem as it is about keeping models of the world up-to-date. However, the frame problem can also be interpreted from an epistemology-oriented point of view, in the sense that a world model denes what is known about the world. Then the frame problem is also an epistemological problem as richness of the model determines what can be inferred based on the model: aspects of the world not included in the model and not derivable from the model do not exist in the world of the model. Another major problem is dening context in a precise and, in particular, an observer-independent way. One of the main reasons is that situations are not given but negotiated among the persons involved in the situation. Agre (2001) discusses how people use the various features of their physical environment as resources for the social construction of a place, i.e., it is through their ongoing, concerted eort that the place opposed to space comes into being. An artifact will be incapable of registering the most basic aspects of this socially constructed environment. Context-aware artifacts may fail annoyingly as soon as a systems (wrong) choices become signicant. Dourish (2001) discusses from a phenomenology-oriented point of view how meaning arises in the course of action: the meaning of a technology is not inherent in the technology but arises from how that technology is used. Designers may inuence how artifacts are being used but they have no absolute control. Humans are in principle conformists, exible tool-makers and users (Gorayska and Mey, 2002). In more practical terms, this means that people may use an artifact in a way that is dierent from what has been envisioned by the artifacts designer. This is important as artifacts embed certain assumptions and this holds for context-aware artifacts as well. Furthermore, the use of artifacts is socially negotiated. Cars or powerful computers on ones desktop can be used as examples for illustrating how the use of artifacts may be (re-)negotiated. Both artifacts can be used as eective tools for transporting things and processing

Looking under the rug 229

data, respectively, but both may also function as status symbols (Wenger, 1998). Context-aware artifacts, however, would not be involved in such negotiations, which means that they are hardly able to recognize the outcome. Accordingly, artifacts would not be aware of what they represent and what other artifacts represent. A practical example is a socially adept agent (Marsh, 1995) that may not be aware of the above mentioned social status of a car. In a discussion with humans, the agent might treat a specically equipped sports car as if it were a regular car. The sports car actually is a car but treating it as such may be embarrassing in certain situations. This discussion points to the current understanding that, contrary to artifacts, humans are situated in their physical and social environment. The term situated has its origins in the sociology literature in the context of the relation of knowledge, identity, and society (Clancey, 1997a). In respect to the core aspects of situatedness, researchers from elds as dierent as ethnomethodology, cognitive science, and anthropology are arguing in a similar direction although individual positions may still vary signicantly. Suchman (1987, 1993), for example, investigated situational aspects of human behavior and has shown that the meaning of situations (and thus the signicance of actions) is generated rather than given. The coherence of situated action is tied in essential ways to local interactions contingent on the actors particular circumstances. Clancey (1997a) argued in a similar direction by emphasizing the relation of perception, action, and knowledge. He claims that every human thought and action is situated, because what people perceive, how they conceive of their activity, and what they physically do develop together. Lave (1991) emphasized that perception, action, and even knowledge, have to be considered in relation to identity, and culture. Lave actually proposed substituting the term situated activity for socially situated practice or, where appropriate, situated learning in order to stress that perception and activity are tightly bound to culture and identity. In these days, using the term situated is a bit complicated as the term is used in a variety of dierent meanings in the literature. Clancey (1997a, p. 23) explained that in particular, the overwhelming use of the term in Articial Intelligence research since the 1980s has reduced its meaning from something conceptual in form and social in content to merely interactive or located in some time and place. Even in non-traditional AI research the term situated is used in varying ways. In a discussion centered around embodiment, Dautenhahn et al. (2002) concluded that the concept of situatedness can easily be applied to the social domain, by extending the physical environment to the social environment. A

230 Christopher Lueg

socially situated agent acquires information about the social, as well as the physical domain through its surrounding environment, and its interactions with the environment may include the physical as well as the social world. Considering the origins of the term situated the notion socially situated is a pleonasm, indicating that still much work is needed to combine the dierent research directions. The dierence between physical and social aspects matters in the context of context-aware artifacts, in particular, as sensing physical aspects of the environment is typically much easier than sensing the social construction of the world. Robertsons (2000) study of the social construction of a business situation can be used to illustrate the dierence. Conducting a workplace observation in a software company, Robertson attended weekly meetings over a period of seven months, making separate video and audio recordings of relevant meeting activities. One of the questions to be answered was what designers actually do during these meetings:
Amongst the talk, laughter and other activities, there was clearly a pattern to each meeting. Individuals reported what they had done while apart. Others would ask questions and each persons work would be discussed by the group. Then another person would report on her work. This process continued until everyone, who had worked on the project through the week, had told the others what she had done. Reporting was always followed by a period of shared designing, where the group worked together on some aspect of the design. Then, towards the end of the meeting, the work for the next week would be negotiated and allocated. (Robertson 2000, p. 126)

Robertson notes that from an observers perspective it was easy to divide the groups meeting into dierent stages, such as reporting, discussion, shared design, negotiations of future work, and nally allocation of work. One of the central ndings of the workplace observation, however, was that the participants in the process did not describe their work with such labels: [] they did not bother with names for specic stages in their work, as they lived it, at all. (Robertson 2000, p. 126) Robertson concludes:
[] naming the stages in the design work in this way excludes entirely the work of coordination and negotiation that made the process they represent possible in the rst place. Moreover, this communicative work had been identied by the designers themselves as the work they most wanted supported. (Robertson 2000, p. 126)

Looking under the rug

231

The most important point for this chapter about context-aware artifacts is how the process was going on:
[] people did all these kinds of cooperative design work while sitting round a table talking together. At times they moved around the room, entered or left the room and moved various objects around; but there were no formal changes of position, no discernible interactional diculties and certainly no upheaval when they changed from one kind of work to another. [] Whatever they did was always accomplished by dierent combinations of their purposeful, embodied actions. (Robertson 2000, p. 126)

The important point here is that the business situation changed although most context indicators that could be sensed by technical artifacts did not appear to undergo any recognizable changes. Robertsons study demonstrates that situations are negotiated among those participating in the situation. This means that even if a particular situation meets the description of a business meeting context at some stage, the situation may change into an informal get together and vice versa. The idea of a context-aware meeting room (or some other kind of intelligent room) can be used to illustrate why the re-negotiation of situations is important in the context of this paper. Using currently available technology, such as room could possibly sense many aspects, such as electronic schedule, the number of persons in the room, and the prevailing clothing. Based on these information, the room could compute that the current context is a business meeting context (and not morning tea or an unplanned, informal gettogether) and could instruct attendees mobile phones not to disturb the meeting; business-related information like the latest share prices could be projected onto the rooms multi-purpose walls, and so on. The problem is that changes to the situation as subtle as those observed by Robertson are hardly recognizable for currently available technology: the (dened) business meeting context would not change while the situation as experienced by those involved would. This means that the once formal business meeting may have changed into an informal get together and vice versa, unrecognized by the intelligent room. The dierence seems to be minor but once the meetings nature has changed, for example, it may no longer be appropriate to project business-related information on walls (as it would be embarrassing to demonstrate that the hosting companys fancy technology did not recognize this simple change in the meeting situation). Elsewhere (Lueg, 2001 & 2002a) I have argued that it is, in particular, the social connotation of the term situated that allows us to highlight the dierences

232 Christopher Lueg

between context as implemented in context-aware artifacts and the situation that is modeled. I understand a situation as a potentially unlimited resource that is continuously interpreted and re-interpreted in the course of action. I understand Clanceys (1997b) statement that situations are conceptual constructs, not places or problem descriptions as support for the cognitive aspect of my denition. The observations made during Robertsons (2000) study stress the importance of the negotiation aspect. Situations are also observerrelative, by which I mean that there is no single observer who denes what constitutes a situation. Those who are involved in a situation create and maintain their own interpretations of the situation by using the situation as resource. Again, Robertsons (2000) observations of the stages she as observer could identify can be used to illustrate the importance of being-involved in a situation. By contrast, the notion of context as operationalized in context-aware artifacts is an expression of a certain interpretation of a situation, a model of a situation. Such a model is observer-dependent and not part of the unfolding situation. Thus the model is no longer open to re-interpretation: the meaning of aspects included in a model is more or less determined. This lack of openness to re-interpretation matters as (individual) participants may decide to assign signicance to aspects of the environment not considered signicant by the models designers. As mentioned before, context is about what people consider relevant in a situation. Winograd (2001) summarizes the context problem as such that features of the world become context through their use: something is not context because of its inherent properties but because of the way it is used in (human) interpretation. Dreyfus (2001) has argued that it should be no surprise that no one has been able to program a computer to respond to what is relevant, as human beings respond only to changes that are relevant given their bodies and their interests. To sum up, I do not see much that suggests that artifacts will soon become context-aware in the sense that they will be able to recognize situations in a non-trivial way. As Erickson (2002) put it: context-awareness exhibited by people appears to be quite dierent from what can be implemented in computational systems. This does not question, however, the value of research on context-aware artifacts. Goodwin and Duranti (1992, p. 2) have maintained that it does not seem possible at the present time to give a single, precise, technical denition of context, and eventually we might have to accept that such a denition may not be possible. They have also noted, however, that providing a formal, or simply explicit, denition of a concept such as context can lead to

Looking under the rug 233

important analytic insights because such a denition can expose inconsistencies and insights that were not visible before. Considering these problems, it is little surprising that the articial intelligence problem is still largely unsolved. As Michael Dertouzos, Director of the Laboratory for Computer Science at MIT, pointed out in July 2000: The AI problem, as its called of making machines close enough to how human beings behave intelligently-has not been solved. Moreover, theres nothing on the horizon that says, I see some light. Words like articial intelligence, intelligent agents, servants all these hyped words we hear in the press are statements of the mess and the problem were in. (quoted in Dreyfus, 2001, p. 8). Dertouzos statement stresses the importance of carefully looking at what can reasonably be expected from todays and tomorrows technology.

Looking under the rug


It is interesting to note that a number of issues discussed in the previous section seem to emerge again and again. Questions concerning the modeling of context and the explaining of inferences (see below) have already been discussed during the Seventies and Eighties in the context of articial intelligence and expert systems. A more recent emergence of these questions could be observed during the hype of intelligent software agents and personal assistants in the midNineties. Interestingly, none of these technologies have delivered what their proponents envisioned. Although it was expected that expert systems change the way businesses operate by altering the way people think about solving problems (Harmon and King, 1985, p), it is fair to say that the vaunted potential of expert systems has never been realized. (Davenport and Prusak, 1998, p). A few so-called expert systems are around in these days but these systems typically operate in rather constrained settings; the idea of expert systems as universally applicable problem solvers and replacements for human expertise has largely been abandoned. Interestingly, the idea of expert-computers is still popular: the news magazine Newsweek reports in its September 2002 issue that according to a poll conducted by George Washington University, experts expect that in 2008 expert system software competes with lawyers, doctors and other professionals (Foroohar, 2002, p. 67). A recent review of promises made during the early software agent hype was disillusioning as well: [] not much discernible progress has been made post 1994 [the year in which the popular ACM special issue on software agents was published],

234 Christopher Lueg

perhaps because researchers have failed to address the practical issues surrounding the development, deployment and utilization of industrial-strength production systems that use the technology. We note that once greater eort is placed on building useful systems, not just prototype exemplars, the real problems inherent in information discovery, communication, ontology, collaboration and reasoning, will begin to be addressed. (Nwana and Ndumu, 1999). In what follows, I discuss a few issues concerning what is being implemented in context-aware artifacts and other technologies incorporating models of human behavior. For example, Bellotti and Edwards (2001) outline that in many situations where context-aware systems have been proposed or prototyped, human initiative is frequently required to determine what to do next. They conclude that intelligibility and accountability are two key features which must be supported by context-aware systems so that users may make informed decisions based on context. Intelligibility means that context-aware systems should be able to present to their users what they know, how they know it, and what they are doing about it. Accountability means that systems must enforce user accountability when they seek to mediate user actions that impact others. As outlined in Lueg (2001), I believe that providing for intelligibility and accountability will help gain a better understanding of responsibilities involved in the design of context-aware artifacts. It is questionable, however, whether intelligibility and accountability help overcome the inherent limitations of context-aware artifacts. Similar demands have been discussed extensively in the articial intelligence eld in the context of expert systems and robotics. The problem is that explaining inferences works best if concerned with rather simple settings, and is increasingly dicult the more complex the setting is. Projecting future implications of proposed actions is even harder. Now the crux is that Bellotti and Edwards (2001) demands for intelligibility and accountability are only necessary in settings that are already so complex that context-aware artifacts are no longer able to or not allowed to make decisions on their own, i.e., without human supervision. Accordingly, demands for intelligibility and accountability are likely to be intractable when applied to complex real world settings. Scenarios from the realm of robotics can be used to illustrate practical impacts of this intelligibility issue. Robots like the ones discussed below can be seen as socially adept technologies. As mentioned before, socially adept technologies are closely related to the idea of context-aware artifacts and arguably the robots discussed below would need to be capable of context-awareness. The large mobile robot described by Brooks (2002) is expected to be capable of

Looking under the rug 235

negotiating who goes rst in a tight corridor. It is expected that the robot will understand the same natural head, eye, and hand gestures people usually understand. In a situation in which the robot fails to understand certain gestures, accounting for intelligibility would mean that the robot starts dumping lists of sensor readings and inferences (or more aggregated information like I sensed that you moved your head such and such. According to rule 123 this means xy) because the robots designer hopes it helps the confronted user understand why the robot failed to understand his or her gestures. Another scenario from the realm of robotics is the robotic cab driver featured in the science ction movie Total Recall. Upon arrival on the planet Mars, the hero (played by the actor Arnold Schwarzenegger) is chased by some evil guys. With a bit of luck, he makes it into a fully automated cab. The robotic driver recognizes a new guest having entered his cab and starts querying for a destination. Being on the run, the hero cries something like just go. The robot, however, has not been programmed to understand such utterances and accompanying panic gestures. Rather than just going ahead, the robot starts nagging for a destination. Completely unaware of the dramatic nature of the situation, the robotic driver wastes valuable time. The hero nally resolves the situation by kicking the robotic driver out of the cab. In both situations, it is questionable whether intelligibility would help resolve the problematic situation. Furthermore, the robotic cab driver scenario is also a nice example of the diculties designers face when preparing artifacts for real world situations as it is a situation apparently not considered by the robots designer. The problem is not that such an escape situation is not exactly a common situation but that context-aware artifacts are based on predened context models. This means that the designers of such robots would have to foresee all possible situations from escape situations to other emergency situations. Considering these issues my conclusion was, and still is, that designers of context-aware artifacts should take care that users are able to overrule a context-aware artifact in such a way that the artifacts behavior does no longer interfere with the situation negotiated among those participating in a situation. Mobile phones can be switched o but more artifacts as complex as mobile robots may be more dicult to overrule. The challenge, still, is making artifacts both easier to comprehend and easier to use.

236 Christopher Lueg

Conclusions
In this chapter I have looked at the limitations of context-aware artifacts and I have discussed some of the implications of these limitations. Exposing limitations does not mean that work on context-aware artifacts may not be valuable; building context-aware artifacts complements more theoretical research into context and may help contribute to gaining a better understanding of the complexity of human behavior and human social life. In this sense, this work is as valuable as work on robotics, which is also increasing our understanding of the amazing complexity of human beings. From a more practical point of view, considering the limitations of the state-of-the-art in context-awareness suggests that we need to be very careful when designing such technologies as they are more likely to fail than to succeed when trying to recognize situations. Socially responsible design (Lueg, 2001) means it should always be possible to overrule decisions made by contextaware artifacts. Many of such problems could be circumvented, however, if humans were kept in the loop (Erickson 2002; also Lueg, 2002c, d). With regard to the general idea of context-aware artifacts, it is still unclear if such artifacts would actually be able to deliver the benet expected. In most cases, people are well aware of their situation and have quite some expertise in using artifacts in appropriate ways (e.g., there is no real need to have contextaware mobile phones as most people turn o their mobiles anyway during a theater audience because they know that mobiles ringing during theater audiences are annoying). People are also good at recognizing changes to situations, as they are participating in the negotiations that lead to these changes. From a human-computer interaction (HCI) point of view, the question therefore is what the benet is of making artifacts context-aware over making artifacts easier to use? (Lueg, 2001). In a way, the context-aware artifacts hype (and parts of the closely related ubiquitous computing idea) can be seen as the latest (but almost certainly not the last) wave of (classical) articial intelligence. Broadly speaking, AI can be seen as the approach to using technology to model and replicate human intelligent behavior in such ways that machines work as if they were human. Research in this area tends to focus on pre-planned behavior and xed meanings, at the expense of the situatedness of human action and the re-negotiation of situations. AI had to learn the hard way that human behavior does not only involve thinking but also acting and being in the world (Clark, 1997), and has moved from modeling cognitive processes in isolation to modeling of

Looking under the rug 237

behaviors in situ. This is where AI meets context-aware artifacts, ubiquitous computing and the idea of technology calming our lives by removing the annoyances (Weiser, 1991). Many researchers in context-aware artifacts and ubiquitous computing do not consider their work as AI research but a closer look reveals that these research directions address quite a few issues that traditionally were investigated in AI. As a consequence, todays researchers may run into problems, such as the frame problem or the problem of reliably predicting human behavior, that have been haunting AI researchers for decades (see Lueg, 2002c, d for a more detailed discussion).

Note
* This chapter is based on (Lueg, 2002b) work presented at the workshop The Philosophy and Design of Socially Adept Technologies at the ACM SIGCHI Conference on Human Factors in Computing Systems (Minneapolis, MN, USA, April 2002). Discussions at the workshop helped shaping a number of arguments. The author is grateful to Barbara Gorayska and Toni Robertson for lots of stimulating discussions and to the anonymous reviewers for insightful comments on the draft version of this chapter. The rug metaphor is pirated from Tom Ericksons CACM article.

References
Agre, P. E. (2001). Changing places: contexts of awareness in computing. Human-Computer Interaction 16(24), 177192. Bellotti, V. & K. Edwards (2001). Intelligibility and accountability: human considerations in context aware systems. Human-Computer Interaction 16(24), 193212. Brooks, R. (2002). Humanoid robots. Communications of the ACM 45(3), 3338. Clancey, W. (1997a). Situated cognition. Cambridge: Cambridge University Press. Clancey, W. (1997b). The conceptual nature of knowledge, situations, and activity. In P. Feltovich, R. Homan, & K. Ford (Eds.), Expertise in context, pp. 247291. The AAAI Press. Clark, A. (1997). Being there. Cambridge, Mass: MIT Press. Davenport, T. H. & L. Prusak (1998). Working knowledge. Boston, Mass: Harvard Business School Press. Dautenhahn, K., B. Ogden & T. Quick (2002). From embodied to socially embedded agents implications for interaction-aware robots. Cognitive Systems Research. Special Issue on Situated and Embodied Cognition. Amsterdam: Elsevier. In print.

238 Christopher Lueg

Dey, A. K., D. Salber & G. D. Abowd (2001). A conceptual framework and a toolkit for supporting the rapid prototyping of context-aware applications. Human-Computer Interaction 16(24), 97166. Dourish, P. (2001). Seeking a foundation for context-aware computing. Human-Computer Interaction 16(24), 229241. Dreyfus, H. (2001). On the Internet. London. New York: Routledge. Erickson, T. (2002). Some problems with the notion of context-aware computing. Communications of the ACM 45(2), 102104. Foroohar, R. (2002). Life in the grid. Newsweek pp. 6067. Goodwin, C. & A. Duranti (1992). Rethinking context: An introduction. In A. Duranti & C. Goodwin (Eds.), Rethinking context: Language as an interactive phenomenon. Cambridge [England]; Melbourne: Cambridge University Press. Gorayska, B. & J. Marsh (1999). Investigations in cognitive technology. In B. Gorayska, J. Marsh & J. Mey (Eds.), Humane interfaces: questions of methods and practice in cognitive technology, pp. 1743. Amsterdam; Oxford: Elsevier/North Holland. Gorayska, B. & J. L. Mey (2002). Pragmatics of technology. International Journal of Cognition and Technology 1(1), 121. Gupta, S., W. C. Lee, A. Purakayastha & P. Srimani (2001). An overview of pervasive computing. IEEE Personal Communications 89. Harmon, P. & D. King (1985). Expert systems: Articial intelligence in business. New York: J. Wiley. Hull, R., P. Neaves & J. Bedford-Roberts (1997). Towards situated computing. In Proceedings of the First International Symposium on Wearable Computers (ISWC 97), pp.146153. IEEE. Lave, J. (1991). Situated learning in communities of practice. In L. B. Resnick, J. M. Levine & S. D. Teasley (Eds.), Perspectives on Socially Shared Cognition, pp. 6382. American Psychological Association, Washington, DC, USA. Third Printing April 1996. Lueg, C. (2001). On context-aware artifacts and socially responsible design. In W. Smith, R. Thomas & M. Apperley (Eds.), Proceedings of the Annual Conference of the Computer Human Interaction Special Interest Group of the Ergonomics Society of Australia (OZCHI 2001), pp. 8489. ISBN 0729805042. Lueg, C. (2002a). Operationalizing context in context-aware artifacts: benets and pitfalls. Informing Science 5(2), 4347. ISSN 15214672. Lueg, C. (2002b). Looking Under the rug: on context-context aware artifacts and socially adept technologies. Proceedings of the Workshop The Philosophy and Design of Socially Adept Technologies at the ACM SIGCHI Conference on Human Factors in Computing Systems (CHI 2002). National Research Council Canada NRC 44918. Lueg, C. (2002c). On the gap between vision and feasibility. Proceedings of the International Conference on Pervasive Computing (PERVASIVE 2002). Lecture Notes in Computer Science (LNCS) 1414, pp. 4557. Berlin; Heidelberg: Springer. Lueg, C. (2002d). Representations in pervasive computing. Paper presented at the Inaugural Asia Pacic Forum on Pervasive Computing, 31 October 1 November 2002, Adelaide, Australia. Paper available at http://www.sta-it.uts.edu.au/~lueg/abstracts/inauguralforum02.html. Marsh, S. (1995). Exploring the socially adept agent. Proceedings of the First International Workshop on Decentralized Intelligent Multi-Agent Systems (DIMAS 95), pp. 301308.

Looking under the rug 239

Nwana, H. & D. Ndumu (1999). A perspective on software agents research. Knowledge Engineering Review. Pylyshyn, Z. (Ed.) (1987). The robots dilemma: the frame problem in articial intelligence. Norwood, N.J: Ablex Publishing Corporation. Robertson, T. (2000). Building bridges: negotiating the gap between work practice and technology design. Human-Computer Studies 53, 121146. Suchman, L. (1987). Plans and situated actions the problem of human-machine communication. New York: Cambridge University Press. Suchman, L. (1993). Response to Vera and Simons situated action: a symbolic interpretation. Cognitive Science 17, 7175. Wenger, E. (1998). Communities of practice: Learning, meaning, and identity. Cambridge: Cambridge University Press. Quotes from rst Paperback Edition 1999. Weiser, M. (1991). The Computer for the 21st Century. Scientic American, 265(3), 6675. Reprinted in IEEE Pervasive Computing 1(1), 1925, 2002. Winograd, T. (2001). Architectures for context. Human-Computer Interaction 16(24), 401419.

Body Moves and tacit knowing*


Satinder P. Gill
Centre for the Study of Language and Information (CSLI), Stanford University, USA

Introduction
Body Moves are rhythmic coordinations in communication of at least two people. In performing them, we indicate the state of our connection and understanding, most importantly the degree of our contact and commitment within a communication situation (Gill, Kawamori, Katagiri, Shimojima, 2000). Tacit knowing is the unspoken dimension of human knowledge, formed in practice or experience and our personal self, with others. It is essential for skilled performance. Body Moves enable the formation of tacit knowing, and its performance, in communication. In this chapter we develop this relation between body and cognition through three examples of collaborative design activities, taken from an ethnographic study of landscape architects (UK) (Gill, 1997) and a project on conceptual design activity in the interactive workspaces lab at Stanford University (Gill, 2002; Gill and Borchers, 2003). The analyses of these design activities involves the study of the interaction between people and the objects they manipulate, a core concern of Cognitive Technology (Gorayska and Mey, 1995) that addresses how external and social environments shape cognition. Salient concepts are practice, experience, tacit knowing, representation, parallel-coordinated action, and co-ordinated autonomy.

Practice and experience


In the winter of 1996/1997, Gordon, an apprentice landscape architect with company BETA, sent a set of completed coloured maps that he had made at the companys Welsh oce, to John, a senior architect based at its headquarters located in North England. The company was going to make a bid for project work to reshape a major road in North Wales where the frequency of trac

242 Satinder P. Gill

accidents was high, and these coloured maps were part of the depiction of the changes to the road design and eects upon the landscape. For example, colours depicted old woodland and new woodland. To Gordons surprise, John judged the colours that he had used to be wrong and that the maps needed to be correctly recoloured. Company BETA had barely two weeks left to submit their bid and re-colouring all these maps was no small task. John brought in other experienced landscape architects at his branch to help, and asked Gordon to travel up from Wales and re-colour the maps with them. It was felt that Gordon lacked experience and the only way he was going to get it was by experiencing the doing of colouring in a shared practice. The problem of seeing the colours was partly due to the companys economic condition. BETA was downsizing, as a result of which Gordon was the sole landscape architect left at the Welsh branch. Architects, however, do not interpret the material in isolation when they rst handle it. In talking aloud and moving pens over paper, they engage the other person(s) in their conceiving. This, it is suggested enables one person to adapt upon another persons view, producing the conditions for a coherent development of the design (Gill, 1997), and a process for seeing-as (interpretation) until they come to see (unmediated understanding) (Tilghman, 1988). This is likewise with colouring activity: as the apprentice colours with the team and more experienced architects, he/she learns how they select, for example, a specic shade of blue to set against a particular shade of green (seeing-as), to create a pleasing eect that looks professional (Gill, op.cit). Because of the distance between the two branches and because of their commitments, John had been unable to visit Gordon and work with him. Instead, he had sent him a set of previously coloured maps (examples of experience), colour coded keys, and a set of instructions. These are descriptive and propositional forms of expression, all located in the experience of the architects at the North England Branch. For Gordon, they are outside his experience, and he brings his own to bear in interpreting these fragmented representations of practice. In his study of how a team of geophysicists judge when material bres in a reaction vat are jet black, Goodwin (1997) shows how simply saying jet black is not sucient for helping an apprentice measure and make this judgement competently. Rather, the blackness of black is learnt through physically working with the bre, and in talking about the experience, transforming private sensations and hypotheses into public events that can be evaluated and conrmed by a more competent practitioner. Geochemists use their bodies as

Body Moves and tacit knowing 243

media that experience the material being worked with through a variety of modalities. In the case of the apprentice, Gina, in Goodwins study, her interlocutors ability to recognize and evaluate the sensation she is talking about requires co-participation in the same activity. The example of Gordons failure to correctly interpret the forms of expression sent to him, is an example of how breakdown can take place when coparticipation is missing from the interpretation process, and how essential it is for repair within a distributed apprenticeship setting. Knowledge becomes clearly more than a matter of applying learnt rules, but of learning rule-following (Johannessen, 1988) within the practices that constitute it. The need for him to colour with the other architects in order to be able to correctly interpret any such future fragments that might be sent to him, shows that experiencing in co-presence has powerful tacit information. Gordons acquired knowledge will be evident in his skillful performance of these forms of expression. The equivalence in meaning of forms of expression and representations of practice denotes a range of a range of human action, artifacts, objects, and tools. Human action includes cues, which may be verbal, bodily, of interaction with a physical material world (tools, e.g., pens, light tables, etc.), and construction of the physical boundary objects (e.g., colour, maps, sketches, masterplan sketches, masterplans, plans, functional descriptive sketches, photographs, written documents, etc.). The dilemma of the distributed setting is that even in the future, any interpreting or understanding that Gordon, as an apprentice, does of similar or dierent fragments of knowledge, will still take place in isolation, and the feedback from his local colleagues will be based on their seeing-as (interpretation based on their experience) and not seeing (as they lack sucient skill in this domain to understand without interpreting.) In the rest of this chapter, we develop an analysis of how experiencing the performances of representations of practice and moving with these representations in a joint design activity consists in specic types of behavioural alignments between actors in an environment, that we call Body Moves. This analysis builds on previous work (Gill, 2002; Gill and Borchers, 2003a) by developing the relationship between Body Moves and tacit knowing. This will help to better understand, conceptually and practically, how Body Moves facilitate knowledge transformation and knowledge acquisition. Further, some ndings are presented from the study at Stanford that show how the use of artifacts, that do not permit designers to act at the surface (e.g., drawing) at the same time, i.e., in parallel, inhibit collaborative activity. Body Moves that have

244 Satinder P. Gill

a parallel movement structure are termed Parallel Coordinated Moves (PCM). They embody autonomy, hence we analyse how coordinated autonomy is part of tacit knowing and is managed in collaboration. In this research, knowledge is considered as a process that is dynamically represented in actors behaviours with each other and with tools, technologies and other artifacts within an environment. These behaviours involve the senses of touch, sound, smell, and vision. The motivation behind this perspective of knowledge is to understand how we form and transform it in communication. This is located within a framework that sees cognition as a dynamic system that co-evolves and emerges through the interaction of mind, body, and environment. The co-evolution includes Body Moves (Kinesics and Kinaesthetics) that give the cognitive dynamic system meaning.

Body Moves
A focus on the body, specically body moves, originated as an attempt at expanding into the area which has been called pragmatic acts, and thereby widening the narrow conception of strict natural language pragmatics (Gill, Kawamori, Katagiri, Shimojima, 2000). Non-verbal communication has already gained strong ground in its signicance for understanding the human interface, the point at which interaction occurs (Gill et al., 2000), and thereby the design of interactive systems. Body Moves provide us with a further insight into the nature and operation of co-presence (Good, 1996), which is an essential component of human understanding. Co-presence denotes simply, how we are present to each other, be this in the same physical space or in diering physical spaces (e.g., computer mediated spaces, or mobile technology mediated spaces). Being present may be described as a precondition for communication, and the nature of this precondition has a bearing upon how we coordinate with each other. Body Moves are coordinated rhythms of body, speech, and silence, performed by participants orienting within a shared activity. These rhythms create what we term contact, i.e., a space of engagement between persons, and take sequential and parallel forms. These rhythms are described as being behavioural alignments and they occur at the level of meta-communication (Allwood et al., 1991; Scheen, 1975; Bateson, 1955; Shimojima et al., 1997). Body Moves are a special case of information ow in dialogue and are considered as a form of interactional synchrony (Birdwhistle, 1979; Kendon, 1970), and as metapragmatic (Mey, 2001). Drawing upon the idea of the composite signal (Clark, 1996;

Body Moves and tacit knowing 245

Engle, 1998), these Body Moves have been conceived as Composite Dialogue Acts, formed of various combinations of gesture, speech, and silence (Gill et al., 2000). Our work on Body Moves indicates the construction/establishment of mutual ground within a space of action. By Body Move we do not refer to the physical movement, rather, we target the act that the movement performs.

Metacommunication
In the early work on Body Moves (Gill et al., 1999, 2000), their metacommunicative quality was located within a framework drawn from conversation theory, where information is conveyed upon the triggering of cueing facts that convey a variety of information about the conversation situation (Shimojima et al., 1997), rather than its content. Such cueing facts are llers and responsives. These function as discourse markers, the particular nature of which can be identied by prosody and phoricity (Kawamori et al., 1998). Such interjections in speech determine discourse structures and the nature of the co-ordination taking place (Schirin, 1987; Kawamori et al., 1998). Body Moves were seen to have the quality of a cueing fact. However, this posed a challenge for handling rhythmic coordinations that did not t the sequential structure of a cueing system. In Gill (2001), a non-sequential rhythmic coordination, called the Parallel Coordinated Move (PCM), was further analysed by drawing upon joint activity theory (Clark, 2000) and synchronous communication studies of body and speech coordination (Kendon, 1970). Recently, in Gill (2003c), a visit back to Scheen (1975) and Batesons (1955) formative work has lead to a further understanding of how the various forms of Body Moves are meta-communicative. To explain Batesons (1955) formulation of metacommunicative behaviour, Scheen considers the relation between kinesics and language: the former (kinesics) can qualify or give instructions about the latter (language) in a relation that Bateson called metacommunicative, whereby the movement of the body helps in clarifying meaning by supplementing features of the structure of language (op. cit., p. 11). Body Moves show just this, and contribute further to the idea that the structure of language lies in its performance. The theory of Body Moves is intrinsically about meta-communication.

Body Moves and the tacit dimension


In performing Body Moves we engage with the representations of the tacit dimension of anothers actions and move with them, for example, in a design

246 Satinder P. Gill

activity, or to form a shared identity. Action is the performance, whilst its tacit dimension, is its basis that is sensed, grasped, responded to. The representation of the tacit dimension of action is the structure of the form of its expression. In engaging with the representation of the tacit dimension of anothers actions, we are resonating with the communicative structures being performed. In order to be able to do so, we both draw upon experience and are experiencing in the same moment. It may help here to turn to Polanyis discussion of the body and tacit knowing (Polanyi, 1966), particularly in learning tasks that involve the body, such as playing chess, dance, etc. He describes a skilled human performance as a comprehensive entity, and when two people share the knowledge of the same comprehensive entity, two kinds of indwelling meet. The performer co-ordinates his moves by dwelling in them as parts of his body, while the watcher tries to correlate these moves by seeking to dwell in them from outside. He dwells in these moves by interiorising them. By such exploratory indwelling the pupil gets the feel of a masters skill and may learn to rival him (Polanyi, op.cit., p. 30 ). Gordon and Gina are undertaking such exploratory indwelling in discerning colours through movement and their sense, in order to share the knowledge (experiential) of the same comprehensive entity (e.g., colour black, or aesthetic judgment). For Polanyi, the body is the ultimate instrument of all external knowledge, and wherever some process in our body gives rise to consciousness in us, our tacit knowing of the process will make sense of it in terms of an experience to which we are attending. In performing Body Moves, the ability to grasp and sense someones motions, and respond to them appropriately (skillfully) is based on experience (tacit knowing of the process) and experiencing (experience to which we are attending). It is spontaneous action. Further, tacit knowing has two terms, proximal, that includes the particulars, and distal, their comprehensive meaning. A simple example is a wood full of trees: a wood is the distal term, and the trees, the proximal term consisting of particulars (trees). When we look at a wood, we are aware of its trees but do not look at each specic tree in order to understand that this is a wood in front of us. How do we achieve this understanding of the entity, the wood. Polanyi develops a theory about the processes of such understanding or comprehensive meaning, called tacit knowing. We draw upon this to consider how we resonate with structures in communication. Polanyi has a specic meaning in using the word comprehension. A comprehensive entity has a number of levels that form a hierarchy. These levels are structures of sets of particulars that constitute the entity. Each level (set of particulars) has its own principles that operate under the control of the next level up. For example, in performing a speech, the

Body Moves and tacit knowing 247

voice you produce is shaped into words by a vocabulary. However, the operation of higher levels cannot be accounted for by the laws governing its particulars that form the lower levels. For example, you cannot derive a vocabulary from phonetics. The two terms of tacit knowing, proximal and distal, are described as being two levels of reality, and between them, there is a logical relation which corresponds to the fact that the two levels are the two acts of tacit knowing which jointly comprehends them. There are two important facts here. Firstly, that in resonating with the particulars of each level in the communication structure, we do so by being aware of the particulars of the entity such as a gesture, for attending to it. This is distinct from saying we resonate with the particulars, the elements of the gesture, by attending to them. If we did so, this function of the particulars, i.e., enabling us to attend to the entity, is canceled and we would lose sight of the gesture itself because all we would see is fragmented elements. A popular example of this in discussions about skill, is of playing the piano. If in the middle of performing a piece of music we suddenly began to focus on the movements of each of our ngers we would have diculty in being able to play. Tacit knowing is about achieving the performance of playing the piano such that the nger movements and the piano keys are invisible to us, as an extension of our selves.1 In a similar sense, our ability to grasp communication cues and the content that they frame is invisible to us until we feel uncomfortable, in the communication situation, at which point we become aware of its particulars. Understanding how we have this ability, through tacit knowing, would provide a further insight into the nature of experiencing and the connecting of self with other self(ves). Body Moves are composites of gesture, speech, and silence of the participants together, not of the individuals (as in the idea of a composite signal (Clark, 1996; Engle, 1998)). This is an important distinction. Our skill in communication, as an individual, is impingent on our skill in performing with another self and needs to be understood as such. In other words, the understanding of the representations of the tacit dimension of anothers action, is expressed in the skilled performance with the other, be this to agree, disagree, negotiate, acknowledge, or simply, to act at the same moment with the other (simultaneously). The last kind of performance, to act at the same moment, is of a dierent nature to the others. It is a parallel and coordinated action, whereas the former are sequential actions that take the form of action and response moves. This distinction becomes important in the discussion and analysis that follows. Body Moves necessarily involve at least two people sharing the knowledge of the

248 Satinder P. Gill

same comprehensive entity, namely, of their joint skilled human performance. These comprehensive entities include, apart from our own performance, both the performance of other persons and these persons themselves (Polanyi, op.cit., p. 49) In this paper, we distinguish between two variations of tacit knowing that arise from diering conditions of time and space relations between persons moving together. The Body Moves that have been identied so far, have two information and knowledge functions. Sequential Body Movements (SBM) carry and maintain the ow of information in interaction (Gill at al., 2000) through action-reaction responses. Parallel Coordinated Moves (PCM), however, facilitate the tacit transformation of this ow, termed knowledge transformation (Gill, 2001). In the Tacit Dimension, Polanyi described a relation between emergence and comprehension, as existing when an action creates new comprehensive entities. Parallel Coordinated Moves are multiactivity gestural coordinations, where dierent but related projects are being expressed in the body actions of the participants at the same time. This fusion provides the conditions for tacit transformation in a new plane of understanding from the prior sequential interactions, and as a result they create new comprehensive entities, expressed in rhythm, body and speech. The collaborative features of these moves enable the participants to negotiate and engage in the formation of a common ground (Gill, 2002).

Tacit and explicit knowing


We know that if the representations of the tacit lie outside ones experience, then they become what some have termed, propositional knowledge (cf. Josefson, 1987; Gill, SP, 1988, 1995). This is either meaningless for the participant or cannot be interpreted or used by him/her in accordance to the background of understanding and practices against which it has been expressed (as in the case of Gordon). Cooley (1996), Rosenbrock (1996), Gill, K. S. (1996) and Gill, S. P.s (1995, 1996) work on tacit and explicit knowledge shows that their relationship to each other is that of continuous emergence, where one aspect builds on and is shaped by the other (see Figure 1, reproduced from Gill SP, 1995, p. 15). In saying explicit, they mean a range of things that share the common feature of abstraction, emphasizing data and information, e.g., the formalisation of ideas, organizational rules, text-book type information, and so-on. The idea of explicit

Body Moves and tacit knowing 249

Tacit Knowledge Tacit Knowledge

Explicit Knowledge Explicit Knowledge

Figure 1. The expansion of knowledge leads to a reciprocal expansion of tacit knowledge required for using the new explicit knowlegde.
[Reproduced from Gill, S. P. Dialogue and Tacit Knowledge for Knowledge Transfer, PhD Dissertation (1995), p. 15.]

can also be applied to any form of expression, such as a word, black, that has a specic meaning located in an experience. As we become skilled in practices around, say, the word black, we will be able to both comprehend this and other forms of expression, gaining a tacit knowing of them, and create new forms of expression based on our experiences. These are new entities that are comprehensive for those who create them, and they need to be comprehended by others.

Co-presence
I asked John whether, if it were possible for his team and Gordon to colour maps together in a distributed setting with the help of some hypothetical computer mediated technology, he would be interested in exploring this possibility. John declared that this was not a matter for technology, but quite simply that Gordon lacks experience and that the only way he will acquire it is by colouring with them in the same space. His conviction, made me reect on what it means to share a space and be present, as a precondition to acquiring experience; experience that would have helped Gordon to interpret the examples of previously coloured maps for similar bids, colour keys, and instructions, that had all been sent to aid him in understanding how to colour the maps. Being present is a bodily experience, and involves all the human senses. In various cultures we draw upon various levels of our senses. For instance, the

250 Satinder P. Gill

Maori rub noses in greeting each other, Russians kiss on the mouth, and in some Arab cultures, they bring their faces close enough to smell the breath of the other. All these acts are part of gauging one persons sense of another, essential to building trust that is required for committed engagement. Placing a glass plane between two people in any of these situations would block their tacit ability to interpret their relation to each other, and thereby comprehend each others meaning, through the impacts between their bodies, and would require them to focus on the visual and speech channels that have limited bandwidth for tacit knowing. John was certain that once Gordon had this experience of colouring with him in the same physical space, he would have no trouble in the future in aligning his aesthetic seeing-as (Tilghman, 1988) with theirs when given such materials or representations (exemplars) to interpret and see, wherever he might be. Seeing-as requires interpretation, and Tilghman terms this, mediated understanding. Once you have the skill to see, you can understand without interpretation, and just perform. The tacit knowing that Gordon had acquired would be retrieved and made active by sensing (Reiner, 2003) in his act of seeing. The role of mind and imagination is important for such retrieval in sensing that brings together past (memory), present, and future. In such sensing, our minds draw upon our bodies: wherever some process in our body gives rise to consciousness in us, our tacit knowing of the process will make sense of it in terms of an experience to which we are attending (Polanyi, op.cit. p. 15). Each situation we are in is also unique, and in engaging with each other we recognize ourselves as persons in the performance. Tilghman (1988) gives a wonderful example of this in Charles le Bruns series of faces. Le Brun painted these in the seventeenth century, to illustrate the various emotions that painters could be asked to represent. What is striking is that any number of them could be substituted for another without loss. What is missing is any setting or context to make the emotion determinate (op. Cit. p. 312). Persons and contexts together constitute the formation of memories through movements in co-presence that Gordon would draw upon to colour any future maps.

Body Moves and grounding


How we come to ground our communication with each other is core to understanding what collaboration is about (Clark, 1996). Much work on grounding has given us a deep insight into how the sequences in our speech and

Body Moves and tacit knowing

251

body communication are nely tuned in time, and how incrementally they serve to build common ground in the specic communication situation. Body Moves expands upon the sequence structure to the study of parallel structures, and involves the relationship between these two. Parallel structures create, and operate, within a dierent time-space connectivity than sequential structures. Sequences emphasise time whilst in parallels time and space are both emphasised. In Body Moves, the grounding process is considered within the communicative frame of the engagement space. This is a communication space where communicative orientations (e.g., metacommunication) lie in a matrix relation to spatial orientations. Within the engagement space, dynamics of interaction is considered in terms of both action and meaning. The process of moving from the state of information ow (Sequential Body Moves) to knowledge transformation (possible Parallel Coordinated Moves) is seen as a process of grounding. This grounding occurs along the integration of the two axes of the tacit and explicit dimensions of human knowing (Jerry Gill, 1995), the awareness and the activity axes, that in their integration form a third axis, of cognitivity [see Figure 2]. These axes allow us to move from tacit to explicit knowing whilst retaining the subsidiary awareness and bodily understanding of the tacit dimension of that explicit knowing. Here we need to briey backtrack. In the discussion on the tacit dimension, we spoke of the relationship between the tacit and explicit dimensions of knowing. Continuing with Polanyis conception, tacit knowing has two terms, proximal and distal, that correspond to two levels of reality; the lower one enabling us to attend to higher one, from particulars to the entity. The explicit dimension of knowing is the ability to refer to the entity. It lies in language whether this be an exemplar of the tacit, such as a coloured map, or a coding of the tacit, such as a colour key, or a description of the tacit in rules, such as a set of instructions. The relation between the tacit and the explicit dimensions of knowing is the multi-axis interface that makes communication possible, and that enables us to say black and comprehend black at the same moment with someone else. Within knowledge transformation, aspects of both the explicit and the tacit dimensions that are particular to the situation of that moment, are located in the past, present and future simultaneously. To reect further on this interface, we explore the idea of skill (tacit knowing) in interaction further. It involves understanding how and when to move between the individual (self) and the group (self with other self(s)) such that one is successful in this performance.

252 Satinder P. Gill

Conceptual Explicit Knowing Cognitivity Activity

Focal

Awareness

Bodily

Tacit Knowing

Figure 2. Cognitivity: the axes of mediating and grounding.


[Reproduced from Gill, J. H. ,The Tacit Mode, (1995), p. 39, Figure 2.1.]

Take Reiners example of two basketball players beautifully timing and coordinating their joint act of tossing and catching a basketball (Reiner, 2003). They perform as if their bodies know how to precisely time actions, assess future movements and impart correct velocities without formalism. In using this example, Reiner is questioning the idea that cognition consists in symbolic processing. A robotic player would have to assess the velocity and distance of the ball, asses its own velocity, calculate the time until the ball and the layer are in the same point in space, calculate the velocity needed for his hand to catch the ball, change the position of his hand accordingly. Simultaneously watch other near-by players, predict their intentions and capabilities, and then plan the appropriate velocities, momentum and forces, all with the tough, precise, timing constraints. (p. 5). This example captures how the integration of the awareness and activity axes gives rise to the cognitivity to move with accuracy and judge with accuracy, whilst relying on the subsidiary and bodily dimensions of tacit knowing to bound this precision. Skilled cooperative action, be this in basketball, colouring maps, or discerning a particular blackness of black, involves the participants in the communicative situation being able to understand it and to know how and when to respond appropriately for the purpose(s) at hand. This has been described as the performance of knowledge in co-action (Gill and Borchers, 2003) and is a

Body Moves and tacit knowing 253

form of intelligence for sustainable interaction. We are not conscious of it and it is invisible to us, as an extension of our self, until there is a problem that causes us to become aware of it. Skilled communication, or being a skilled performer of knowledge, in a team, involves this ability to move eectively between ones self with another self , from seeing-as to seeing. In Body Moves, this skill is performed in grasping the cues or representations of the tacit dimension of each others action, as bodies move from sequential interaction to parallel coordinated actions and back again. Coordinated structures of these behaviours can be extended and transformed through technology to form more dimensional interactional spaces.

Co-ordinated autonomy
The movement from sequence to parallel action involves the management of coordinated autonomy, i.e., the management of self and self-with-other that is essential to sustain the communication and engagement within collaborative activities. Coordinated autonomy is one dimension of being co-present, and it is culturally determined.2 In parallel coordinated moves, coordinated autonomy has a special quality of awareness. Parallel action itself is not a sucient condition, as one could be autonomous without awareness of the other. As we will show later in this paper, autonomy without attending to the other can be disruptive to collaborative joint activities. Action that is parallel and coordinated, however, involves each person being aware of the other simultaneously. Such quality of awareness is important for tacit knowing.

The Parallel Coordinated Move


The Parallel Coordinated Move (PCM) was rst identied whilst analyzing a ve-minute video excerpt of landscape architects working on a conceptual design plan. It occurred only once and lasted 1.5 seconds. It was the rst time in that session that the disagreement between two architects was able to nd a resolution, and it involved both their bodies acting on the surface at the same time, even whilst presenting alternative design plans. One of them was silent and the other speaking. It enabled the grounding in the communication to come into being (Gill, 2002; Gill, 2003b)3 by enabling an open space for the

254 Satinder P. Gill

negotiation of dierences. The opening and closure of the PCM is by actionresponse Body Moves. For example, the focus move involves a movement of the body towards the area the speaker or actor is attending to, i.e., space of bodily attention, and in response causes the listener or other party to move his or her body towards the same focus. In order to understand the PCM further, and to gather more examples for analysis, a number of congurations for a similar task were set up to collect further video data. Part of this study is reported here.4 One task set, is for dyads of students to design shared dorm living spaces. We also collected data and made a preliminary analysis of group activity where students are using multiple large-scale surfaces, i.e., SMARTBoards.5 These are electronic whiteboards. The PCM is explored as a category of Body Move that has its own set of variable congurations, where the basic common dening feature is that participants act at the same time. The examples drawn upon are of actions taking place upon various surfaces, and the contexts within which these occur. Such actions, for example, can be to indicate ideas or proposals with a pen or nger or hand. There is also a consideration of those cases where only one participant has physical contact with the surface in order to glean some understanding of what the function of touching is, and reect on that back to the case of parallel action. We have sought to capture the management of both the body and speech spaces within a task where you need to produce something together and agree upon it.

The study
The experiment6 involved two drawing surfaces, used by dierent sets of subjects, a whiteboard and a SMARTBoard. This is a large-scale computerbased graphical user interface, and is touch-sensitive (an electronic whiteboard). Smart technology does not permit two people to touch the screen at the same time, i.e., it does not allow for parallel action at the surface of the task being undertaken, and therefore makes for a useful case bed of data to analyse how such action aords collaborative activity. The contrast between drawing at the smart and the whiteboard was expected to reveal whether or not there are particular dierences in body moves and gesture and speech coordination at these interfaces.

Body Moves and tacit knowing 255

The experiment
Two cameras were positioned to capture side views of subjects and one camera to capture a view from behind. Microphones were attached to the subjects. 11 subjects (3 female, 8 male students) were recruited. Three (seniors) had a general familiarity with the iRoom, while the other 8 (juniors) had a rudimentary concept. All subjects were briefed on the rooms functionality, including an explanation of how to use the SMARTBoards and the related tools. Specically, the subjects were shown how to draw on the SMARTBoard using the coloured pens, how to erase using the eraser, and how to use the computer-based components including a wireless keyboard and mouse. Its important to note that the subjects were not directly informed that the SMARTBoard can only receive one source of input at a time. There were seven sessions; four at the SMARTBoard, three at the whiteboard. This activity takes place in the iRoom, which is the laboratory of the Stanford Interactive Workspaces project7 (Guimbretiere, Stone, and Winograd, 2001). Our observations8 reveal that the participants commitment, politeness, and attention to each other becomes reduced at a single SMARTBoard, showing behaviours that are in marked contrast to those of users at a whiteboard. Furthermore, the quality of the resulting design is lower when using the SMARTBoard.9 Acting in parallel, e.g., drawing on the surface at the same, involves a degree of autonomy. We have observed patterns of movement from sequential (e.g. turn taking) to parallel actions, as part of this design activity, and suggest that coordinated autonomous action is part of sustainable collaborative activity. In a related study, when a group of four users has three SMARTBoards available to them, there appears to be a transposition of the patterns of autonomy and cooperation that one nds between a dyad working on a whiteboard (Borchers, Gill, and To, 2002; Gill and Borchers, 2003), although the body moves and parallel actions that constitute them take a very dierent form. At the SMARTBoard, when one has to wait ones turn to act at the surface, it may (a) take longer to build up the experiential knowledge of that surface than if one could move onto it when one needs to, and (b) there is a time lag for the other person working with you to experience with you, in a manner of speaking, your experience of the situation, i.e., there is an awareness lag. As we know, awareness is important for tacit knowing. The former and the latter diculties are, we suggest, linked because of this experiential dimension of tacit or implicit knowledge. With multiple boards in parallel use, awareness of the experience, i.e., of one person of other persons, seems more uid than that of a

256 Satinder P. Gill

dyad at the one board, evident in the movements around the boards to gather and disperse where rhythms in behavioural alignments were halted. The study suggests that mediating interfaces that could support collaborative human activities to involve sustainable and committed engagement of self and interpersonal self (self with other self(ves) (Gill and Borchers, 2003a) need to be able support parallel coordinated activity. One aspect of this engagement is contact. Contact is an important dimension of Body Moves. It indicates the nature and degree of commitment of persons to each other. The conguration of the bodys physical space inuences the strategies to guage and manage contact. In the following examples we consider a) parallel coordinated moves, and b) how coordinated autonomy is managed as a communicative strategy.

Example of a Parallel Coordinated Move


When a designer is making contact with the surface to act upon it, whilst the other person is doing so too, there is an attempt to engage with the body eld of the other person, as in the case of the landscape architects. It also happens in the example below (Figure 5). The designer on the right side, closest to us (E), enters the body eld of the other one (F) who is currently drawing (action), and uses his index nger to trace out a shape to indicate a bed. He is proposing this idea to (F) who is drawing, to get his opinion (negotiation). Both action and negotiation are operating at the surface. The body eld of the person drawing (F) is not disturbed, and as we know from the discussion of the engagement space, this indicates a high degree of contact and is identiable as a Parallel Coordinated Move (PCM).

Transcription coding scheme In the example presented below we will use the following conventions to encode the Body Moves (BM) and Communicative Acts (CA).
{} [] | (1,2,3,4,5)
1. 2. 3. 4.

body movements with speech body movements as turns (i.e., no speech) indicates the point at which body actions start tag reference to specic moment of body move in the pictures (1), (2), (3), (4), (5)

E: Do you want us to do like a bed E: CA:Suggest (1) {E moves in to the whiteboard, index nger point touches the (1) surface; F at the surface about to draw}. E: and then that then

Body Moves and tacit knowing 257

(1)

(2)

(3)

(4)

(5)

Figure 3. A Parallel Coordinate Move.


5. 6. 7. 8. 10. 11. 12. 13. 14. 15. 16. 17. 20. 21. 22. 23. 24. {Es nger traces the outline of a bed E: PCM (12) {F is drawing in a line towards the left, his head nods} F: PCM (12) E: one here E: CA:Suggest (3) {E moves his hand down and taps the surface, of the space E: BM:Dem-Ref (3) that he has just traced, with the back of his hand}; {F traces the outline of the bed in the air, moving his hand F: BM:B-Check (34) straight to the left and back and then down}. E: and then | one here E: CA:Suggest (45) {E lifts his hand up and taps the board again with the back E: BM:Dem-Ref (45) of his hand; at | F moves his hand back to original position Silence [E lifts hand o and away from the surface, as F is about to touch it with his pen] F: ye F: CA:Ack {F puts pen back on paper}; (5) [Es body begins moving back] E: and maybe do like a dresser between them {Es body moves back to rest-reection position; F is drawing the beds}

(F) acknowledges (E) s proposal, in tracing the proposed idea above the surface of the board (Figure 3, pictures 3 and 4) with his pen, whilst (E) taps a position of one bed with the back of his hand on the surface to locate it. Through gesture (b-check), F checks the proposal that (E) is making through gesture (Dem-Ref) and speech (Suggest). After tracing (E) continues to draw, and his pen touches the surface (pic5) at the same time as (E) begins to lift his hand away. There is no break in the uidity of the rhythm of the coordination between them (of body and speech).

Parallel Coordinated Actions and co-ordinated autonomy


When body elds overlap, you have simultaneous coordinated autonomy within parallel coordinated moves. In the study, there are many instances of parallel actions taking place at the surfaces of the table and whiteboard, and attempts to do so at the SMARTBoard when only one such board is available.

258 Satinder P. Gill

Figure 4. Moving to act autonomously in parallel.

In this example [Figure 4 above], (E) is standing back, watching and talking, and (F) is drawing on the whiteboard. (F) has his body positioned to accommodate (E) by slightly opening it, slanted to the right, to share the engagement space with (E). At some point, (E) looks to the left of (F) to an area on the whiteboard and moves towards it [pic 2]. He picks up another felt pen and begins to draw as well [pic 3]. As (E) touches the surface, (F) shifts his body and alters his posture so that it is now open slanted to the left, and increases contact with (E). Both are now acting in parallel. This shift occurs in silence. At the SMARTBoard [Figure 5 below], (C) is standing back whilst (D) is drawing. He looks and moves to a position to the right of (D), on the SMARTBoard. He leans in to the surface but cannot draw because he has to rst wait for (D) to end his turn. (D), without looking up, speaks, and his utterance causes (C) to turn his body back to look at him. As he cannot yet act, (C) moves back from the surface and waits, and as he is doing so, he breathes in deeply in frustration. (C) notices him, pauses his drawing, turns to look at him and moves back from the zone of action,10 allowing (D) to move into it [Figure 6, pic 3]. Once (C) is acting, i.e., drawing, (D) continues with his drawing on the SMARTBoard [Figure 7]. The result is a disturbance on the board, and a jagged line cuts across from (Ds) touch point to (Cs), causing them both surprise and laughter [Figure 7, picture 2]. (D) momentarily forgot that you cannot touch the surface at the same time. The need to act whilst another is acting is not a

Figure 5. Waiting to act.

Body Moves and tacit knowing 259

Figure 6. Giving the turn.

conscious one. This autonomy in co-action seems to be part of the coordinated collaborative process but at a metacommunicative level.

Figure 7. Problem in drawing on the SMARTBoard together.

In this example, [Figures 57] we see that (Cs) attempt to act is frustrated until his need to act is noticed, at which point the turn to act is oered to (C) by (D). It is signicant that they recognise each others need to act, and signal this need (moving body away, distancing) and respond to it (speech and body), and further, that they forget the limitations of the surface to aord them this need. In contrast, the whiteboard permitted [Figure 4] a more uid movement around the surface, as there was no enforced pause by the surface, and no turntaking required on one designers part to permit the other person to act. These examples are of parallel coordinated actions that involve autonomy, where autonomy occurring in simultaneity involves awareness of and attendance to the state of engagement in the space between participants and the surface(s). When a designer at the SMARTBoard does not easily give the turn to the other one, we observe various strategies to force it. These include, moving close to the board and inside the visual locus of the drawing space in a quick motion, or moving back and forth, or reaching for a pen, or looking at the pen, or simply reaching out and asking for the pen the other person is currently using, or just moving right in front of the body of the person currently drawing, thereby forcing them back, and taking a pen from the pen holder. As either person can act at the whiteboard, there is no need for such strategies.

260 Satinder P. Gill

In contrast to the SMARTBoard, at the whiteboard autonomous performance by one person that is not occurring in co-action can bring a reaction to regain coaction. In an example below, [Figure 8, pictures (16)] (E) looks up and stands to draw something higher up on the board, just after (F) has knelt down to draw beside him. (E) altered his position such that the contact within the engagement space became too low for (F) to be aligned with him in order to act.

Figure 8. Attempt, disturbance, and regaining of parallel and coordinated action.

(F) attempts to regain contact so that he can work with (E), rst by speech [in picture 3] and when that fails, by using Body Moves to attempt contact [picture 4] and focus [picture 5].11

Discussion
The SMARTBoard makes those actions that are invisible, or are extensions of ourselves, when acting at a whiteboard, or a drawing table, visible. The structure in communication becomes visible only when there is some kind of breakdown (Winograd and Flores, 1984, p.68). Visibility is problematic for tacit knowing as it inhibits action and awareness of the focal. We see this structure in communication in the acts of leaning ones hands in the drawing space, acts of rubbing something out whilst another is drawing, checking something by pointing on it, or touching the surface with a nger or hand to think about an idea, etc. These are part of our capacity to connect our selves with the external world and form meaning. When these actions are inhibited or have to be negotiated, the uidity of sharing an engagement space in an interactive drawing task becomes altered by the kinds of communication strategies available to persons to collaborate. We have analysed three basic elements of collaboration and cooperation in joint activities: the skill to grasp and respond to the representations of the tacit dimension of our actions (e.g., in Body Moves, gestures, sounds); the ability to coordinate this grasping and responding in a rhythmic synchrony of sequential and parallel coordinated actions; and coordinated autonomy that occurs within

Body Moves and tacit knowing 261

parallel coordinated movements and involves awareness and attendance to the state of engagement in the space between us and interfaces. We have found that simultaneous synchrony in co-action such as drawing, or being able to touch the surface together, provides for a certain kind of awareness of states of contact within an engagement space. We have established that this kind of awareness of the other is essential for tacit knowing and the emergence of tacit knowing, hence parallel coordinated actions are conditions for it. Furthemore, the multi-dimensional expression of ideas in combinations of activity using a pen, hand or nger, to sketch ideas allows them to be located in ones self and made clear for the other person, whereby contact with each others ideas can be made with the body through motion and physical contact at the surface, as well as through speech. The analysis of parallel coordinated movements shows the importance of coordinated autonomous behaviour for sustainable collaborative activity, as it facilitates negotiation and cooperative behaviour. Without it, the designers use disturbance strategies to achieve autonomous action. Autonomy is a function of being able to act together at the same time. Coordinated simultaneous autonomy is a function of being able to act and attend to each other at the same time. In the merging of the dierences, each self maintains his/her identity, yet is able to be aware and be with the other without disturbing each others action. The moment of a parallel-coordinated move arrives after a set of sequential Body Moves that build up the knowledge base for grounding. In this mergence of action, tacit knowing can be achieved, and can give rise to new comprehensive entities or new ideas that embody the tacit knowing of both selves. It is in this mergence that knowledge transformation can take place. A challenge for designing mediating interfaces is that they aord us our human skills of engaging with each other and forming tacit knowing. One result of the study of the SMARTBoards is the design of more contact aordances, e.g., software to permit the simultaneous operation of multiple functions at the surface. From the study of the landscape architects we conclude that the solution to gaining experience lies in the integration of the axes of tacit and explicit knowing, and this makes for our ability to experience a representation of practice and understand it for the purposes at hand. From the analysis presented here, it is clear that an interface, for collaborative activity, that can manage this integration needs to handle both sequential and parallel coordinated Body Moves and the conveyance of the persons. The messy details of any situated activity (Goodwin, 1997) make up our capacity for perceptual discernment, and this involves the various modalities

262 Satinder P. Gill

of our human senses. Body Moves, as multi-modal communicative acts, lie within the rhythmic pulse of interactive space and time that carries these messy details of a setting or context or the spirit of a person, and in so doing, they mediate them in their performance, enabling tacit knowing. The processes of integration for tacit and explicit knowing gives us an insight into the nature of cognitivity, and a theoretical frame from which to explore the relation between embodied mind, technology and environment, rooted in pragmatics of knowing. This bears directly upon the pragmatics of Cognitive Technology that seeks to understand the eects of technology on users and their environment, and the conditions under which users can cognize their technology in order to realize the eects of their technological eorts (Mey, 1995). The extension of self (the invisible interface) operates through awareness and activity in order to achieve that state of cognitivity. It is a frame of cognition within which to reect on CTs question about eects of the transparent tool (that you use without noticing that you are using it) (Mey 1988; Norman 1999) without the mind-body split. In summary, the discussion develops a conceptual frame for cognitivity and collaborative action that can inform the analysis of human-technology symbiosis and the design of mediating interfaces. The idea of the mediating interface supports the argument that the locus of control with respect to the language of expression at the interface, ought to be placed in the users mind and not in the machine (Gorayska and Cox, 1992), where mind is seen in terms of cognitivity. The discussion has built on the concepts of invisibility, parallel-coordinated moves, tacit and explicit knowing, and coordinated autonomy.

Notes
* Thanks and acknowledgements to Jan Borchers for collaborating with me on the HCI study of large surfaces, and to Terry Winograd for his support of this work in the iSpaces Project at Stanford. Thanks to Timo Saari and Seija Kulkki for their support of this work, undertaken whilst the author was with CKIR (Centre for Knowledge and Innovation Research). Thanks also to Masahito Kawamori for his work with me on the fundamental coding of Body Moves whilst at NTT. Lastly, thanks to Jacob Mey for his support of Body Moves as part of Pragmatic Theory. 1. The idea of invisibility and extension of self is drawn from Polanyis work on Personal Knowledge (1964). 2. The relationship between coordinated autonomy and culture is being developed by the author in a forthcoming paper.

Body Moves and tacit knowing 263

3. See Gill (2002) for a deeper analysis of the PCM. 4. For a more indepth and wide covering analysis, see Borchers, Gill, and To (2002), and Gill and Borchers (2003a). 5. Borchers, J. O., Gill, S. P., To, T. (2002). 6. Gill, S. P., Sethi, R., Martin, S. (2001). 7. http://graphics.stanford.edu/projects/iwork/ 8. For a preliminary discussion of this research, see: Borchers, J. O., Gill, S. P., and To, T. (2002). Multiple Large-Scale Displays for Collocated Team Work: Study and Recommendations, Technical Report, Stanford University. For a more fully developed analysis, see Gill, S. P., and Borchers, J. O. (2003a). 9. This could in part be due to the awkwardness of the interface for producing smooth drawings. 10. In Gill and Borchers (2003a) we have developed the idea of zones of interaction, namely reection, action, and negotiation, and we describe how these are managed and carried by Body Moves within the Engagement Space. In other words, these are not xed spatial locations. This is being developed further in another paper by Gill. 11. Gill et al., 2000.

References
Allwood, J., J. Nivre & E. Ahlsen (1991). On the Semantics and Pragmatics of Linguistic Feedback. Gothenburg Papers. Theoretical Linguistics 64, 139. Bateson, G. (1955). The Message. This is the Play. In B. Schaner (Ed.), Group Processes. Vol.II. New York: Macy. Birdwhistle, R. L. (1970). Kinesics and Context. University of Pennsylvania. Borchers, J., S. Gill & T. To (2002). Multiple Large-Scale Displays for Collocated Team Work: Study and Recommendations. Technical Report. Stanford University. Clark, H. H. & E. F. Schaefer (1989). Contributing to discourse. Cognitive Science 13, 259294. Clark, H. H. (1996). Using Language. Cambridge: Cambridge University Press. Cooley, M. (1996). On Human-Machine Symbiosis. In K. S. Gill (Ed.), Human Machine Symbiosis: The Foundations of Human-centred Systems Design, pp. 69100. London: Springer. Engel, R. (1998). Not Channels but composite signals: speech, gesture, diagrams and object demonstrations are integrated in multi-modal explanations. In M. A. Gernsbacher & S. J. Derry (Eds.), Proceedings of the Twentieth Annual Conference of the Cognitive Science Society, pp. 321327. Mahwah, N. J.: Erlbaum. Gill, J. H. (2000). The Tacit Mode. Michael Polanyis Postmodern Philosophy. New York: SUNY Press.

264 Satinder P. Gill

Gill, K. S. (1996). The Foundations of Human-Centred Systems. In K. S. Gill (Ed.), Human Machine Symbiosis: The Foundations of Human-centred Systems Design pp. 168. London: Springer. Gill, S. P. (1995). Dialogue and Tacit Knowledge for Knowledge Transfer. PhD Dissertation, University of Cambridge. Gill, S. P. (1996). Designing for Knowledge Transfer. In K. S. Gill (Ed.), Human Machine Symbiosis: The Foundations of Human-centred Systems Design, pp. 313360. London: Springer. Gill, S. P. (1997). Aesthetic Design: Dialogue and Learning. A Case Study of Landscape Architecture. AI & Society 9, 273285. Gill, S. P. (2002). The Parallel Coordinated Move: Case of a Conceptual Drawing Task. Published Working Paper: CKIR, Helsinki. ISBN 9517916604. Gill, S. P. & J. Borchers (2003). Knowledge in Co-Action: Social Intelligence in Collaborative Design Activity. AI & Society, 17(3), 322339. An adaptation of a conference paper presented at Social Intelligence Design 2003, Royal Holloway, London. Gill, S. P., R. Sethi & S. Martin (2001). The Engagement Space and Gestural Coordination. In C. Cave, I. Guaitella & S. Santi (Eds.), Oralite et Gestualite: interactions et comportements multimodaux dans la communication. (Proceedings of ORAGE 2001, International Conference on Speech and Gesture), pp. 228231. Aix-en-Provence, France. Gill, S. P., M. Kawamori, Y. Katagiri & A. Shimojima (2000). The Role of Body Moves in Dialogue. RASK 12, 89114. Good. D. A. (1996). Pragmatics and Presence. AI & Society 10 (3&4), 30914. Goodwin, C. (1997). The Blackness of Black: Colour Categories as Situated Practice. In B. Lauren Resnick, R. Saljo, C. Pontecorvo & B. Burge (Eds.), Discourse, Tools and Reasoning: Essays on Situated Cognition, pp. 111140. Berlin, Heidelberg, New York: Springer. Goodwin, C. (in press) Pointing as Situated Practice. To appear in S. Kita (Ed.) Pointing: Where Language, Culture and Cognition Meet. Hillsdale: Erlbaum. Gorayska, B. & K. Cox (1992). Expert systems as extensions of the human mind. AI & Society 6, 245262. Guimbretiere, F., M. Stone & T. Winograd (2001). Stick it on the Wall: A Metaphor for Interaction with Large Displays. Submitted to Computer Graphics (SIGGRAPH 2001 Proceedings). Josefson, I. (1987). The nurse as an engineer. AI & Society 1, 115126. Kawamori, M., T. Kawabata & A. Shimazu (1998). Discourse Markers in Spontaneous Dialogue: A corpus based study of Japanese and English. Proceedings of 17th International Conference on Computational Linguistics (COLING-ACL98). Kendon, A. (1970). Movement Coordination in Social Interaction: Some examples described. Acta Psychologia 32, 100125. Mey, J. (1995). Cognitive Technology Technological Cognition. Proceedings of the First International Cognitive Technology Conference, August 1995, Hong Kong. Reprinted in AI & Society (1996) 10, 226232. Mey, J. (1998). Adaptability. In: Concise Encyclopedia of Pragmatics, pp. 57. Oxford: Elsevier Science. Mey, J. (2001). Pragmatics. An Introduction. Oxford: Blackwell.

Body Moves and tacit knowing 265

Norman, D. (1999). The invisible computer. Cambridge, Mass.: MIT Press. Polanyi, M. (1964). Personal Knowledge: Towards a post critical philosophy. New York: Harper and Row. Polanyi, M. (1966). The Tacit Dimension. Doubleday. Reprinted version, 1983, Gloucester, Mass.: Peter Smith. Riener, M. & J. Gilbert (in press). The Symbiotic Roles of Empirical Experimentation and Thought Experimentation in the Learning of Physics. International Journal of Science Education. Rosenbrock, H. H. (1988). Engineering as an Art. AI & Society 2, 315320. London: Springer. Scheen, A. E. (1974). How Behaviour Means. Exploring the contexts of speech and meaning: Kinesics, posture, interaction, setting, and culture. New York: Anchor Press/Doubleday. Schirin, D. (1987). Discourse Markers. (Studies in Interactional Sociolinguitsics, 5). Cambridge: Cambridge University Press. Tilghman, B. R. (1988). Seeing and Seeing-As. AI & Society 2(4), 303319. Winograd, T. & C. F. Flores (1986). Understanding Computers and Cognition: A new foundation for design. Norwood, N. J.: Ablex Press.

Gaze aversion and the primacy of emotional dysfunction in autism


Sarah Bowman, Lisa Hinkley, Jim Barnes and Roger Lindsay
Department of Psychology, Oxford Brookes University

Introduction Autism and Cognitive Technology


Autism is relevant to cognitive technology in two important ways. Firstly, there is now available a wide range of computer-based assistive technology (Dautenhahn, Werry, Salter and te Boekhorst, 2003; Moore and Calvert, 2000; Alcade, Navarro, Marchena and Ruiz, 1998; Huttinger, 1996; Chen and Bernard-Opitz, 1993). Computer-based technologies appear to be well suited to the cognitive limitations of autistic individuals because social interaction is not required, they are rule-governed, predictable and controllable; they incorporate very clear-cut boundary conditions, they are naturally monotropic (one topic at a time), they are able to match the individuals attention tunnel, are context-free, errormaking is safe, and there are options for non-verbal or verbal expression. Secondly, there is some ground for hope that understanding the cognitive decits associated with autism will enhance our ability to emulate the natural software (or mindware: Clark, 2000) underlying human social cognition. This would allow computers to be programmed to behave more like human agents hence facilitating human-computer interaction, and would also feed back into assistive technology, enabling autism suerers to be better assisted and allowing assistive devices to be designed in a more principled manner. Though the second aspect of the autism-cognitive technology is least developed and perhaps, most exciting, there appears to be a serious obstacle to progress in this area. This derives from the possibility that the decits associated with autism are primarily cognitive. As there is increasing evidence that autism has a substantial genetic component, the implication of cognitive primacy is that the cognitive processes that are dysfunctional in autism have a genetic

268 Sarah Bowman, Lisa Hinkley, Jim Barnes and Roger Lindsay

basis. Contemporary theories of autism (see section below) have almost exclusively taken this view, arguing both that cognitive decits are primary and that these result from neurological deciencies in an innate theory of mind module, an innate Central Executive system, an innate global processing system or an innate mirror neuron system subserving social imitation. If autistic decits are indeed innate and cognitive, they obviously cannot result from defective learning and hence cannot be examples of natural technology (El Ashegh and Lindsay, this volume; Meenan and Lindsay, 2002). Natural technology refers to cognitive software, or mindware, that is developed and transmitted as a cultural artefact and that is capable of (sometimes substantially) enhancing innate human capabilities. Writing and arithmetic are obvious examples. However, despite the recent emphasis on innate cognitive dysfunction as the source of autistic decits, we argue below that current evidence does not compel this conclusion, and indeed the preoccupation of researchers with cognitive decits has caused them to neglect the possibility that cognitive decits in autism result from strategies developed to deal with genetically based emotional dysfunction. The neurological systems underlying emotion are much more likely to be biologically grounded than cognitive processes. An accumulating body of evidence seems to implicate emotional rather than cognitive dysfunction as fundamental in autism, and hence there appears to be an increasingly realistic prospect that secondary cognitive decits can be analysed as cases of natural technology (see, for example, Ramachandran, undated MS) and that new and more eective interventions can be developed on the basis of such analyses. In the present chapter, we will briey present arguments against the most inuential cognitive theories of the autistic decit, and then with similar brevity we will summarise the evidence that points to a locus in emotional dysfunction. We will then report an experimental study that seeks to test cognitive theories of autism against theories assigning primacy to emotional dysfunction.

The autistic syndrome


Autism is a pathology of development, probably having a genetic basis or component (Badner and Gershon, 2002; Bailey, Palferman, Heavey and Le Couteur, 1998; Bailey, Le Couteur, Gottesman, Bolton, Simono, Yuzda and Rutter, 1995). Some researchers have argued that autism is caused by intrauterine or environmental toxins but the evidence for such attributions remains imsy. Since the seminal work by Leo Kanner (1943) on Early Infantile Autism,

Gaze aversion and emotional dysfunction in autism 269

a cluster of related conditions has grown up around the core disorder that he identied.1 The autistic syndrome may be characterised by a triad of cognitive impairments, in social interaction, communication and imagination (Wing and Gould, 1979), or it may be seen as a point upon, or segment of a continuum of autistic spectrum disorders which also includes conditions such as SemanticPragmatic Disorder, Retts Disorder, Childhood Disintegration Disorder, Aspergers Disorder and Pervasive Developmental Disorder (e.g., Bishop, 1989; Rutter and Schopler, 1985; Schopler, 1987; Wing, 1988; Wing and Gould, 1979). There is a wide range of autistic decits. Sensory impairments are frequently present; attentional dysfunction is usual; a third or more of autistic individuals never develop functional speech; when speech is present, echolalia, palilalia and disturbance of intonation and other paralinguistic features are common. There is often intolerance of environmental change, focus of attention on restricted aspects of the stimulus world and stereotyped and repetitive behaviour such as hand apping and rocking. In the social domain, it has been claimed that autistic disorders aect (1) understanding of the mental-physical distinction (the capacity to dierentiate between mental and physical events); (2) understanding that the brain controls both mental and physical functions; (3) distinguishing between appearance and reality; (4) comprehending that other people may have false beliefs; (5) understanding that people hold dierent knowledge about situations; (6) recognising and using mental state words; (7) engaging in imaginative play; (8) understanding emotions and emotional behaviour, (9) following gaze and detecting other peoples intentions, using gurative speech and employing or interpreting deception (Baron-Cohen, 1995; Baron-Cohen, Tager-Flusberg, and Cohen, 2000).

Theories of autism
Explanations of autistic dysfunction are almost as varied as the pattern of symptoms involved. Initially, faulty emotional learning was emphasised, often accompanied by the suggestion that high achieving parents provided poor models for the acquisition of non-verbal communication skills (e.g., Bettelheim, 1967). Since the work of Rimland (1964), however,2 explanations of autism have accepted the view that brain dysfunction lies at the root of autistic disorders, but have made very dierent proposals as to the form this dysfunction takes. Contemporary theoretical discussions about the origin and cause of autism often seem to conate and confuse issues from quite dierent levels of explanation.

270 Sarah Bowman, Lisa Hinkley, Jim Barnes and Roger Lindsay

One key debate, quite clear in early publications, is whether autism is primarily a pattern of decits resulting from emotional dysfunction that distorts subsequent cognitive development, or whether acquired brain damage causes cognitive malfunction that directly produces the manifest symptoms. The issue here is whether cognitive decits are primary or secondary. A related debate focuses on how best to characterise the core cognitive decits associated with autism. Are they attentional, perceptual, communicative or social? A third debate has tended to overshadow all others in the last decade or so, as transgenerational data on autism has begun to suggest a genetic basis for autism. This debate has focused on what innate cognitive modules, or functions, or systems might be disabled by the genetic malfunction responsible for the disorder. One group of theorists argues that a cognitive module responsible for modelling the mental states of others is selectively damaged; a rival body of research attributes the genetic fault to defects in a cognitive control system; while others claim that decits results from inability to switch between local and global processing, or lack of special purpose neurones that enable humans to imitate the behaviour of other people. As we illustrate below, the central assumption of recent research has been that some innate cognitive system or module is defective in autistic individuals, and the function of this system can be identied by establishing the core cognitive decits associated with autism. Though no-one doubts that emotional disorder is a prominent feature of the autistic syndrome, early suggestions that this emotional dysfunction might be fundamental have tended to become lost, as assumptions about the primacy of cognition over aect, and the search for innate cognitive modules have become the dominant ideas shaping the course of research. The most inuential cognitive theories of the autistic decit are Mindblindedness Theory, Executive Dysfunction Theory, Weak Central Coherence Theory and Imitation Decit/Mirror Neuron Theory. The theory of Mindblindedness (also known as lack of a theory of mind) (Premack and Woodru, 1978; Baron-Cohen 2002; 2003) claims that autism results from an absent or dysfunctional system for modelling the mental states of other people.3 Executive Dysfunction Theory (Rumsey and Hamburger, 1988; Hughes, Russell and Robbins, 1994) claims that the autistic decit is associated with the Central Executive (Baddeley, 1986; 1990), a hypothetical cognitive control system widely believed to underlie problem solving, planning processes and the generation of novel responses, and the suppression of irrelevant or intrusive behaviours.4 Weak Central Coherence theory (Frith and Happ 1994; Happ, 1999)5 argues that autistic individuals are locked into a preoccupation with the sensory detail

Gaze aversion and emotional dysfunction in autism 271

of a stimulus and unable to attend to the whole gestalt. The most recent theory of autistic dysfunction, the Imitation Decit/Mirror Neuron Theory (IM/MNT),6 emphasizes neurological structure rather than cognitive mechanisms per se, claiming that specic neurones responsible for the human ability to imitate others (Rogers and Pennington, 1991) are absent or dysfunctional. The four theories dier in the balance they strike between cognitive processes and neurological structures; all are cognitive in the sense that they restrict themselves to cold cognition (Abelson, 1963) the mechanisms invoked are those related to attention, perception, learning, memory, knowledge and belief the vocabulary is that of information processing rather than that of feelings and aect. This is well-illustrated by the main experimental paradigm used to illustrate cognitive decits in autism, a false belief task known as the Sally-Ann task. In the Sally-Ann task, a child participant is told the following story (or one of a large number of equivalent variants): Sally places her marble in a box and leaves the room. While she is gone, Ann moves the marble into a basket. Sally then returns to the room to get her marble. The child is then asked where Sally will look for her marble. Most normally developing 4 year olds will answer correctly (that Sally will look for the marble in the box) because they appreciate that Sally does not know the marble has been moved. Autistic children generally get the question wrong, even when their mental age is well above 4 years old. Findings such as this were originally claimed to imply that autistic children lack a theory of mind (ToM) they nd it dicult or impossible to distinguish what Sally knows from what they themselves know (Baron-Cohen et al., 2000; Frith, 1985). Rival theories have tended to accept the centrality of the Sally-Ann task, but have tried to explain the ndings associated with it in terms of other constructs, such as inability to deal with modelling others mental states because of impoverished Central Executive function, or inability to imitate other minds because of an absence or paucity of mirror neurons. This focus on knowledge and belief predisposes towards neglect of another aspect of disordered behaviour in people with autism disorder of emotion and aect.

Autism and emotional disfunction


In contrast to the four cognitive theories of autism, Stress Overload Theory takes the view that autistic behaviour results from attempts to cope with an excess of fear and anxiety which rise to distressing levels as environmental stimuli increase in novelty or complexity. Because they are intrinsically complex and

272 Sarah Bowman, Lisa Hinkley, Jim Barnes and Roger Lindsay

unpredictable, social stimuli are particularly liable to overload the autistic individuals coping mechanisms. On this view, emotional dysfunction is primary, and cognitive decits are secondary consequences of strategies developed to deal with inappropriate and excessive emotion. Intolerance of change in the environment, repetitive behaviour and avoidance of social inputs are all natural strategies to adopt if an individuals goal is to moderate novelty and complexity. An explanation of autistic decits as secondary cognitive consequences of primary emotional dysfunction oers two reasons for optimism. First, this view suggests that the cognitive features of the syndrome may be acquired, and hence reversible through training. To employ the terminology we introduced earlier, autistic behaviour may result from failure to spontaneously acquire the natural technology normally employed to regulate social behaviour, without implying that the cognitive system is intrinsically incapable of acquiring it. Second, if the core problems observed in autism result from emotional disturbances, even if these are driven by genetic factors, the prospects of intervention using drugs become much more positive. A drug that replaces lost insights into the mind of other people is improbable; drugs that assist in emotional control are already available and might be therapeutically eective if administered before inappropriate cognitive adaptations to the core emotional condition have taken place. The suggestion that autistic disorders are caused by emotional dysfunction is as old as autism itself. Unfortunately, in addition to proposing an explanation in terms of aective dysfunction, Kanner (1943) also claimed that this dysfunction arose from the early emotional experiences of a child specically from the childs reaction to cold and inexpressive mothers. The subsequent stigmatisation of parents of autistic children was both cruel and unjustied (Rimland, 1964). The realisation that already-suering parents were inappropriately blamed because of the belief that emotional dysfunction was the primary locus of the autistic decit had an unexpected consequence. In later debates about autism, the view that emotional dysfunction is not the primary cause but a secondary consequence of a physical disorder of the cognitive system has become transformed into a dogma. However, the case for emotion as the source of autistic decits remains strong. The DSM-IV (American Psychiatric Association, 1994) criteria for Autistic Disorder include: marked impairment in the use of multiple nonverbal behaviours, such as eye-to-eye gaze, facial expression, body postures, and gestures to regulate social interaction

Gaze aversion and emotional dysfunction in autism 273

a lack of spontaneous seeking to share enjoyment, interests, or achievements with other people (e.g., by a lack of showing, bringing, or pointing out objects of interest) lack of social or emotional reciprocity encompassing preoccupation with one or more stereotyped and restricted patterns of interest that is abnormal either in intensity or focus apparently inexible adherence to specic, nonfunctional routines or rituals stereotyped and repetitive motor mannerisms (e.g., hand or nger apping or twisting or complex whole-body movements) persistent precoccupation with parts of objects

Many of these features either directly indicate emotional dysfunction or can be interpreted as behavioural responses to emotional disturbance. For example, avoidance of eye-contact, preoccupation with detail and stereotyped, repetitive behaviour are all strategies that might be expected in individuals trying to avoid the excessive anxiety that results from sensory and emotional overload (Gillingham, 2000).8 It has been widely reported that some of the major diculties experienced by autistic individuals are associated with the expression and understanding of emotion (Bemporad, Ratey and ODriscoll, 1987; Loveland et al., 1989; Hobson and Lee, 1989). However, it is not always acknowledged that stimulus contexts associated with emotion invariably involve the processing of novel and complex social information, and that novel and complex information is liable to cause distress to autistic individuals even when it is not social in nature. Recently, explicit evidence of anxiety problems in autistic children has begun to emerge Amaral and Corbett (in press). Muris et al. (1998) examined the presence of co-occurring anxiety symptoms in 44 children with autism spectrum disorder. The sample included 15 children with autism, and 29 with pervasive developmental disorder-not otherwise specied (PDD-NOS). They found that more than 80% of the children met criteria for at least one anxiety disorder. Gillott et al. (2001) compared high-functioning children with autism to two control groups including children with specic language impairment and normally developing children on measures of anxiety and social worry. Children with autism were found to be more anxious on both indices. Four of the six factors on the anxiety scale were elevated with obsessive-compulsive disorder and separation anxiety showing the highest elevations. Despite the emphasis placed by clinicians and practitioners on emotional disturbance, academic theories have focused almost exclusively upon actual or

274 Sarah Bowman, Lisa Hinkley, Jim Barnes and Roger Lindsay

possible decits in cognition. Presumably, the emotional dysfunction that is such a central feature of the autistic syndrome, rather than being denied, is being neglected on the grounds that it can be accounted for as a secondary consequence of one or more cognitive decits. However, the assumption that cognitive decits are primary, and emotional problems secondary, does not appear to be supported by any signicant body of evidence. We therefore believe that it is of considerable importance to bring the possibility that autistic decits are rooted in emotional disorder back into centre eld. Accordingly, we have presented the Stress Overload Theory as an additional possibility that can be tested against the cognitive alternatives.

Neuropsychology of autism
Three dierent neurological systems of the brain have been proposed as the site of autistic dysfunction. The frontal lobes, cerebral hemispheres integration and the amygdala. The frontal lobes have been implicated both by theorists who argue that mental state information is processed by generic executive functions (e.g., Frye et al., 1995, 1996) and by Mindblindness theorists (Baron-Cohen, 1994; Happ, 1996). Impairment of executive functions has long been associated with damage to prefrontal areas (e.g., Luria, 1966; Fuster, 1989; Duncan, 1986; Shallice, 1998). There is substantial evidence from brain injury victims, animal lesion studies and functional imaging studies that dierent aspects of executive functions are subserved by neural systems located in anatomically separated regions of the prefrontal cortex (e.g., Luria, 1966; Fuster, 1989; Robbins, 1996; Shallice and Burgess, 1996). Though such reports are still few in number, imaging studies have suggested that part of the left medial front cortex may indeed be implicated in autism (e.g., Frith, 2001). The amygdala, which is a hemispherically bilateral structure, also shows a dierent pattern of activation in autistic individuals from that in neuro-typical people when presented with social meaningful stimuli such as faces showing emotional expressions. This has led proponents of the Mindblindness theory of autistic decits to claim that the amygdala is centrally involved in the cognitive processes underlying normal social behaviour (Baron-Cohen et al., 2000). However, recently reported work by Amaral and Corbett (in press) suggests that damage to the amygdala is not sucient to produce the social decits observed in autism.7 Amaral and Corbett observe that [r]ecent data from studies in [their] laboratory on the eects of amygdala lesions in the macaque monkey are at variance with a fundamental role for the amygdala in social behavior and

Gaze aversion and emotional dysfunction in autism 275

conclude that an important role for the amygdala is in the detection of threats and mobilizing an appropriate behavioral response, part of which is fear. If the amygdala is pathological in subjects with autism, it may contribute to their abnormal fears and increased anxiety rather than their abnormal social behavior (Amaral and Corbett, in press). Crucially, though Amaral and Corbetts work conrms the involvement of the amygdala in autistic dysfunction, it suggests that eects of damage to the amygdala are mediated by impairments to subsystems involved in emotion processing, rather than from direct interference with social cognition. The neuropsychological basis for weak central coherence has yet to be precisely specied. Most work within this framework has generally taken the view that weak central coherence results from the nature of the computational processes underlying cognition, rather than being associable with particular cortical structures.8 Accordingly such neuropsychological evidence as is currently associated with this theory does not rule out a primary locus for autistic dysfunction in the brain areas underlying emotion rather than cognition. It has been noted that IM/MNT benets from its ability to precisely specify the neurological structures (mirror neurons) responsible for imitation behaviour and to assign them a specic cortical location in areas F4 and F5 in the ventral premotor cortex, with the probable involvement of similar structures in the superior temporal sulcus (Gallese and Goldman, 1998). However, the precision with which relevant cortical structures are specied also exposes this theory to disconrmation, and a recent PET study by Decety, Chaminade, Grzes and Meltzo (2002) has produced ndings that directly contradict its claims. This investigation found that imitation behaviour is associated with activity in quite dierent areas of the cortex: The left inferior parietal is specically involved in the imitation of the other by the self, whereas the right homologous region is more activated when the self imitates the other Overall these results favor the interpretation of a lateralisation in the posterior part of the brain related to self versus others-related information respectively in the dominant versus the non-dominant hemisphere. (Decety, Chaminade, Grzes and Meltzo, 2002, p. 271) The conclusions we draw from this review of the neuropsychology of autism is that areas of the prefrontal cortex and the amygdala are probably implicated in the syndrome. These structures undoubtedly play a central role in interfacing the limbic system responsible for emotional experience with systems in the premotor and orbito-frontal cortex that are involved in the formulation of intentions and the generation of action schemas. However, presently

276 Sarah Bowman, Lisa Hinkley, Jim Barnes and Roger Lindsay

available neuropsychological evidence is not sucient to justify the view that dysfunctional cognition causes abnormal emotional experiences, as against the view that inappropriate emotional experience is the cause of cognitive dysfunction. Again, there seems every reason to keep under consideration the possibility that decits in emotion processing are the primary source of autistic disorder.

Emotion and face recognition


Emotion is expressed in distinct facial expressions and particular areas of the face are linked with dierent emotions such as happiness, surprise, sadness, anger, fear and disgust (Darwin, 1872). Whilst autistic children may understand and express simple emotions such as anger, they typically do not express more complex emotions such as surprise (Dennis et al., 2000) and are signicantly impaired at recognising emotions (Hobson, 1986a, 1986b; Weekes and Hobson, 1987). When autistic children are presented with photographs of faces, at least some have been found to have diculty in using photographs to identify faces and to correctly attribute emotion on the basis of facial expression in photographs (Tantam, Monahan, Nicolson and Stirling, 1989). Autistic children dier from control children in recognising pictured faces when orientation is manipulated (Langdell, 1978), are less able than controls to recognise emotional expressions and are also much less accurate in identifying a right-way-up face (Tantam, Monahan, Nicolson and Stirling, 1989). Autistic children, however, have been found as able as controls to label upside-down faces, and there is a much bigger dierence within the control groups between right-way-up faces and upside-down faces than there is within the autistic groups. Tantam, Monaham, Nicholson and Stirling concluded that it was likely that this diculty would apply to actual faces as well as pictures of faces and might explain some autistic individuals diculty in social interactions. Though high-functioning autistic children are far more able at matching simple emotions than their lower-functioning counterparts (Dennis et al., 2000) they are less able then control participants at identifying emotions when matched for non-Verbal Mental Age (nVMA) (Tantam, Monahan, Nicolson and Stirling, 1989). However, there are no dierences between autistic and control groups when they are matched for Verbal Mental Age (VMA) (Ozono, Pennington and Rogers, 1990). Celani, Battacchi and Aricidiacono (1999) suggested that this is because the recognition of facial expressions depends upon the use of analytic (or local) processing and holistic (or global) processing. Celani et al. speculated that analytic processing takes place in the left hemisphere,

Gaze aversion and emotional dysfunction in autism 277

information is perceived in terms of its properties and an inferential understanding of facial expression is gained. Holistic processing may depend more upon right hemisphere (Buck, 1984) and involve sub-cortical structures in the limbic system that respond to displays as a whole and hence may enable direct apprehension of emotional meaning. Celani et al. hypothesised that the holistic processing associated with the right hemisphere is impaired or unavailable in autism so that autistic individuals are forced to employ left hemisphere analytic processes to compare faces on the basis of component features of emotional expressions (Celani et al., 1999). The eyes are the window to the soul is a statement attributed often to Leonardo da Vinci and, at least for humans, the eyes are indeed extremely important social cues. Direction of gaze is an important non-verbal cue for turn taking in face-to-face conversation and noting the pattern of a persons gaze can reveal much about their determination of intentions and their mental state. We learn much about the minds of others by observing their eyes; whether they like us, what they are thinking about, and what they want. This has led to the suggestion that gaze interpretation is a form of mindreading (Baron-Cohen, 1994): Such mindreading allows the extraction of clues about the focus of another persons attention, desires, and even beliefs and a lack of sensitivity to gaze may underlie the impairments in social and cognitive abilities that are observed in autism. Humans may have an internal eye direction detector (EDD), which may play a particularly important part in intentionality detection, perhaps the most important component in a mindreading system (BaronCohen, 1995). Evidence for the existence of an EDD has come from an experiment showing that infants as young as 3 months of age can detect the direction of a persons gaze using information from the eyes alone Hood, Willen and Driver, 1998). This has been interpreted to support the theory that an EDD mechanism is present in infants from a young age, and thus have access to a mechanism that allows infants to direct their own attention so as to match that of others, a phenomenon sometimes known as joint attention. Autistic individuals display highly atypical gaze behaviour, particularly with respect to following the eye movements of other people. Hobson et al. (1986a, b) suggest that autistic individuals attend to dierent features in the face, seeming to make less use of the eye area than do control participants in experimental studies. In early reports gaze avoidance was considered a central feature of conditions such as autism and Aspergers (OConnor and Hermelin, 1967). More recent researchers have suggested that rather than avoiding gaze, autistic individuals may restrict gaze-sampling to quick glances at the eyes of

278 Sarah Bowman, Lisa Hinkley, Jim Barnes and Roger Lindsay

others (Volkmar and Mayes, 1990), though Baron-Cohen (1995) appears to believe that gaze in autistic people is unimpaired and Dickerson has pointed out that the claim that autistic children are able to use gaze selectively, requires the assumption that they can use competently to some degree (Dickerson, Rae, Stribling, Dautenhahn, Ogden and Werry, in press). Normal individuals seem to scan other peoples faces by xating in repeated cycles for approximately a third of a second, dwelling on the eyes longer than other facial areas (Argyle, 1994). The relationship between hypothetical cognitive decits and atypicalities in gaze and eye-contact is still a poorly understood issue in autism research. Cognitive decit theories can only explain dysfunctional processing of gaze information as a result of the absence of a theory of mind, inability to employ global processing, insucient Central Executive resources, or lack of the mirror neurons that allow social behaviour to be interpreted. Another possibility is that eye-gaze stimuli adversely stimulate the autonomic nervous system in autism, perhaps inhibiting the normal development of a theory of mind. If so, any decit in autism is likely to be much more low-level and pervasive than a cortical theory of mind module (Keeley, 2002, p. 4). When autistic children are separated from an adult by a substantial barrier, they look more at that adult than if the barrier were not present even though the adult is in full view and looking at the child in both conditions (MacConchie, 1973, cited in Richer and Cross, 1976). MacConchie suggests that the barrier reduces the probability of a social interaction between the adult and the child, thereby also reducing the threat normally associated with eye contact. Also, children with autism look more at adults and engage in less ight behaviour when the adults have both eyes covered than when the eyes are visible (Richer and Cross, 1976). These studies lend support to the suggestion that avoidance of eye contact is a means of reducing the emotional threat posed by social contact. Tasks requiring the discrimination of the direction of gaze, as those in an imaging study by Kawashima et al. (1999), activate an area in the left amygdala equally in both eye contact and in no eye contact conditions. However, a region in the right amygdala only becames activated during the eye contact condition (Kawashima et al., 1999). These results implicate the amygdala in the processing of gaze information: the left amygdala in the general processing of gaze direction with specic involvement of the right amygdala when another individual directly makes eye contact with the person whose activation levels are being imaged. This reinforces the suggestion that impaired amygdala function may be associated with autism.

Gaze aversion and emotional dysfunction in autism 279

We conclude that, even when matched for nVMA, autistic participants are less able to process facial expressions related to emotions than normal controls. As there is abundant neuropsychological evidence associating impairment of the amygdala with the symptoms of autism, it seems likely that such impairments contribute to the disorder. There is some evidence that a special purpose EDD exists and that this system is present even in very young infants. It is possible that EDD does not develop in autistic children as it does in other children. It still seems to be an open question, however, whether decits associated with cognitive systems such as EDD are primary decits, carrying other consequences in train, or whether they are secondary problems, resulting for example from abnormal emotional reactions to social stimuli.

Experiments
Below we report two studies designed to discriminate between cognition- and emotion-based theories of autistic decits. Previous researchers have failed to distinguish between a passive failure to make use of high-value information from the eyes of others (cue blindness) and an active tendency to avoid the gaze of others (cue aversion). Cognitive theories predict cue-blindness: as on these theories, autistic children cannot process social cues there is clearly no reason why they should seek to avoid looking at the areas of the face where such cues are located. Emotion-based theories on the other hand, predict cue aversion: because social cues are associated with excessive anxiety, autistic individuals will avoid processing them. In both cases important information will be lost, and cognitive processing will be impoverished as a result. But the underlying mechanisms are nonetheless quite distinct. The rst study we report was intended to investigate whether the sample of autistic children participating in the study do indeed show impaired ability to identify emotional expressions using information from the eye region of photographed faces in comparison with controls. The design required participants to identify emotional expressions from whole faces, faces with eyes deleted, or from eyes only with the remainder of the face deleted. The second study compares the regions of complete faces that autistic and control participants attend to. The logic of the study is this: photographs of faces are briey presented to participants. Superimposed on the face, in the mouth cheek or eye region, is a heart-shaped target. Participants are required to press a key as soon as they detect the heart shape and reaction times are measured. When the display has terminated, participants are asked where the target was located. If

280 Sarah Bowman, Lisa Hinkley, Jim Barnes and Roger Lindsay

cognitive decit theories are correct, the time take to correctly identify the target location should be equal across locations. Even if autistic participants cant process social cues, there is no reason why the heart-shape should take longer to detect in the eye region than in the mouth or cheek region. Emotional decit theories however, predict that autistic participants will selectively avoid attending to the region of face around the eyes because this are is rich in social cues. On these theories, targets located by the eyes should take longer to detect than targets located by the mouth or by the cheek.

Experiment 1
This experiment examines whether autistic children have more diculty than normal controls and learning disabled controls at identifying emotions from pictures of the eyes alone. The accuracy of emotion identication is compared using eyes-only displays, whole-face displays and no-eyes displays. It is hypothesised that the autistic group will fare particularly badly in the eyes-only condition.

Method Participants
Three groups of participants were used for this experiment. The rst group (n = 17) were attending a special school and all had a diagnosis of autism using established criteria. The second group (n = 14) all had learning disabilities and also attended a special school. The third group (n = 18) were control children attending a mainstream school. Participants were matched for non-verbal mental age (nVMA). All participants were males aged between 5 and 16 from one of two schools in South Essex. As autism aects more males than females (ratio 4:1) the study was conned to males participants in order to avoid having to deal with the problem of what proportion of female participants should appear in the study. Learning-disabled controls were used alongside normal controls because 75% of autistic individuals have a learning disability as well and indeed, learning disability was a criterion for acceptance at the school where the study was carried out. Children with severe learning disabilities, Attention Decit Disorder (ADD), Attention Decit Hyperactivity Disorder

Gaze aversion and emotional dysfunction in autism 281

(ADHD) and very low-function autistic children were excluded from the study, as they would have found diculty in completing the tasks employed. Experiments 1 and 2 employed the same participants, but for any particular participant there was an interval of at least two weeks between experimental sessions. This interval allowed each session to be kept as brief as possible, as well as minimising the risk of transfer eects and between-group dierences in attention or memory span.

Apparatus
A laptop computer running a Superlab programme was used to present faces from Ekman and Friessen (1978). Pictures of faces displaying emotion were presented with only the eyes visible (just-eyes condition), everything but the eyes visible (no-eyes condition) or the whole face visible (whole-face condition). Five emotional expressions were used as targets happy, sad, angry, afraid and surprised. Ravens Progressive Matrices (standard) was used as the nVMA test. Four dierent faces were used in a counterbalanced design with each participant viewing two of the four available faces (Appendix 4).

Procedure
The experiment was preceded by a training session intended to ensure that all participants were equally competent at identifying facial expressions of emotions and attaching verbal labels to them. Participants were seated in front of a laptop computer and told that they would see a picture of a face followed by a second screen showing ve more faces. The sets of ve faces were made up of the original facial expression and four dierent facial expressions. Participants were then asked to name the member of the set of ve which looked most like the previous face. The experimenter then named the ve dierent emotions illustrated in the second screen. The display and naming exercise was repeated as often as necessary until participants could condently identify the emotions. The training phase was useful in familiarising participants with the task and in ensuring that between-group dierences in emotion recognition using whole and partial faces was not due to pre-existing dierences in recognising or labelling the expressions presented. A pilot study run on ve children showed that to prevent the prevent the training phase from requiring any more than a trivial amount of new learning, some synonymous expressions would have to be accepted as equivalent to specic

282 Sarah Bowman, Lisa Hinkley, Jim Barnes and Roger Lindsay

emotion words for some participants. For example, smiley for happy, frightened for afraid, cross for angry, shocked for surprised and upset for sad. Participants were then told that some of the pictures would show all of a face, some would show just the eyes and others would have the eyes blacked out so that you cant see them. If they were unsure they were instructed to guess. They were then asked if they had any questions. The programme displayed 30 pictures on the laptop screen (2 face identities 5 emotions 3 display types). The pictured faces showed one of ve dierent emotions (happy, sad, angry, afraid or surprised). Each face was displayed for 2000ms, and was followed by a screen showing the ve emotions in words with a schematic picture next to it from a standardised bank of emotional expressions used by the school for autistic and learning disabled children. This acted as a prompt for the answer. A set of practice examples was administered rst with a gap afterwards for questions or concerns. If the participant said they did not know, they were encouraged to guess. The participant gave the experimenter their answers and the experimenter entered the response on the computer. This was to make the experiment accessible to as many children as possible including some lower functioning autists. Reaction times were not recorded in Experiment 1.

Results
Table 1 shows the mean number of emotional expression correctly identied by autistic, learning disabled and control participants.
Table 1. Mean number of emotional expressions correctly identied by autistic, learning disabled and control participants. All gures are out of a maximum of 10. Standard deviations are in brackets.
Display Type Group Autistic (n = 17) Learning Disabled (n = 14) Control (n = 18) Eyes only (3.00 (0.37) (5.14 (0.41) (7.50 (0.36) No eyes (4.41 (0.43) (1.92 (0.47) (5.78 (0.42) Whole face (4.82 (0.45) (3.36 (0.50) (7.44 (0.44)

Gaze aversion and emotional dysfunction in autism 283

Analysis of Variance was performed on the mean number of correct responses with display type as a within-participant factor and group as a between-participant factor. There was a signicant main eect of display type (F = 10.155, df = (2,45), p = 0.001) and of group (F = 33.201, df = (2,46), p = 0.001). There was also a signicant display type group interaction (F = 9.463, df = (4,90), p = 0.001). Post hoc analyses were conducted to further examine these eects.

Autistic Group
For the autistic group there was a signicant main eect of display type (F(2,15)=6.631, p =0.009). Performance was found to be worse in the eyes-alone condition (mean = 3) than in the no-eyes condition (4.41 F(1,16) = 8.727, p = 0.009) or in the whole-face (4.82) condition (F(1,16) = 11.828, p = 0.003). However there was no dierence between no-eyes (4.42) and whole face conditions (4.82 F(1,16) = 0.662, p = 0.442). These ndings suggest that autistic children nd it more dicult to identify emotion when only the eyes are presented as cues to emotional expression.

Learning Disabled Control Group


For the learning disabled controls there was a signicant main eect of display type (F(2,12) = 11.706, p = 0.002). Performance was superior with eyes (5.14) than no-eyes displays (1.92 F(1,13) = 25.288, p = 0.001) or whole-face displays (3.36 F(1,13) = 11.085, p = 0.005). Furthermore recognition of emotion was superior with whole-face displays (3.36) than no-eyes displays (1.92 F(1,13) = 7.514, p = 0.017).

Normal Control Group


For the normal control group there was a signicant main eect of display type (F(2,16) = 8.413, p = 0.003). Superior emotion recognition was found with eyes (7.5) than no-eyes displays (5.78 F(1,17) = 13.038, p = 0.002). However eyes (7.5) could not be distinguished from whole-face displays (7.44 F(1,17)=0.027, p = 0.871). Furthermore normal controls performed better with whole face (7.44) than no eyes (5.78 F(1,17) = 17.0, p = 0.001). Further analysis examined dierences between the three participant groups for the dierent types of displays. For the eyes-only displays the autistic group

284 Sarah Bowman, Lisa Hinkley, Jim Barnes and Roger Lindsay

(3) performed signicantly worse than learning controls (5.14 p = 0.001) or normal controls (7.5 p = 0.001). Learning disabled controls also performed worse than normal controls (p = 0.001). For the no-eyes displays the autistic group (4.41) performed better than learning disabled controls (1.92 p = 0.001) and their performance on these displays was only slightly worse than normal controls (5.78) a dierence which verged on signicance (p = 0.082). However the learning disabled controls were worse at this task than the normal controls (p = 0.001). For the whole face displays, the autistic (4.82) and learning disabled controls (3.36) could not be distinguished (p = 0.113). However the normal controls group (7.44) performed better than the autistic (p = 0.001) or learning controls (p = 0.001).

Discussion
The autistic childrens performance was found to be worse in the eyes-alone condition than in the no-eyes condition or in the whole-face condition. However there was no dierence between no-eyes and whole-face conditions. These ndings suggest that autistic children nd it more dicult to identify emotion when only the eyes are presented as cues to emotional expression. By contrast, for the learning disabled control group performance was found to be best for the eyes-only displays, followed by whole-face displays and lastly no-eye displays. This group were most successful when they could rely on the eye region for successful categorisation of emotion however the whole face displays may have provided extra information that distracted these participants from the all important eye region, perhaps by providing extra cues. The normal control participants performed best in the conditions when eyes were presented (eyes-only and whole-face), they were much poorer in the no-eyes condition. Unlike the learning disabled control condition however performance in the eyes and whole-face conditions could not be distinguished suggesting that the extra information provided in the whole display condition did not confuse these participants as much as it did the learning disabled control participants. The procedure used in Experiment 1 evidently diers from real life emotion identication in that recognition decisions are based on images of faces that are both still and incomplete, hence it is legitimate to doubt whether the ndings of the study generalise to emotion identication in real life. However this may be, the experimental procedures employed in the present study appear to have been successful in demonstrating real dierences between autistic and control

Gaze aversion and emotional dysfunction in autism 285

participants. Furthermore, the observed dierences seem to be in line with expectations based on previous research, including studies that were not as experimentally constrained as the present study. The ndings of the present study support those of Tantam et al. (1989) who found that when matched for nVMA the autistic participants are less able than control participants at emotion recognition tasks. Celani et al. (1999) had previously found that autistic individuals process information at a more elemental level than control participants do. In the present study the autistic individuals did almost equally well in the no eyes and whole face conditions suggesting that they only use part of a face to recognise emotion. It is possible that they employed a feature-based approach to emotion recognition as suggested by Hobson et al. (1988), similarly the data are consistent with the hypothesis that the normal controls used a more holistic approach. In summary, autistic participants were less accurate at emotion recognition using just the eyes than were the control participants, supporting the hypothesis that autistic children gain less information about emotion from the eye region than normal or learning disabled control children.

Experiment 2
The second experiment sought to investigate the way in which autistic children pay attention to facial features by measuring the time taken to identifying a shape when it is sometimes superimposed on the eye region of a pictured face, sometimes on the cheek, and sometimes on the mouth. The study examined the eects of group (autistic, learning disability control and normal control) on the time taken to locate a heart shape superimposed on the eye, cheek or the mouth region of facial displays.

Method Participants
The same 49 participants took part in Experiments 1 & 2.

286 Sarah Bowman, Lisa Hinkley, Jim Barnes and Roger Lindsay

Procedure
Participants were told that they would be shown more pictures of faces but this time each picture would have a heart shape hidden somewhere on the face (see Appendix 1 for examples). The experimenter then showed the participant an example of the heart shape. They were told to press the space bar as soon as they say the shape and then to tell the experimenter where it was. They were told that the shape would be on the eye, the mouth or the cheek. Participants were also informed that the experimenter wanted to see how fast they could respond and so they should press the space bar as soon as they saw the heart. At this point any questions participants had were answered and they were given a few practice trials to ensure they understood the requirements of the study. The experiment, which consisted of 12 pictures shown sequentially on the screen, then proceeded. All participants were 100% accurate at locating the heart shape.

Results
Table 2 presents mean response times for locating a superimposed heart shape on the eye, mouth and cheek regions of pictured faces for the 3 groups of participants. A two-factor Analysis of Variance with group as a between participants factor and display type as a within participants factor was carried out on the reaction time data. Signicant main eects of display type (F(2,45) = 4.455, p = 0.017) and of group (F(2,46) = 8.096, p = 0.001) were found. The interaction between group display verged on signicance (F(4,90) = 2.263, p = 0.069).

Table 2. Mean reaction times (in milliseconds) for locating a superimposed shape on pictured face by autistic, learning-disabled and control participants. Standard deviations are in brackets.
Display Type Group Autistic (n = 17) Learning Disabled (n = 14) Control (n = 18) Eye 2575.5 (184.6) 1912.4 (203.5) 1762.6 (179.5) Mouth 1902.1 (133.0) 1490.5 (146.6) 1789.4 (129.3) Cheek 2600.9 (202.9) 1617.7 (223.7) 1627.2 (197.4)

Gaze aversion and emotional dysfunction in autism 287

Post hoc analyses were conducted to further examine these eects.

Autistic Group
For this group the main eect of display type was found to be signicant (F(2,15) = 6.15, p = 0.01). The autistic group took signicantly longer to locate the shape in the eyes condition (2575.5) than in the mouth condition (1902.1 F(1,16) = 12.641, p = 0.003). However there was no signicant dierence between the eye (2575.5) and cheek conditions (2600.9 F(1,16) = 0.08, p = 0.931). They also took longer in the cheek condition (2600.9) than in the mouth condition (1902.1 F(1,16) = 4.831, p = 0.043).

Learning Disabled Control Group


For this group the main eect of display type was not signicant (F(2,12) = 1.928, p = 0.188).

Normal Control Group


For this group the main eect of display type was not signicant (F(2,16) = 1.413, p = 0.272). Further post-hoc analysis sought to compare the three participant groups in terms of their response times to the dierent types of displays. For the eyedisplays, there was a main eect of group on response times (F(2,46) = 5.547, p = 0.007). Autistic participants were notably slower at responding to eyestimuli with responses slower than both learning disabled controls (2575.5 vs. 1912.4 p = 0.063) and normal controls (2575.4 vs. 1762.6 p = 0.011). However the learning disabled controls and the normal controls could not be statistically distinguished (p = 0.858). For the mouth-displays the eect of group was not signicant (F(2,46) = 2.30, p = 0.112). Finally, for the cheek-displays the eects of group was signicant (F(2,46) = 6.80, p = 0.003). Response times for the autistic children in this condition (2600.9) were signicantly longer than either learning disabled controls (1617.7 p = 0.013) or normal controls (1627.2 p = 0.008). Learning disabled controls and normal controls could not be statistically distinguished (p = 1.00).

288 Sarah Bowman, Lisa Hinkley, Jim Barnes and Roger Lindsay

Discussion
Autistic children identied the heart shaped target fastest in the mouth condition. They were signicantly slower in the eyes and in the cheek condition. In other words, the further away from the eyes the target was located, the more quickly it was detected. For the learning control group and the normal control group there was no eect of shape location on response times. For the eye displays, the autistic group responded more slowly than either control group, which did not dier from one another in terms of response times. For the mouth displays, the response times of the three participant groups could not be statistically distinguished. For the cheek displays there was also an eect of group, with autistic children being signicantly slower to respond than either control group. The between group dierences in response to display type are important. Because the stimuli were semi-naturalistic, visual contrast between the superimposed heart shape and background were not completely identical across facial areas, the complexity of the background probably diered also, and the heart shapes were located at dierent spatial positions on the screen (e.g., the shape located over the eye was physically higher than that located near the mouth). If all participant groups had found the eye displays to be more dicult, the explanation would probably lie in physical dierences between this and the other display types. However, the fact that specic eye display diculty was conned to the autistic group suggests that physical aspects of the stimuli are not responsible. It remains possible that autistic participants have some hitherto unsuspected diculty with low contrast, or high location stimuli, but at present this possibility is gratuitous. By contrast, the prior research evidence that autists have diculty in processing information associated with gaze and eye location is compelling. The ndings are consistent with the hypothesis that autistic children are gaze averse and actively avoid looking at the eye region of faces. Nether control group showed any dierence between the three conditions indicating they have neither preference nor aversion for the eye (or indeed to any other) region of faces. Langdell (1978) carried out an early investigation of the behaviour of autistic participants using whole photographs of faces and parts of photographs. The ability of autistic children to sort photographs of simple emotions (whole photographs & half face photographs) was examined. Both control and autistic groups were able to sort the full photographs and those of the lower halves of faces but the autistic group were impaired in sorting the photographs of the

Gaze aversion and emotional dysfunction in autism 289

upper half of the face. The results of the present study are consistent with Langdells ndings in that reaction times for the upper half of the face (the eye and cheek conditions) were signicantly longer than for the lower half of the face (the mouth condition). In the light of theoretical discussion presented earlier, this evidence that there is positive gaze aversion in autistic children seems to support hypotheses based upon emotional dysfunction rather than cognitive decits. Whilst cognitive decit theories dier in the precise mechanism that is claimed to be defective, all maintain that autistic people are unable to process certain kinds of information in displays. It is dicult to understand why autistic individuals should selectively avoid sources of information to which they are eectively, blind. Emotion-based theories, such as the Stress Overload Theory, on the other hand, predict exactly the outcome we have reported: autistic people will seek to avoid novel and complex aspects of the stimulus environment that threaten to overwhelm their cognitive coping strategies. In the case of human faces, this would be expected to result in avoidance of the eye region because of the complexity and unpredictability of gaze information.

General discussion
A great deal of information is obtained from the eyes of normal individuals. The eyes are used to assess whether someone is being truthful, whether their smile is genuine and what they are planning to do next. The eyes may also be used to irt, to share a joke, or to discretely direct the attention of another person. (Vertegaal, Slagter, Van der Veer and Nijholt, 2000; Argyle and Cook, 1976; Kendon, 1967). Autistic individuals conspicuously fail to display these abilities under ordinary conditions, though there is no positive evidence that they do not possess them and some evidence that they do demonstrate competence in directing gaze (Dickerson, Rae, Stribling, Dautenhahn, Ogden and Werry, in press). To the extent that the latter is true, the selective use of gaze observed in people with autism may be under some volitional control. Baron-Cohen (1994) believes that an important function of human eye and gaze monitoring is the determination of mental states, or what he calls mindreading. He also suggests that a mechanism called the Eye Direction Detector is absent or defective in autistic individuals, resulting in a lack of sensitivity to gaze and to impairments in social and cognitive abilities (Baron-Cohen, 1995). The data from the present study strongly suggest that these impairments are a

290 Sarah Bowman, Lisa Hinkley, Jim Barnes and Roger Lindsay

secondary consequence of a primary emotional dysfunction. Autistic individuals do not cognitively process social information because they avoid the sources of such information as part of a stress management strategy. There is considerable evidence linking dysfunction of the amygdala to the symptoms of autism. As mentioned earlier, the experiment by Kawashima et al. (1999) associates the left amygdala in the general processing of gaze direction but implicates the right amygdala when eye contact with another individual is made. The authors conclude that these results demonstrate the involvement of the human amygdala in the processing of social judgements of other people based on the direction of their gaze. The amount and type of activity in the cerebellar, mesolimbic and temporal lobe regions of the brain has also been claimed to dier signicantly between autistic participants and participants when processing facial expressions (Critchley et al., 2000). This seems compatible with the suggestion that the circuits controlling the ow of information between emotion centres and the frontal cortex may be the primary source of dysfunction in autistic people. The review of theoretical analyses of autistic decits, the review of the neuropsychological evidence, and the new data that we report all seem consistent with the suggestion that the primary dysfunction underlying autistic disorders is associated with the emotional systems rather than the cognitive systems of the brain. This is encouraging for two reasons. Firstly, theories that postulate innate cognitive modules, such as a Theory of Mind module have a poor track record in neuropsychology (e.g., Fodor, 1983). Such theories, along with others claiming generic cognitive dysfunction are particularly fragile in the case of autism as the condition does not emerge until the age of three or later, and the evidence for defects present from birth is necessarily inferential. Though patterns of selective sparing and loss associated with adult brain damage confer plausibility on the suggestion that cognitive processes have some kind of modular organisation in adults, this provides no basis at all for believing that cognitive modules are present from birth. Indeed, Lindsay and Gorayska (2002) have argued that modular structure may well result from mindware organisation: the problem spaces that underlie goal-oriented action planning eectively become independent modules if they share neither goals, nor plan elements, this functional independence is described by Lindsay and Gorayska as relevance discontinuity. Secondly, if the social and cognitive decits associated with autism are not themselves genetically determined, but result from adaptation to emotional dysfunction, it may be useful to regard them as the result of defective natural technology (Meenan and Lindsay, 2002; El Ashegh an Lindsay, this volume; Ramachandran, Undated MS).

Gaze aversion and emotional dysfunction in autism 291

Natural technology appears to exist in both spontaneously developed and induced forms. Speech acquisition for example, involves mindware representations of grammatical, semantic and pragmatic relationships, but these do not have to be explicitly taught, indeed they are acquired even in considerably impoverished learning environments (Lenneberg, 1967). Writing, on the other hand, is a natural cognitive technology the development of which must be induced by explicit teaching. Induced natural technology is always likely to be readily modiable as external intervention is built into the acquisition mechanism. Social interpretations of gaze and the visual signs of emotion seem likely to develop spontaneously, rather than to require explicit teaching. Competence in these areas may therefore be susceptible to relatively limited intervention. However, all forms of natural technology are cognitive adaptations and therefore, to some degree modiable. From the natural technology perspective, the cognitve pathology associated with autism is caused by decient mindware, and in principle it should be possible to modify the mindware so as to eliminate the pathology. It is certainly worthwhile to continue to evaluate explicit teaching procedures, such as computer-based interventions, that are likely to facilitate the direct acquisition of appropriate natural technologies in people with autism. Perhaps, in the light of the data reported above, new interventions can be developed that focus on emotional dysfunction as the source of faulty mindware. Early intervention might be able to limit initial acquisition of dysfunctional natural technology. For example, control of aversive aspects of the autistic childs environmental, such as novelty and complexity might assist the development or re-engineering of social information-gathering and interaction strategies. Additionally, materials that induce as little emotion as possible should be used in explicit teaching of the cognitive strategies underlying gaze and the interpretation of emotional cues. This is almost the converse of eective teaching strategies in most contexts.

Notes
1. Wing and Potter (2002) quote a prevalence rate of up to 60 per 10, 000 for autism, and it is likely that that the gure is even higher for the whole autistic spectrum. The rate of autism in males is approximately three times greater than in females. 2. Rimland (1966) was one of the rst investigators to suggest that autism results from physical causes such as brain damage, though his specic proposal that relevant damage took the form of retrolental broblasia resulting from postnatal oxygen administration was rapidly discredited.

292 Sarah Bowman, Lisa Hinkley, Jim Barnes and Roger Lindsay

3. Mindblindedness theory (also known as lack of a theory of mind) is perhaps the most widely known cognitive theory oered to explain autism (Premack and Woodru, 1978; Baron-Cohen 2002; 2003). The central claim underlying the idea of mindblindness is that a special-purpose module in the brain is responsible for the human ability to understand the intentions and behaviour of others. In autistic individuals this module is absent or damaged. Some diculty or abnormality in understanding other peoples points of view is not seen as the only psychological feature of the autistic spectrum, but it is taken to be the core feature and [?that] appears to be universal among individuals with autism. The followers of this theory attribute complete mind-blindness, or total lack of a theory of mind only to extreme cases of autism. More commonly, a basic understanding of others mental states is supposed to be available to autistic individuals, but not at the level that one would expect from measured ability in other areas. The higher prevalence of autism amongst males is explained by suggesting that male brains have evolved under selection pressure to systematise and categorise the environment, whilst female brains have evolved under selection pressure to support empathy functions (Baron-Cohen 2002; 2003). On this view, autism is the result of an extreme male brain. 4. Executive Dysfunction Theory (Rumsey and Hamburger, 1988; Hughes, Russell and Robbins, 1994) claims that the autistic decit is associated with the Central Executive a hypothetical cognitive control system widely believed to underlie problem solving, planning processes and the generation of novel responses, and the suppression of irrelevant or intrusive behaviours (Baddeley, 1986; 1990). Central Executive activity is believed to be associated with the prefrontal cortex, and a key motivation for Executive Dysfunction Theory is the observed similarity between autistic individuals and patients with frontal lobe injury. On this theory diculties in initiating and sustaining social interaction result from more fundamental cognitive failure in activities such as attention shifting, planning, exible thinking, and disengaging from current stimulus control. There is also some evidence that autistic children do poorly on tests such as the Wisconsin Card Sorting Test and the Tower of London Task that are claimed to be sensitive to prefrontal injury (Prior and Homann, 1990). The common claim that autistic children are incapable of, or incompetent at, deception may be explained as part of the executive dysfunction because the Central Executive would be expected to play a major part in constructing and maintaining a cognitive model of the world as it is falsely represented to be. Inability to comprehend that other people have incorrect knowledge, or knowledge dierent from ones own personal beliefs are similarly claimed to be due to diculty in inhibiting reality-based responses. 5. Weak Central Coherence theory (Frith and Happ, 1994; Happ, 1999) is a cognitive theory of autism based on the suggestion that there is a decit at the level of basic perceptual processes, resulting in a failure to construct or eectively use gestalt information. Autistic individuals are supposed to be unable to see the wood for the trees. The term Weak Central Coherence is used to explain the nding that autistic children often show an obsessive preoccupation with minute sensory detail; with parts of a stimulus rather than the whole. This theory is sometimes linked to the view that the left cerebral hemisphere analyses detail, whilst the right hemisphere constructs gestalts, suggesting that the failure of central coherence is a consequence of hemispheric imbalance or inter-hemispheric communication and control. Autistic children often show islets of normal or even superior ability in specic

Gaze aversion and emotional dysfunction in autism 293

areas such as mathematics and drawing, despite a generally retarded cognitive prole and low IQ. Weak Central Coherence explains these clinical ndings in terms of the completely dierent way that in which the autistic perceptual system processes the incoming stimuli. 6. The most recent theory of autistic dysfunction, the Imitation Decit/Mirror Neuron Theory (IM/MNT), does not emphasize cognitive mechanisms but links behaviour directly to neurological structures. The basic idea was that failure to develop a theory of mind could be accounted for by the lack of the ability to imitate others (Rogers and Pennington, 1991). Other aspects of autistic spectrum disorders, such as general social decits, echolalia in speech, and stereotyped and repetitive behaviour can also be seen as the result of imitation when it is inappropriate, or imitation failure when it is helpful. This psychological theory has attracted much more interest however since the identication of mirror neurons in area F5 of the prefrontal cortex of monkeys (Gallese et al. 1996; Rizzolatti et al. 1996). Mirror neurons (tellingly labelled Monkey see, monkey do cells (Carey, 1996)) are action-coding neurons that re when a monkey performs a specic action, and, more remarkably, when the monkey sees the same action performed by another monkey or a human. Other related classes of neurons seem to code goal-directed actions and respond selectively to goal-driven body movements, such as reaching for or manipulating an object (Williams et al., in press). It has been speculated that the part of the monkey cortex containing the mirror neurons that deal with hand actions now subserves speech in humans, acting as a bridge between ones own speech and actions and the utterances and behaviour of others (Rizzolatt and Arbib, 1998). The connection between mirror neuron decits and imitation failure as a theoretical basis for autism has been made by a number of researchers including Ramachandran (undated) and Williams et al. (in press). Whereas the more cognitive theories of autism have found themselves casting about retrospectively for some physical basis in brain processes, IM/MNT has the considerable advantage of a built-in neuropsychological mechanism. 7. The amygdala is made up of the medial nucleus, the lateral/basolateral nuclei, the central nucleus, and the basal nucleus. The amygdala is located in the temporal lobe and appears to act as some kind of way-station between the prefrontal cortex and the limbic system. There is little doubt that the central nucleus of the amygdala plays a crucial role in the process of determining which complex environmental stimuli elicit emotional responses such as fear. Remove the amygdala from a monkey, for example and it loses its fear of snakes (Kolb and Whishaw, 2003; Damasio, 1994; 2000; Le Doux, 2000). From the amygdala bres project to and from the prefrontal cortex and the areas of the brain responsible for the expression of various emotional responses. Damage to the central nucleus (or the lateral/basolateral nuclei which provides the central nucleus with sensory information) reduces or abolishes a wide range of physiological reactions and emotional behaviours (Carlson, 1998). Morphological anomalies have been reported in the amygdalas of autistic individuals (Bauman and Kemper, 1985) and structural Magnetic Resonance Imaging (sMRI) techniques have also found reduced amygdala volumes in high-functioning autistic individuals compared with normal participants (Abell et al. 1999, cited in Adolphs et al. 2001). Surgical lesions in the amygdala of monkeys has been claimed to produce behaviour similar to that of autistic children (Bachevalier, 1991) suggesting that abnormal functioning of this area may account for some of the symptoms commonly found in autistic individuals.

294 Sarah Bowman, Lisa Hinkley, Jim Barnes and Roger Lindsay

Support for the amygdala hypothesis has been sought by looking at participants with amygdala lesion and with functional imaging of the amygdala in normal individuals (Adolphs, Sears and Piven 2001; Baron-Cohen, Ring, Bullmore, Wheelwright, Ashwin and Williams, 2000; Adolphs, Tranel, Damasio and Damasio, 1994; Morris et al. 1996). Studies such as those cited have generally been taken to support the idea that the amygdala has an important function in the recognition of emotion from facial expressions and may be involved in complex social judgements such as judging the trustworthiness of a person. However, a more recent study by Amaral and Corbett (in press) has found that surgical lesions in monkeys more precisely conned to the amygdala than those described by Bechevalier (1991), and specic damage to the amygala in humans (cases SM and HM) does not produce the abnormal social behaviour found in autistic individuals (Amaral & Corbett, in press). 8. Weak Central Coherence Theory was briey associated with a bold claim by Marshall and Fink that the right hemisphere of the brain operates in global processing mode, while the left hemisphere is specialised by local processing. This claim was apparently supported by functional imaging data generated during an object recognition task. Unfortunately a later attempt to reproduce these ndings using letter recognition produced exactly the reverse pattern of results to those predicted by the theory (Fink, Marshall, Halligan, Frith, Frackowiak and Dolan, 1997). Subsequent work within this framework has generally taken the view that weak central coherence results from the nature of the computational processes underlying cognition, rather than being associable with particular cortical structures (de Carvalho, Ferreira and Fiszman, 1999; Thagard and Verbeurgt, 1998; OLoughlin and Thagard, 2003). This immediately presents the diculty of explaining why decits are restricted to particular domains such as social cognition. One option is to claim that social cognition is particularly dependent upon global processing, but there is little theoretical justication for this view. At best, it might be expected that social information processing would be only selectively impaired, and that dysfunction in non-social cognition would also be present in tasks requiring global processing. However, this would predict a more distributed pattern of decits than those so far reported in autistic individuals. Another option would be to argue that autism is associated with a dual decit: a mindblindness problem and a central coherence problem. This is somewhat unparsimonious and requires strong evidence that neural computation operates dierently in autistic individuals and in normals. Such evidence is not yet available. 9. Emotional disorder and vulnerability to sensory over-stimulation receive little emphasis in the recent academic literature on autism which has focused almost exclusively on sociocognitive issues such as intentions and beliefs. This is in marked contrast to the emphasis found in reports by autism suerers themselves or produced by practitioners and carer support networks. A web search carried out on 16th June 2003, using the search terms autism + over-stimulation on the Google search engine, generated 1003 items few or none of which were from academic sources, and most of which were from individuals or agencies concerned with the home or clinical management of autism.

Gaze aversion and emotional dysfunction in autism 295

Appendix 1

296 Sarah Bowman, Lisa Hinkley, Jim Barnes and Roger Lindsay

References
Abell, F., M. Krams, J. Ashburner, K. Friston, R. Frackowiak, F. Happ, C. Frith & U. Frith (1999). The Neuroanatomy Of Autism: A Voxel-Based Whole Brain Analysis of Structural Scans. Neuroreports 10, 16471651. Abelson, R. (1963). Computer simulation of hot cognition. In S. Tomkins & S. S. Messick (Eds.), Computer Simulation of Personality, pp. 27798. New York: Wiley. Adolphs, R., D. Tranel, H. Damasio & A. Damasio (1994). Impaired Recognition Of Emotion In Facial Expressions Following Bilateral Damage To The Human Amygdala. The Journal Of Neuroscience 15, 58795892. Adolphs, R., L. Sears & J. Piven (2001). Abnormal Processing Of Social Information From Faces In Autism. Journal of Cognitive Neuroscience 13(2), 232240. Amaral, D. G. & B. A. Corbett (in press). The Amygdala, Autism and Anxiety. In M. Rutter (Chair) Autism: neural basis and treatment possibilities, Novartis Foundation Symposium 251. New York: Wiley. Web version available at: http://psych.colorado.edu/ ~munakata/csh/Novartis_paper_6-12-02.doc (accessed 14th June 2003). Alcade, C., J. I. Navarro, E. Marchena & G. Ruiz (1998). Acquisition of basic concepts by children with intellectual disabilities using a computer-assisted learning approach. Psychology Reports 82(3 Pt 1), 10516. American Psychiatric Association (1994). Diagnostic and Statistical Manual of Mental Disorders (Fourth Edition DSM-IV). Washington D.C: American Psychiatric Association. Argyle, M. (1994). Bodily Communication, second edition, London: Routledge. Argyle, M. & M. Cook. (1976). Gaze and Mutual Gaze. London: Cambridge University Press. Bachevalier, J. (1991). An Animal Model For Childhood Autism. In: C. A. Taminga & S. C. Schulz (Eds.), Advances in Neuropsychiatry and Psychopharmacology, Volume 1: Schizophrenia Research. New York: Raven Press. Baddeley, A. D. (1990). Human memory: Theory and practice. Oxford, Oxford University Press. Baddeley, A. D. (1986). Working memory. Oxford: Clarendon Press. Badner, J. & E. Gershon. (2002). Regional meta-analysis of published data supports linkage of autism with markers on chromosome 7. Molecular Psychiatry 7, pp. 5666. Bailey A., A. Le Couteur, I. Gottesman, P. Bolton, E. Simono, E. Yuzda & M. Rutter. (1995). Autism as a strongly genetic disorder: evidence from a British twin study. Psychological Medicine 25(1), 6377. Bailey, A., S. Palferman, L. Heavey, & A. Le Couteur. (1998). Autism: The phenotype in relatives. Journal of Autism and Developmental Disorders 2, 369392. Baron-Cohen, S. (1994). How To Build A Baby That Can Read Minds. Cahiers de Psychologie Cognitive 13, 513552. Baron-Cohen, S. (1995a). Mindblindness. Cambridge MA: MIT Press. Baron-Cohen, S. (2002). The extreme male brain theory of autism. Trends in Cognitive Sciences 6 (6), 248254. Baron-Cohen, S. (2003). The Essential Dierence: Men, Women and the Extreme Male Brain. London: Allen Lane, The Penguin Press.

Gaze aversion and emotional dysfunction in autism 297

Baron-Cohen, S., H. Tager-Flusberg & D. J. Cohen (Eds.) (2000). Understanding Other Minds: Perspectives From Developmental Cognitive Neuroscience. Oxford: Oxford University Press. Baron-Cohen, S., H. A. Ring, E. T. Bullmore, S. Wheelwright, C. Ashwin & S. C. R. Williams (2000). The Amygdala Theory Of Autism. Neuroscience & Biobehavioural Reviews 24, 355364. Bauman, M. & T. L. Kemper (1985). Histoanatomic Observations of the Brain in Early Infantile Autism. Neurology 35, 866874. Bemporad, J. R., J. J. Ratey & G. ODriscoll (1987). Autism and Emotion: An ethological theory. American Journal of Orthopsychiatry 57(4), 477485. Bettelheim, B. (1967). The Empty Fortress: Infantile autism and the birth of the self. New York: The Free Press. Bishop, D. V. M. (1989). Autism, Aspergers syndrome and semantic-pragmatic disorder: Where are the boundaries? British Journal of Disorders of Communication 24, 107121. Buck, R. (1984). The Communication of Emotion. Guildford Press. New York. Carey, D. P. (1996). Monkey see, monkey do cells. Current Biology, 6, 108788. Carlson, N. R. (1998). Physiology of Behaviour (6th Edition). Boston, MA: Allyn & Bacon. Celani, G., M. W. Battacchi & L. Arcidiacono (1999). The Understanding of the Emotional Meaning of Facial Expressions in People with Autism. Journal of Autism and Development Disorders 29(1), 5765. Chen, S. H. & V. Bernard-Opitz (1993). Comparison of personal and computer-assisted instruction for children with autism. Mental Retardation 31(6), 36876. Clark, A. (2000). Mindware: an introduction to the philosophy of cognitive science. Oxford: Oxford University Press. Critchley. H. D. & E. M. Daly, E. T. Bullmore, S. C. Williams, T. van Amelsvoort, D. M. Robertson, A. Rowe, M. Philips, G. McAlonan, P. Howlin & D.G. Murphy. (2000). The functional neuroanatomy of social behaviour: Changes in cerebral blood ow when people with autistic disorder process facial expressions. Brain 123(Pt 11), 220312. Damasio, A. R. (2000). A second chance foe emotion. In R. D. Lane & L. Nadel (Eds.), Cognitive neuroscience of emotion, pp. 1223. New York: Oxford University Press. Damasio, A. R. (1994). Descartes Error: Emotion, Reason, and the Human Brain. New York: Putnam. Darwin, C. (1872). The Expression of the Emotions in Man and Animals. London: Murray. (Cited in M. Dennis, L. Lockyer & A. L. Lazenby (2000). How High-Functioning Children With Autism Understand Real And Deceptive Emotion. Autism 4(4), 370381.) Dautenhahn, K., I. Werry T. Salter & R. te Boekhorst (2003). Towards Adaptive Autonomous robots in Autism Therapy. IEEE International Symposium on Computational Intelligence in Robotics and Automation (CIRA03), Kobe, Japan, July 2003. de Carvalho, L. A. V., N de C. Ferreira & A. Fiszman (1999). A neurocomputational model for autism. Proceedings of the IV Brazilian Conference on Neural Networks, IV Congresso Brasiliero de Redes Neurais, July 2022, 1999 ITA, So Jos dos campos SP Brazil, pp. 344349. Decety, J., T. Chaminade, J. Grzes & A. N. Meltzo (2002). A PET exploration of the neural machanism involved in reciprocal imitation. NeuroImage 15, 265272.

298 Sarah Bowman, Lisa Hinkley, Jim Barnes and Roger Lindsay

Dennis, M., L. Lockyer & A. L. Lazenby (2000). How High-Functioning Children With Autism Understand Real And Deceptive Emotion. Autism 4(4), 370381. Dickerson, P., J. Rae, P. Stribling, K. Dautenhahn, B. Ogden & I. Werry. (in press). Autistic childrens co-ordination of gaze and talk: re-examining the asocial autist. In P. Seedhouse & K. Richards, (Eds.), Applying Conversation Analysis. London: Palgrave Macmillan. Duncan, J. (1986). Disorganization of behaviour after frontal lobe damage. Cognitive Neuropsychology 3, 271290. Ekman, P. & W. V. Friessen (1978). Pictures of Facial Aect. Palo Alto, CA: California Consulting Psychological Press. El Ashegh, H. A. & R. Lindsay (this volume). Cognition and Body Image, pp. 175223. Fink, G., J. C. Marshall, P. W. Halligan, C. D. Frith, R. S. Frackowiak & R. J. Dolan (1997). Hemispheric specialization for global and local processing: the eect of stimulus category. Proceedings of the Royal Society of London, B Biological Sciences 264(1381), 48794. Fodor, J. A. (1983). The modularity of mind. Cambridge, MA: MIT Press. Frith, U. (1985). Recent Experiments On Autistic Childrens Cognitive And Social Skills. Communication 19, 1623. Frith, U. (2001), Mindblindness and the Brain in Autism. Neuron 32, 969979. Frith, U. & F. Happ (1994). Autism: beyond theory of mind. Cognition 50, 115132. Frye, D., P. D. Zelazo & T. Palfai (1995). Theory of mind and rule-based reasoning. Cognitive Development 10, 483527. Frye, D., P. D. Zelazo, P. J. Brooks & M. C. Samuels (1996). Inference and action in early causal reasoning. Developmental Psychology 32, 12031. Fuster, J. M. (1989). The prefrontal cortex, anatomy, physiology and neuropsychology of the frontal lobe. 2nd edition. New York: Raven Press. Gallese, V., L. Fadiga, L. Fogassi & G. Rizzolatti (1996). Action recognition in the premotor cortex. Brain 119, 593609. Gallese, V. & A. Goldman (1998). Mirror neurons and the simulation theory of mindreading. Trends in Cognitive Science 2 (12), 493502. Gillott, A., F. Furniss & A. Walter (2001). Anxiety in high-functioning children with autism. Autism 5, 277286. Gillingham, G. (2000). Autism: a new understanding! Solving the mystery of autism, Aspergers and PPP-NOS. Edmonton, Alberta: Tacit Publishing. Happ, F. (1996). Studying weak central coherence at low levels: children with autism do not succumb to visual illusions. A research note. Journal of Child Psychology and Psychiatry 37, 873877. Happ, F. (1999). Autism: Cognitive Decit or Cognitive Style? Trends in Cognitive Neurosciences 3(6), 216222. Hobson, R. P. (1986a). The Autistic Childs Appraisal of Expressions of Emotion. Journal of Child Psychology and Psychiatry 27, 671680. Hobson, R. P. (1986b). The Autistic Childs Appraisal of Expressions of Emotion: A Further Study. Journal of Autism and Development Disorders 17, 6379.

Gaze aversion and emotional dysfunction in autism 299

Hobson, R. P. & A. Lee (1989). Emotion-Related and Abstract Concepts in Autistic People: Evidence From the British Picture Scale. Journal of Autism and Development Disorders 19(4), 601623. Hobson, R. P., J. Ouston & A. Lee (1988). Whats in a Face? The Case of Autism. British Journal of Psychology 79, 441453. Hood, B. H., J. D. Willen & J. Driver (1998). Adults Eyes Trigger Shifts of Visual Attention in Human Infants. Psychological Science 9(2), 131134. Hughes, C., J. Russell & T. W. Robbins (1994). Evidence for executive dysfunction in autism. Neuropsychologia 32, 47792. Huttinger, P. (1996). Computer applications in programs for young children with disabilities: Recurring themes. Focus on Autism and Other Developmental Disabilities 11, 105124. Kanner, L. (1943). Autistic Disturbances of Aective Contact. Nervous Child, 2, 217250. Reprinted in L. Kanner (Ed.), Childhood Psychosis: Initial Studies and New Insights, Washington, D. C.: V. H. Winston, 1973. Also reprinted in A. M. Donnellan (Ed.) Classic Readings in Autism. New York: Teachers College Press, 1985. Kawashima, R., M. Sugiura, T. Kato, A. Nakamura, K. Hatano, K. Ito, H. Fukuda, S. Kojima & K. Nakamura (1999). The human Amygdala plays an important role in Gaze Monitoring: A PET Study. Brain 122, 217250. Keeley, B. L. (2002). Eye-gaze information processing theory: a case-study in primate cognitive neuroethology. In M. Beko, C. Allen & G. Burkhardt (Eds.), The Cognitive Animal: Empirical and Theoretical Perspectives on Animal Cognition Cambridge, MA: MIT Press. Web document version found at: http://bernard.pitzer.edu/~bkeeley/ WORK/PUBS/coganimal.pdf (accessed 15th June 2003). Kendon, A. (1967). Some Functions of Gaze Direction in Social Interaction. Acta Psychologica 32, 125. Kolb, B. & I. Q. Whishaw (2003). Fundamentals of Human Neuropsychology. (Fifth Edition). New York: Worth Publishers. Langdell, T. (1978). Recognition of Faces: An Approach to the Study of Autism. Journal of Child Psychology and Psychiatry 19, 225268. Le Doux, J. E. (2000). Cognitive-emotional interactions. In R. D. Lane & L. Nadel (Eds.), Cognitive neuroscience of emotion, pp. 12955. New York: Oxford University Press. Lenneberg, E. H. (1967). Biological foundation of language. New York: Wiley & Sons Lindsay, R. & B. Gorayska (2002). Relevance, goal management and cognitive technology. International Journal of Cognition and Technology 1(2), pp. 187232. Reprinted in this volume, pp. 63107. Loveland, K. A., B. Tunali-Kotoski, D. A. Pearson, K. A. Brelsfeld, J. Ortegon & R. Chen (1989). Imitation and Expression of Facial Aect in Autism. Development and Psychopathology 6, 43344. Luria A. R. (1966). Higher cortical functions in man. New York: Basic Books. Meenan, S. & Lindsay, R. (2002). Planning and the neurotechnology of social Behaviour. International Journal of Cognition and Technology 1(2), 23374. Moore, M. & S. Calvert (2000). Brief report: vocabulary acquisition for children with autism: teacher or computer instruction. Journal of Autism and Developmental Disorders 30(4), 35962.

300 Sarah Bowman, Lisa Hinkley, Jim Barnes and Roger Lindsay

Morris, J. S., C. D. Frith, D. I. Perrett, D. Rowland, A. W. Youngh & A. J. Calder (1996). A Dierential Neural Response in the Human Amygdala to Fearful and Happy Facial Expressions. Nature 383, 812815. Muris, P., P. Steerneman, H. Merckelbach, I. Holdrinet & C. Meesters (1998). Comorbid anxiety symptoms in children with pervasive developmental disorders. Journal of Anxiety Disorders 12, 387393. OConnor, N. & B. Hermelin (1967). The selective visual attention of autistic children. Journal of Child Psychology and Psychiatry 8, 16779. OLouglin C. & P. Thagard (2003). Autism and Coherence: a computational model. http:// cogsci.uwaterloo.ca/Articles/Pages/autism.pdf (accessed 14th June 2003). Ozono, S., B. F. Pennington & S. J. Rogers (1990). Are There Emotion Perception Decits in Young Autistic Children? Journal of Child Psychology and Psychiatry 31(3), 343361. Premack, D. & G. Woodru (1978). Does the chimpanzee have a theory of mind? Behavioural and Brain Sciences 1, 515526. Prior, M. R. & W. Homann (1990). Neuropsychological testing of autistic children through an Exploration with frontal lobe tests. Journal of Autism and Developmental Disorders 20, 581590. Ramachandran, V.S. (undated). Mirror neurons and imitation learning as the driving force behind the great leap forward in human evolution. Web document found at: http://www.edge.org/ 3rd_culture/ramachandran/ramachandran_p1.html (accessed 10th June, 2003). Raven, J. C. (1956). Standard Progressive Matrices. London: H. K. Lewis. Richer, J. M. & R. G. Coss (1976). Gaze Aversion in Autistic And Normal Children. Acta Psychiatrica Scandinavia 53, 193210. Rimland, R. (1964). Infantile Autism: The Syndrome and Its Implications for a Neural Theory of Behavior. New York: Appleton-Century-Crofts. Rizzolatti, G. & M. A. Arbib (1998). Language within our grasp. Trends in Neuroscience 21, 18894. Rizzolatti, G., L. Fadiga, L. Matelli, M. Bertinardi, E. Paulesu, D. Perani & F. Fazio (1996). Localisation of grasp representations in humans by PET: 1. Observation vs. execution. Experimental Brain Research 111, 24652. Robbins, T. W. (1996). Dissociating executive functions of the prefrontal cortex. [Review]. Philosophical Transactions of the Royal Society of London, Biological Sciences 351, 146371. Rogers, S. J. & B. F. Pennington (1991). A theoretical approach to the decits in Infantile Autism. Developmental Psychpathology 3, 13762. Rumsey, J. M. & S. D. Hamburger (1988). Neuropsychological ndings in high-functioning autistic men with infantile autism, residual state. Journal of Clinical and Experimental Neuropsychology 10, 201221. Rutter, M. & E. Schopler (1987). Autism and pervasive developmental disorders: concepts and diagnostic issues. Journal of Autism and Developmental Disorders 17, 159186. Shallice, T. (1998). From Neuropsychology to Mental Structure. Cambridge: CUP. Shallice, T. & P. Burgess (1996). The domain of supervisory processes and temporal organization of behaviour. Philosophical Transactions of the Royal Society of London, Biological Sciences 351, 140512.

Gaze aversion and emotional dysfunction in autism 301

Schlopler, E. (1985). Editorial: Convergence of learning disability, higher-level autism, and Aspergers syndrome. Journal of Autism and Developmental Disorders 15, 359. Tantam, D., L. Monahan, H. Nicolson, & J. Stirling (1989). Autistic Childrens Ability to Interpret Faces: A Research Note. Neuropsychologia 5, 757768. Thagard, P. & K. Verbeurgt (1998). Coherence as constraint satisfaction. Cognitive Science 22, 124. Vertegaal, R., R. Slagter, G. C. Van der Veer, and A. Nijholt (2000). Why Conversational Agents Should Catch the Eye. In Extended Abstracts of CHI 2000. The Hague, The Netherlands: ACM 2000, 257258. Volkmar, F. R. & L. C. Mayes (1990). Gaze behaviour in autism. Development and Psychpathology 2, 6169. Weekes, S. J. & R. P. Hobson (1987). The Salience of Facial Expression for Autistic Children. Journal of Child Psychology and Psychiatry 28, 137152. Williams, J. H. G., A. Whiten, T. Suddendorf & D. I. Perrett (in press). Imitation, mirror neurons and autism. Neuroscience and Biobehavioral Reviews. Wing, L. (1988). The continuum of autistic characteristics. In E. Schopler & G. B. Mesibov (Eds.), Diagnosis and Assessment in Autism, pp. 91110. New York: Plenum. Wing, L. & J. Gould (1979). Severe impairments of social interaction and associated abnormalities in children: Epidemiology and classication. Journal of Autism and Developmental Disorders 9, 1129. Wing, L. & D. Potter (2002). Mental Retardation and Developmental Disabilities Research Review 8(3), 5161.

Communicating sequential activities


An investigation into the modelling of collaborative action for system design
Marina Jirotka and Paul Lu
University of Oxford / Kings College London

1.

Introduction

Recently a number of critiques have emerged of Human-Computer Interaction (HCI) that have not only been directed at its primary focus the individual user of a computer system but also at its conceptual underpinnings, principally drawn from cognitive science. Thus, researchers have begun to consider broader topics of interest, such as how technology can and does support collaborative and communicative activities, and have suggested developments of the conceptual framework, such as social cognition, distributed cognition and cognitive technology. To diering degrees these developments, like HCI itself, have both theoretical and practical concerns, on the one hand trying to develop systematic analyses of technologies in use whilst on the other, endeavouring to inform the practices, procedures and methods for system design. This interweaving of description, analysis and prescription has been a longstanding concern, whether in the developing particular tools, techniques and guidelines for interface design, or by informing more general approaches to system development. It seems a perennial problem to develop prescriptions that are compatible with the underlying conceptions and concerns of the analytic orientation deployed. In this chapter, we will report on one such investigation. We will draw from a study that utilises one of the orientations employed by those concerned with the analysis of technologies in use the ethnomethodological orientation (Heath and Lu, 2000). Researchers seeking to draw from this orientation share many concerns with researchers in Cognitive Technology (Gorayska and Mey, 2002). For example, studies are principally concerned with naturalistic settings,

304 Marina Jirotka and Paul Lu

focussing on work activities in context, hence the number of workplace studies informed from this orientation (Lueg, 2002, reprinted in this volume, pp.225239; Kaushik et al., 2002). Studies are also concerned with analysing social interaction, communication and use of talk in context (Dautenhahn, 2002, reprinted in this volume, pp. 128152; Dascal, 2002 reprinted in this volume, pp. 3762). Ethnomethodological studies tend to explore the communicative and collaborative uses of technologies, not only within particular naturalistic settings but with respect to specic innovative collaborative technologies, hence their popularity within Computer-Supported Collaborative Work (CSCW). Researchers drawing from this orientation have undertaken ethnographic studies, drawing upon materials gathered in the eld, such as eld notes, audio and video recordings, to try and understand the complexities of the everyday actions through which technologies are made sense of, and how collaborative actions are coordinated through such systems (e.g., Heath and Lu, 2000). Drawing upon these understandings, there inevitably has also been a concern to develop consequences for the design and deployment of technologies, whether this is through simple implications for particular technologies or as general guidelines for system development (cf. Plowman et al., 1995). However, while being concerned that the critical features of the analytic orientation can be maintained, it is dicult to suggest methods that do not divert attention from the features that make that orientation distinctive. What seems to be characteristic of workplace studies are the details of everyday work and interaction that are revealed through them, particularly with respect to the ways in which these actions are situated, contingently produced, and shaped from moment-tomoment by the participants. However, it can be hard to direct attention to such details when cast as frameworks, guidelines and other techniques for the designer. Such methods can appear much like those suggested elsewhere for system design, methods that themselves have not met with undue success when deployed (cf. Bellotti, 1988). Researchers have thus sought to explore more in more systematic fashion the relationship between studies of technologies in use and the derived implications for design. By considering the relationship in terms of a communication between ethnographers and designers, attention can be paid to the dierent interests of those concerned. So, proposals have been made for presenting data and materials gathered in eldwork (Pycock et al., 1998), for representing and summarising the results of studies (Hughes et al., 2000) and for outlining patterns found in previous studies (Martin et al., 2001). In these endeavours there is a concern for representation of data, but also remaining sensitive to the

Communicating sequential activities 305

underlying analysis. One recurring resource for facilitating the communication of an analysis is through modelling. Designers may build models in order to increase understanding of a domain or a system that can then be read by others and later discarded. However, models can provide more than simply being a communication device; operations may be performed on models that provide resources for reasoning about the data being modelled. This ability to reason about a model suggests one way in which the work reported here is related to recent initiatives in Cognitive Technology. The models should form a resource for designers, supporting their activities. The models being artefacts and tools, in a similar way to which Dascal (2002, reprinted in this volume) considers language more generally, as a cognitive technology. More importantly perhaps, the work has common concerns to those of researchers in Cognitive Technology regarding the uses of everyday tools and objects by participants in their settings. These may be conventional artefacts such as paper documents, or more complex devices, like computer systems. In ethnographic studies in CSCW and Cognitive Technology, although there may be dierent ways of conceiving of the users activities, there is a common interest in investigating the skills and practices through which these objects are made sense of and produce recognisable actions in the domain (Gorayska and Mey 2002, reprinted in this volume). There is a third way in which this research has shared concerns with those in Cognitive Technology. Alongside studies of communicative and collaborative activities in a number of settings, researchers an CSCW have been involved (or studied) a range of technical interventions, where either systems have been deployed, prototype technologies have been experimented with or models, patterns or schema of system use have been formulated. These interventions have the eventual aim of either improving the kinds of technologies that could be used in everyday settings, or the methods and approaches through which those technologies are designed. However, they frequently serve another purpose. By undertaking them, researchers can develop their own understandings of human conduct (cf. Pfeier, 2002, reprinted in this volume). In the case at hand, in common with the concerns of Cognitive Technology, we are interested in exploring how through attempts at modelling communication and collaboration we might enhance our understanding of social conduct. We begin by drawing upon our previous research (Jirotka, 2000; Jirotka and Lu, 2001; Lu and Jirotka, 1998) and model aspects of social interaction where the structure of the modelling notation resonates with the purpose to which it is being put; namely, developing models of complex collaborative activities. In

306 Marina Jirotka and Paul Lu

order to do this we conduct an investigative exercise drawing upon a particular notation: Communicating Sequential Processes (Hoare, 1985) designed initially for describing concurrent systems and reasoning about parallel and concurrent activities. CSP, as a process language, provides powerful resources to model many of the features identied in previous studies of collaborative work (Lu and Jirotka, 1998; Jirotka, 2000). We discuss the appropriateness of this notation for presenting an analysis of situated, sequential and collaborative activities by drawing on illustrative examples from a workplace study undertaken in a nancial dealing room. We then go on to investigate how such an approach could provide tools for reasoning about the consequences of deploying technologies in the workplace. We discuss a potential technological intervention in the trading oor that has consequences for the ways in which information is communicated and distributed around the dealing room. We draw upon the models of trading activities built up previously, to speculate how they could be used to reason about the consequences of the deployment of this technology, and the possible dierent impacts on existing practices. We draw from this investigation to discuss the general implications for developing prescriptions for design from analysis of naturalistic materials.

2. Background: Developing ethnomethodological analyses and sequential analyses for system design Workplace Studies, drawing upon a wide range of orientations, are concerned with detailing the practices through which everyday work and interaction are accomplished (e.g., Engestrm and Middleton, 1996; Heath et al., 2000; Lu et al., 2000; Plowman et al., 1995). Due in part of their concerns with the situated use of technologies, both mundane and complex, and partly because of inuential critiques of HCI (e.g., Suchman, 1987), those workplace studies drawing on ethnomethodological orientation have seemed particular prominent in elds such as Computer-Supported Cooperative Work (CSCW) elds where there is a parallel interest in designing and developing new technologies (Harper, 2000; Button and Sharrock, 2000; Suchman, 2000). One recurring concern in these has been the collaborative resources that are utilised both to understand and produce intelligible actions and activities through everyday artefacts in the workplace. Although the artefacts that feature in these studies are varied, including whiteboards (Goodwin and Goodwin, 1996), electromechanical displays (Heath and Lu, 1992) and CCTV screens

Communicating sequential activities 307

(Lu et al., 2000), given the nature of many of the workplaces under investigation there inevitably has been a focus on the use of documents whether these held and presented on paper or electronically. Examples are the uses of paper ight strips in air trac control (Harper and Hughes, 1993), of worksheets in printing presses (Bowers and Button, 1995), of paper tickets in nancial trading rooms (Heath et al., 1994), of computer systems for telephone call takers (Whalen and Vinkhuyzen, 2000) and of scheduling systems to manage passenger transport (Lu and Heath, 2000). Many of the concerns surrounding these practices share much in common with studies within Cognitive Technology particularly those that focus on the communicative and interactional practices surrounding the uses of technology. For example, Goodwin and Goodwin (1996) explore the ways that whiteboards are used in control rooms as a shared communicative resource both to make sense of colleagues utterances and their other conduct. In a quite dierent domain, Greatbatch et al. (1993) reveal how the interaction between doctor and patient is co-ordinated with activities on a computer system. Whalen and Vinkhuyzen (2000) outline, in a number of call centre settings, the ways the information displayed on the computer screen is used as a resource within talk with the remote party on the phone, how text typed into the system is co-ordinated with the telephone contributions of the co-interactant and how procedures specied by the system are transformed for the practical circumstances that the participants face. What is distinctive about these studies is that they develop upon a social scientic analytic framework that emerged within the ethnomethodological orientation upon conversation and interaction analysis and explore the details of how talk, bodily conduct and the use of material resources accomplish social activities (Sacks, 1992; Atkinson and Heritage, 1984) In particular, they develop a sequential analysis of the coordination of talk that emerged from the initial studies by Sacks et al. (1974). Such analyses draw on participants own understandings how they emerge turn-by-turn and provide for a powerful and detailed analysis of the moment-to-moment production and coordination of collaborative activities. Conversation and interaction analyses draw extensively on the sequential analyses of talk and conduct. Through this sequential analysis, researchers develop how one turn of talk (which could be several utterances, a single utterance or merely a word or vocalisation) displays a participants understanding of a co-interactants prior utterance whilst also providing for its intelligibility to be displayed in the next. Turns therefore provide a public resource for displaying and producing understandings, available to both participants and analysts. Indeed, the organisations of turns in a sequence

308 Marina Jirotka and Paul Lu

requires participants (and therefore other analysts) to monitor their production at all points throughout their course (Sacks et al., 1974).1 As might be gathered, conversation and interaction analyses draw on naturalistic materials; audiorecordings of talk, for example; or in the case of workplace studies, more typically on video recordings of everyday conduct. These workplace studies delineate a conception of conduct in everyday settings, as collaborative, collaboration as being principally interactional, and interactions being produced and recognised through a sequential organisation. As with other workplace studies drawing on the ethnomethodological orientation there has been an interest in drawing upon these detailed analyses not only to develop understandings of technologies in use but also to consider the implications of the analyses for proposed technologies and systems. A critical issue for these eorts lies in considering ways of transforming the ethnographic material that preserve the richness of the ethnographic account whilst being sensitive to the practices of engineers and designers of proposed technologies. After all, if such analyses are to be useful within the design process, they too must be presented in ways that resonate with the practices of designers and engineers. With this regard, researchers have considered the questions that engineers and designers are likely to ask of ethnographers (Randall et al., 1994), quick and dirty methods for design (Hughes et al., 1994) and notational and representational approaches for presenting analyses (Hughes et al., 2000). One particular software engineering activity has been the focus of much interest that of modelling Throughout the development process, software engineers are encouraged to develop models of particular processes, mechanisms and solutions. These models are not only perceived as aids for communication between designers and others involved in the process, but also for clarication, as particular issues can be focused on, abstracted and unpacked. Though system developers may not explicitly use a formal or structural approach to system design, they frequently nd the need to focus on aspects of a problem, develop solutions that are generic to more than one case and utilise notations to communicate issues and potential solutions to others (Booch, 1991). Given the complex details of social conduct revealed in workplace studies, it is therefore not surprising that some researchers have proposed modelling aspects of the social and organisational context in which technologies are to be deployed. In this way it is hoped that it may be possible to communicate the details of ethnographic studies in forms and representations that are familiar to system designers. However, whilst models can be developed as such communication devices, they

Communicating sequential activities 309

also have the potential for reasoning about the data being modelled. Previous investigations into modelling the details of work practices have tended to focus on issues of representation as communication (Viller and Sommerville, 1999). This may be due to the complexity of the models developed, but it would seem to neglect one of the critical uses of the models that have been delineated. Nonetheless, the task of developing descriptive models of activities revealed by naturalistic analyses is fraught with diculties. Recent attempts have sought to provide frameworks for describing, representing and communicating the social and organisational aspects of work, the contingent and the tacit, the collaborative and the situated. By their very nature these are hard to delineate, formalise and represent. The kinds of activities seeking to be conveyed are produced through informal, interpretative and interactional practices. The challenge to modelling aspects of social conduct lies in the ability to represent the very details of collaboration, coordination and interaction that these workplace studies reveal. This may be particularly apparent if the principal purpose of the model is for communication with designers. It may be that modelling is not, in itself, incompatible with social scientic studies; even though previous attempts in HCI have been criticised for glossing the very phenomena revealed through the analysis (Button, 1990). However, if the analyses are to have a more signicant practical purpose, it may be that the models have to be more useful to designers. Instead of attempting to simplify the ethnographic information into models as communicative devices for system designers, perhaps we might view the development of models as resources for reasoning about technological intervention in social and collaborative settings. This chapter investigates the use of a modelling notation that provides such capabilities. In its structure Communicating Sequential Processes (CSP) (Hoare, 1985) incorporates powerful techniques for expressing issues raised by ethnomethodological studies, particularly those that detail the interactional and sequential production of collaborative activities. Sequentiality is an integral part of CSP. Inherent in its structure, CSP has powerful resources to describe sequential processes. This is not to say that other notations cannot describe or represent such aspects. However, when notational structures do not adequately support the description of relevant phenomena, it is often the case that additional structures must be articially constructed and these may obscure the phenomena that are being described. CSP seems to provide a more direct notation for particular aspects of collaborative conduct of concern to us here. CSP also allows the development of, and reasoning about, processes that interact with each other and the environment. The notation provides for

310 Marina Jirotka and Paul Lu

various mathematical models and reasoning methods that oer analysts resources to reason about possible states and the consequences of various combinations of states occurring. It takes as critical the notion of a process whose behaviour is described by a sequence of events it might use. A process may be put in parallel or be interleaved with other processes so that various combinations of events can evolve. Thus, CSP is concerned with sequentiality, with parallel and interleaved activities. It may be then that CSP could provide a way of elucidating particular aspects of social and organisational activities. In the remainder of this chapter we wish to investigate this possibility. We will draw upon illustrative examples from prior analyses of collaborative work practice in nancial trading rooms. We consider how aspects of activities-ininteraction may be characterised in CSP, what may be modelled, and what the consequences of doing so would be. Based on this preliminary investigation, we will discuss the potential for the development of such models as a resource for reasoning about technological intervention in workplace settings.

3. Modelling collaborative work practice The development of CSP involved a movement away from considering computer processes as linear and distinct from the environment around them, to systems where events can happen in parallel and interact with other events (Hoare, 1985). Thus, in concurrent systems whilst the various components are in (more or less) independent states (and) it is necessary to understand which combination of states can arise and the consequences of each (Roscoe, 1998, p.2). The relevance of this approach is perhaps most apparent when considering computer communications and distributed systems, where various processes can occur in dierent places and at dierent times. In these cases it is important to develop a model of concurrent processes in order to reason about the consistency between them and whether any potential communication breakdowns could occur.2 As CSP allows reasoning about processes that interact with one another and their environment, the most fundamental object in CSP is a communication event. Communication in CSP has been dened as a transaction or synchronisation between two or more processes rather than as necessarily being the transmission of data one way (Roscoe, 1998, pp. 89). Thus, communication events are considered as events or actions that can describe patterns of behaviour regarding objects in the world. The set of events viewed relevant in the description of an object is termed its alphabet. In CSP a process is the behaviour pattern of an object described by the set of events selected in its alphabet.

Communicating sequential activities

311

In the following, we consider the use of CSP notation with respect to the use of models of collaborative and communicative activities. For this we will draw on our previous analyses of a particular complex setting, nancial dealing rooms (Heath et al., 19945; Jirotka et al., 1993). These are highly sophisticated settings characterised as fast moving, highly complex and relying on complex collaborations between participants. Hence, they seem particularly challenging when considering the developments of models of collaborative activities.3 The sequential analyses of dierent domains of nancial trading reveal the highly complex and collaborative nature of dealing, characterised by very frequent (often brief) deals undertaken in parallel and interrelated with other activities. Traders can be engaged in more than one activity at a time with various coparticipants, and can participate in these activities in various ways, from being a (counter-) party to a deal to simply overhearing that a deal has been struck by another dealer. It is also apparent that the ways activities are interleaved are complex. Rather than wait until a deal is agreed to record the necessary information (called deal capture), dealers frequently record items of the deal as they emerge in the interaction.4 For the purposes of this investigation we do not attempt to build up a complete model of the setting in CSP. Rather, we wish to examine the issues that emerge when considering modelling socio-interactional resources using this notation for the purposes of informing design. In this exercise, we shall use simple illustrative examples of fundamental concepts in CSP such as parallelism.5 3.1 Making a deal To begin, we consider a fragment of activity from the dealing room that reveals the critical features of a deal. Tom, a market maker, is a doing a deal with Miles, a salesman through an intercom system.
Miles: Tom: Miles: Tom: Miles: BEEP Hello Hello Miles I want to sell ve o seven ve Shell, ve o seven ve Shell for (Kleinwort) whatever you can do Tom Eh four eighty three and a half Thats very kind of you. Three one six. Thank you sir. BEEP

Very briey, in this fragment, Miles a salesman calls Tom to sell 5075 units of Shell stock on behalf of his client Kleinwort and he asks Tom to quote a price

312

Marina Jirotka and Paul Lu

(whatever you can do Tom). Tom provides a buying price (4.83 per share) which Miles accepts by giving Tom Kleinworts customer number (Thats very kind of you. Three one six. Thank you sir); an utterance that also ends the call. At this point we can see the key features of a deal: the buyer (Tom); the seller (Miles on behalf of Kleinwort); the price (4.83); the shares being traded (Shell); the number of shares (5075) and the customer number (316). These six components of a deal, the stock, amount of stock, price, buyer, seller, and the customer number are the basic features of interest when trading. It is clear from the fragment that dealers in the course of the communication have agreed a deal. However, it is unclear exactly when the deal, or the components of the deal, are agreed, and if such distinctions in timing are of consequence to the participants. It is critical, however, that the components are conveyed during its course and on termination these, and the deal, come to be agreed. In CSP we could dene agreeing a deal in terms of two independent processes: one for agreeing a price (PRICE) and one for agreeing the amount (AMOUNT) of stock to be dealt.
PRICE = agreePrice agreeDeal STOP AMOUNT = agreeAmount agreeDeal STOP

Here agreePrice and agreeDeal are events in the alphabet of the PRICE process and agreeAmount and agreeDeal are in the alphabet of AMOUNT. Processes can participate in events in sequence (e.g., agreePrice and then agreeDeal) and can then be put together in parallel (denoted by the operator ||). When two processes are brought together to run concurrently, events that are in both their alphabets require the simultaneous participation of both processes. However, each process also can evolve its own events independently. So, if we now put PRICE and AMOUNT in parallel:
PRICE || AMOUNT

we can see that the two processes are synchronised around agreeDeal. As these processes have been put in parallel, we cannot agreeDeal until we agreePrice and agreeAmount. The notation allows us to preserve the indeterminacy of the ordering of these two activities. As detailed in the workplace study, it is important for dealers to be able to agree an amount of stock to trade and to agree the price, but the ordering of these two events may be indeterminate. CSP also provides traces that are records of events, that is the sequence of events that occur so we can keep a record of communication between the environment and the process. Thus, traces of the above might be

Communicating sequential activities

313

agreePrice, agreeAmount, agreeDeal agreeAmount, agreePrice, agreeDeal

We can further extend the complexity by dening a dealing process in terms of PRICE, AMOUNT and other processes as:
DEAL = PRICE || AMOUNT || STOCK || BUYORSELL || etc.

where each of the processes PRICE, AMOUNT, STOCK, BUYORSELL, may be synchronised around agreeDeal, however, it is not specied in what order they are accomplished. The deal itself may become even more complex when we also consider how the details of the transaction are recorded. Observations of deal production and real-time record-keeping reveal that dealers do not always wait until the deal has been agreed before recording items. Rather, the ways in which components are written down may be tied to the order in which they are produced in the talk. Dealers may record the components of a deal as each item emerges in the talk. However, dealers may also wait until three or four deals have been done before recording items, or they may record four or ve deals in an interleaved manner with the deals at dierent stages of completion. The relationship between recording a deal and agreeing a deal can be further elaborated in the price process outlined above.
PRICE = agreePrice (recordPrice agreeDeal STOP | agreeDeal recordPrice STOP)

Processes may be described as a single stream of behaviour, but they may also be inuenced by their interactions with the environment in which they are placed. Thus, CSP provides ways of describing processes that oer a choice of actions in their environment (denoted by the operator |). For example, in the example above we have further dened the price process as being a choice between recording the price and then agreeing a deal or of agreeing the deal followed by recording the price. CSP provides a modular approach to the composition of each of the components of a deal. There is a constant concern with the sequence or ordering of events; for example, when processes are put in parallel, it is of interest what events happen when, as processes are synchronised around particular events. Critically, the order of these events is not determinate. The notation also provides a means of describing processes that oer a choice of actions in the environment. These choice operators may generate a number of traces and combinations of events that can be used as a resource for reasoning

314 Marina Jirotka and Paul Lu

about the work setting. From the materials at hand we can see that the participants agree the components that make up the deal and the overall deal itself, but there is no specic point where agreement happens. Rather, it is constituted by the various actions of the participants. Fundamentally, we are not attempting to model all the details of the ethnographic analysis. Rather, we wish to convey the critical issues of interaction and collaboration so often featured in workplace studies. In our study of the dealing room we wanted to demonstrate that various activities are accomplished, but that the ordering in which they are done is not necessarily important. Though certain actions can be seen to accomplish an activity like agreeing a deal, the components may be either too hard or impossible to separate out from the accomplishment itself. What appears to be critical from workplace studies is the open and exible nature of certain activities; that they can be accomplished in dierent orders depending on the local circumstances at hand. Furthermore, that there may be many ways of producing various activities, through dierent utterances or indeed through other types of conduct. CSP allows us to represent in a parsimonious manner, issues of non-determinism, choice and parallelism as these are built into the notation itself. As we shall see later our interest in this notation lies not only in its ability to represent and describe activities in the workplace, but also the possibility of being able to reason about a range of possible outcomes. 3.2 Monitoring the local environment In explicating the details of workplace activities researchers have frequently noted the importance of monitoring or awareness practices in the domains under investigation. Whether these are in control rooms (Goodwin and Goodwin, 1996; Heath and Lu, 1992; Harper and Hughes, 1993), oces (Rounceeld et al., 1994) or factory oors (Button and Sharrock, 1997) these observations point to the importance of tacit, implicit and informal communicative practices in the achievement of everyday work. They also raise important issues regarding the interweaving of the private and the public, the individual and shared and the conceptions of each in CSCW and HCI, and potential technological developments to support activities and interactions in work domains. Such studies have revealed how sensitive participants are to the conduct of others, how participants can design their own actions to be monitored and how even what appear to be crude communicative activities can be tuned to the activities in the local environment.

Communicating sequential activities

315

Some sense of these ndings can be observed in the trading room. As might be expected this is a noisy environment where each trader may be undertaking numerous interlinked activities at any one time. Whilst undertaking their own activities traders also have to be aware of what their colleagues are doing. Frequently, traders will also shout outloud various oers, bids or other news for their colleagues. These outlouds are usually, simple, single utterances shouted into the general milieu, and not appearing to require any particular response from a colleague. They are part of the reason that trading rooms are characterised as noisy and rowdy places. A trader aware of these, whilst dealing with his or own activities, like the multiple deals carried out on several phones at once, is known as having a third ear which is seen as critical for undertaking competent trading. In the following fragment, activities in the trading room are particularly hectic. Tom and Richard are each concluding deals of their own on the phone. Stacey, another trader sitting someway distant is also on the phone, she turns and shouts out information about the call concerning Hanson shares.
Stacey: Tom: Richard: Richard: Stacey: Richard: Tom: Hanson twenty of an eighth (forty by fteen) Shearson on the bid Do you want to hit them? Um Yes Whos that? Rene? Yes Do we want to sell forty We want to sell forty ve dont we? Give them the lot

Staceys outloud that Shearson, another market maker in the City, is on the bid, announces that they are buying Hanson stock. As she is announcing this Richard is closing up a conversation on the telephone and looking at the numerous screens in front of him. Tom has also been dealing, but as Staceys shout is being completed, he turns momentarily and slightly towards Richard and asks whether they should hit them (i.e., sell their Hanson shares to Shearson). After agreeing with Tom, Richard then turns in the direction of Stacey and shouts to nd out more precisely who is at the other end of her phone conversation. Tom and Richard then discuss, briey, how many shares they should sell. What appears a crude broadcast of some emerging information by Stacey actually invokes a series of delicately produced activities and reveals the ways in which the participants are sensitive to the conduct of their colleagues. For example, Toms question presupposes that Richard has both heard the utterance and may

316 Marina Jirotka and Paul Lu

be prepared to collaborate in selling the stock. Despite Richard making no indication that he has heard Staceys utterance, the utterance initiates collaboration with him concerning the selling of stock. Richard seems to make sense of the question unproblematically, indeed in later talk with Tom, he displays he has indeed heard Staceys utterance and made sense of it. A short time later a substantial amount of stock is sold. Outlouds like Staceys in this example whilst, at rst, seeming potentially disruptive to others in the room are sensitively designed. Although the initial utterance is shouted, this does not mean that it is necessarily insensitive to the conduct of colleagues. Indeed, as has been mentioned elsewhere (Heath et al., 1993) the design of outlouds and other shouted utterances can be tailored to the demands of the local setting; they can be heard by a range of participants and by their very design they do not demand particular responses. They also provide for potential collaboration as Tom can presuppose that colleagues for whom the information may be relevant will have heard the outloud. So, whilst participants need to monitor the local environment of objects, such as screens, they also need to monitor the activities of their colleagues in relation to those objects. Outlouds though critically considered as public in nature can invoke other shared activities and those considered as individual or private. They can also be sensitively produced for the prospective concerns of potential hearers, recipients and other overhearers, with dierent participant statuses (cf. Goman, 1971) in the setting; the particular activities undertaken being contingent upon the circumstances at hand. CSP can oer resources for reecting these characteristics. CSP allows us to consider, for example by dening dierent processes, the other processes that can evolve if an outloud event occurs. Events, in common with alphabets of more than one process, can be shared and thus, made public. The model can thus be used to consider what might happen after an activity has occurred, but also what can be seen to be made relevant by other participants who may then undertake possibly dierent next actions. We shall not go into detail specifying particular processes and events, but rather give a general outline of how CSP can be used to model the dierent choices that may be made within the dealer processes. For example, one dealer (i.e., a Dealer1) may initiate a phone call as a result of hearing the information in the outloud. A second dealer (Dealer2) may check his position in a particular stock. Diagrammatically, this might be represented as follows:

Communicating sequential activities

317

Dealer 1 initiateTelephoneCall Outloud Hanson twenty of an eighth (forty by fifteen) Shearson on the bid Dealer 2 checkAmount

The following is a gloss of how the notation might be used for at an abstract level:
DEALER1 ( dealerOutloud, initiatePhoneCall, agreeDeal) DEALER2 ( dealerOutloud, checkAmount | reduceAmount) increaseAmount))

All such deal processes have the possibility to evolve if an outloud occurs; all having the opportunity to recognise the relevance of an outloud for them and initiate a trading process in a particular stock, but not necessarily. This suggests ways of considering the public and private nature of activities in the workplace. Events, such as outlouds, in common to the alphabets of more than one process can be shared, and thus be considered publicly available. Where processes do not share events, those events can be hidden from other processes, and therefore be considered private. More importantly, the various means through which activities are made public can be reected in the ways in which dierent processes evolve. CSP provides the mechanisms in its structure to describe collaboration and coordination that allow us to consider not just that an activity has occurred, but that it has been seen to be made relevant by other participants who may then undertake possibly dierent next activities. There is therefore a possibility of distinguishing how activities are made public in collaborative settings and the dierent ways that those activities engender activities by co-participants. Note that the critical issue is not the notation or the representational quality of the notation, but the mechanisms that underpin it. CSP is a concise notation and given its provenance, the notation provides for parsimonious representations of sequential processes with non-determinant orderings concerns that resonate with issues that emerge from detailed analyses of everyday communicative activities in workplaces. However, by allowing for execution and reasoning about traces, CSP allows for the possibility of providing more than a tool for communicating the ndings of workplace studies. CSP has the potential for

318

Marina Jirotka and Paul Lu

reasoning about transformations to work practices, particularly with respect to those that may arise following the introduction of a technology. This oers a potentially powerful resource for designers. If models of activities can be combined with parallel models of proposed technological solutions, manipulating the model can result in generating a number of dierent traces of events. Such a possibility will be sketched out in the next section where we will outline the potential use of a notation like CSP in the light of a proposed technical intervention. We will outline this with respect to an envisaged technical intervention one that might have consequences for the outlouds just considered. 3.3 A proposed technical intervention Often through discussions with organisational members, designers arrive at a set of technological options to support participants in the domain. For example, in this case, one could envisage the potential of novel traders to support communication and collaboration between two or more trading oors of the same bank (e.g., between London and New York oces). A proposal could consider supporting distributed outlouds in such a way that colleagues in oces could both produce them for others in the remote oces and hear them from other settings. Dierent technological solutions could support alternative ways of achieving this. For instance, as outlouds are produced from another setting, it may seem feasible to make announcements in one domain also public to the relevant participants in the other setting(s) say, through an automated announcement system. Such an intervention could transform outlouds so that they could be provided in audio, video, textual or graphical form, or some mixture of these, and could be automatically produced or initiated by others in a remote domain. Given such a scenario we could outline a set of issues that could be relevant for designers considering distributing outlouds. So, for example, our previous analysis (see also, Jirotka, 2000) raised a number of issues related to broadcasts in the co-present environment that would suggest that: Outlouds are coordinated with the activities of those who receive them. They are produced in relation to the activities of others in the dealing room, not just broadcast irrespective of what is happening in the setting; Recipients are sensitive to particular outlouds and not to others; Dealers are seen to be able to relate their conduct to the conduct of others, so the relationship between one traders activity to an outloud can be inferred by other members of the dealing room;

Communicating sequential activities 319

In their production, outlouds are coordinated with and related to other outlouds.

By interrogating the model, it may be possible to consider these dierent technological solutions in relation to aspects of producing and recognising outlouds and how these might be accomplished.

4. Technological interventions in the workplace As we do not expect to capture all possible conduct and information about the setting for the analysis, nor expect to oer a complete model of the work of the setting under investigation, the model produced could be quite simple. It is developed in order to check the understanding of both designers and ethnographers of the phenomena in this case, outlouds. Designers could then build dierent models of the various technological options to support the solutions. For example, a new CSP process may be dened that models the technologically assisted outloud, as in,
DISTRIBUTEDOUTLOUD

Where the process has an event pushButton in its alphabet, representing the pushing of a button that enables a dealer to distribute their outloud. In order to reason about the consequences of dierent combinations of processes, the designer should be able to evaluate what the consequences of dierent technological features might be for social conduct from combining the processes and examining the traces. An initial model of the trading room activities could be developed as a set of top level CSP processes in parallel, for example,
P = MARKETMAKER || DEAL || OUTLOUD || SALESMAN

Some of these processes reect the general activities of individuals in the setting e.g., market maker and sales person, or they reect objects or events occurring in the domain, e.g., deal. Other processes reect the socio-interactional resources, e.g., outlouds. We could then replace the existing outloud process with the new process for distributed outlouds
Q = MARKETMAKER || DEAL || DISTRIBUTEDOUTLOUD || SALESMAN

If T1 is a trace of P, then it should be straightforward to check whether T1 is also a trace of Q. If it is, and the relevant events have been modelled, then it would

320 Marina Jirotka and Paul Lu

appear that the technology to distribute outlouds would have little impact on the scenario represented by T1. However, it may be necessary to modify the other processes in order to allow the processes to evolve in a way that may not be identical to T1 but are similar. Thus, in our example of the restrictive automated technology for distributing outlouds, if the DISTRIBUTEDOUTLOUD process has an event pushButton in its alphabet, the designer could follow the trace of the process up to the point at which the process could not evolve in the desired direction without the button on the system being pressed. At least one of the processes modelling a dealer will also need to participate in that pushButton event. The designer will need to modify the behaviour of the dealer process to get the trace to achieve the same goal. Thus, it would seem that the technological features proposed for distributing outlouds in this case may have a specic impact on the domain. One example of this would be the simple inference that dealers may be required to do at least one extra activity, pressing a button, in order to do an outloud. Drawing on the details of how outlouds are achieved in the current setting, designers can consider how an individual could coordinate the production of the outloud with activities in the remote setting. When deliberating the technological features needed to support these broadcasts, designers will need to consider what is required at the recipients site to make the outloud relevant and apparent. Furthermore, they may need to determine what other features are necessary and relevant if the outloud is not produced through audio, and how these features need to be supported. For example, if a technology was considered that required constraints on when a dealer can make a broadcast to a remote domain; it may be necessary to consider how such capabilities are provided, how they are accessed and coordinated with other activities (even when considering such simple methods as using a lighted button to coordinate when an outloud could be made). Thus, drawing upon the models, designers could consider what aspects of this outloud are permanent or transitory, and what other items the outloud might coordinate or overlap with for example, noise or talk in the dealing room or other visual items on a screen. In this case it would be important to consider the dierent ways in which the outloud may be made public and how it may be produced in parallel with other activities. Features of the technological options may be modelled where designers consider what the CSP processes and corresponding traces might be. In the remote trading example, when combining models of technological features with social conduct, designers may consider, for instance, what might happen when

Communicating sequential activities

321

a remote trader is trying to operate a particular system so that a reasonable set of activities occur. Designers might examine the dierent ways information from outlouds can be made publicly accessible, and the dierent sets of resources that should be present. To do this, they might consider the model of outlouds as they are achieved at present, in relation to dealing and other activities in the dealing room. At this point, designers may need to reect upon what components, if any, of outlouds are necessary to support, what their granularity is, and how they are voiced and parsed. In CSP this is done in relation to modelling the processes of dierent types of technological support and dierent media. The traces that are produced when manipulating a CSP model allow for exploration of the consequences of the dierent combinations of processes. The analysis in the above example just looks at one evolution path of the process. But there are also opportunities to analyse processes for deadlock that would indicate a problem with design that might also be resolved by modifying other processes. In addition, other traces may be produced that have not been drawn from the observed setting and may be interesting for analysis. Designers assessing these traces may decide that the set of events produced do not make sense and therefore the process model may be incorrect. Alternatively, a set of events may never have been observed in the setting during the study, but nonetheless suggest concerns for design and therefore need to be considered further. For this, video fragments collected in the study could support the more conventional use of scenarios for design. Of course, it may be that issues raised may not be addressed in the initial ethnographic investigation. Modeling could then suggest foci for further analysis or even further data collection. In this way the modeling and the social scientic study can be iterative and mutually informing. Moreover, with the traces generated through CSP models can be resource for analysis suggesting various orderings of activities that have not been considered or more unusual sequences of events may arise that need further investigation.

5. Discussion Though the examples given here are undoubtedly brief and merely illustrate some characteristics of CSP, they suggest how the notation provides resources for reecting on the nature of communicative and collaborative activities. We have also considered how the notation could provide resources for conveying:

322 Marina Jirotka and Paul Lu

how the ordering of activities undertaken by individuals can be left open and exible; how particular actions serve to coordinate actions with those of others; the ways in which the activities of co-participants are interrelated, for example: how visible actions produced by a participant, but not explicitly directed at another, can be utilised by others; how a single action can accomplish multiple interactional activities; and how activities emerge through collaboration. CSPs consideration of parallel processes, private and public events, and the potential for outcomes to be non-deterministic provide powerful resources for reecting the contingent, emergent and sequential nature of collaborative activities. This is not to suggest that other notations cannot represent certain kinds of social behaviour. It is certainly possible to characterise aspects of achievements such as mutual awareness in terms of objects and relationships between them (cf. Viller and Sommerville, 1999). It is also possible to provide frameworks for providing access to more detailed materials (Hughes et al., 2000) and patterns that suggest recurrent features found in a number of settings (Martin et al., 2001). However, such notations may not be the most appropriate for modelling the contingent nature of workplace conduct. Not only do the notations often imply a clear delineation of activities that is not suggested in the analysis, their structure and form tends to foreground an association of activities to individuals. Indeed the notations themselves seem to unduly focus attention on the stable features of a domain (or across domains), This seems in marked contrast to the workplace activities that these notations were trying to convey. These, drawing on an ethnomethodological orientation, bring to light how activities emerge, transform throughout their course and where certain features may be left ambiguous for practical purposes. It seems important to convey these exible, open, contingent and situated practices to designers if the technologies they develop are to be sensitive to the settings in which they are deployed. The notations adopted may indeed obscure the very aspects they are intending to convey. Attempting to model communicative action in terms of single channels or linear relationships between co-participants, as between speaker and hearer or writer and reader, for example, may hide from view the interactive and collaborative production of the interaction, from momentto-moment. With its concern with concurrent activities, CSP does appear to oer resources for conveying critical features of collaborative and interactional workplace activities. Meanwhile, it is also possible to utilise CSP models that still, as in the ordering of particular activities, leave features indeterminate, ambiguous or ambivalent, but which are necessary aspects of many activities in every settings.

Communicating sequential activities 323

Of course, this is not to say that it is not possible to convey such features in other notations. Complex models of collaborative work could be developed using object-oriented and algebraic notations, but in doing so, it is important to consider the motivations behind the modelling exercise in the rst place. If the modelling exercise is merely intended to convey information to system developers it is unclear why a formal (or semi-formal) notation is necessary, particularly as natural language is quite an eective way of revealing the ambiguous, complex and rich nature of everyday conduct. Even if there are pragmatic reasons for adopting a particular notation, or that it results in parsimonious descriptions, its use may not ensure that the most appropriate aspects of hard problems are conveyed. Indeed, it may be that if the objective of the model is to be a communication tool then other resources may be appropriate for this. An analysis of requirements supported by instances from audio-visual recordings captured in the eld may both serve to highlight critical issues of relevance to design, but also provide ways of conveying the complexity and richness of the work practices under consideration (Jirotka et al., 1993). Hence, it would be hoped that through adopting a particular notation more will be possible than simply building a communication device, that, for example, provides some resources for considering the implications of a particular domain or requirements analysis within the system development process. One shortcoming of using semi-formal notations is that they provide few resources for considering the consequences of the model developed. If the system analyst wishes to explore systematically the interrelationships between the components of the model, this either has to be accomplished through detailed inspection or by building appropriate tools for analysis. Nor is this necessarily straightforward with formal notations. It is possible using a variety of formal notations to dene data types and operators that represent certain aspects of naturally-occurring conduct, and these can be used to outline particular objects, activities and their interrelationships of interest. These could provide representations of ne details of interaction and collaboration (Jirotka, 2000). However, it is not necessarily the case that the structure of the modelling notation will resonate with the purposes to which it is put. In order to reason about a model, it is necessary to allow designers to manipulate what has been dened in the model, to explore the consequences of these denitions. CSP not only appears to provide an appropriate structure for reecting concurrent and collaborative activities, but also a means of reasoning about them. As it is primarily concerned with the representation of concurrent activities, there are a range of resources embedded within CSP for describing

324 Marina Jirotka and Paul Lu

coordinated action. Moreover, the traces inherent in CSP are not merely records of the process, but resources for reasoning about the processes dened. The traces of a CSP model allow for the exploration of the consequences of dierent combinations of processes. The ability to reason with a model suggests a way in which the modelling activity could provide more than a means of communicating between ethnographers and designers. Undoubtedly there are problems with trying to manage complex descriptions of workplace activities for the purposes of design, particularly given the dierent analytic resources and practical motivations of the dierent parties (cf. Randall et al., 1994). It may be that tools and techniques to bridge the gap may indeed assist the communication of analyses of complex activities for the purposes of design, but it would be a shame if in assisting the conveyance these approaches obscured the very thing they were intended to convey. Considering the problem of the relationship between the work of social scientists and computer system developers as one of communication may also overlook the possibility for greater collaboration between the two disciplines. If a modelling technique can facilitate the consideration of particular aspects of the setting, without undue work being engaged in dening additional notation and tools, then it may be that the modelling exercise would not just be a resource for communication to others in the development process. Modelling might provide a way to explore the consequences of an emerging analysis early on in the system design process so that through it particular aspects could be considered in more detail in the eldwork. A preliminary analysis of the collaborative conduct of participants, how they are aware of each others activities, how a particular tool or artefact gures in an activity or how an activity is initiated, left open or closed, as examples, could provide the resources for an initial model which, when analysed, could raise issues for further investigation and analysis. Thus, the approach might be one where, in the context of business and organisational concerns, modelling and ethnographic analysis are interwoven; the modelling suggesting areas for further analysis and the analysis rening the process models. This iteration could continue to the point at which designers in conjunction with analysts and organisational members discuss dierent technological design options, model features of these and then reason about the consequences in relation to the video analysis. The traces that are produced when manipulating a CSP model allow for exploration of the consequences of the dierent combinations of processes. There are opportunities to analyse processes

Communicating sequential activities 325

for deadlock that would indicate a problem with design that might also be resolved by modifying other processes. In addition, other traces may be produced that have not been drawn from the observed setting yet may be interesting for analysis. Though an analysis could go into very ne detail, this may not be necessary. Designers may be able to model quite complex features of social interaction, but this may not be necessary for considering the consequences of dierent technological choices. For certain cases more detailed analysis may be needed, where there is an interest in examining the sequencing of activities or how one activity relates to another, for example. In some ways, what is being suggested here has parallels in certain engineering elds. Models are not developed to represent a problem or domain in all its details but to focus attention on particular issues, to make apparent the questions that have to be addressed and as a resource for discussing possible solutions. In such cases, it is not just that the model can accurately represent the problem but that it can be used, manipulated and design inferences can be made from it.

6. Summary Modelling may be one way of bridging the gap between those analysing the nature of workplace activities through social scientic studies and those trying to develop appropriate and relevant technologies for such domains. As the studies of social and organisational conduct tend to be rich and detailed, models of various kinds could provide a way of communicating these details to system developers in the early stages of system design. Unfortunately, social scientic studies have revealed the open, ambiguous, tacit and interactional nature of social and collaborative activities, which perhaps inevitably prove dicult to model in formal and semi-formal notations. Moreover, modelling organisational activities may be particularly problematic when the frameworks used to model them are too rigid and inexible to account for the emergent and situated nature of social and organisational conduct. It may therefore be worth reconsidering the role of, and the resources for, modelling when seeking to utilise studies of social practice for system design. For example, there may be alternative ways of clarifying, communicating and presenting ethnographic analyses for requirements and design. The analyses themselves, particularly when supported by suitable materials, such as video recordings, may be the most suitable way of communicating what is necessary for dierent participants at dierent times in the development process.

326 Marina Jirotka and Paul Lu

It has been a recurrent concern in HCI, CSCW and Cognitive Technology not to solely communicate ndings of a study, but also to have a more proactive role in the design process, commonly expressed as moving from description to prescription. This may be in terms of trying to make inferences about how dierent technologies impact on dierent users, requirements for new technologies can be derived, or potential design solutions can be assessed. For other analytic orientations (Barnard, 1991) this has been a longstanding concern and raises fundamental problems regarding the nature of making predictions from one discipline with consequences for another. In CSCW as in HCI and Cognitive Technology it has been recognised that considering the contribution of social and cognitive sciences as producing just a corpus of ndings for design may not be the most appropriate ways in which these disciplines can be useful. Nevertheless, workplace studies, as a number of other studies of technology in use, are frequently undertaken within pragmatic development processes; as part of a requirements process, as precursors to design, or to assess possible technological options. It therefore is a recurring issue to draw out implications from the empirical studies for certain design. What seems to be required is a way of both systematically drawing out implications whilst remaining sensitive to the analytic orientations of the studies. In this chapter, we have provided a sketch of how this process might be accomplished through modelling, in the particular case of ethnographic studies. It has been suggested that the value of these studies for the design process lies in drawing out the details of complexities of the setting, particularly the informal, social and collaborative nature of work practices. However, it may not be most appropriate to merely try and model these details. What seems critical is to reect the sensitivities that underpin the analyses. In the case where the analysis explores the coordination of collaborative and communicative activities, it may be most relevant to use a modelling notation that itself is concerned with the coordination and sequencing of events and processes. Modelling, however, may be more critical if, given appropriate tools and notations, it serves to support the analysis of requirements, raising critical issues not just for design but also for social scientic enquiry. Providing ways of reasoning with the models developed, although drawing on social scientic analysis, may serve as examples or contrasts to, attempts at developing cognitive technologies. In either case, the use of such models would require a more ne-grained collaboration between social and computer scientists, where the problems are not considered merely in terms of how best to convey information from the former to the latter, or indeed how an understanding of everyday work

Communicating sequential activities 327

and interaction can inform system design, but also how the requirements for developing technology can support the investigation of social interaction and work practice. Taking the concept of sequence as a primary resource may go some way to increasing our understanding of technology in action.

Notes
1. It should be re-iterated that conversation and interaction analysis have emerged in the social sciences and so researchers drawing from them are concerned with social actions and activities. Hence, researchers are interested in understanding, intelligibility, recognition and agreement as social actions, rather than, say. cognitive processes. They are interested in how they are made public, displayed and are socially witnessable. 2. Perhaps the most familiar example is when considering deadlock, where no particular component of a system can make progress because each is waiting for a next item of information from the other processes. Livelock is a complementary problem, where a process continues indenitely because it cannot be interrupted by any external process. CSP provides a notation to represent such processes, examine them, and reasoned about them. 3. They are also interesting as they have been undergoing major technological transformation over the last decade and as such are rich environments to investigate the design of technology and work practice. Indeed, our study was conducted as part of research with a major telecommunications company concerned with developing new technologies such as voice recognition systems for trading rooms. 4. These recording practices have consequences for technologies developed to support deal capture. We have been documented these elsewhere (Jirotka, 2000; Jirotka et al., 1993). 5. There have been a number of previous eorts to model and represent sequential and interactional resources for system design and specication (Cawsey, 1990; Finkelstein and Fuks, H. 1990; Gilbert et al. 1990; Frohlich and Lu, 1990). These have been particularly interested in modelling sequential features revealed by analyses of naturally occurring turns of talk. These eorts have typically used conventional declarative, and linear, formalisms. Further details of this and other modelling exercises can be found in Jirotka (2000).

References
Atkinson J. M. & J. C. Heritage (Eds.) (1984). Structures of Social Action: Studies in Conversation Analysis. Cambridge: Cambridge University Press. Barnard, P. (1991). Bridging Between Basic Theories and the Artifacts of Human-computer Interaction, in Designing Interaction: Psychology at the Human-Computer Interface, Carroll, J. M. (Eds.), pp. 10327. Cambridge: Cambridge University Press.

328 Marina Jirotka and Paul Lu

Bellotti, V. (1988). Implications of Current Design Practice for the Use of HCI Techniques. In D. M. Jones & R. Winder (Eds.), People and Computers IV: Proceedings of the Fourth Conference of the BCS Specialist Group, pp.1334. Cambridge: Cambridge University Press. Booch, G. (1991). Software Engineering Economics. Englewood Clis, N. J.: Prentice Hall. Bowers, J. & G. Button (1995). Workow from Within and Without: Technology and Cooperative Work on the Print Industry Shop Floor. In ECSCW95, pp.5166. Stockholm, Sweden. London: Kluwer Academic Publishers. Button, G. (1990). Going up a Blind Alley: conating Conversation Analysis and Computational Modelling. In P. Lu, G. N. Gilbert & D. M. Frohlich (Eds.), Computers and Conversation, pp. 6790. London and New York: Academic Press. Button, G. & W. Sharrock (1997). The Production of Order and the Order of Production: Possibilities for Distributed Organisations, Work and Technology in the Print Industry. In ECSCW 97, Lancaster. Kluwer Academic Publishers. Button, G. & W. Sharrock (2000). Design by Problem Solving. Workplace Studies: Recovering Work Practice and Informing System Design. In P. Lu, J. Hindmarsh & C. Heath, Workplace Studies: Recovering Work Practice and Informing System Design, pp. 4667. Cambridge: Cambridge University Press. Cawsey, A. (1990). A Computational Model of Explanatory Discourse: local interactions in a plan-based explanation. In P. Lu, G. N. Gilbert & D. M. Frohlich (Eds.), Computers and Conversation, pp. 223236. London and New York: Academic Press. Engestrm, Y. & D. Middleton (Eds.) (1996). Cognition and Communication at Work. Cambridge: Cambridge University Press. Finkelstein, A. & H. Fuks (1990). Conversation Analysis and Specication. In P. Lu, G. N. Gilbert & D. M. Frohlich (Eds.), Computers and Conversation, pp. 175185. London and New York: Academic Press. Frohlich, D. M. & P. Lu (1990). Applying the Technology of Conversation to the Technology for Conversation. In P. Lu, G. N. Gilbert & D. M. Frohlich (Eds.), Computers and Conversation, pp. 189222. London and New York: Academic Press. Gilbert, G. N., R. Woott & N. Fraser (1990). Organising Computer Talk. In P. Lu, G. N. Gilbert & D. M. Frohlich (Eds.), Computers and Conversation, pp. 237260. London and New York: Academic Press. Goman, E. (1971). Relations in Public. Harmondsworth: Penguin. Goodwin, C. & M. H. Goodwin (1996). Seeing as a Situated Activity: Formulating Planes. In Y. Engestrm & D. Middleton (Eds.), Cognition and Communication at Work, pp. 6195. Cambridge: Cambridge University Press. Gorayska, B. & J. L. Mey (2002). Introduction: Pragmatics of Technology, International Journal of Cognitive Technology 1(1), 120. Greatbatch, D., P. Lu, C. Heath & P. Campion (1993). Interpersonal Communication and Human-Computer Interaction: an examination of the use of computers in medical consultations. Interacting With Computers 5(2), 193216. Harper, R. H. R. (2000). Analysing Work Practice and the Potential Role of New Technology at the International Monetary Fund: Some Remarks on the Role of Ethnomethodology. In P. Lu, J. Hindmarsh & C. Heath (Eds), Workplace Studies: Recovering Work Practice and Informing System Design, pp. 169186. Cambridge: Cambridge University Press.

Communicating sequential activities 329

Harper, R. & J. Hughes (1993). What a f-ing system! Send em all to the same place and then expect us to stop em hitting: Making Technology Work in Air Trac Control. In G. Button (Ed.), Technology in Working Order, pp. 127144. London: Routledge. Harper, R., J. Hughes & D, Shapiro (1991). Working in Harmony: An Examination of Computer Technology and Air Trac Control. In J. Bowers & S. D. Benford (Eds.), Studies in Computer Supported Cooperative Work. Theory, Practice and Design, pp. 225234. Amsterdam: North-Holland. Heath, C. C., M. Jirotka, P. Lu & J. Hindmarsh (1993). Unpacking Collaboration: the Interactional Organisation of Trading in a City Dealing Room. In Proceedings of ECSCW 1993, Milan, September 13th17th, pp. 155170. London: Kluwer Academic Publishers. Heath, C. C., M. Jirotka, P. Lu & J. Hindmarsh, J. (19945). Unpacking Collaboration: the Interactional Organisation of Trading in a City Dealing Room. CSCW Journal 3(2), 147165. Heath, C. C., H. Knoblauch & P. Lu (2000). Technology and social interaction: the emergence of workplace studies. British Journal of Sociology 51(2), 299320. Heath, C. C. & P. Lu (1992). Collaboration and Control: Crisis Management and Multimedia Technology in London Underground Line Control Rooms. CSCW Journal 1(12), 6994. Heath, C. C. & P. Lu (2000). Technology in Action. Cambridge: Cambridge University Press. Hoare, C. A. R. (1985). Communicating Sequential Processes. Englewood Clis NJ: Prentice Hall. Hughes, J. A., V. King, T. Rodden & H. Andersen (1994). Moving out of the Control Room: Ethnography in System Design. In Proceedings of CSCW 94, Chapel Hill, North Carolina, Oct. 2226, pp. 42940. New York: ACM Press. Hughes, J., J. OBrien, T. Rodden & M., Rounceeld (2000). Ethnography, Communication and Support for Design. In P. Lu, J. Hindmarsh & C. Heath, Workplace Studies: Recovering Work Practice and Informing System Design, pp. 187214. Cambridge: Cambridge University Press. Jirotka, M. (2000). An Investigation into Contextual Approaches to Requirements Capture. Unpublished DPhil Thesis. University of Oxford. Jirotka, M. & P. Lu (2001). Representing and modeling collaborative practices for systems development. In C. Floyd, Y. Dittrich & R. Klischewski (Eds.), Social Thinking Software Practice, pp. 111140. Cambridge, MA: MIT Press. Jirotka, M., P. Lu & C. Heath (1993). Requirements for Technology in Complex Environments: Tasks and Interaction in a City Dealing Room. SIGOIS Bulletin (Special Issue) Do users get what they want? (DUG 93), 14(2December), 1723. Kaushik, R., Kline, S., David, P., and DArcy, J. O. (2002). Dierences between computermediated and face-to-face communication in a collaborative ction project, International Journal of Cognition and Technology 1(2), 30326. Lu, P., J. Hindmarsh & C. Heath (Eds.) (2000). Workplace Studies: Recovering Work Practice and Informing System Design. Cambridge: Cambridge University Press. Lu, P. & C. C. Heath (2000). The Collaborative Production of Computer Commands in Command and Control. International Journal of Human-Computer Studies 52, 669699. Lu, P. C. Heath & M. Jirotka (2000). Surveying the Scene: Technologies for Everyday Awareness and Monitoring in Control Rooms. Interacting With Computers 13, 193228.

330 Marina Jirotka and Paul Lu

Lu, P. & M. Jirotka (1998). Interactional Resources for the Support of Collaborative Activities: Common Problems for the Design of Technologies to Support Groups and Communities. In T. Ishida (Ed.), Community Computing and Support Systems: Social Interaction in Networked Communities, pp. 249266. Berlin: Springer Verlag. Martin, M., T. Rodden, M. Rounceled, I. Sommerville & S. Viller (2001). Finding patterns in the eldwork. In European Conference on Computer Supported Coopertaive Work, Germany. London: Kluwer Academic Publishers. Pfeier, R. (2002). Robots as Cognitive Tools. International Journal of Cognitive Technology 1(1), 125144. Plowman, L., Y. Rogers & M. Ramage (1995). What are Workplace Studies For? In Proceedings of ECSCW95, Stockholm, Sweden, 1014 September, pp. 309324. Pycock, J., K. Palfreyman, J. Allanson & G. Button (1998). Representing Fieldwork and Articulating Requirements through Virtual Reality. In Proceedings of CSCW 98, Seattle. New York: ACM Press. Randall, D., J. Hughes & D. Shapiro (1994). Steps towards a Partnership: Ethnography and System Design. In M. Jirotka & J. Goguen (Eds.), Requirements Engineering: Social and Technical Issues, pp. 241258. London: Academic Press. Roscoe, A. W. (1998). The Theory and Practice of Concurrency. Englewood Clis, NJ: Prentice Hall. Rounceeld, M., J.A. Hughes, T. Rodden & S. Viller (1994). Working with Constant Interruption: CSCW and the Small Oce. In Conference on Computer Supported Cooperative Work, Chapel Hill, North Carolina, USA, pp. 275286. New York: ACM Press. Sacks, H. (1992). Lectures in Conversation: Volumes I and II. Oxford: Blackwell. Sacks, H., E. A. Scheglo & G. Jeerson (1974). A simplest systematics for the organisation of turn-taking for conversation, Language 50(4), 696735. Sommerville, I., T. Rodden, P. Sawyer, R. Bentley & M. Twidale (1993). Incorporating Ethnography into Requirements Engineering. In Proceedings of RE93: International Symposium on Requirements Engineering, San Diego, Jan 46. Suchman, L. (1987). Plans and Situated Actions: The Problem of Human-Machie Communication. Cambridge: Cambridge University Press. Suchman, L. (2000). Making a Case: Knowledge and Routine Work in Document Production. In P. Lu, J. Hindmarsh & C. Heath (Eds.), Workplace Studies: Recovering Work Practice and Informing System Design, pp. 2945. Cambridge: Cambridge University Press. Viller, S. & I. Sommerville (1999). Coherence: an Approach to Representing Ethnographic Analyses in System Design. Human-Computer Interaction 14(12), 942. Whalen, J. & E. Vinkhuyzen (2000). Expert Systems in (Inter)Action: Diagnosing Document Machine Problems Over the Telephone In P. Lu, J. Hindmarsh & C. Heath (Eds.), Workplace Studies: Recovering Work Practice and Informing System Design, pp. 92140. Cambridge: Cambridge University Press.

Part III

Coda

The end of the Dreyfus aair


(Post)Heideggerian meditations on man, machine and meaning*
Syed Mustafa Ali
The Open University, England

1.

Introduction

According to Janney, an underlying assumption of Cognitive Technology [is] that computers can be regarded as tools for prosthetically extending the capacities of the human mind. (Janney, 1997, p. 1) On this view, Cognitive Technology is not concerned with the replication or replacement of human cognition arguably the central goal of strong articial intelligence but with the construction of cyborgs, that is, cybernetic organisms or man-machine hybrids, in which possibilities for human cognition are enhanced (Haraway, 1985). However, it may be necessary to reconsider this position in order to address what might be referred to as the Schizophrenia Problem associated with human-computer interaction. Janney describes the essence of this problem as follows: As a partner, the computer tends to resemble a schizophrenic suering from severe intrapsychic ataxia the psychiatric term for a radical separation of cognition from emotion. Its frame of reference, like that of the schizophrenic, is detached, rigid, and self-reexive. Interacting in accordance with the requirements of its programs, the computer, like the schizophrenic, forces us to empathize one-sidedly with it and communicate with it on its own terms. And the suspicion arises that the better we can do this, the more like it we become. (Janney, 1997, p. 1) Crucially, on his view, intrapsychic ataxia, is a built-in feature of computers. (Janney, 1997, p. 4) Notwithstanding the intrinsic nature of the Schizophrenia Problem, Janney remains optimistic about the possibility of its (at least partial) solution within cognitive technologies as is evidenced by his intent to encourage discussion about what can be done in Cognitive Technology to address the problems pointed out [emphasis added].

334 Syed Mustafa Ali

(Janney, 1997, p.1) As he goes on to state, an important future goal of Cognitive Technology will have to be to encourage the development of computer technology that reduces our need for psychic self-amputation. (Janney, 1997, p. 5) While concurring with Janney that a one-sided extension of the cognitive capacities of the human mind at the expense of the users emotional and motivational capacities is technological madness (Janney, 1997, p. 1), it is maintained that if the Schizophrenia Problem is to be solved by which is meant elimination and not mere reduction of the need for psychic self-amputation it will be necessary for Cognitive Technology to reconsider its position on the issue of replication of human cognition and emotion. Although eorts are underway in this direction, it is suggested herein that they are unlikely to prove ultimately successful. This is because the Schizophrenia problem can be shown to be intrinsically, if only partially, related to the hard problem of consciousness (Chalmers, 1996), that is, the problem of explaining how ontological subjectivity (or rst-person experience) can arise from an ontologically objective (or non-experiential) substrate. For example, Picard (1997) has argued that the problem of synthesizing emotion can largely be bracketed from the problem of explaining (and possibly synthesizing) consciousness. However, as she is careful to point out, consciousness and emotion, while not identical, are closely intertwined. While current scientic (specically, neurological) evidence lends support to the view that consciousness is not necessary for the occurrence of all emotions, Picard concedes that emotional experience appears to rely upon consciousness for its existence. (Picard, 1997, p. 73). If consciousness is necessary for emotional experience, then in order to solve the Schizophrenia Problem, cognitive technologies must rst solve the hard problem. This would seem to suggest that, contrary to one of the underlying assumptions of Cognitive Technology, replication of mind (cognition and emotion) arguably the central goal of AI (or Articial Intelligence) constitutes a necessary condition for IA (or Intelligence Augmentation). In this connection, it might be argued that the thought of the German phenomenologist Martin Heidegger (18891976) more specically, that aspect of his early thinking concerned with the being (or ontology) of human beings as interpreted by Hubert Dreyfus is highly relevant to Cognitive Technology in that it appears to suggest how the Schizophrenia Problem can be solved. According to Dreyfus (1991), Heidegger holds subjective experience to be grounded in, and thereby emergent from, a more primitive existential experience Dasein or being-in-the-world that is ontologically prior to subjectivity and objectivity. If Dreyfus Heidegger is correct, then the Schizophrenia Problem

The end of the Dreyfus aair 335

is solvable because the hard problem can be solved by constructing articial Daseins capable of generating consciousness as an emergent phenomenon. In this chapter, it will be argued that appealing to Heideggerian thought in the context of attempting to solve the Schizophrenia Problem associated with cognitive technologies is problematic on (at least) three counts: First, Dreyfus interpretation of Heidegger, or rather, technologists selective appropriation of Dreyfus interpretation of Heidegger, while (possibly) illuminating from a technological standpoint, can be shown to be distorting when viewed from the perspective of Heidegger scholarship. Crucially, this fact may be of more than merely academic signicance; second, Heideggers commitment to an empirical-realist conception of nature as intrinsically non-experiential can be shown to undermine the possibility of a Heideggerian emergentist solution to the hard problem; third, it is suggested that because the technical construction of articial systems in this instance, synthetic Daseins occurs under an implicit subject-object (or articer-artifact) orientation, the primitive components of such systems will necessarily stand in extrinsic (or external) relation to each other. This fact is of critical signicance since Heidegger holds that beings are relationally-constituted, thereby entailing a commitment to an ontology grounded in intrinsic (or internal) relationality. In closing, it will be argued that, since Heidegger cannot solve the hard problem, it is necessary to look elsewhere for a solution to the Schizophrenia Problem. In this connection, Whiteheadian panexperientialism seems promising in that it appears to solve the hard problem. However, this is at the price of a commitment to an ontology grounded in intrinsic (or internal) relationality which undermines the possibility for constructing articial Daseins capable of consciousness, thereby rendering the Schizophrenia Problem unsolvable.

2. The Dreyfus aair Determining the implications of Heideggers thought for Cognitive Technology is arguably as dicult a task as determining his standing in Western academic philosophy: On the one hand, Heidegger is (generally) regarded as an intellectual charlatan of consummate proportion (and extremely dubious moral standing) by members of the Anglo-American philosophical establishment; on the other hand, he is (largely) revered as a genuinely original thinker who has contributed both profusely and profoundly to the enrichment of Continental philosophy. Similarly, on the one hand, Heideggers later thought, in particular,

336 Syed Mustafa Ali

his assertion that the essence of technology is by no means anything technological (Heidegger, 1977, p. 4), has been regarded by anti-technologists as establishing grounds upon which to mount a universal critique of technology; on the other hand, certain Heideggerian insights have been embraced by technologists in an attempt at resolving intractable problems of long standing. Although the claim that Heidegger has contributed signicantly to the debate on the meaning and scope of technology is not, in itself, in question, determining the precise nature of his contribution(s) in the present context, the implications of his thought for the development and critical evaluation of cognitive technologies is problematic because there are many ways to interpret and appropriate his meditations on this issue by appealing to dierent aspects and phases of his phenomenological inquiry into being. In this connection, Dreyfus (1972) seminal critique of GOFAI (GoodOld-Fashioned-Articial-Intelligence), which makes extensive use of the existential analytic of Dasein (that is, the situated analysis of the onto-phenomenological structures of human being) presented in Heideggers Being and Time (Heidegger, 1962) in order to contest the suciency of disembodied, a-contextual, symbolic computation as a means by which to instantiate real yet synthetic intelligence, has played an important, perhaps even decisive, role in motivating practitioners to consider engaged, embedded, and non-representational approaches to computing grounded (at least partly) in Heideggerian thought. It is crucial to appreciate at the outset that Dreyfus approach to AI critique was philosophical and not technological, being driven by a desire to draw attention to the perceived failings of an extant technology. Dreyfus primary concern was not and, arguably, could not be, given his lack of technical expertise to develop technological solutions to the problems of AI; this task was left to the technologists among his later followers. Connectionist approaches to consciousness (Globus, 1995) and cognition (Clark, 1997), robotic approaches to articial life (Wheeler, 1996; Prem, 1997), and the (re)conceptualisation of the information systems paradigm in terms of communication rather t