Beruflich Dokumente
Kultur Dokumente
STEPHENM. KOSSLYN
AND
STEVENP. SHWARTZ
The Johns Hopkins University
Our simulation is a model of how people represent information in, and later
retrieve information from, visual .mental images. That is, whe'n retrieving some
fact, people sometimes report generating an image and inspecting it; for exam-
ple. if asked what shape are a beagle's ears, one may recall this information by
mentally picturing the dog's head, and "looking" at the ears. We wanted to
understand how such processes operate, and felt that a simulation approach was
useful for a number of reasons: First. it allows explicit modeling of a system of
complex processes. Second. it provides a "sufficiency proof," a demonstration
that one's model is in fact adequate to account for some range of data, if the
program runs as claimed. Third, we felt that by attempting to motivate construc-
tion of the model with actual data about human cognition, we would be con-
fronted by important empirical issues that may otherwise have been overlooked;
we also felt that such an approach would lead us to collect a set of data that would
build, that would accumulate to form a "big picture" (cf. Newell. 1973; see also
Kosslyn. in press).
*The present work was supported by NSF Grant BNS 76-16987 awarded to the fikt author.
Requests for reprints andlor copies of the running program should be addressed to Stephen M .
Kosslyn. 1236 William James Hall, Harvard University. 33 Kirkland St.. Cambridge. M A 02138.
The present description of the program was written in January, 1977; it is likely to become obsolete
as our work progresses. Our simulation also serves as a question-answerer, where information may
be looked up directly in, or deduced from, propositional files (when possible); this paper will not
discuss this aspect of the simulation (the deficiencies of which are glaring at present).
265
Plan of the Paper. The paper is-divided into three major sections. The first
a description of the kinds of data that motivated the basic choices made i
constructing our model; in this section two major topics are considered, th
significance of the spatial/pictorial properties of experienced images and th
origins of such images. The second section is a description of the model itsel:
we will first consider the basic data structures of the model, and then the basi
processes used in constructing, inspecting, and transforming images. In thi
section we will illustrate the operation of our sim'ulation by tracing through
. series of examples. Finally, the third,, concluding section is a discussion of th
special characteristics of our model and the usefulness of themodel as a guide fc
empirical research.
more quickly detected. And in fact, when subjects were asked to image an
,animal and attempt to find some named part, larger p a r t k v e n though they
were less highly associated-were affirmed more quickly than smaller but more
highly associated parts. It is interesting, to note that when imagery was not
required, ,and subjects were simply asked to respond as quickly as possible, then
association strength-not size-determined evaluation time (replicating previous
.findings). Thus, imagesdo not seem to be simply another case of general list
processing of the sort that might occur in comprehending and processing seman-
tic information (see Kosslyn, l976a).
Measuring the Visual Angle of the Mind's Eye. One of the defining features
of the experienced image, we claim, is that it seems to represent spatial extent in
a way analogous to percepts. If so, then it makes sense to ask how large an image
can be. That is, if the structure in which images occur is spatial, then it may be
meaningful to speak of it as having "edges." Perhaps images can only be some
certain size before overflowing..We tested this idea in the following way: Sub-
jects were asked to image some object (either named or shown) and to pretend
that they were walking toward it, which reportedly made it seem to loom larger.
If at some point the object seemed to overflow, was not "visible" in its entirety
in the image, the subject was to "stop" in his "mental walk." At this point he
was to indicate how far away the object seemed in his image (i.e., how far away
he would be from the actual object if it appeared at that subjective size). We did a
series of experiments using this basic methodology, and found that the larger the
imaged object, the further was the apparent distance at the point of overflow, just
as one would expect if the "image space" was of limited extent. When subjects
imaged large animals and simply estimated distance in feet and inches, we
sometimes found that distance did not seem to increase linearly, however, but
increased more with larger animals (approximately a log function). For stimuli
that did not differ in size,so greatly (but still varied considerably), we found that a
constant angle was occluded by objects--regardless of their size-at the point of
overflow. We also found that this point was not absolute: more stringent defini-
tions of what it meant for an image to "overflow" resulted in appreciably smaller
estimates of the maximal angle (Kosslyn 1976b; in preparation). The important
point here, however, is that mental images do have spatial "boundaries," and
because of this, cannot always be "inspected" in their entirety.
These three findings, on scanning images, detecting parts of images, and the
spatial extent of images, converge in demonstrating that the spatiallpictorial
qualities of experienced images can enter into actual processing.
to consider this issue, but might have considered how inferences are made from
sets of abstract propositions (cf. Pylyshyn, 1973).
There seemed to be two very basic alternatives regarding the present issue:
images could be stored in toto, and simply retrieved, or some sort of constructivq.
activity might underlie images. The first alternative was quickly eliminated: We
found that subjects required more time to generate.progressively larger images of
objects (Kosslyn, 1975). If more parts are included in larger images (perhaps
because it is easier to see where they belong), this result is' not surprising. If
images are simply "projected," however, this sort of finding seems counterintui-
tive. However, although unlikely, this result could just indicate. that smaller
images become brighter (more vivid) more quickly than larger ones. This latter
account fails to explain, however, why it takes more time to image very detailed
pictures than to image less elaborated ones (Kosslyn, in press; Kosslyn, Green-
barg & Reiser, in preparation). The results of these experiments make no sense if
images are simply replayed or retrieved as a single unit.
The above results raise yet another issue: Perhaps images are not really con-
structed, but are stored as integral units, which are simply activated a portion at a
time. And the more complex the image, the longer it takes to activate it. In order
to eliminate this possibility, we asked subjects to image scenes which included an
array of letters. The interesting comparison was between alternative organiza-
tions of the same number of letters, which either formed 6 columns or were.
grouped to form 2 columns, each 3 letters wide. We found that when letters were
grouped to form fewer perceptual units, people were able to generate an image of
them more quickly-ven though the same number of letters were present in both
cases. This finding is inconsistent with the piecemeal retrieval hypothesis. These
results do not, however, exclude another possibility: images could be stored
integrally, but organized in the course of retrieval, during output. That is, the
effects of the number of perceptual units could be due to a nonrandom piecemeal
retrieval process. A common introspection seems to run counter to this claim,
however: People do not forget arbitraj portions or segments, but lose entire
interpreted units of an image (see Pylyshyn, 1973). This introspection is sup-
ported by the results of Bower and Glass' (1976) study of the organization of
visual memories, and makes no sense if images are stored integrally.
The next issue, as we saw it, concerned what sorts of representations are
recruited during the construction process. There seemed to be two basic pos-
sibilities: Images could be constructed solely on the basis of perceptual informa-
tion, like fitting together the pieces of a puzzle. Or, images could be created by
relating together perceptual and more abstract "propositional'', representations.
It seemed clear that the latter possibility could in fact occur; after all, most people
claim to have no trouble imaging a novel scene that is described to them (e;g.,
Jimmy Carter putting a giant peanut to bed at night). In this case, one presumably
retrieves memories of what the named objects look like, and uses the stated
relational information to construct an image of these objects in the proper con-
SIMULATING IMAGERY 27 1
2. THE MODEL
Most of the basic concepts underlying our data structures and processes were
motivated by empirical findings, but'not all the details. In a way, the current
version of our program is a collection of hypotheses resting on a relatively firm
foundation of findings. That is, although we have implemented procedures in
particular ways, we are not married to the details; in some cases, we are currently
conducting research in order to discriminate among a number of plausible alter-.
native schemes (e.g., for rotation, in particular). Nonetheless, we feel it is
important to formulate hypotheses that are plausible and sufficient to account for
kn0w.n findings. Thus, we have filled in some details prior to obtaining empirical
support for underlying assumptions. We will not simply incorporate these as-
sumptions and build upon them, however, but will first attempt to discover
whether such assumptions are in fact justified in the face of actual properties of
human cognition.
This section is composed of two major parts. In the first part we will review
the major data structures of our simulation; in the second part, we will discuss the ,
processes involved in generating, inspecting, and transforming images. Before
turning to the specifics of the model, however, let us first discuss some aspects.of
the computer program itself.
"problem solving" set of programs. As it now stands, the user provides the
relevant information directly or by setting up files on a disk (as mentioned
above). '
There are two major data structures in our model, one representing the experi-
enced, quasi-pictorial "surface" image1 that embodies spatial information, and
one representing the abstract information used to generate such images.
' A surface image is quasi-pictorial in that: ( I ) it represents visual information by using a coordi-
nate system to depict the appearance of the referent such that portions of the internal representation
within the coordinate space bear the same spatial relations to each other as do the correshnding
portions of the object(s) being represented;and, (2) the same sorts of information (e.g.,about extent.
brightness contrast, etc.) are available to inte~pretiveprocedures here as are available when percepts
themselves are accessed. This follows if the same sorts of representations (accessible by the same
sorts of interpretive procedures) that underlie the experience of seeing also underlie surface images.
SIMULATING IMAGERY 273
TRANSFORMATIONS I partially
Act iva ted I
I)
Inactivated
IMAGE t
car. prp
r
f e a r t i r e . 'prp .
L f r o n t i r e . oro -
definition 6
4 9 6 11 5 131 r definition 6
2 8 5 1 0 6 j 3 l I
rearwheelbase f rontwheelbase
r
rearwheelbase. prp frontwheeibase . prp
def i n i t i d n 5
4 9 6 , 1 5 1
FIG. 1 . Schematic representation of the basic data structures utilized in the simulation; large
words with arrows symbolize processes that operate upon the structures.
ances (spatial patterns) consisted of instructions specifying where cells should be '
filled. We needed a format that would allow people to preset the size of their
images, since we had data that people could readily image objects at different
sizes (Kosslyn, 1975). In addition, our intuition was that people could place
images at different "locations" with relative ease. These concerns, plus consid-
erations of parsimony, led us to adopt a polar coordinate format. That is, we
specified' the relative location of each point by an R,B pair, indicating distance
and angular orientation relative to an origin. This format allows one to vary size
easily (by multiplying the R values), allows easy shifting of location (by moving
the origin), and .is very economical.
The other component of the underlying representation of an image was a list of
"facts." This list included: ( I ) parts of the object ("HASA X"); (2) the location
of that imaged object in a scene, or the location of a part of an object (as will be
discussed). Locations were indicated by a relation and a "foundation part" (e .g.,
for "cushion," "LOCATION FLUSHON SEAT"); (3) the level of resolution
needed to see the imaged part, indicated in terms of dot density in the surface
matrix; and (4) a definition of the part. This definition was in terms of a set of
procedures, which would be necessary to execute successfully in a specified
sequence in order to find the object or part (as will be discussed). The purpose of
each type of list entry will be developed as we discuss the operating characteris-
tics of the program below. . .
In order to model the constructive process, we were forced to assume that any
given object may be represented in memory by more than just a single image and
propositional file. We posited that all encoded objects have a "skeletal file,"
which represents a global shape (or the "central shape" to which all else is
attached). This file represents infomation encoded upon an initial look; in addi-
tion, there may be subsidiary files representing "second looks." In this case,
information about details may be represented by files subordinate to the skeletal
file. The propositional lists associated with these files must note where on the
skeleton the part belongs, and how it is attached. This swcture was motivated by
our finding that images of objects with more parts required more time to gener-
ate, even if the parts were created simply by cutting up the same object into more
segments, and presenting the segments sequentially prior to imaging. The struc-
ture of our program is schematized in Fig. 1 .
Finally, in addition to kpresentations of rote appearances and lists of proposi-
tional knowledge, we also posit that one has representations of the meanings of
various relations (like "left of," "flush on," "under," and so on). We represent
such knowledge by procedures that serve to integrate parts together. Thus,
"FLUSHON," for example, means that an added part (e.g., a cushion of a
chair) should fit on another part (the seat) such that none overlaps or fails to
cover.
This exhausts the sorts of representations in our model. Let us now turn to the
actual operation of the program.
276 S. M. KOSSLYN AND S. P. SHWARTZ
The best way to illustrate how the prograrn.works is to trace through some
examples involving generation of an image of a car.
ZAll of the following figures are externalized representations of the "activated partition buffer,"
which is operated upon by FIND.
1
AAAAAAAAAAAAAAAAA
AA 1 AAA
AA 1 AAA
AAAAAAAAAAAAA 1 . AA AA
AAAA--------
AA
AAA AAAAAAAA A A A AAAAAAAAAAAA
AAAAA AAAAAAAAAAAAAAAAAAAAAAA AAAA
1
1
1
1
1
1
1
1
1
1
BBBBBBBBBBBBBDBBB
BB 1 BBB
BB 1 BBB
BBRBBBBBBBBBB 1 BB BB
----------------B------------RBBBBBBBBRBBBBBBBBBBRBRDBB8BBBBBBB--------
B B BB B BB B BB
BBB BBBRBBBB B B B RBBBBBBBBBBB
BRBBBC FBBBRBBRRRBRBBBBBBBBRRBBA Af)BBB
CCCCCC 1 AAAAAA
1
1
1
1
1
1
1
4
FIG. 2. Externalized surface representation of a car. The top illustration is a skeletal image and
the bottom is a fully elaborated image. Different letters in the bottom illustration index recency of
being "refreshed."
store only the body of the car in the skeletal file for expository purposes; this
-
allows us to illustrate more clearly how "second looks" are intenrated into the
image. We also have yet to determine when only a skeletal image is generated (as
opposed to a more fully fleshed-out version). We know that when people are
asked to image all of the details, more time is in fact required when more detailed
drawings are mentally pictured. We also have some pilot data, however, that
seem to indicate that if people are not given explicit instructions to include all
parts in their image, the number of details does not influence construction time.
If people typically only construct a skeletal image (and perhaps wait for task
demands before filling in further information), then this result makes sense (if in
fact this preliminary finding is correct). In addition, we do have some slightly
more compelling reasons for suspecting that people often do not fill out all
details: Kosslyn (1976a) found that people did not "see" parts of already con-
structed images any more quickly than they constructed parts of images on the
spot. If people had only constructed a skeleton when asked to image an object,
and' then had to insert the part when asked to "de" it (as opposed to simply
detecting it on a fully constructed image), these results would make sense: in
either case, whether an image was c o n k c t e d ahead of time or not, the person
would still have to generate an image of the probed part. In balance, then, we
decided to have the program construct only the skeletal image unless otherwise
instructed .
with time, and must be regenerated if they are to be maintained. After regenerat-
ing the image, P U T calls FIND in order to locate the foundation part,
"REARWHEELBASE," on the skeletal image. This is accomplished by find-
ing the definition of REARWHEELBASE, and then executing these tests in the
specified order. The definition is specified simply as an ordered list of numbers
that index procedures, but could be stored as a list of concepts corresponding to
what the procedures actually do. Our suspicion was, however, that natural lan-
guage is too weak to easily draw the distinctions necessary to specify the differ-
ent procedures humans actually use, and that such definitions are probably stored
in some kind of abstract format. If all definitional procedures are satisfied, FJND
has located the part, and the,location is passed back to PUT.
Following this, the representation of the relation "UNDER" is looked up.
We represented relations directly as procedures, but instead we could have rep-
resented them in terms of declarative specifications with interpretive functions
that work over them (as far as we are concerned, these alternatives are so difficult
to distinguish between that they may simply be "notational variants"). The
definition for a relation may be very particular; that is, there may be many
distinguishable sorts of "under," depending on what sorts of objects are in-
volved. Some of these distinctions can be expressed in English with phrases
(e .g., "tucked slightly in and under"), but many cannot. Thus, we have simply
used single words as labels, but realize that something more abstract may in fact
be required in order to provide an accurate model of human representation and
processing of relations.
In any case, the relation expresses exactly how parts fit together, which allows
PUT to construct a composite image. The first thing PUT does is to adjust the
size scale of the part relative to the skeleton. That is, the skeleton may have been
generated at any number of sizes, and the part must be matched to scale. This is
done by assessing the size of the foundation part (the wheelbase in this case) and
then adjusting to scale the R values in the image deep representation of the part
(the tire). That is, the extent of the foundation part along a single dimension
(horizontally) is assessed in the image, and then the R values are adjusted until
the sum of the maximum horizontal (within a tolerance) R values is the same
size. Once so adjusted, the part is printed out at the correct location in the image
(via PICTURE). This procedure may be repeated for any number of parts. On
the bottom of Fig. 2, a front tire was also imaged. In this case, the same tire was
simply placed in the front. We suspect that people often assume that all tires look
alike (or all trees!), and are quite happy to save effort by encoding only a single
exemplar, and using it in multiple contexts.
Once an image is constructed, we hypothesize that it begins to fade, and that
work is required to maintain it. We have evidence that more complex images
(e.g., of 16-cell vs. 4-cell matrices) are more difficult to maintain, and that parts
of more complex imaged scenes require more time to detect-supporting the
notion that the image is generally more degraded as it becomes more complex
(see Kosslyn, 1975). In our program, there is a scheme to recycle parts of the
surface image. That is, we print out each part with a different letter; the most
recent part is printed with "a," and the letters reassigned such that more recently
printed parts are printed with lower letters. The different letters in. the car at the
bottom of Fig. 2 reflect recency of being refreshed. Older parts are refreshed
first; in our current implementation, we only regenerate when the program is
asked to-do something (e.g., find a part) with the image. So, the parts may fade
altogether without being refreshed if the image is held too long before the
program is asked to evaluate it or the like. This decision was motivated by.the
same sort of data that led us to consider a skeletal-imageas a default. That is, if
people hold an image for a while, they may in fact let all details fade and only
maintain the skeleton rather than retain a potentially useless load. We are cur-
rently testing this hypothesis, and thus pose it at present only as a reasonable
possibility. Unless fading rates vary.somewhat capriciously, however, our model
commits us to a position that "image processing capacity" will be defined by the
rate at which constmction and fading proceed: If there are so many parts to be
placed that the initially placed ones fade away by the time the last is generated,
all parts will not be displayed simultaneously-and "processing capacity" will
have been exceeded.
here, and will not allow PUT to place the tires; the program trace for the run that
produced Fig. 4 is identical to that produced in generating Fig. 3; except for the
size factor indicated. Thus, we would predict that this size image may be con-
structed faster than a medium sized one if enough foundation parts overflow; this
prediction has not yet been tested.
SIMULATING IMAGERY
1
1
1
1
1
1
1
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
a 1 aaaa
a 1 aaaa
a 1 aaaa
a a a
aaaaaa 1
1 aaaa aaaa
a aaaaaa 1 aaaa
-------.--------------aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa---a---a-----a-
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa a a
aa aaaa aa aaaa aa
aa aaaa aa aaaa aa aa a a
a a aaaa aa aa a a a a a
a aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aa aa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
FIG. 4. Externalized surface representation of a car imaged at a very large subjective size. Small
letters indicate that there isa direct mapping from the visual buffer to the activated partition (which is
printed here); this indicates that Iateral inhibition is no longer obscuring contours.
and works here just as it does himage generation. SCAN shifts the points that
define a surface image across the visual buffer, such that different portions of the
image fall in the center of the matrix (which is posited to be most highly activated
and "in focus"). EXPAND and SHRINK shift the points defining a surface
image such that the size scale is altered. (The details of these last three proce-
dures will be described shortly .)
LOOKFOR accepts commands to locate some part; it does this by calling up
FIND, which looks up a procedural definition of the part and searches for it.
Before executing FIND, however, LOOKFOR checks to see whether the image
size scale is correct; if not, EXPAND or SHRINK is called to make adjust-
ments. If FIND is not successful in locating the part, SCAN is utilized in a
further effort to locate it.
In the current simulation, size and resolution covary; larger images are more resolved, because
more details may emerge a s more dots are available to represent contours in the activated partition
buffer. This relationship between size and level o f resolution is not necessary, however; w e could
have kept the dot density constant as size was changed. A s it now stands, larger images also will
appear "closer." and smaller ones "further away," because of the relationship between size and
resolution.
SIMULATING lMAGERY 285
attached to the-door, which will allow one to adjust the image to the correct size
to "see" the ventwings.
As the program now stands, the optimal level of resolution is stored directly.
The resolution of an image is indexed in our model by dot density; more dense
images are less resolved, as points run into one another and details are obscured
(in our model this is indicated in part by number of capital letters, which indicate
overprinting). We will eventually implement a scheme for calculating optimal
resolution from information about the relative size of a part and the material of
which an object is composed: If dots represent places where light is deflected by
local perturbations on the surface, then mirrors will have fewer dots than
cardboard; hence, knowledge of both size and material will be necessary, to
calculate how dense dots ought to be in order to be optimally able to see some
part. In any case', once optimal dot density is obtained, the current level of
density is assessed and compared yith the optimal; if the current density is not
within some specified range, the program "zooms in" or "pans out" as appro-
priate (via EXPAND or SHRINK, as will be described shortly) until the dots are
at the correct level of density.
Once it has been determined that the image is within the range of resolution
necessary to "see" the sought part, FIND is used as it-is in generating fully
elaborated images. In addition, however, once the tire is found, the program
"scans" to it, which is equivalent to centering the part in the visual buffer.
(Thus, the inspected part is in the most sharply focused region of the visual
buffer.) We decided to have the program scan to the object because of results of
experiments in which people were asked first to mentally focus on a portion of an
image, and then were asked whether they could "see" another portion; the
further the separation between the two portions, the longer the response times.
People reported scanning directly to the object, whereupon it became sharply in
focus. In order to scan directly to it, however, people seemed first to locate it in
the whole image field; the strongly linear'relationship between detection time and
distance scanned (see Kosslyn, 1976b; Kosslyn, Ball & Reiser, in preparation)
indicates that random searching probably was not required. Figure 5 illustrates
the results of finding the rear tire.
1
CCCCCCCCCCCCCCCCC
CC 1 CCC
CCl CCC
CCCCCCCCCCCCC CC C
C 1 CCCCCCCCCCCCCCCCCCCCtCCC c C
C 1 C CC C CC C
CCC CCCCCCCC C C ccc c c c c
'--------------------------CCCCCA------ACCCCCCCCCCCCCCCCCCCCCCCB-~--------c-
AAAAAAl B bC
FIG. 5 . Results of scanning to the rear tire; note that the front end has partially overflowed.
material previously at the appropriate edge (i.e., the right) of the activated region
is shifted into the center, moving material previously inactivated into the acti-
vated regions (lines 42-45).,Thus, since the front tire is defined as being at the
front-which here is equated with "right1'-the rightmost portion of the image is
shifted to the center. (The process of scanning and inspecting should be continu-
SIMULATING IMAGERY 287
TABLE 1
Attempting to Find a Front Tire on a Fully-Detailed, Medium-Sized,
Off-Centered Image of a Car
*LOOKFOR FRONTlRE
REGENERATING IMAGE
SCANNINGTO 25 0
- -
T A B L E 1 cGtld.
FRONTIRE.PRP OPENED
REGENERATING IMAGE . .
SC ANMNG TO 6 0
ous, but we presently perform the scans in leaps in order to save time.) Once the
image is shifted, FIND again inspects the image and now is successful, and
LOOKFOR again calls SCAN to center the found part. The reader will note that
parts (e.g., front tire) are currently defined in t e n s of left and right; this clearly
is inadequate, as the car obviously could face either direction. We plan to
overcome this deficit by having the program locate a "landmark" (e.g., the hood
ornament) which establisJes whether the front is to the left or right; once this is
established, left and right will be assigned as appropriate in the execution of
procedures, which will be written in terms of orientation-invariant relations like
"front" and "back."
We presently execute SCAN only after FIND, but it may be that SCAN-like
EXPAND and SHRINK-ought to be called prior to attempting serious
scrutiny, especially if the optimal resolution entails having some portion over-
flowing (e.g., as would occur when searching for a cat's claws, or the door
handles of a car). We are about to conduct research attempting to discover
whether scanning occurs simultaneously with size adjustment or whether the two
operations occur serially (if the former is true, we expect that effects of size and
SIMULATING IMAGERY 289
distance will not be additive). If size adjustment and scanning occur in parallel,
then our current simulation is in need of revision.
2.2.3 Transforming Images
The data on image transformations seem to suggest that such transformations
occur more or less continuously; at the minimum, the image seems to go through
intermediate states in the process of being transformed (e.g., Cooper & Shepard,
1973).4 In our program, all image transformations involve shifting the locations
of points that delineate the object in the surface matrix. Furthermore, all shifting
is done sequentially, as we did not wish to posit that points could be stored in a
temporary buffer before being repositioned. Given that portions of an image are
transformed sequentially, we were forced to posit that there was a limitation on
how far individual points could be shifted at any one time; this limit is a function
of how far a given portion (i.e., set of points) can be moved before a noticeable
gap or deformation occurs in the image (and hence depends upon the size of the
image, among other factors). Thus, only relatively small changes can be made at
a time, requiring a series of relatively small tranformations. Our scheme, then,
conforms to the data and to our intuitions that images pass through intermediate
points when transformed; there is no reason why a nonspatial model, like a
"semantic network," would require passing through spatially intermediate
states.
Although we have implemented our transformations using the portion-at-ai
time sequential shifting scheme described above, we also could have simulated a
mechanism that shifts all points simultaneously. In this scheme, there would be a
distribution around the distance each point (or portion) is moved, such that the
image becomes scrambled when transformed; further. the larger the transforma-
tion, the more the points would become jumbled. In order to avoid total dismp-
tion, a sequence of relatively small transformations would be used, each one
followed by a "clean up" operation realigning the points correctly. This latter
scheme is much more difficult to implement than the one we have currently
programmed, however, and thus we have hesitated to embark upon it until we
can formulate some way of empirically distinguishing between the two types of
mechanisms.
'At first glance one may think that there is no problem here, images just rotate. expand. etc.
However. an image is not like a picture. not a rigid physical object that can actually be moved and
obeys the laws of physics. A s is apparent when one examines our model, the spatial qualities o f
image representations do not necessarily entail continuous (or stepwise) transformations; this is a real
problem to be solved that is not explained away simply by appeal to the existence of spatial images
(cf. Pylyshyn, in press).
290 S. M. KOSSLYN AND S. P. SHWARTZ
around the center of the surface matrix. For zooming in, the outer ring is moved
outward and then each ring towards the center is moved outward in succession.
For panning,out, the rings are pulled toward the center, again one step at a time
starting at the innermost.ring. The maximal step size of each cycle is set by the
starting size of the image; thus, rates of expansion, for example, may increase as
the size increases; this prediction has yet to be tested. As the apparent size of the
image is altered, previously obscured details may be able to be mapped into more
than a few cells of the activated partition, and hence become more sharply
defined. That is, if the image is too small, various details will be obscured, as
only activated cells (represented in the activated partition buffer) are available for
inspection by the FIND procedure's. All transformations cause points to move in
the visual buffer, which then are mapped into the activated partition; hence, as
images are expanded, previously obscured details become evident (i.e., accessi-
ble to the FIND procedures).
scan indefinitely along an arc (as long as new portions of the image were con-
tinually being constructed at the leading edge).
3. CONCLUSIONS
ACKNOWLEDGMENTS
We wish to thank Robert Abelson, A1 Collins, John Anderson, and Susan Williams for useful
comments on the manuscript.
REFERENCES
Cooper, L. A. Mental rotation of random two-dimensional shapes. Cognitive psycho log^, 1975,7,
20-43.
Cooper, L. A., &,Shepard, R. N. chronometric studies of the rotation of mental images. In W. G.
. Chase ( ~ d . ) , ' V i s u ainformation
l processing. New York: Academic Press, 1973.
Farley, A. M. VIPs: A visual imagery and perception system; the result of protocol analysis. Ph.D.
Dissertation. Camegie-Mellon University. 1974.
Funt, B. V. WHISPER: A computer implementation using analogues in reasoning. Ph.D. Thesis,
University of British Columbia, 1976.
Kosslyn, S. M. Constructing visual images. Ph.D. Dissertation, Stanford University, 1974.
Kosslyn, S. M. Information representation in visual images.Cognitive Psychology, 1975, 7 , 341-
370.
Kosslyn. S. M. Can imagery be distinguished from other forms of internal representation? Evidence
from studies of information retrieval time. Memory and Cognition, 1976.4, 29 1-297. (a)
Kosslyn, S. M. Visual images preserve metric. spatial information. Paper presented at the
Psychonomic Society Meetings, St. Louis, MO, 1976. (b)
Kosslyn. S. M. Imagery and internal representation. In E. Rosch & B. Lloyd (Eds.), Cognition and
categorization. Hillsdale, New Jersey: Lawrence Erlbaum Associates, ,1977, in press.
~osslyn;S. M., Ball, T. M.. & Reiser, B. J. Visual images preserve metric spatial information.
Journal of Experimental Psychology: Human Perception and Performance. in press.
Kosslyn, S. M.,' Greenbarg, P. E., & Reiser, B. J. Generating visual images. In preparation.
Kosslyn, S. M., & Pomerantz, J . R. Imagery, propositions, and the form of internal representations.
Cognitive Psychology, 1977. 9, 52-76.
Kosslyn, S. M., Murphy, G. L., Bemesderfer, M. E., & Feinstein, K. J . Category and continuum
in mental comparisons. Journal of Experimental Psychology: General, in press.
Moran, T . P. The symbolic imagery hypothesis: A production system model. Ph.D. Dissertation,
Carnegie-Mellon University, 1973.
Newell, A. You can't play 20 questions with nature and win. In W. G. Chase (Ed.). Visual
igormation processing. New York: Academic Press, 1973.
Pylyshyn, Z. W. What the mind's eye tells the mind's brain: A critique of mental imagery.
Psychological Bulletin. 1973, 80, 1-24.
Pylyshyn, Z. W. The symbolic nature of mental representations. in S. Kaneff & J . F. O'CalIaghan
(Eds.), Objectives and methodologies in artificial intelligence. New York: Academic Press, in
press.
Simon. H. A. What is visual imagery?An inf~rmation'~rocessing interpretation. In L. W. Gregg
(Ed.), Cognition in learning and memory. New York: J. Wiley, 1972.
Smith. E. E., Shoben, E. J., & Rips, L. J . Structure and process in semantic memory: A feature
model for semantic decisions. Psychological Review, 1974. 81, 214-241.
Weber, R. J., Kelley, J.. & Little, S. is visual imagery sequencing under verbal control? Journal of
Experimental Psychology, 1972, %, 354-362.