Sie sind auf Seite 1von 11

Visual Search: Finding an Image in a Haystack

Cognitive Psychology
Laboratory Project #2
Introduction

From the moment we wake up until we drop exhausted into bed at night, visual sensory
data floods our senses – bombarding our retinas with an unbroken flow of signals that must be
prioritized and processed by our brain’s neuronal pathways. Whether selecting the best breakfast
sandwich from Ross Dining Hall, finding our roommate’s red Toyota in a snowbound Midd
parking lot, or searching for friends in a Mead Chapel concert audience, our cognitive
mechanisms remains the same: from the large quantity of sensory input we process each second,
small amounts survive selective filtering and systematic assessment processes in our short term
and working memory. The result: a life-sustaining ability to quickly and coherently manage the
information propelling our conscious actions and decision-making through a goal-directed
focused on the tasks at hand.

This seemingly straightforward activity of absorbing a visual scene and concentrating on


data deemed important by our attentional goals depends, in reality, on complex sequential tasks
underpinning our ability to sort through the dynamic arrays of objects in our environment.
Framed by a concept known as “visual search,” our brains take only milliseconds to locate the
objects of our attention from a field of distracting stimuli, transforming sensory data into
impulses that activate the diverse yet intricately interwoven cortical networks responsible for
focused attention. Like a spotlight that engages, scans, and then disengages from its targets
(Posner and Petersen, 1990), the gatekeeping function of our selective attention helps us break
down large quantities of visual stimuli into chunks that can be handled by our visual system.

To make sense of visual cognition, we must first understand its top-level architecture.
Treisman organized this activity into four components, each functioning in discrete, context-
dependent ways: attention, feature detection, recognition, and feature integration (Stephen and
Mirman, 2010). These separate functions take place in two sequential processes of visual search
(Treisman and Gelade,1980): an initial automatic evaluative pass that quickly and efficiently
parses out an object’s unique basic descriptive elements, analyzing and encoding one feature at a
time in a parallel processing mode. As part of this initial preattentive assessment, we roughly
group and register information about the object in a number of specialized perceptual maps,
keeping track of the unique features and where they are spatially located. This survey step is
followed by a slower, less-efficient, serially-based evaluation of a smaller subset of selected
objects. This second stage involves recombining and binding the multiple previously identified
features of an object into a cohesive, recognizable holistic representation. Once solidified, we are
then able to take our mental object representation and compare it to what we already know about
our environment. Given working memory’s capacity limitations, this feature amalgamation and
matching process often draws on information stored earlier in our long-term memory,
information that not only helps us recognize known objects but also provides a framework for
evaluating new and changed objects. Also acknowledged are the functional capabilities of
working memory – and associated neural networks – to represent properties such as color and
spatial elements in distinct cortical regions of the brain. Treisman and Gelade called their
paradigm “feature integration theory.”

Key to this feature integration theory is a division of labor that defines two discrete types
of searches: a simple “feature search” and a more complex “conjunction search.” When
performing a task that involves separating objects by only one identifying characteristic (such as
color or shape), feature search enables objects that contain that prominent discriminating element
to seemingly “pop out” when attention is directed their way. Their properties are quickly
organized via the maps created by feature integration processes. “Conjunction search,” on the
other hand, is much more complex, requiring focused resources to serially identify objects that
combine two or more unique attributes. Feature search does not require attentional resources;
conjunction search, on the other hand, may draw on a wealth of short- and long-term memory
resources and, as a result, is generally measured with much slower response times.

To investigate the selective, focused attention processes of visual search, we explored


how subjects analyze visual fields for color and shape, both alone and in combination. The
present experiment used feature and conjunctive search tasks, asking subjects to correctly note
the presence or absence of a target stimulus embedded in display containing increasing levels of
distractive characters. Accuracy of response and response time was measured as dependent
variables of subject performance; color and shape served as independent variables. Based on the
feature integration theory, we hypothesized that looking for a single object feature would be
processed rapidly, with subjects using feature searches to identify appropriate target. On the
other hand, we thought the detection of objects with multiple features, would require the focused
attention of slower, conjunctive searches. Tests were untimed, which we thought would produce
positive correlations between accuracy and response time as search difficulty increased.

Methods

Experimental Subjects:
Twenty-five subjects participated in the experiment, all undergraduates at Middlebury
College volunteering as part of their course requirement. Subjects were directed to perform tasks
as rapidly as possible while making a minimum number of errors in their decision-making.

Experimental Protocol:
Three visual search tasks were used to assess selective attention processes in visual
search. As independent variables, color alone, shape alone, and a combination of color and shape
were tested. One feature search task – testing color – positioned a single target letter (either a
blue X or a blue N) among sets of green Xs and brown Ns. A second feature search task – testing
shape – placed a single oval-shaped target letter (either a brown O or a green O) among sets of
green Xs and brown Ns. A third conjunction search task – simultaneously testing both color and
shape – displayed a single target letter (a green N) among sets of green Xs and brown Ns, each
non-target latter sharing either color or shape with target letter.

Using computer-generated displays, each subject was run under the three trial conditions
to determine how unitary and combined object features affected object recognition response time
and object accuracy rates. The color feature tasks required subjects to indicate whether a blue-
colored letter was present on the screen (a sample stimulus type is represented in Fig. 1).
Similarly, the shape feature tasks asked subjects to determine whether an oval-shaped character
was visible on the screen (a sample stimulus type is represented in Fig. 2). Tasks testing both
color and shape directed subjects to confirm whether they could find a non-blue, non-oval target
letter on the screen (a sample stimulus type is represented in Fig. 3). For each trial, stimuli
remained on the screen until the subject pressed either a Y or N, indicating their response. Once
a response was registered, the subject’s response time was displayed in the top left corner of the
screen; 1000 msec later, the next trial began. Accuracy and response time (measured from the
onset of the stimulus) were recorded for each trial.

Each subject was run individually for one session. There were 144 trials per session – 48
testing color, 48 shape, and 48 the combination of color and shape. The three visual search tasks
were presented with one of four levels of difficulty as measured by the number of distracters:
either1, 5, 15, or 30 characters. The experiment required 72 unique stimuli, 24 each for color,
shape, and the color-shape combination – with each stimulus appearing twice throughout the
trials.

Results

Figures 4 and 5, along with Tables 1-3, summarize the results of this experiment.

Figure 4 reflects the subjects’ mean response times (measured in msec) for the two
individual feature search tasks (color and shape) and the combined conjunction color-shape
search task. Data includes results where the target letter was present and absent. Results are
classed by level of difficulty as measured by the number of distracters: 1, 5, 15, or 30 (set size).
As the data in Figure 4 (and the associated Table 1) demonstrate, both the single feature color
and shape search conditions produced virtually constant response times for all four levels of
difficulty levels (irrespective of target letter presence). Conversely, response times for the
conjunction color-shape searches slowed as task difficulty increased; target letter absence
showed markedly slower response times as difficulty levels rose. As shown in Table 2, these
pattern differences between single and compound feature letter recognition times are similarly
reflected in the sharp increases in response slope for conjuction searches (especially when the
target letter was absent).

Figure 5 reflects subjects’ accuracy rates (measured in percent correct) on the three
untimed visual search tasks – the two individual feature tasks for color and shape, and the
combined conjunction color-shape task. Data includes results where the target letter was present
and absent. Results are classed by level of difficulty as measured by the number of distracters: 1,
5, 15, or 30 (set size). As shown in Figure 5 (and the associated Table 3) indicate, minor
variations in accuracy rates can be seen as difficulty increased; however, all three visual search
tasks produced virtually constant accuracy rates as distractor levels rose (again irrespective of
target letter presence).

Discussion
Looking at both subject response times and accuracy rates – and having performed no
analytical tests on the data to conclude statistical significance – the trends in our data confirm
that searching for objects with an easily distinguishable feature, such as color or shape, improves
visual search efficiency. According to previous research on the visual search process (Treisman
and Gelade, 1980; Triesman and Sato, 1990), these results are consistent with the feature
integration theory of attention, where the prominent descriptive characteristics processed in the
preattentive stage lead to more rapid visual search response times than those processed in the
focused attention stage.
Our results linking the more rapid processing of visual information the bottom-up nature
of preattentive analysis – observed when comparing the response times of the feature search task
with response times of the conjunction search task – support previous research (Treisman and
Gelade, 1980; Treisman and Sato, 1990). This difference in response time reflects Treisman and
Gelade’s feature integration theory of attention (1980) which claims that, without any prior
knowledge of a situation, objects are broken down and sorted by unique, separable features
during the preattentive stage, and are subsequently combined and reassembled in the focused
attention stage. Analyses of salient features such as color and shape are distributed
simultaneously to multiple parts of the brain through parallel processing mechanisms. Once
features are identified, they are appropriately grouped in ways that help us bind correct features
to these representations of visual stimuli. Treisman and Gelade then postulate that feature maps
are created for use in later object recognition processes, helping us create a real-world
interpretation of what we are seeing. This adaptation inherently reduces the cognitive load
experienced by any specific brain region, and it supports rapid decision-making processes,
presumably through neural projections directly from the individual feature processing regions to
the motor cortex (Behrmann and Haimson, 1999).
The process breaks down, however, when the brain has to process multiple salient
features during analysis and decision-making, especially when increased numbers of distractors
are present. As reported by Treisman and Gelade (1980), focused attention acts like "glue" in
forming unitary objects from the available features. In order to register what features belong to
each object, however, the brain is thought to use serial processing in a mechanism that binds the
descriptive features into a cohesive entity. This collocation mechanism slows down processing
speeds because the brain must take into account data from multiple regions before reaching a
decision, coordinating information it has disseminated through various mapping procedures. In
the end, this entire process is reflected in our data by increases in response times during
conjunction search tasks. It is interesting to note that feature distinctiveness was more apparent
in the response time data than the accuracy data – possibly due to the fact that both these
variables operate in a complementary fashion, with accuracy increasing as response time
increases.
Response slope measurements also reinforce these longer memory processing
requirements of conjunction search results. As documented by Tresiman and Gelade (1980),
conjunctions are characterized not only by increased response times but also by a typical 2:1
response ratio between tasks lacking targets and those including targets – a phenomenon found in
our data’s more difficult (and larger) scenarios. When target letters were present in more difficult
searches, subjects consistently detected these targets very nearly twice as fast. Slope – which
links response times to the number of items displayed – provides an estimate of how much time
is needed to assimilate objects into memory. It also provides a marker that subjects are probably
applying what is called serial self-terminating searches in their processing. This search technique
supports subjects’ adoption of a continued systematic search of each item individually in the
visual field until the target is found, a process helpful when evaluating complex visual fields.
Lower slope ratios for target-present conjunction searches (compared to target-absent
tasks), can be accounted for, according to Treisman’s amendments to her initial theory (Treisman
and Sato, 1990), by taking into account object similarity. Targets that share qualities with their
distracters are believed to be processed slightly differently during the initial preattentive stage.
The belief is that shared – and therefore task-irrelevant – attributes between targets and
distracters are suppressed during map-making activities, minimizing distractor interference.
One question worth considering when looking at the results from our experiment is
whether self-terminating searches are actually used for all positive match searches. Like Pashler
(1987), our results show that the 2:1 response time ratios are not present when the number of
distracters is small; in fact these ratios appear at best to be about 1:1. A possible explanation for
this phenomenon could be that subjects used different search mechanisms depending on whether
or not the target character was present. Perhaps self-terminating searches were used with positive
target character matches, while negative target character matches employed serial exhaustive
searches (where the search process continues exhaustively until every last item in the display has
been individually examined) (Treisman and Gelade, 1980).
Conclusion

Clearly, we spend hours every day performing visual searches – allocating our attentional
resources from items in a complex visual field to the next. Unfortunately, finding appropriate
laboratory tests whose results can be both provide guidance on how these processes actually
work – and still support extrapolation of results to real-world experiences – is difficult, at best.
Although trends in the data from this study parallel the observations observed in the
groundbreaking research of Treisman and her colleagues (1980, 1990), we still find ourselves,
over thirty years later, facing deep unanswered questions about the replicability of, given the
significant complexities that our brains handle every day. Yet, the two-part model underlying
Treisman’s simplified research tests of complex visual search – quick extraction of distinctive
object features followed by more in-depth identification and analysis of the parts and whole –
appears solid.

As we read in recent new releases that Google has recently patented facial recognition
software for social networking sites (Storm, 2011), it would come as no surprise, therefore, to
find that the underlying framework relies in part on the two-level paradigm tested in our
experiment. Will the next frontier of visual search involve not only our brain’s neural networks
but also those of supercomputers like IBM’s Watson? Will we begin to reassess what elements
are controlled by parallel processing and which are serial? As visual imaging applications
continue to help us picture the neural processes taking place in our brains, we will no doubt
encounter new sets of questions that challenge our current conceptualization of complex object
recognition – how processing occurs, how objects interrelate with each other, and what role they
play in our dynamic perceptual environment. In the meantime, I may spend tomorrow
determining whether I have sufficient funds to buy another share of Google stock!
References

Behrmann, M., and Haimson, C. (1999). The cognitive neuroscience of visual attention. Curr.
Opin. Neurobiol. 9:158–163.

Desimone, R,, and Duncan, J. (1995). Neural mechanisms of selective visual attention. Annu.
Rev. Neurosci. 18:193–222.

Goldstein, E. (2011). Cognitive psychology: Connecting mind, research, and everyday


experience. Belmont, CA.: Wadsworth.

Pasher, H. (1987). Detecting conjunctions of color and form: reassessing the serial search
hypothesis. Percep. Psychophy. 41(3):191-201.

Posner, M., and Petersen, S.E. (1990). The attention system of the human brain. Annu. Rev.
Neurosci. 13:25-42.

Stephen, D.G. and Mirman, D. 2010. Interactions dominate the dynamis of visual cognition.
Cognition 115:154-165.

Storm, D. 2011. Google publishes facial recognition patent, could use social networking photos.
PC World
http://www.pcworld.com/article/220580/google_publishes_facial_recognition_patent_co
uld_use_social_network_photos.html (Last accessed February 28, 2011).

Treisman, A., and Gelade G. (1980). A feature-integration theory of attention. Cogn.Psychol. 12:
97–136.

Treisman, A., and Sato, S. 1990. Conjunction search revisited. J. Exp. Psych. 16(3):459-478.
Figures and Tables:

Figure 1. Stimuli examples for the feature search task testing color. Participants were asked to
determine if a blue letter was present. Left side: sample with 15 distractors and a blue letter
present. Right side: sample with 15 distractors and a blue letter absent.

Figure 2. Stimuli examples for the feature search task testing shape. Participants were asked to
determine if the round-shaped letter O was present. Left side: sample with 30 distractors and
letter O present. Right side: sample with 30 distractors and letter O absent.

Figure 3. Stimuli examples for the feature search task testing both color and shape. Participants
were asked to determine if a green letter N was present. Left side: sample with 5 distractors with
green letter N present. Right side: sample with 5 distractors and green letter N absent.
Figure 4. Mean response times for visual search trials separated into categories based on the
target defined and the target state.

Response Times 1 5 15 30
O Present 464 msec 475 msec 475 msec 496 msec
O Absent 530 msec 509 msec 583 msec 631 msec
Blue Present 503 msec 470 msec 497 msec 515 msec
Blue Absent 503 msec 479 msec 530 msec 551 msec
Green N Present 556 msec 648 msec 796 msec 962 msec
Green N Absent 674 msec 768 msec 1152 msec 1515 msec
Table 1. Mean response times for visual search trials.
Response Time Slope
O Present 0.98
O Absent 4.06
Blue Present 0.92
Blue Absent 2.14
Green N Present 13.84
Green N Absent 29.64
Table 2. Mean response times for visual search trials.
Figure 5. Accuracy rates for visual search trials separated into categories based on the target
defined and the target state.

Table 3. Mean accuracy rates for visual search trials.

Accuracy Rates 1 5 15 30
O Present 94 % 92 % 95 % 95 %
O Absent 89 % 95 % 96 % 97 %
Blue Present 93 % 94 % 95 % 93 %
Blue Absent 93 % 94 % 93 % 97 %
Green N Present 91 % 88 % 86 % 90 %
Green N Absent 85 % 88 % 89 % 84 %

Das könnte Ihnen auch gefallen