Sie sind auf Seite 1von 3

TECHNOLOGY by Massimo Piccardi and Tony Jan

Recent Advances in Computer Vision


omputer vision is the sumer camcorders are based
C branch of artificial intel-
ligence that focuses on pro-
on standards such as the
Digital Video (DV), which
viding computers with the provides videos with 720 ×
functions typical of human 480 pixels/frame at a rate of
vision. To date, computer 30 frames/s. Even Webcams
vision has produced impor- can now provide images of

Henry Schneiderman and Takeo Kanade, Carnegie Mellon University


tant applications in fields satisfactory quality at prices
such as industrial automa- starting as low as $25.
tion, robotics, biomedicine, The availability of afford-
and satellite observation of able hardware and software
Earth. In the field of industri- has opened the way for new,
al automation alone, its appli- pervasive applications of
cations include guidance for computer vision. These
robots to correctly pick up applications have one factor
and place manufactured in common. They tend to be
parts, nondestructive quality human-centered; that is,
and integrity inspection, and either humans are the tar-
on-line measurements. gets of the vision system or
Until a few years ago, they wander about wearing
chronic problems affected small cameras, or sometimes
computer-vision systems and both. Vision systems have
prevented their widespread Figure 1. The Face Detection Project can automatically distinguish become the central sensor in
adoption. Since its start, com- applications such as
puter vision has appeared as images containing faces from other images and put a box around • human-computer interfaces
a computationally intensive each detected face—frontal, profile, or three-quarter images. (HCIs), the links between
and almost intractable field computers and their users;
because its algorithms require a minimum recent years, however, increased perfor- • augmented perception, tools that increase
of hundreds of MIPS (millions of instruc- mance at the system level—faster micro- normal perception capabilities of humans;
tions per second) to be executed in accept- processors, faster and larger memories, and • automatic media interpretation, which
able real time. Even the input–output of faster and wider buses—has made comput- provides an understanding of the con-
high-resolution images at video rate was er vision affordable on a wide scale. Fast tent of modern digital media, such as
traditionally a bottleneck for common com- microprocessors and digital-signal proces- videos and movies, without the need for
puting platforms such as personal comput- sors are now available as off-the-shelf solu- human intervention or annotation; and
ers and workstations. To solve these prob- tions, and some of them can execute calcu- • video surveillance and biometrics.
lems, the research community has produced lations at rates of thousands of MIPS. The
an impressive number of dedicated com- Texas Instruments C6414 processor, for Human-computer interfaces
puter-vision systems. One such famous sys- example, runs at 600 MHz and can achieve The basic idea behind the use of com-
tem was the Massively Parallel Processor a peak performance of 4,800 MIPS. High- puter vision in HCIs is that in several appli-
(MPP), designed at the Goddard Space speed serial buses such as the IEEE 1394 cations, computers can be instructed more
Flight Center in 1983 and operated there and USB 2.0 are capable of transferring naturally by human gestures than by the
until 1991. The MPP used an array of hundreds of megabits per second, a rate use of a keyboard or mouse. In one inter-
16,384 single-bit processors and was capa- that greatly exceeds the requirements of esting application, computer scientist
ble at peak performance of 250 million any common high-resolution video camera. James L. Crowley of the National Polytech-
floating-point operations/s—an impressive These buses are already integrated into the nical Institute of Grenoble in France and
feat at the time. most recent personal computer chipsets or his colleagues used human eye movements
Dedicated computers such as the MMP are available as inexpensive daughter- to scroll a computer screen up and down. A
have always received a cold reception from boards. Moreover, video cameras have gone camera located on top of the screen tracked
industr y because they were expensive, almost completely to digital, and they come the eye movements. The French researchers
cumbersome, and difficult to program. In in several price ranges and types. Con- reported that a trained operator could com-

FEBRUARY/MARCH 2003
© American Institute of Physics 18 The Industrial Physicist
Technology
Institute for Information Technology
National Research Council Canada

many hard-to-find remote con- demos/faceindex—one is shown in Figure 1


trols around today’s homes, pro- —and anyone can submit an image to
vide environmental surveillance, http://www.vasc.ri.cmu.edu/cgi-bin/demos/
and turn the TV off when you fall findface.cgi, which will process the image
asleep in your favourite armchair. overnight and depict all detected faces with a
Figure 2. A camera tracks the point of each play-
box around them.
er’s nose closest to the camera and links it to the The Voice However, computer vision can do much
red “bat” at the top (or bottom) of the table to Another application is The more for multimedia. For example, it is an
vOICe, developed at Philips invaluable support to recent multimedia
return the computer ball across the “net.”
Research Laboratories (Eindhoven, standards aimed at compressing digital
The Netherlands) by Peter B. L. videos—reducing their size in bytes—while
plete a given task 32% faster by using his Meijer and available online for testing at still retaining acceptable visual quality. One
eyes rather than a keyboard or mouse to http://www.seeingwithsound.com. The such standard is MPEG-4 from the Moving
direct screen scrolling. In general, using vOICe provides a simple yet effective means Picture Expert Group, which allows the com-
cameras to sense human gestures is much of augmented perception for people with par- pression of different objects in a scene with
easier than making users wear cumbersome tially impaired vision. In the virtual demon- specific compression levels in such a way as
peripherals such as digital gloves. stration, the camera accompanies you in your to adjust the trade-off between space reduc-
Another interesting example of an HCI wanderings. The camera periodically scans tion and visual quality on a per-object basis.
application can be downloaded from the scene in front of you and turns images The basic idea is that important objects such
http://www.cv.iit.nrc.ca/research/Nouse/ for into sounds, using different pitches and as actors should retain the highest visual
personal testing, provided a Webcam is lengths to encode objects’ position and size. quality, while objects in the background can
plugged into your personal computer. This be encoded with lower quality to save bytes.
application—called Nouse, for nose as a Media interpretation Nonetheless, MPEG-4 is silent on how to
mouse—tracks the movements of your The use of computer vision for automatic separate a video into the objects of which it
nose, and was developed by Dmitry Gorod- media interpretation assists users in searching is composed. Here again, computer vision
nichy. You can play NosePong, a nose-driven for specific scenes and shots otherwise not can help with a variety of techniques that
version of the Pong video game (Figure 2), annotated in the video-scene indexes. For perform the task automatically.
or test your ability to paint with your nose example, images containing faces can be
or to write with your nose. Although this automatically distinguished from other Video surveillance
application is slanted toward fun, it is a con- images, as the results of the Face Detection Perhaps the most developed modern appli-
vincing demonstration of the potential uses Project led by Henr y Schneiderman and cation of computer vision is video surveillance.
of cameras as natural interfaces. In industry, Takeo Kanade at Carnegie Mellon University Long gone are the days when video surveil-
for example, an operator might quickly stop (CMU) prove. The CMU face detector is lance meant low-resolution, black-and-white,
a conveyor belt with a specific gesture considered the most accurate for frontal face analog closed-circuit television. Nowadays,
detected by a camera without needing to detection and is also reliable for facial pro- computer vision enables the integration of
physically push a button, pull a lever, or files and three-quarter images. Many exam- views from many cameras into a single, con-
carry a remote control. ples are available at http://vasc.ri.cmu.edu/ sistent “superimage.” Such an image auto-
Cameras could matically detects scenes
also become power- with people and/or vehi-
University of Technology, Sydney, Australia

ful peripherals for cles or other targets of


the so-called intelli- interest, classifies them
gent home. A cam- in categories such as
era located in your people, cars, bicycles, or
living room would buses, extracts their tra-
perform several jectories, recognizes
tasks, starting with limb and arm positions,
sensing a human and provides some form
presence and then a b of behavior analysis.
turning the lights on Figure 3. This parking-lot surveillance system subtracts the static background The analysis relies on
and the heat up. a list of previously spec-
Indeed, cameras image, distinguishes a person from moving vehicles, locates the head, and cal- ified behaviors or on
could replace the culates the speed of the head in each frame. statistical observations

20 The Industrial Physicist


a Normal behavior
Speed of head 20

15

10
For those who want to build their own
surveillance systems, an enormous amount
5
of equipment is available. Web sites of
manufacturers such as Sony, Axis, Pelco,
0
0 5 10 15 20 25 30 35 40 45 50 and many others offer a wide range of cam-
Video frames (at 5 frames per second) eras. You can find network cameras starting
Abnormal behavior at less than $500 that can be simply plugged
b

University of Technology, Sydney, Australia


20 into any network, such as a TCP-IP, which
can carry a full Web server and allow cam-
15 era frames to be downloaded and processed.
Speed of head

Adjustable pan–tilt–zoom cameras can be


used to point and focus on specific targets
10
over wide survey areas. And if cabling poses
a problem because of camera location,
5
wireless versions are available off-the-shelf.
Computer vision, already a useful aid in
0
0 5 10 15 20 25 30 35 40 45 50 several industrial processes, will find increas-
Video frames (at 5 frames per second) ing uses as companies develop new applica-
tions in areas such as HCI, augmented per-
Figure 4. Examples of the speed of the head (in pixels per frame) of a person in
ception, and automatic media interpretation.
the parking lot exhibiting normal behavior (a) and abnormal behavior (b). Such Its potential to improve plant and public
video surveillance might alert a security guard to a possible car thief. safety is attracting increasing attention in
today’s security-conscious world.
such as frequent-versus-infrequent behav- that represents only the static objects in the
iors. The basic goal is not to completely scene—from the current frame (Figures 3a Further reading
replace security personnel but to assist them and 3b). The next step is to distinguish peo- Crowley, J. L; Coutaz, J.; Bérard, F. Per-
in supervising wider areas and focusing their ple from moving vehicles on the basis of a ceptual user interfaces: things that see.
attention on events of interest. Although the form factor, such as the height:width ratio, Commun. ACM 2000, 43 (3), 54–64.
critical issue of privacy must be addressed and to locate their heads as the top region Jan, T.; Piccardi, M.; Hintz, T. Automated
before society widely adopts these video sur- in their silhouette. In this way, the head’s Human Behaviour Classification using
veillance systems, the recent need for speed at each frame is automatically deter- Modified Probabilistic Neural Network. In
increased security has made them more like- mined. Then, a series of speed samples are Proc. Int. Conf. Computational Intelligence for
ly to win general acceptance. In addition, repeatedly measured for each person in the Modelling, Control and Automation; CIMCA
several technical countermeasures can be scene. Each series covers an interval of 2003, Vienna, Austria, Feb. 12–14, 2003.
taken to prevent privacy abuses, such as pro- about 10 s, which is enough to detect suspi- National Instruments Corp. (Austin,
tecting access to video footage by way of cious behavior patterns (Figure 4). TX), markets a range of computer-vision
passwords and encryption. Finally, a neural network classifier, trained products. Its LabView-based Vision line
At the University of Technology in Syd- to recognize the suspicious behaviors, pro- focuses on industrial and scientific uses.
ney, Australia, we have developed and test- vides the behavior classification. In the http://sine.ni.com/apps/we/nioc.vp?cid=1
ed a system that can detect suspicious experiments we performed, the system 286&lang=US.
pedestrian behavior in parking lots. Our achieved good accuracy, with a reasonably Pavlidis, I.; Morellas, V.; Tsiamyrtzis, P.;
approach is based on the assumption that a limited number of false dismissals and false Harp, S. Urban surveillance systems: from
suspicious behavior corresponds to an indi- alarms—4% and 2%, respectively, among the laboratory to the commercial world.
vidual’s erratic walking trajector y. The more than 100 test samples. Although man- Proc. IEEE 2001, 89 (10), 1478–1496. Ω
rationale behind this assumption is that a ufacturers and operators of surveillance sys-
B I O G R A P H Y
potential offender will wander about and tems have often been reluctant to accept
Massimo Piccardi (massimo@it.uts.
stop between different cars to inspect their innovation, recent results from research lab-
edu.au) is an associate professor of com-
contents, whereas normal users will main- oratories of major companies prove that
puter science and Tony Jan (jant@
tain a more direct path of travel. these systems are now reliable, economical,
it.uts.edu.au) is a lecturer in the depart-
The first step consists of detecting all the and ready for commercialization. One exam-
ment of computer systems at the Universi-
moving objects in the scene by subtracting ple is DETER from Honeywell Labs, a proto-
ty of Technology in Sydney, Australia.
an estimated “background image”—one type urban-surveillance system.

21 The Industrial Physicist

Das könnte Ihnen auch gefallen