Sie sind auf Seite 1von 66

Distributed Vision Networks

ACIVSACIVS -07
Hamid Aghajan, Stanford University, USA Richard Kleihorst, NXP Research, Netherlands Chen Wu, Stanford University, USA

August 27, 2007 Delft University, Netherlands


Course Website http://wsnl.stanford.edu/acivs07/index.php

Outline
Introduction Application potentials Data fusion mechanisms Human pose analysis Multi-view gesture Smart cameras Outlook

ACIVS 2007 Tutorial

Distributed Vision Networks

Technology Cross Cross-Roads


Image Sensors
Rich information Low power, low cost

Sensor Networks
Wireless communication Networking

Smart Camera Networks


Signal Processing
Embedded processing Collaboration methods

Vision Processing Architecture? Algorithms? Applications?


Distributed Vision Networks

Scene understanding Context awareness

ACIVS 2007 Tutorial

Potential impact on design methodologies in each discipline

Sensor Networks Perspective


Opportunities for novel applications:
Make complex p interpretation p of environment and events Learn phenomena and behavior, not just measure effect Incorporate context awareness into the application Allow network to interact with the environment Change of paradigm: High-bandwidth sensors (vision)

ACIVS 2007 Tutorial

Distributed Vision Networks

Vision Processing Perspective


Novel approach to vision processing:
Use the additional available dimension: space
Data fusion across views, time, and feature levels

Design based on effective use of all available information (opportunistic fusion) Utilize multiple views to:
Overcome ambiguities Achieve robustness Allow for low complexity algorithms

Use communication to exchange descriptions - not raw data


In-node processing

Change of paradigm: Networked vision sensors


ACIVS 2007 Tutorial Distributed Vision Networks 5

Distributed Vision Networks


New Paradigm
High-bandwidth data In-node processing Low-bandwidth communication Collaborative interpretation

ACIVS 2007 Tutorial

Distributed Vision Networks

Distributed Vision Networks


Rich design space utilizing concepts of:
Vision processing Signal g p processing g and optimization p Wireless communications Networking Sensor networks

Novel smart environment applications:


Interpretive Context aware User centric

ACIVS 2007 Tutorial

Distributed Vision Networks

Distributed Vision Networks


Processing at source allows:
Image transfer avoidance Descriptive reports Scalable S l bl networks t k

Design opportunities:
Processing architectures for real-time in-node processing Algorithms based on opportunistic data fusion Novel smart environment applications Balance of in-node and collaborative processing:
Communication cost Latency Processing complexities Levels of data fusion
Distributed Vision Networks 8

ACIVS 2007 Tutorial

Distributed Vision Networks


Vision sensing requires awareness of:
Privacy issues
Employ in in-node node processing Avoid image transfer Applications that provide services not based on monitoring / reporting

Bandwidth issues
Transmit processed information not raw data Transmit based on information value for fusion / query-based

Processing demand
Employ separate early vision and interpretive processing mechanisms Layered processing architecture: Features, objects, relationships, models, decisions
Employ data exchange and collaboration across different layers
ACIVS 2007 Tutorial Distributed Vision Networks 9

Distributed Vision Networks


Agents Response systems Smart environments

Robotics

Feedback ( features, parameters, decisions etc. decisions, etc )

Enabling technologies: o o o o Vision processing Wireless sensor network Embedded computing Signal processing

Distributed Vision Networks ( DVN )

Artificial Intelligence

Context Event interpretation Behavior modeling

Smart Environments

Assisted living Occupancy sensing Augmented reality

Scene construction Virtual reality Gaming

Multimedia

Human Computer Interaction

Immersive virtual reality Non-restrictive interface Robotics

ACIVS 2007 Tutorial

Distributed Vision Networks

10

Outline
Introduction Application potentials Data fusion mechanisms Human pose analysis Multi-view gesture Smart cameras Outlook
Examples by: Chen Wu, Chung-Ching Chang, Huang Lee, Joshua Goshporn, Itai Katz, Kevin Gabayan, Arezou Keshavarz, Ali Maleki-Tabar Wireless Sensor Networks Lab, Stanford University
ACIVS 2007 Tutorial Distributed Vision Networks 11

Application Potentials: View Selection


Select best view of person of interest in real-time tracking
Data exchange between cameras determines which one to stream visual data
DOOR

CAM 1

CAM 2

CAM 5

CAM 4

CAM 3

ACIVS 2007 Tutorial

Distributed Vision Networks

12

Application Potentials: Assisted Living


Detect accidents at home
DOOR

CAM 1

CAM 2

CAM 5

CAM 4

CAM 3

ACIVS 2007 Tutorial

Distributed Vision Networks

13

Application Potentials: MultiMulti-Finger Gesture


Manipulate virtual world with free hand gesture

Pan

Rotate

Zoom out

Zoom in

ACIVS 2007 Tutorial

Distributed Vision Networks

14

Application Potentials: Face Profiling


Interpolate and reconstruct face model from a few snapshots
X X Y -100 -50 0 Y Z Z Y Z X

Camera 3 (Test set)

Camera 1 (Training set)

ACIVS 2007 Tutorial

Distributed Vision Networks

15

Application Potentials: 3D Model Reconstruction


t1

t1 t2

t2

Only observations at t2

Observations at t1

Observations at t2

ACIVS 2007 Tutorial

Distributed Vision Networks

16

Application Potentials: Virtual Reality


Place people in virtual world
DOOR

CAM 1

CAM 2

CAM 5

CAM 4

CAM 3

ACIVS 2007 Tutorial

Distributed Vision Networks

17

Outline
Introduction Application potentials Data fusion mechanisms Human pose analysis Multi-view gesture Smart cameras Outlook

ACIVS 2007 Tutorial

Distributed Vision Networks

18

Smart Camera Networks


Time

Space (Views)

Fusion Dimensions Space (views) Time

Feature Levels

Overcome ambiguities, occlusions Enhance estimate robustness Increase confidence level of estimates Detection of key frames

Feature levels
Exchange of features with other nodes across algorithmic layers
ACIVS 2007 Tutorial Distributed Vision Networks 19

Fusion Mechanisms
Feature fusion:
Use of multiple, complementary features within a camera node

Spatial fusion:
Localization, epipolar geometry, ROI and feature matching Validation of estimates by checking consistency, i outlier li removal l 3D reconstruction

Temporal fusion:
Local interpolation / smoothing of estimates Exchange of updates via spatial fusion Spatiotemporal estimate smoothing and prediction

Model-based fusion:
3D human body reconstruction, human gesture analysis Feedback to in-node feature extraction

Key features and key frames:


Information assisting other nodes
ACIVS 2007 Tutorial

Decision fusion:
Estimates based on soft decisions Adequate features in own observations Cost, latency of communication
Distributed Vision Networks 20

10

Layered Spatial Collaboration


Case Study: Human Gesture Analysis Final decision
Description Layers
G

Decision Layers

Description Layer 4 : Gestures

Security
Decision Layer 3 : collaboration between cameras

Mutual reasoning: Soft decision fusion - Joint estimation


Gesture Elements Assisted reasoning: Estimate validation Feature-based fusion Description Layer 3 :
E1 E2 E3

- Key feature exchange

Decision Layer 2 : collaboration between cameras


F1
f11 f12

Gaming

feature In-node Self reasoning: - extraction In-node feature extraction

Description Layer 2 : Features

F2

f21

f22

F3

f31

f32

Description Layer 1 : Images

D i i Layer Decision L 1: within a single camera


R1 R2 R3

Smart Presentation

Opportunistic data fusion


ACIVS 2007 Tutorial

Fusion of features within a single camera Fusion based on collaboration among multiple cameras
Distributed Vision Networks

Accident Detection

21

Fusion Mechanisms

Mutual Reasoning Self Reasoning

Assisted Reasoning
Exchange of key features
Model-based Active vision Feedback
22

Feature fusion Temporal fusion


ACIVS 2007 Tutorial

Spatial fusion Spatiotemporal fusion


Distributed Vision Networks

11

The Big Picture

ACIVS 2007 Tutorial

Distributed Vision Networks

23

ModelModel -based Fusion


Motivation to build a human model:
A concise reference for merging information from cameras Offers flexibility for interpretation in different applications:
Various g gesture interpretation p applications pp

Allows recreation of body gesture in virtual domain


Viewing angles to body not available from any of the cameras

Allows employment of active vision methods:


Focus on what is important Develop more detail in time

Helps address privacy concerns in various applications

ACIVS 2007 Tutorial

Distributed Vision Networks

24

12

ModelModel -based Fusion


Approach:
Exchange segments and attributes, combine to reconstruct a 3D model Subject Subjects s information mapped and maintained in the model:
Geometric configuration: dimensions, lengths, angles Color / texture / motion of different segments
x z
1
3

1
2

4
4

ellipses
CAM1

ellipses
CAM2

ellipses
CAM3

Advantages:
Employ higher level of in-node processing Exchange descriptions only relevant to model Affordable communication for multi-camera collaboration Initialization for active vision in nodes:
Provides color (other feature) distributions for rough segmentation Helps with body part tracking (motion flow) Offers hint on what is important to look for in images
ACIVS 2007 Tutorial Distributed Vision Networks 25

Data Flow
The collaboration routine

in

out 3D model parameters

2D attribute descriptions in interface out in

2D attribute descriptions out interface in

2D attribute descriptions out interface

local processing routines Cam 1

local processing routines Cam 2

local processing routines Cam 3

ACIVS 2007 Tutorial

Distributed Vision Networks

26

13

Use of Feedback

CAM 1

CAM 11 CAM
3D description

e1

CAM 2

Merge information

Project / decompose to each image plane again

CAM 2 CAM 2

e2

CAM N

Gesture extraction

N CAM CAM N

e3

Gestures

Feedback
Initialize in-node feature extraction Active vision (focus on what is important)
ACIVS 2007 Tutorial Distributed Vision Networks 27

Outline
Introduction Application potentials Data fusion mechanisms Human pose analysis Multi-view gesture Smart cameras Outlook

ACIVS 2007 Tutorial

Distributed Vision Networks

28

14

RegionRegion -based Feature Fusion


Problems with pixel-based features:
Localized attributes need local thresholds hard to set
Comparing color of foreground / background pixels

No information from extended neighborhood considered


Knowledge about extent of neighborhood not available
That is the objective in many cases segmentation

Objects often contain correlated attributes in a region The idea is to grow regions based on correlated attributes
Achieve segmentation an intermediate step in many applications
ACIVS 2007 Tutorial Distributed Vision Networks 29

Feature Fusion: Optical Flow and Color


Use of complementary features
Edge and color Color and motion

Combine pixel-based and region-based methods

ACIVS 2007 Tutorial

Distributed Vision Networks

30

15

RegionRegion -based Fusion: Optical Flow and Color

ACIVS 2007 Tutorial

Distributed Vision Networks

31

Joint Refinement of Color and Motion


Description Layers
Description Layer 4 : Gestures
G

Decision Layers

images coarse estimation of color segmentation coarse estimation of motion flows

Decision Layer 3 : collaboration between cameras Description Layer 3 : Gesture Elements


E1 E2 E3

Decision Layer 2 : collaboration between cameras Description Layer 2 : Features


F1
f11 f12

refine better color segmentation

refine better motion flows

F2

f21

f22

F3

f31

f32

Description Layer 1 : Images

Decision Layer 1 : within a single camera


R1 R2 R3

......
Optical flow assisting color segmentation

......

Color segmentation assisting optical flow


without using angles of ellipses after using angles of ellipses

( ) close-by points with similar motion Clustering vector allows for better segmentation of the leg

Search for fitted ellipse in motion flow allows for effective detection of arms motion vector
32

ACIVS 2007 Tutorial

Distributed Vision Networks

16

Spatial Fusion
Geometric fusion Mutual reasoning
J Joint i t estimation ti ti Joint refinement Decision fusion
Making correspondences Tracking Reconstruction of 3D models Camera network calibration Use of epipolar geometry to:
Feature matching Outlier removal ROI mapping between camera views
X X Y -100
Camera 3 (Test set) Camera 1 (Training set)

Assisted reasoning
Estimate validation Key frame exchange

Z Y Z X

Face Orientation Estimation


Color and geometry-based method 0 Spatial / temporal validation method
-50 Y
33

Mapped to an ellipsoid

ACIVS 2007 Tutorial

Distributed Vision Networks

Color and Geometry Fusion


Face orientation analysis
In-node feature extraction by fusion of Color and Geometry
Apply position constraints for the eyes when thresholding Cb/Cr **

Compensated Image Skin color ellipse model Skin mask

Cb/Cr Mean and Covariance

Eye-map Eye-Gaussian Distribution

GaussianChrominance Distribution

Eye Candidate

ACIVS 2007 Tutorial

Distributed Vision Networks

34

17

Color and Geometry Fusion


Face orientation analysis
Feature matching with epipolar geometry Use geometry of cameras to:
Match features Remove false feature candidates
100 200 300 100

x1 b li baseline

l x2

Epipolar line for x1


200 300 100 200 300

00 50 00

100 150 200 100 200 300

100 150 200 100 200 300

50 00 50 00

50 100 150 200

50 100 150 200

100

200

300

False eye candidates

50

50

50

False face candidate

ACIVS 2007 Tutorial

Distributed Vision Networks

Epipolar lines for false candidates

An Example of Mutual Reasoning


35

Feature Fusion
Level of features for fusion between cameras?
Features are typically dense fields
Edge points, motion vectors

They are locally fused to derive descriptions (sparse)


Descriptions are exchanged

Valuable features may be exchanged as dense descriptors


Communication cost issues need to be considered
Collaboration between cameras Features (single camera) or descriptions (shared) Processing within a single camera

High-level descriptions L Low-level l ld descriptions i ti High-level features Low-level features

Sparse

Dense

Key features and key frames allow selective sharing of dense features
ACIVS 2007 Tutorial Distributed Vision Networks 36

18

Key Frames
Frames with high confidence estimates
Node with key frame observation broadcasts derived information Other nodes use them to refine their local estimates

ACIVS 2007 Tutorial

Distributed Vision Networks

37

Temporal Fusion
Use key frames to re-initialize local face angle estimate
Use angle estimates close to zero (frontal view)

Aims to limit error propagation in time


Use optical flow to locally track angle changes between frames Interpolate between two key frames to limit optical flow error propagation
Cameras initialize face angles Cameras initialize face angles

Local optical flow is used to track face angle between key frames

Cameras interpolate face angles between key frames using local optical flow

Key frames

ACIVS 2007 Tutorial

Distributed Vision Networks

38

19

Spatial / Temporal Validation


Estimates between key frames are corrected by:
Temporal smoothing (one camera) Outlier O tli removal l( (multiple lti l cameras) )
Temporal smoothing Spatial smoothing

Can this be done more effectively?


Spatiotemporal filtering

Key frames

ACIVS 2007 Tutorial

Distributed Vision Networks

39

Spatiotemporal Fusion
200 degree 0 -200 0 5 10 15 20 25 30 35 40
Backward Pass Forward Pass

Opportunistic O t i ti creation ti of f face f profile fil


150 True orientation

Right Profile
degree

100 50 0 -50 -100 -150 0

Left Profile

Mapped to ellipsoid
5
G H I

10
J K L

15
M N

20 frame
O P

25
Q R

30
S T

35
U V

40
W X Y Z

-140

-125

-105

-100

-95

-80

-70

-65

-50

-40

-27

-20

-10

10

20

30

45

50

62

70

80

85

110

120

130

ACIVS 2007 Tutorial

Query result examples: side profiles


Distributed Vision Networks

40

20

Spatiotemporal Fusion

ACIVS 2007 Tutorial

Distributed Vision Networks

41

ModelModel -based Human Posture Reconstruction


From model, or via k-means Refine color models (Perceptually Organized) Or other morphological method with constraints Concise description of segments

Segmentation function: Single camera


Feedback

Model fitting function: Collaborative


Goodness of ellipse fits to segments Projection on image planes E.g. parameters for the upper body (arms)

ACIVS 2007 Tutorial

Distributed Vision Networks

42

21

InIn -Node Feature Fusion for Segmentation

ACIVS 2007 Tutorial

Distributed Vision Networks

43

Collaborative Model Fitting


Frame105

ACIVS 2007 Tutorial

Distributed Vision Networks

44

22

Collaborative Model Fitting

ACIVS 2007 Tutorial

Distributed Vision Networks

45

Virtual Placement

ACIVS 2007 Tutorial

Distributed Vision Networks

46

23

Virtual Placement
Collaborative Face Analysis

Feature Fusion

Ellipse Fitting Model based Model-based Spatiotemporal Fusion

In-node processing

ACIVS 2007 Tutorial

Distributed Vision Networks

47

Decision Fusion
Smart home care network for fall detection
States are combined as soft decisions to create a report
Accelerometer Signal Classifier
250

Trigger Image Analysis State0


x axis

Accelerometer Signal for Falling and Sitting Down

Fall

Camera 1 Analysis State1

Camera 2 Analysis State2 Decision Making Process

Camera 3 Analysis State3

Sitting Down

y axis z axis

200

Signal Level

150

100

50

50

180 Accelerometer Amplitude Change 160 140 120 100 80 60 40 20 0 0 0.2

Time ( (s) ) Falling versus Sitting Down

100

150

200

250

300

Sitting Down Falling

State0=3

State0=1

No Report (Safe)

Report All Useful Data (Possible Hazard)

Report (Hazard)

0.4

0.6

0.8 1 1.2 Duration (s)

1.4

1.6

1.8

Joint work with: Arezou Keshavarz, Ali Maleki-Tabar ACIVS 2007 Tutorial Distributed Vision Networks 48

24

Decision Fusion

- Initialize Sear rch Space for 3D Model - Validate Model


Distributed Vision Networks

Wait for More M Observations

ACIVS 2007 Tutorial

49

Decision Fusion

ACIVS 2007 Tutorial

Distributed Vision Networks

50

25

Decision Fusion

ACIVS 2007 Tutorial

Distributed Vision Networks

51

Decision Fusion

Alert level = 0.6598 Confidence =0

Alert level = 0.8370 Confidence = 0.7389

Alert level = 0.8080 Confidence = 0.7695

combine
Lying down, danger ( 0.6201 )

ACIVS 2007 Tutorial

Distributed Vision Networks

52

26

Event Interpretation
Human Model Vision Kinematics Attributes States Reasoning / Interpretations

Behavior analysis Interpretation levels Instantaneous action Low-level features

AI reasoning

AI

Posture / attributes Vision Processing M d l parameters Model t

Feedback

Queries Context Persistence Behavior attributes


53

ACIVS 2007 Tutorial

Distributed Vision Networks

Event Interpretation

ACIVS 2007 Tutorial

Distributed Vision Networks

54

27

Outline
Introduction Application potentials Data fusion mechanisms Human pose analysis Multi-view gesture Smart cameras Outlook

ACIVS 2007 Tutorial

Distributed Vision Networks

55

Why is it difficult to estimate posture?


Image cues are not obvious

Human model varies (body part sizes)

High-dimensional optimization

ACIVS 2007 Tutorial

Distributed Vision Networks

56

28

Image Cues
Edge
Templates Chamfer distance (distance orientation) (distance,

Motion
Structure Object boundaries / edges

Color
skin color adaptively learned color

No single i one is i robust !


Point/line features v.s. region features

ACIVS 2007 Tutorial

Distributed Vision Networks

57

Human Model
Measured
Motion capture devices

Acquisition
Init at the beginning, with not too difficult postures

If differs from the subject too much, fitting might be trapped into far-away local minima easily

ACIVS 2007 Tutorial

Distributed Vision Networks

58

29

Optimization
Top-down approach
3Dmodel>2Dprojectionsof edgesandsilhouettes Validate2Dprojectionswith imageobservations Deutscher(particlefiltering), Sminchisescu(scaledcovariance sampling)etc.

ACIVS 2007 Tutorial

Distributed Vision Networks

59

Optimization
Bottom-up approach
Lookingforbodypartcandidatesinimages Assemble2D/3Dmodelsfrombodypart candidates did t Sigal(looselimbedpeople),Ramanan(profile pose)

ACIVS 2007 Tutorial

Distributed Vision Networks

60

30

Optimization - Pros and Cons


Top-down approach
+ Easytohandleocclusions Difficulttooptimize:nonconvex Timeconsumingincalculatingprojectionsandevaluate them

Bottom-up approach
+ Distributemorecomputationinimages(i.e.bodypart candidates,localassemblage) Difficulttohandleocclusionswithoutknowingrelative configurationsofbodyparts Notdirecttomapfrom2Dassemblagetothe3Dmodel

ACIVS 2007 Tutorial

Distributed Vision Networks

61

Multiple Views?
Gains
More image cues Resolve ambiguity (sometimes helps a lot!)

Challenges
Huge redundancy, how to reduce? Misleading Correspondence? Communication?

R Review i Score addition (Gavrila etc.) 3D voxels / stereo (Malik etc.) 3D templates from training (Sigal etc.)

ACIVS 2007 Tutorial

Distributed Vision Networks

62

31

MultiMulti -View Camera Network


Basic Assumption and Constraint
-- Powerful local image processor, limited communication Reduce R d l local l information i f ti Maximally utilize multi-views:
to compensate for partial observations and reduced descriptions

Ideas Combine bottom-up and top-down approaches


Concise and informative local deduction

Choose best view for different purposes


Optimally combine Reduce redundancy

Challenge: Can we learn adaptively?


Model (size, appearance) Behaviors -> prediction & validation
ACIVS 2007 Tutorial Distributed Vision Networks 63

TemporalTemporal -Spatial Fusion

ACIVS 2007 Tutorial

Distributed Vision Networks

64

32

ModelModel -based Gesture Analysis


Why use a human model for gesture: for gesture interpretations:
Offers flexibility for interpretation in different applications:
Various gesture interpretation applications

Allows recreation of body gesture in virtual domain


Viewing angles to body not available from any of the cameras

Helps address privacy concerns in various applications

Applications: gaming, security, assisted living, etc.

ACIVS 2007 Tutorial

Distributed Vision Networks

65

ModelModel -based Gesture Analysis


But also for vision analysis A concise reference allowing information from cameras to merge
Spatial: Model parameters are fused from multiple views Temporal: Model parameters are updated in time

Allows for active vision methods:


Focus on what is important (color / texture) Develop more details in time

ACIVS 2007 Tutorial

Distributed Vision Networks

66

33

Spatial Fusion
Combining top-down and bottom-up approaches

ACIVS 2007 Tutorial

Distributed Vision Networks

67

Distributed Communication Flow


Fusion to update local knowledge of the subject
Vector of model parameters

CAM 5 3

Camera 5 wants to update its knowledge of the subject

1 CAM 1 5 4 2

Broadcast the request for collaboration

Updated knowledge of the subject is fedback

The other cameras send requested descriptions

5 5 CAM 4 CAM 2 Up-to-date knowledge of the subject is used in local processing CAM 3
ACIVS 2007 Tutorial Distributed Vision Networks 68 Vector of descriptions from local processing

34

Algorithm
From model, or via k-means Refine color models (Perceptually Organized) Or other morphological method with constraints Concise description of segments

Segmentation function: Single camera


Feedback

Model fitting function: Collaborative


Goodness of ellipse fits to segments Projection on image planes E.g. parameters for the upper body (arms)

ACIVS 2007 Tutorial

Distributed Vision Networks

69

Segmentation
In-node function based on:
Feature fusion Feedback from model

Segmentation function: Single camera

Feedback allows for incorporation of spatiotemporal fusion outcome into local analysis Rough estimate of segments provided by:
Local initialization Adoption of spatiotemporal model

Expectation Maximization (EM) methods use new observation to refine local color distributions
EM produces markers (collection of high-confidence segment islands) for watershed Also helps with varying color distributions between cameras

Watershed enforces spatial proximity information to link the segment


ACIVS 2007 Tutorial Distributed Vision Networks 70

35

EM Segmentation
Initialization:
Not a good idea to arbitrarily specify an initial estimate
EM may be trapped to local optima

Ways W to t obtain bt i initial i iti l estimates: ti t


K-means Centers of clusters are taken as the initial estimations for EM Segment parameters from the 3D body model Assumes appearance doesnt change very quickly

Segmentation function: Single camera

ACIVS 2007 Tutorial

Distributed Vision Networks

71

Perceptually Organized EM (POEM)


Regular EM method:
A pixel-based method
Doesnt use spatial relationship between pixels / segment islands

May also leave some pixels unclassified

POEM:
Segments are continuous, so consider a pixels neighborhood Use a measure of expected grouping:
w( xi , x j ) = e
xi x j coord ( xi ) coord ( x j )
2 2

12

The neighborhood votes for (xi in segment l):


Vl ( xi ) = l ( x j ) w( xi , x j ), where l ( x j ) = p ( y j = l | x j )
xj

ACIVS 2007 Tutorial

Distributed Vision Networks

72

36

Watershed Segmentation
Removing vague pixels is important for watershed, since wrong seeds/markers would compete with correct ones and cause false segments

Segmentation function: Single camera

Red: undecided pixels

Assigns labels to undecided (dark blue) pixels


ACIVS 2007 Tutorial Distributed Vision Networks 73

Ellipse Fitting
Motivation:
Concise descriptions of segments Each ellipse should represent a segment with similar shape Not necessarily correspond to body parts

Goodness of fit measures control ellipse fitting:


Occupancy of the ellipse Coverage of the segment

ACIVS 2007 Tutorial

Distributed Vision Networks

74

37

ModelModel -based Human Posture Reconstruction


Ellipse parameters are exchanged between cameras
Reduced data communication load for collaboration
1
1 2
2

3
3

A server node or each of cameras collect the data and create a virtual skeleton

4
4

Goodness of ellipse fits to segments

Projection on image planes

E.g. parameters for the upper body (arms)

ACIVS 2007 Tutorial

Distributed Vision Networks

75

Skeleton Fitting
Frame1

Frame28

Frame70

Frame81

Frame105

Frame148

ACIVS 2007 Tutorial

Distributed Vision Networks

76

38

Summary
Main difference of a camera network: Spatially distributed vision information
-> huge potentials for 3D interpretations, especially for human activities

How to achieve systematically and efficiently


Model-based In-node processing Collaboration between views

ACIVS 2007 Tutorial

Distributed Vision Networks

77

Open Questions
How much advantageous g over monocular? In what ways? How to use them in the correct way? Capability limit of the camera network (how well can it understand the scene, how many views are needed)? Balance and trade-off trade off : In-node In node v.s. v s collaborative processing Networking: Data exchange v.s. latency

ACIVS 2007 Tutorial

Distributed Vision Networks

78

39

Outline
Introduction Application potentials Data fusion mechanisms Human pose analysis Multi-view gesture Smart cameras Outlook

ACIVS 2007 Tutorial

Distributed Vision Networks

79

Aims of This Section

T To introduce i t d smart t camera architectures hit t To motivate smart wireless cameras from a power consumption point of view To discuss the research and industrial approaches

ACIVS 2007 Tutorial

Distributed Vision Networks

80

40

Vision Systems
Are systems that analyse images and video They report in events/objects/properties DVD recorders, set-top boxes, smart cameras video image

VISION SYSTEM

data

ACIVS 2007 Tutorial

Distributed Vision Networks

81

Smart Camera Vision System


A definition
Source: Wikipdia

A smart t camera is i an i integrated t t d machine hi vision i i system t which, hi h in addition to image capture circuitry, includes a processor, which can extract information from images without need for an external processing unit, and interface devices used to make results available to other devices.

Franois Berry, Univ. Blaise Pascal, France


ACIVS 2007 Tutorial Distributed Vision Networks 82

41

Granularity of Integration
Smart camera is vision system with an intermediate hardware granularity

Machine Vision System

Smart Camera

Vision chips

PROCESSING IN INTEGRATED LEVEL

Franois Berry, Univ. Blaise Pascal, France


ACIVS 2007 Tutorial Distributed Vision Networks 83

Smart Cameras
= Camera + intelligence = The basis for new applications Such as: detection, tracking, scene analysis

Automotive

Mobile Comm.

Surveillance

Consumer

ACIVS 2007 Tutorial

Distributed Vision Networks

84

42

Challenges in Wireless Smart Cameras


Performance Power consumption Programmability

ACIVS 2007 Tutorial

Distributed Vision Networks

85

Actually, Why a Smart Wireless Camera?

Why not a radio connection to PC that does processing?


ACIVS 2007 Tutorial Distributed Vision Networks 86

43

Power Consumption of a Wireless VGA Camera

A typical wireless digital VGA camera needs 200 mWatt to transmit life video over a short distance. 1 year of daily use means 1750 Wh Using a good fuel cell it consumes about 2.2 liters of methanol fuel per year. This takes about 400X the total camera volume
ACIVS 2007 Tutorial Distributed Vision Networks 87

Consumption is in the Radio Path


01101 0 DS P DA RF PA

TRANSMITTER

ET

EL

transmit signal processing Link energy per bit energy per bit
ET [nJ/bit] Bluetooth GSM (0.2 Watt) 150 500-1000 EL [nJ/bit] 1 2500

ACIVS 2007 Tutorial

Distributed Vision Networks

88

44

Power Consumption for Radio


Power mains 1W 802.11a Bluetooth Zigbee PicoRadio
power/bit does not scale with Moores Law

UTP

100m battery 10m 1m Auto- 100 nomous

Data Rate b/ b/s 100M

100

1k

10k

100k

1M

10M

~sensor

~ speech, audio, hifi

~ moving pictures
Raf Roovers

ACIVS 2007 Tutorial

Distributed Vision Networks

89

Computational Efficiency is Growing (Moore)


1011 1010 109 108 107 106 105 104 103 102 101 100 0 2 80

Age in Years Human Brain

[GOPS per p Watt]

SIMD Processor

ASICs

CPUs
Pentium 4

0.5

0.25

0.13

0.07

Feature size [um] SIMD Single Instruction Multiple Data


ACIVS 2007 Tutorial Distributed Vision Networks 90

45

Strategy for Wireless Cameras?


For short range transmission a physical limit on the DA converter dictates the power consumption, will not get lower.

Programmable processors continue to become more energy efficient in GOPS/Watt. Especially SIMD processors which are suitable for a large part of an image processing chain

So, go towards smart cameras with SIMDs where the video analysis is done in the camera itself and only events are forwarded

ACIVS 2007 Tutorial

Distributed Vision Networks

91

Some LowLow-Cost Smart Cameras

CMUcam3 (ARM7) 60 MIPS @ 650mW

Stanford MeshEye Mote (ARM7)

ACIVS 2007 Tutorial

Distributed Vision Networks

92

46

Smart Wireless Camera Platform

WiCa (Xetal SIMD) 50 GOPS @ 600mWatt


ACIVS 2007 Tutorial

Cyclops (AVR RISC) 8 MIPS @ 50mWatt?


93

Distributed Vision Networks

Architecture
Smart Camera
Embedded system
Operates in fixed run-time (Real Not end end-user user constraints ( programmable time)

Differentiating features: power cost performance Runs a few applications often known at design time
ACIVS 2007 Tutorial Distributed Vision Networks

Franois Berry, Univ. Blaise Pascal, France


94

47

Smart Camera Designs for Research


Technology

Processor approach: DSP, Media Processor, GPU

Programmable Logic approach: CPLD, FPGA

Franois Berry, Univ. Blaise Pascal, France


ACIVS 2007 Tutorial Distributed Vision Networks 95

Processor Approach
GPU and DSP
DSP (Digital Signal Processor) :
Microprocessor designed specifically for digital signal processing processing, generally in real-time computing Features: Highly parallel accumulator and multiplier : MAC Operation Fixed-point arithmetic is often used to speed up arithmetic processing. The specification of DSPs is 4 algorithms: Infinite Impulse Response (IIR) filters Finite Impulse Response (FIR) filters FFT Convolvers

Franois Berry, Univ. Blaise Pascal, France


ACIVS 2007 Tutorial Distributed Vision Networks 96

48

Smart Camera Designs for Research


Technology

Processor approach: DSP, Media Processor, GPU

Programmable Logic approach: CPLD, FPGA

Franois Berry, Univ. Blaise Pascal, France


ACIVS 2007 Tutorial Distributed Vision Networks 97

Programmable Logic Approach


FPGA (Field Programmable Gate Array):
Its an electronic component used to build dedicated digital circuits Integrated circ circuit it able to change interconnecti interconnectivity it of a large n number mber of fundamental computing components via configuration information stored in onboard static RAM

FPGA

Hardware Description
VHDL

Customized IC =
Processor Specific Glue logic Specific p applications pp

Verilog

Franois Berry, Univ. Blaise Pascal, France


ACIVS 2007 Tutorial Distributed Vision Networks 98

49

Programmable Logic Approach


FPGA (Field Programmable Gate Array):
Increasing speed & density Increased I/O pin count and bandwidth Lower power Integration of hard IP (e.g. multipliers, processor soft cores,)
Maximum Sustained Single Precision Floating Point Operations (GFLOPS)
$350 $300 $250 Cost per 1 Mi illion Gates

FPGA Costs

300 250
Oper rations
Maximum Floating-Point Multiply-Accumulates

200 150 100 50 0


1998

$200 $150 $100 $50 $0

1999

2000

2002

2005

1998

1999

2000

2001

2002

2003

Franois Berry, Univ. Blaise Pascal, France


ACIVS 2007 Tutorial Distributed Vision Networks 99

Smart Camera for Research


A very efficient smart cam could be based on an efficient mix of FPGA, DSP, GPU,!!!

Franois Berry, Univ. Blaise Pascal, France


ACIVS 2007 Tutorial Distributed Vision Networks 100

50

Smart Camera Design for Consumer Use


Challenges: Performance Energy consumption Cost Let us look at an example

ACIVS 2007 Tutorial

Distributed Vision Networks

101

Example Event Casting: Face Detection

ACIVS 2007 Tutorial

Distributed Vision Networks

102

51

Face Detection Application Mapping


low level
intermediate

Video

level

high level

Data

Pixel processing: Haar filters for every y pixel p similar SIMD 10++GOPS
ACIVS 2007 Tutorial

Image processing: Application: Image pyramid Draw box, event for every y image g similar FPGA/DSP 100MOPS
Distributed Vision Networks

y event For every different CPU 1MOPS


103

SIMD Single Instruction Multiple Data

Why SIMD for LowLow-Level?


High-performance (need > 10GOPS) High Hi h i internalt l bandwidth b d idth (need ( d > 500Gb/ 500Gb/s) )

A PE C

B
instruction

Bandwidth =

10GOPS * 3 * 16bits

SIMD Single Instruction Multiple Data


ACIVS 2007 Tutorial Distributed Vision Networks 104

52

Uniprocessor to SIMD: 1 PE
4.6mm2 0.6mm2 1mm2 0.02mm2

Data Memory y

100MHz
Program Memory

DSP Performance Size Performance/area Overhead Bandwidth


ACIVS 2007 Tutorial

Control 100MOPS 5.22mm2 19MOPS/mm2 26% 4.8Gb/s

Distributed Vision Networks

105

Uniprocessor to SIMD: 2PEs


4.6mm2 0.6mm2 1mm2 0.02mm2 0.02mm2 1 2

100MHz
Program Memory

DSP

Control 200MOPS 5.24mm2 38MOPS/mm2 25% 9.6Gb/s

Performance Size Performance/area Overhead Bandwidth


ACIVS 2007 Tutorial

Distributed Vision Networks

106

53

Uniprocessor to SIMD: 100PEs


4.6mm2 0.6mm2 1mm2 0.02mm2 0.02mm2 1 2

100MHz
Program Memory

DSP

100

Control 10 GOPS 8.2 mm2 1.2 GOPS/mm2 20% 480 Gb/sec.

Performance Size Performance/area Overhead Bandwidth


ACIVS 2007 Tutorial

Distributed Vision Networks

107

Uniprocessor to SIMD
RISC : 1PE 50MHz Peak Performance Size Performance /area 0.18u Overhead Bandwidth Peak Power Consumption 0.05 GOPS 6.4 mm2 0.008 GOPS/mm2 26% 2 Gb/S Xetal-II SIMD :
320PE@150MHz

Pentium4 2.4GHz 6 GOPS 131 mm2 0.045 GOPS/mm2 ??% 58 Gb/S 59 Watt

100 GOPS
44.4 (0.18u) 11.1 (0.09u) mm2

2.25 GOPS/mm2 12% 1.5Tb/S 1.0 Watt

ACIVS 2007 Tutorial

Distributed Vision Networks

108

54

Why is SIMD Low Low-Power?


Typical DSP instructions need 4 accesses to memory

A PE C
ACIVS 2007 Tutorial

instruction

C = A + B; C = A > B ? A : B;
Distributed Vision Networks 109

Why is SIMD Low Low-Power?


SIMD have multiple PEs in parallel A ith ti always Arithmetic l h has t to b be d done But: Instruction fetch is shared multiple times. Data (A,B,C) access is shared in multiple-word-wide memories g an 8 times wider memory y takes Accessing half the amount of energy per data entity.

ACIVS 2007 Tutorial

Distributed Vision Networks

110

55

SIMD Energy Consumption


SIMD Energy Consumption 25

Basis: Convolution
20

Enery Consumption[nJ/pixel]

Computation Communication Memory access

15

W=13 10

W=11 W=9

W=7 W=5 W=3

100

200

300 400 Number of PEs

500

600

Without voltage scaling, energy saving levels off


111

Parallelism Memory Localization


ACIVS 2007 Tutorial Distributed Vision Networks

SIMD Power/Energy Scaling


[40,1]
1 30 GOPS 0.9 Scaling Energy Consumption in SIMD Architectures

[Vddmax, fmax]

0.8

0.7

Energy Scaling Factor

20 GOPS 0.6

0.5

Parallelism enables scaling Vth limits degree of scaling fmax = 50 MHz


Sets Pmin

0.4

10 GOPS

0.3

[200,0.3]
0.1 GOPS

N = 640 pixels
Sets Pmax = 640 [Vddmin, fmin]

0.2

0.1

100

200

300 400 Number of PEs

500

600

700

ACIVS 2007 Tutorial

Distributed Vision Networks

112

56

Consumer Smart Wireless Camera Architecture

SIMD

DSP

CPU

Event reporting

SIMD: Single Instruction Multiple Data


ACIVS 2007 Tutorial Distributed Vision Networks 113

Smart Wireless Camera Platform

WiCa

ACIVS 2007 Tutorial

IC3D/Xetal3 based Stereo sensor input 50GOPS performance T i l 100 Typical 100milli-Watts illi W tt ZigBee node Battery powered C++ programmed
114

Distributed Vision Networks

57

Smart Wireless Camera PCB


ZigBee module

Battery module
Ben Schueler, NXP ACIVS 2007 Tutorial Distributed Vision Networks 115

Picture of WiCa Setup

Alexander Danilin, NXP


ACIVS 2007 Tutorial Distributed Vision Networks 116

58

Which Algorithms Run Easily on WiCa?


Where much of the application is running on the SIMD Where the DSP/CPU is used for limited or occasional tasks only Choose appropriate algorithmic basis for scene analysis
For F example: l feature f t b based d

ACIVS 2007 Tutorial

Distributed Vision Networks

117

What Have We Mapped to WiCa?


Face detection: soft edge features

Horizontal soft edges

Vertical soft edges

ACIVS 2007 Tutorial

Distributed Vision Networks

118

59

What Have We Mapped to WiCa?


Wireless intruder detection system

ACIVS 2007 Tutorial

Distributed Vision Networks

119

What Have We Mapped to WiCa?


Object recognition applications

ACIVS 2007 Tutorial

Distributed Vision Networks

120

60

What Have We Mapped to WiCa?


Depth estimation from stereo

ACIVS 2007 Tutorial

Distributed Vision Networks

121

Some Power Consumption Results


Object recognition Face detection Stereo depth estimation Gesture recognition 25mWatt 40mWatt 50mWatt 15mWatt

ACIVS 2007 Tutorial

Distributed Vision Networks

122

61

Conclusions
Low-power smart wireless cameras can be d i designed d with ith SIMD f front-ends t d Real-time applications ZigBee node opens research challenges for distributed camera networks WiCa in use at a number of sites

ACIVS 2007 Tutorial

Distributed Vision Networks

123

Outline
Introduction Application potentials Data fusion mechanisms Human pose analysis Multi-view gesture Smart cameras Outlook

ACIVS 2007 Tutorial

Distributed Vision Networks

124

62

Summary
Smart camera networks:
Enable novel user-centric applications:
Interpretive Context aware User centric

Processing at source allows:


Image transfer avoidance Scalable networks p reports p Descriptive

Privacy issues:
Awareness of user choices In-node processing and image transfer avoidance Model-based or silhouetted images
ACIVS 2007 Tutorial Distributed Vision Networks 125

Summary
Smart camera networks:
Algorithm design is key in efficient use of computing resources
In-node feature extraction and opportunistic fusion Use of key features in the data exchange mechanism Model-based approach provides feedback / initial points for in-node processing

Balance between in-node and collaborative processing


Communication cost Latency Processing complexities Levels of data fusion

ACIVS 2007 Tutorial

Distributed Vision Networks

126

63

Towards Active Vision


Active vision in feature extraction:
Use of key features instead of generic features (edges, motion, etc.) Detection of prominent color / texture attributes Use of spatiotemporal fusion results to learn key features

Active vision in modules with processing load:


Instead of avoiding methods with high processing cost / latency:
Define what they should look for Perform initialization to restrict searches

Active vision in gesture analysis:


Use history of subject and semantic meanings of gestures to feedback what is important to detect

ACIVS 2007 Tutorial

Distributed Vision Networks

127

Outlook
Applications:
Select best view of person of interest in real-time tracking Adjust presentation based on speakers speaker s gestures Manipulate virtual world with free hand / finger gestures Detect accidental falls at home / elderly care Reconstruct face model from a few snapshots Build 3D models of objects Place people and their actions in virtual world

ACIVS 2007 Tutorial

Distributed Vision Networks

128

64

Outlook
Agents Response systems Smart environments

Robotics

Feedback ( features, parameters, decisions etc. decisions, etc )

Enabling technologies: o o o o Vision processing Wireless sensor network Embedded computing Signal processing

Distributed Vision Networks ( DVN )

Artificial Intelligence

Context Event interpretation Behavior modeling

Smart Environments

Assisted living Occupancy sensing Augmented reality

Scene construction Virtual reality Gaming

Multimedia

Human Computer Interaction

Immersive virtual reality Non-restrictive interface Robotics

ACIVS 2007 Tutorial

Distributed Vision Networks

129

Outlook
Enabling technologies: o o o o Vision processing Wireless sensor networks Embedded computing Signal processing

Distributed Vision Networks ( DVN )

Artificial Intelligence (AI)

Context Event interpretation Behavior models

Feedback ( features, parameters, decisions, etc. )

Quantitative Knowledge

Qualitative Knowledge
Immersive virtual reality Non-restrictive Non restrictive interface Interactive robotics Scene construction Virtual reality Gaming Agents Response systems User interactions

Human Computer I t Interaction ti

Multimedia

Smart Environments

Robotics

ACIVS 2007 Tutorial

Distributed Vision Networks

130

65

References
H. Aghajan and C. Wu, From Distributed Vision Environment to Human Behavior Interpretations, Behaviour Monitoring and Interpretation Workshop at the 30th German Conference on Artificial Intelligence, Sept. 2007. C. Wu and H. Aghajan, Model-based Human Posture Estimation for Gesture Analysis in an Opportunistic Fusion Smart Camera Network, Int. Conf. on Advanced Video and Signal based Surveillance (AVSS), Sept. 2007. C. Chang and H. Aghajan, A LQR Spatiotemporal Fusion Technique for Face Profile Collection in Smart C Camera S Surveillance, ill I Int. t C Conf. f on Ad Advanced d Vid Video and d Si Signal lb based dS Surveillance ill (AVSS) (AVSS), S Sept. t 2007 2007. C. Chang and H. Aghajan, Spatiotemporal Fusion Framework for Multi-Camera Face Orientation Analysis, Advanced Concepts for Intelligent Vision Systems (ACIVS), August 2007. C. Wu and H. Aghajan, Model-based Image Segmentation for Multi-View Human Gesture Analysis, Advanced Concepts for Intelligent Vision Systems (ACIVS), August 2007. Hamid Aghajan and Chen Wu, Layered and Collaborative Gesture Analysis in Multi-Camera Networks, Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), April 2007. Chen Wu and Hamid Aghajan, Opportunistic Feature Fusion-based Segmentation for Human Gesture Analysis in Vision Networks, IEEE SPS-DARTS, March 2007. Chen Wu and Hamid Aghajan, Collaborative Gesture Analysis in Multi-Camera Networks, ACM SenSys Workshop on Distributed Smart Cameras (DSC) (DSC), Oct. Oct 2006 2006. Chung-Ching Chang and Hamid Aghajan, Collaborative Face Orientation Detection in Wireless Image Sensor Networks, ACM SenSys Workshop on Distributed Smart Cameras (DSC), Oct. 2006. A. Maleki-Tabar, A. Keshavarz, H. Aghajan, Smart Home Care Network using Sensor Fusion and Distributed Vision-Based Reasoning, ACM Multimedia Workshop On Video Surveillance and Sensor Networks (VSSN), Oct. 2006. A. Keshavarz, A. Maleki-Tabar, H. Aghajan, Distributed Vision-Based Reasoning for Smart Home Care, ACM SenSys Workshop on Distributed Smart Cameras (DSC), Oct. 2006.

http://wsnl.stanford.edu/publications.php
ACIVS 2007 Tutorial Distributed Vision Networks 131

www.ICDSC.org
Tutorials: Andrea Cavallaro, Queen Mary University of London, UK: Smart Cameras: Algorithms, Evaluation and Applications Bjoern Gottfried, U. of Bremen, Germany: Ambient Intelligence and the Role of Spatial Reasoning: Smart Environments with Smart Cameras Richard Radke, Rensselaer Polytechnic Institute, USA: Multiview Geometry for Camera Networks Networks Wilfried Elmenreich, Vienna Univ., Realtime Sensor Networks for Smart Cameras: Communication, Data Processing and Applications PhD Forum: Students present their PhD research in spot posters and short talk Tutorial lecturers will discuss potential research topics with students

First ACM / IEEE International Conference on Distributed Smart Cameras (ICDSC-07)


Smart camera architectures Image sensing techniques for smart cameras Embedded vision programming Fusion of vision and other sensors Distributed vision processing algorithms Distributed appearance modeling Collaborative C ll b ti f feature t extraction, t ti d data t and dd decision i i f fusion i Architectures and protocols for camera networks Wireless and mobile image senor networks Position discovery and middleware applications Vision-based smart environments Surveillance and tracking applications Multi-view vision for human-computer interaction 3D scene analysis Distributed multimedia and gaming applications
ACIVS 2007 Tutorial Distributed Vision Networks

September 25-28, 2007 Vienna, Austria

132

66

Das könnte Ihnen auch gefallen