Vision Sensor Network Tutorial

Distributed Vision Networks
ACIVSACIVS -07
Hamid Aghajan, Stanford University, USA Richard Kleihorst, NXP Research, Netherlands Chen Wu, Stanford University, USA
August 27, 2007 Delft University, Netherlands

Course Website http://wsnl.stanford.edu/acivs07/index.php
Outline
Introduction Application potentials Data fusion mechanisms Human pose analysis Multi-view gesture Smart cameras Outlook
ACIVS 2007 Tutorial
Technology Cross Cross-Roads

Image Sensors
Rich information Low power, low cost
Sensor Networks
Wireless communication Networking
Smart Camera Networks

Signal Processing
Embedded processing Collaboration methods
Vision Processing Architecture? Algorithms? Applications?

Scene understanding Context awareness
ACIVS 2007 Tutorial
Potential impact on design methodologies in each discipline
Sensor Networks Perspective

Opportunities for novel applications:
Make complex p interpretation p of environment and events Learn phenomena and behavior, not just measure effect Incorporate context awareness into the application Allow network to interact with the environment Change of paradigm: High-bandwidth sensors (vision)
ACIVS 2007 Tutorial
Vision Processing Perspective

Novel approach to vision processing:
Use the additional available dimension: space
Data fusion across views, time, and feature levels
Design based on effective use of all available information (opportunistic fusion) Utilize multiple views to:
Overcome ambiguities Achieve robustness Allow for low complexity algorithms
Use communication to exchange descriptions - not raw data

In-node processing
Change of paradigm: Networked vision sensors

ACIVS 2007 Tutorial Distributed Vision Networks 5

New Paradigm
High-bandwidth data In-node processing Low-bandwidth communication Collaborative interpretation
ACIVS 2007 Tutorial

Rich design space utilizing concepts of:
Vision processing Signal g p processing g and optimization p Wireless communications Networking Sensor networks
Novel smart environment applications:

Interpretive Context aware User centric
ACIVS 2007 Tutorial

Processing at source allows:
Image transfer avoidance Descriptive reports Scalable S l bl networks t k
Design opportunities:
Processing architectures for real-time in-node processing Algorithms based on opportunistic data fusion Novel smart environment applications Balance of in-node and collaborative processing:
Communication cost Latency Processing complexities Levels of data fusion
Distributed Vision Networks 8
ACIVS 2007 Tutorial

Vision sensing requires awareness of:
Privacy issues
Employ in in-node node processing Avoid image transfer Applications that provide services not based on monitoring / reporting
Bandwidth issues
Transmit processed information not raw data Transmit based on information value for fusion / query-based
Processing demand
Employ separate early vision and interpretive processing mechanisms Layered processing architecture: Features, objects, relationships, models, decisions
Employ data exchange and collaboration across different layers

Agents Response systems Smart environments
Robotics
Feedback ( features, parameters, decisions etc. decisions, etc )
Enabling technologies: o o o o Vision processing Wireless sensor network Embedded computing Signal processing
Distributed Vision Networks ( DVN )
Artificial Intelligence
Context Event interpretation Behavior modeling
Smart Environments
Assisted living Occupancy sensing Augmented reality
Scene construction Virtual reality Gaming
Multimedia
Human Computer Interaction
Immersive virtual reality Non-restrictive interface Robotics
ACIVS 2007 Tutorial
10
Outline
Examples by: Chen Wu, Chung-Ching Chang, Huang Lee, Joshua Goshporn, Itai Katz, Kevin Gabayan, Arezou Keshavarz, Ali Maleki-Tabar Wireless Sensor Networks Lab, Stanford University
Application Potentials: View Selection

Select best view of person of interest in real-time tracking
Data exchange between cameras determines which one to stream visual data
DOOR
CAM 1
CAM 2
CAM 5
CAM 4
CAM 3
ACIVS 2007 Tutorial
12
Application Potentials: Assisted Living

Detect accidents at home
DOOR
CAM 1
CAM 2
CAM 5
CAM 4
CAM 3
ACIVS 2007 Tutorial
13
Application Potentials: MultiMulti-Finger Gesture

Manipulate virtual world with free hand gesture
Pan
Rotate
Zoom out
Zoom in
ACIVS 2007 Tutorial
14
Application Potentials: Face Profiling

Interpolate and reconstruct face model from a few snapshots
X X Y -100 -50 0 Y Z Z Y Z X
Camera 3 (Test set)
Camera 1 (Training set)
ACIVS 2007 Tutorial
15
Application Potentials: 3D Model Reconstruction

t1
t1 t2
t2
Only observations at t2
Observations at t1
Observations at t2
ACIVS 2007 Tutorial
16
Application Potentials: Virtual Reality

Place people in virtual world
DOOR
CAM 1
CAM 2
CAM 5
CAM 4
CAM 3
ACIVS 2007 Tutorial
17
Outline
ACIVS 2007 Tutorial
18
Smart Camera Networks

Time
Space (Views)
Fusion Dimensions Space (views) Time
Feature Levels
Overcome ambiguities, occlusions Enhance estimate robustness Increase confidence level of estimates Detection of key frames
Feature levels
Exchange of features with other nodes across algorithmic layers
Fusion Mechanisms
Feature fusion:
Use of multiple, complementary features within a camera node
Spatial fusion:
Localization, epipolar geometry, ROI and feature matching Validation of estimates by checking consistency, i outlier li removal l 3D reconstruction
Temporal fusion:
Local interpolation / smoothing of estimates Exchange of updates via spatial fusion Spatiotemporal estimate smoothing and prediction
Model-based fusion:
3D human body reconstruction, human gesture analysis Feedback to in-node feature extraction
Key features and key frames:

Information assisting other nodes
ACIVS 2007 Tutorial
Decision fusion:
Estimates based on soft decisions Adequate features in own observations Cost, latency of communication
10
Layered Spatial Collaboration

Case Study: Human Gesture Analysis Final decision
Description Layers
G
Decision Layers
Description Layer 4 : Gestures
Security
Decision Layer 3 : collaboration between cameras
Mutual reasoning: Soft decision fusion - Joint estimation

Gesture Elements Assisted reasoning: Estimate validation Feature-based fusion Description Layer 3 :
E1 E2 E3
- Key feature exchange
Decision Layer 2 : collaboration between cameras

F1
f11 f12
Gaming
feature In-node Self reasoning: - extraction In-node feature extraction
Description Layer 2 : Features
F2
f21
f22
F3
f31
f32
Description Layer 1 : Images
D i i Layer Decision L 1: within a single camera

R1 R2 R3
Smart Presentation
Opportunistic data fusion

ACIVS 2007 Tutorial
Fusion of features within a single camera Fusion based on collaboration among multiple cameras
Accident Detection
21
Fusion Mechanisms
Mutual Reasoning Self Reasoning
Assisted Reasoning
Exchange of key features
Model-based Active vision Feedback
22
Feature fusion Temporal fusion

ACIVS 2007 Tutorial
Spatial fusion Spatiotemporal fusion

11
The Big Picture
ACIVS 2007 Tutorial
23
ModelModel -based Fusion

Motivation to build a human model:
A concise reference for merging information from cameras Offers flexibility for interpretation in different applications:
Various g gesture interpretation p applications pp
Allows recreation of body gesture in virtual domain

Viewing angles to body not available from any of the cameras
Allows employment of active vision methods:

Focus on what is important Develop more detail in time
Helps address privacy concerns in various applications
ACIVS 2007 Tutorial
24
12
ModelModel -based Fusion

Approach:
Exchange segments and attributes, combine to reconstruct a 3D model Subject Subjects s information mapped and maintained in the model:
Geometric configuration: dimensions, lengths, angles Color / texture / motion of different segments
x z
1
3
1
2
4
4
ellipses
CAM1
ellipses
CAM2
ellipses
CAM3
Advantages:
Employ higher level of in-node processing Exchange descriptions only relevant to model Affordable communication for multi-camera collaboration Initialization for active vision in nodes:
Provides color (other feature) distributions for rough segmentation Helps with body part tracking (motion flow) Offers hint on what is important to look for in images
Data Flow
The collaboration routine
in
out 3D model parameters
2D attribute descriptions in interface out in
2D attribute descriptions out interface in
2D attribute descriptions out interface
local processing routines Cam 1
ACIVS 2007 Tutorial
26
13
Use of Feedback
CAM 1
CAM 11 CAM
3D description
e1
CAM 2
Merge information
Project / decompose to each image plane again
CAM 2 CAM 2
e2
CAM N
Gesture extraction
N CAM CAM N
e3
Gestures
Feedback
Initialize in-node feature extraction Active vision (focus on what is important)
Outline
ACIVS 2007 Tutorial
28
14
RegionRegion -based Feature Fusion

Problems with pixel-based features:
Localized attributes need local thresholds hard to set
Comparing color of foreground / background pixels
No information from extended neighborhood considered

Knowledge about extent of neighborhood not available
That is the objective in many cases segmentation
Objects often contain correlated attributes in a region The idea is to grow regions based on correlated attributes
Achieve segmentation an intermediate step in many applications
Feature Fusion: Optical Flow and Color

Use of complementary features
Edge and color Color and motion
Combine pixel-based and region-based methods
ACIVS 2007 Tutorial
30
15
RegionRegion -based Fusion: Optical Flow and Color
ACIVS 2007 Tutorial
31
Joint Refinement of Color and Motion

Description Layers
Description Layer 4 : Gestures
G
Decision Layers
images coarse estimation of color segmentation coarse estimation of motion flows
Decision Layer 3 : collaboration between cameras Description Layer 3 : Gesture Elements

E1 E2 E3
Decision Layer 2 : collaboration between cameras Description Layer 2 : Features

F1
f11 f12
refine better color segmentation
refine better motion flows
F2
f21
f22
F3
f31
f32
Description Layer 1 : Images
Decision Layer 1 : within a single camera

R1 R2 R3
......
Optical flow assisting color segmentation
......
Color segmentation assisting optical flow

without using angles of ellipses after using angles of ellipses
( ) close-by points with similar motion Clustering vector allows for better segmentation of the leg
Search for fitted ellipse in motion flow allows for effective detection of arms motion vector
32
ACIVS 2007 Tutorial
16
Spatial Fusion
Geometric fusion Mutual reasoning
J Joint i t estimation ti ti Joint refinement Decision fusion
Making correspondences Tracking Reconstruction of 3D models Camera network calibration Use of epipolar geometry to:
Feature matching Outlier removal ROI mapping between camera views
X X Y -100
Camera 3 (Test set) Camera 1 (Training set)
Assisted reasoning
Estimate validation Key frame exchange
Z Y Z X
Face Orientation Estimation

Color and geometry-based method 0 Spatial / temporal validation method
-50 Y
33
Mapped to an ellipsoid
ACIVS 2007 Tutorial
Color and Geometry Fusion

Face orientation analysis
In-node feature extraction by fusion of Color and Geometry
Apply position constraints for the eyes when thresholding Cb/Cr **
Compensated Image Skin color ellipse model Skin mask
Cb/Cr Mean and Covariance
Eye-map Eye-Gaussian Distribution
GaussianChrominance Distribution
Eye Candidate
ACIVS 2007 Tutorial
34
17
Color and Geometry Fusion

Face orientation analysis
Feature matching with epipolar geometry Use geometry of cameras to:
Match features Remove false feature candidates
100 200 300 100
x1 b li baseline
l x2
Epipolar line for x1

200 300 100 200 300
00 50 00
100 150 200 100 200 300
100 150 200 100 200 300
50 00 50 00
50 100 150 200
50 100 150 200
100
200
300
False eye candidates
50
50
50
False face candidate
ACIVS 2007 Tutorial
Epipolar lines for false candidates
An Example of Mutual Reasoning

35
Feature Fusion
Level of features for fusion between cameras?
Features are typically dense fields
Edge points, motion vectors
They are locally fused to derive descriptions (sparse)

Descriptions are exchanged
Valuable features may be exchanged as dense descriptors

Communication cost issues need to be considered
Collaboration between cameras Features (single camera) or descriptions (shared) Processing within a single camera
High-level descriptions L Low-level l ld descriptions i ti High-level features Low-level features
Sparse
Dense
Key features and key frames allow selective sharing of dense features
18
Key Frames
Frames with high confidence estimates
Node with key frame observation broadcasts derived information Other nodes use them to refine their local estimates
ACIVS 2007 Tutorial
37
Temporal Fusion
Use key frames to re-initialize local face angle estimate
Use angle estimates close to zero (frontal view)
Aims to limit error propagation in time

Use optical flow to locally track angle changes between frames Interpolate between two key frames to limit optical flow error propagation
Cameras initialize face angles Cameras initialize face angles
Local optical flow is used to track face angle between key frames
Cameras interpolate face angles between key frames using local optical flow
Key frames
ACIVS 2007 Tutorial
38
19
Spatial / Temporal Validation

Estimates between key frames are corrected by:
Temporal smoothing (one camera) Outlier O tli removal l( (multiple lti l cameras) )
Temporal smoothing Spatial smoothing
Can this be done more effectively?

Spatiotemporal filtering
Key frames
ACIVS 2007 Tutorial
39
Spatiotemporal Fusion
200 degree 0 -200 0 5 10 15 20 25 30 35 40
Backward Pass Forward Pass
Opportunistic O t i ti creation ti of f face f profile fil

150 True orientation
Right Profile
degree
100 50 0 -50 -100 -150 0
Left Profile
Mapped to ellipsoid
5
G H I
10
J K L
15
M N
20 frame
O P
25
Q R
30
S T
35
U V
40
W X Y Z
-140
-125
-105
-100
-95
-80
-70
-65
-50
-40
-27
-20
-10
10
20
30
45
50
62
70
80
85
110
120
130
ACIVS 2007 Tutorial
Query result examples: side profiles

40
20
Spatiotemporal Fusion
ACIVS 2007 Tutorial
41
ModelModel -based Human Posture Reconstruction

From model, or via k-means Refine color models (Perceptually Organized) Or other morphological method with constraints Concise description of segments
Segmentation function: Single camera

Feedback
Model fitting function: Collaborative

Goodness of ellipse fits to segments Projection on image planes E.g. parameters for the upper body (arms)
ACIVS 2007 Tutorial
42
21
InIn -Node Feature Fusion for Segmentation
ACIVS 2007 Tutorial
43
Collaborative Model Fitting

Frame105
ACIVS 2007 Tutorial
44
22
Collaborative Model Fitting
ACIVS 2007 Tutorial
45
Virtual Placement
ACIVS 2007 Tutorial
46
23
Virtual Placement
Collaborative Face Analysis
Feature Fusion
Ellipse Fitting Model based Model-based Spatiotemporal Fusion
In-node processing
ACIVS 2007 Tutorial
47
Decision Fusion
Smart home care network for fall detection
States are combined as soft decisions to create a report
Accelerometer Signal Classifier
250
Trigger Image Analysis State0

x axis
Accelerometer Signal for Falling and Sitting Down
Fall
Camera 1 Analysis State1
Camera 2 Analysis State2 Decision Making Process
Camera 3 Analysis State3
Sitting Down
y axis z axis
200
Signal Level
150
100
50
50
180 Accelerometer Amplitude Change 160 140 120 100 80 60 40 20 0 0 0.2
Time ( (s) ) Falling versus Sitting Down
100
150
200
250
300
Sitting Down Falling
State0=3
State0=1
No Report (Safe)
Report All Useful Data (Possible Hazard)
Report (Hazard)
0.4
0.6
0.8 1 1.2 Duration (s)
1.4
1.6
1.8
Joint work with: Arezou Keshavarz, Ali Maleki-Tabar ACIVS 2007 Tutorial Distributed Vision Networks 48
24
Decision Fusion
- Initialize Sear rch Space for 3D Model - Validate Model

Wait for More M Observations
ACIVS 2007 Tutorial
49
Decision Fusion
ACIVS 2007 Tutorial
50
25
Decision Fusion
ACIVS 2007 Tutorial
51
Decision Fusion
Alert level = 0.6598 Confidence =0
Alert level = 0.8370 Confidence = 0.7389
Alert level = 0.8080 Confidence = 0.7695
combine
Lying down, danger ( 0.6201 )
ACIVS 2007 Tutorial
52
26
Event Interpretation
Human Model Vision Kinematics Attributes States Reasoning / Interpretations
Behavior analysis Interpretation levels Instantaneous action Low-level features
AI reasoning
AI
Posture / attributes Vision Processing M d l parameters Model t
Feedback
Queries Context Persistence Behavior attributes

53
ACIVS 2007 Tutorial
Event Interpretation
ACIVS 2007 Tutorial
54
27
Outline
ACIVS 2007 Tutorial
55
Why is it difficult to estimate posture?

Image cues are not obvious
Human model varies (body part sizes)
High-dimensional optimization
ACIVS 2007 Tutorial
56
28
Image Cues
Edge
Templates Chamfer distance (distance orientation) (distance,
Motion
Structure Object boundaries / edges
Color
skin color adaptively learned color
No single i one is i robust !

Point/line features v.s. region features
ACIVS 2007 Tutorial
57
Human Model
Measured
Motion capture devices
Acquisition
Init at the beginning, with not too difficult postures
If differs from the subject too much, fitting might be trapped into far-away local minima easily
ACIVS 2007 Tutorial
58
29
Optimization
Top-down approach
3Dmodel>2Dprojectionsof edgesandsilhouettes Validate2Dprojectionswith imageobservations Deutscher(particlefiltering), Sminchisescu(scaledcovariance sampling)etc.
ACIVS 2007 Tutorial
59
Optimization
Bottom-up approach
Lookingforbodypartcandidatesinimages Assemble2D/3Dmodelsfrombodypart candidates did t Sigal(looselimbedpeople),Ramanan(profile pose)
ACIVS 2007 Tutorial
60
30
Optimization - Pros and Cons

Top-down approach
+ Easytohandleocclusions Difficulttooptimize:nonconvex Timeconsumingincalculatingprojectionsandevaluate them
Bottom-up approach
+ Distributemorecomputationinimages(i.e.bodypart candidates,localassemblage) Difficulttohandleocclusionswithoutknowingrelative configurationsofbodyparts Notdirecttomapfrom2Dassemblagetothe3Dmodel
ACIVS 2007 Tutorial
61
Multiple Views?
Gains
More image cues Resolve ambiguity (sometimes helps a lot!)
Challenges
Huge redundancy, how to reduce? Misleading Correspondence? Communication?
R Review i Score addition (Gavrila etc.) 3D voxels / stereo (Malik etc.) 3D templates from training (Sigal etc.)
ACIVS 2007 Tutorial
62
31
MultiMulti -View Camera Network

Basic Assumption and Constraint
-- Powerful local image processor, limited communication Reduce R d l local l information i f ti Maximally utilize multi-views:
to compensate for partial observations and reduced descriptions
Ideas Combine bottom-up and top-down approaches

Concise and informative local deduction
Choose best view for different purposes

Optimally combine Reduce redundancy
Challenge: Can we learn adaptively?

Model (size, appearance) Behaviors -> prediction & validation
TemporalTemporal -Spatial Fusion
ACIVS 2007 Tutorial
64
32
ModelModel -based Gesture Analysis

Why use a human model for gesture: for gesture interpretations:
Offers flexibility for interpretation in different applications:
Various gesture interpretation applications
Allows recreation of body gesture in virtual domain

Viewing angles to body not available from any of the cameras
Helps address privacy concerns in various applications
Applications: gaming, security, assisted living, etc.
ACIVS 2007 Tutorial
65
ModelModel -based Gesture Analysis

But also for vision analysis A concise reference allowing information from cameras to merge
Spatial: Model parameters are fused from multiple views Temporal: Model parameters are updated in time
Allows for active vision methods:

Focus on what is important (color / texture) Develop more details in time
ACIVS 2007 Tutorial
66
33
Spatial Fusion
Combining top-down and bottom-up approaches
ACIVS 2007 Tutorial
67
Distributed Communication Flow

Fusion to update local knowledge of the subject
Vector of model parameters
CAM 5 3
Camera 5 wants to update its knowledge of the subject
1 CAM 1 5 4 2
Broadcast the request for collaboration
Updated knowledge of the subject is fedback
The other cameras send requested descriptions
5 5 CAM 4 CAM 2 Up-to-date knowledge of the subject is used in local processing CAM 3
ACIVS 2007 Tutorial Distributed Vision Networks 68 Vector of descriptions from local processing
34
Algorithm
From model, or via k-means Refine color models (Perceptually Organized) Or other morphological method with constraints Concise description of segments

Feedback
Model fitting function: Collaborative

Goodness of ellipse fits to segments Projection on image planes E.g. parameters for the upper body (arms)
ACIVS 2007 Tutorial
69
Segmentation
In-node function based on:
Feature fusion Feedback from model
Feedback allows for incorporation of spatiotemporal fusion outcome into local analysis Rough estimate of segments provided by:
Local initialization Adoption of spatiotemporal model
Expectation Maximization (EM) methods use new observation to refine local color distributions
EM produces markers (collection of high-confidence segment islands) for watershed Also helps with varying color distributions between cameras
Watershed enforces spatial proximity information to link the segment

35
EM Segmentation
Initialization:
Not a good idea to arbitrarily specify an initial estimate
EM may be trapped to local optima
Ways W to t obtain bt i initial i iti l estimates: ti t

K-means Centers of clusters are taken as the initial estimations for EM Segment parameters from the 3D body model Assumes appearance doesnt change very quickly
ACIVS 2007 Tutorial
71
Perceptually Organized EM (POEM)

Regular EM method:
A pixel-based method
Doesnt use spatial relationship between pixels / segment islands
May also leave some pixels unclassified
POEM:
Segments are continuous, so consider a pixels neighborhood Use a measure of expected grouping:
w( xi , x j ) = e
xi x j coord ( xi ) coord ( x j )
2 2
12
The neighborhood votes for (xi in segment l):

Vl ( xi ) = l ( x j ) w( xi , x j ), where l ( x j ) = p ( y j = l | x j )
xj
ACIVS 2007 Tutorial
72
36
Watershed Segmentation
Removing vague pixels is important for watershed, since wrong seeds/markers would compete with correct ones and cause false segments
Red: undecided pixels
Assigns labels to undecided (dark blue) pixels

Ellipse Fitting
Motivation:
Concise descriptions of segments Each ellipse should represent a segment with similar shape Not necessarily correspond to body parts
Goodness of fit measures control ellipse fitting:

Occupancy of the ellipse Coverage of the segment
ACIVS 2007 Tutorial
74
37
ModelModel -based Human Posture Reconstruction

Ellipse parameters are exchanged between cameras
Reduced data communication load for collaboration
1
1 2
2
3
3
A server node or each of cameras collect the data and create a virtual skeleton
4
4
Goodness of ellipse fits to segments
Projection on image planes
E.g. parameters for the upper body (arms)
ACIVS 2007 Tutorial
75
Skeleton Fitting
Frame1
Frame28
Frame70
Frame81
Frame105
Frame148
ACIVS 2007 Tutorial
76
38
Summary
Main difference of a camera network: Spatially distributed vision information
-> huge potentials for 3D interpretations, especially for human activities
How to achieve systematically and efficiently

Model-based In-node processing Collaboration between views
ACIVS 2007 Tutorial
77
Open Questions
How much advantageous g over monocular? In what ways? How to use them in the correct way? Capability limit of the camera network (how well can it understand the scene, how many views are needed)? Balance and trade-off trade off : In-node In node v.s. v s collaborative processing Networking: Data exchange v.s. latency
ACIVS 2007 Tutorial
78
39
Outline
ACIVS 2007 Tutorial
79
Aims of This Section
T To introduce i t d smart t camera architectures hit t To motivate smart wireless cameras from a power consumption point of view To discuss the research and industrial approaches
ACIVS 2007 Tutorial
80
40
Vision Systems
Are systems that analyse images and video They report in events/objects/properties DVD recorders, set-top boxes, smart cameras video image
VISION SYSTEM
data
ACIVS 2007 Tutorial
81
Smart Camera Vision System

A definition
Source: Wikipdia
A smart t camera is i an i integrated t t d machine hi vision i i system t which, hi h in addition to image capture circuitry, includes a processor, which can extract information from images without need for an external processing unit, and interface devices used to make results available to other devices.
Franois Berry, Univ. Blaise Pascal, France

41
Granularity of Integration
Smart camera is vision system with an intermediate hardware granularity
Machine Vision System
Smart Camera
Vision chips
PROCESSING IN INTEGRATED LEVEL

Smart Cameras
= Camera + intelligence = The basis for new applications Such as: detection, tracking, scene analysis
Automotive
Mobile Comm.
Surveillance
Consumer
ACIVS 2007 Tutorial
84
42
Challenges in Wireless Smart Cameras

Performance Power consumption Programmability
ACIVS 2007 Tutorial
85
Actually, Why a Smart Wireless Camera?
Why not a radio connection to PC that does processing?

43
Power Consumption of a Wireless VGA Camera
A typical wireless digital VGA camera needs 200 mWatt to transmit life video over a short distance. 1 year of daily use means 1750 Wh Using a good fuel cell it consumes about 2.2 liters of methanol fuel per year. This takes about 400X the total camera volume
Consumption is in the Radio Path

01101 0 DS P DA RF PA
TRANSMITTER
ET
EL
transmit signal processing Link energy per bit energy per bit
ET [nJ/bit] Bluetooth GSM (0.2 Watt) 150 500-1000 EL [nJ/bit] 1 2500
ACIVS 2007 Tutorial
88
44
Power Consumption for Radio

Power mains 1W 802.11a Bluetooth Zigbee PicoRadio
power/bit does not scale with Moores Law
UTP
100m battery 10m 1m Auto- 100 nomous
Data Rate b/ b/s 100M
100
1k
10k
100k
1M
10M
~sensor
~ speech, audio, hifi
~ moving pictures
Raf Roovers
ACIVS 2007 Tutorial
89
Computational Efficiency is Growing (Moore)

1011 1010 109 108 107 106 105 104 103 102 101 100 0 2 80
Age in Years Human Brain
[GOPS per p Watt]
SIMD Processor
ASICs
CPUs
Pentium 4
0.5
0.25
0.13
0.07
Feature size [um] SIMD Single Instruction Multiple Data

45
Strategy for Wireless Cameras?

For short range transmission a physical limit on the DA converter dictates the power consumption, will not get lower.
Programmable processors continue to become more energy efficient in GOPS/Watt. Especially SIMD processors which are suitable for a large part of an image processing chain
So, go towards smart cameras with SIMDs where the video analysis is done in the camera itself and only events are forwarded
ACIVS 2007 Tutorial
91
Some LowLow-Cost Smart Cameras
CMUcam3 (ARM7) 60 MIPS @ 650mW
Stanford MeshEye Mote (ARM7)
ACIVS 2007 Tutorial
92
46
Smart Wireless Camera Platform
WiCa (Xetal SIMD) 50 GOPS @ 600mWatt

ACIVS 2007 Tutorial
Cyclops (AVR RISC) 8 MIPS @ 50mWatt?

93
Architecture
Smart Camera
Embedded system
Operates in fixed run-time (Real Not end end-user user constraints ( programmable time)
Differentiating features: power cost performance Runs a few applications often known at design time
ACIVS 2007 Tutorial Distributed Vision Networks

94
47
Smart Camera Designs for Research

Technology
Processor approach: DSP, Media Processor, GPU
Programmable Logic approach: CPLD, FPGA

Processor Approach
GPU and DSP
DSP (Digital Signal Processor) :
Microprocessor designed specifically for digital signal processing processing, generally in real-time computing Features: Highly parallel accumulator and multiplier : MAC Operation Fixed-point arithmetic is often used to speed up arithmetic processing. The specification of DSPs is 4 algorithms: Infinite Impulse Response (IIR) filters Finite Impulse Response (FIR) filters FFT Convolvers

48
Smart Camera Designs for Research

Technology
Processor approach: DSP, Media Processor, GPU
Programmable Logic approach: CPLD, FPGA

Programmable Logic Approach

FPGA (Field Programmable Gate Array):
Its an electronic component used to build dedicated digital circuits Integrated circ circuit it able to change interconnecti interconnectivity it of a large n number mber of fundamental computing components via configuration information stored in onboard static RAM
FPGA
Hardware Description
VHDL
Customized IC =
Processor Specific Glue logic Specific p applications pp
Verilog

49
Programmable Logic Approach

FPGA (Field Programmable Gate Array):
Increasing speed & density Increased I/O pin count and bandwidth Lower power Integration of hard IP (e.g. multipliers, processor soft cores,)
Maximum Sustained Single Precision Floating Point Operations (GFLOPS)
$350 $300 $250 Cost per 1 Mi illion Gates
FPGA Costs
300 250
Oper rations
Maximum Floating-Point Multiply-Accumulates
200 150 100 50 0

1998
$200 $150 $100 $50 $0
1999
2000
2002
2005
1998
1999
2000
2001
2002
2003

Smart Camera for Research

A very efficient smart cam could be based on an efficient mix of FPGA, DSP, GPU,!!!

50
Smart Camera Design for Consumer Use

Challenges: Performance Energy consumption Cost Let us look at an example
ACIVS 2007 Tutorial
101
Example Event Casting: Face Detection
ACIVS 2007 Tutorial
102
51
Face Detection Application Mapping

low level
intermediate
Video
level
high level
Data
Pixel processing: Haar filters for every y pixel p similar SIMD 10++GOPS
ACIVS 2007 Tutorial
Image processing: Application: Image pyramid Draw box, event for every y image g similar FPGA/DSP 100MOPS
y event For every different CPU 1MOPS

103
SIMD Single Instruction Multiple Data
Why SIMD for LowLow-Level?

High-performance (need > 10GOPS) High Hi h i internalt l bandwidth b d idth (need ( d > 500Gb/ 500Gb/s) )
A PE C
B
instruction
Bandwidth =
10GOPS * 3 * 16bits
SIMD Single Instruction Multiple Data

52
Uniprocessor to SIMD: 1 PE
4.6mm2 0.6mm2 1mm2 0.02mm2
Data Memory y
100MHz
Program Memory
DSP Performance Size Performance/area Overhead Bandwidth

ACIVS 2007 Tutorial
Control 100MOPS 5.22mm2 19MOPS/mm2 26% 4.8Gb/s
105
Uniprocessor to SIMD: 2PEs

4.6mm2 0.6mm2 1mm2 0.02mm2 0.02mm2 1 2
100MHz
Program Memory
DSP
Control 200MOPS 5.24mm2 38MOPS/mm2 25% 9.6Gb/s
Performance Size Performance/area Overhead Bandwidth

ACIVS 2007 Tutorial
106
53
Uniprocessor to SIMD: 100PEs

4.6mm2 0.6mm2 1mm2 0.02mm2 0.02mm2 1 2
100MHz
Program Memory
DSP
100
Control 10 GOPS 8.2 mm2 1.2 GOPS/mm2 20% 480 Gb/sec.
Performance Size Performance/area Overhead Bandwidth

ACIVS 2007 Tutorial
107
Uniprocessor to SIMD
RISC : 1PE 50MHz Peak Performance Size Performance /area 0.18u Overhead Bandwidth Peak Power Consumption 0.05 GOPS 6.4 mm2 0.008 GOPS/mm2 26% 2 Gb/S Xetal-II SIMD :
320PE@150MHz
Pentium4 2.4GHz 6 GOPS 131 mm2 0.045 GOPS/mm2 ??% 58 Gb/S 59 Watt
100 GOPS
44.4 (0.18u) 11.1 (0.09u) mm2
2.25 GOPS/mm2 12% 1.5Tb/S 1.0 Watt
ACIVS 2007 Tutorial
108
54
Why is SIMD Low Low-Power?

Typical DSP instructions need 4 accesses to memory
A PE C
ACIVS 2007 Tutorial
instruction
C = A + B; C = A > B ? A : B;
Why is SIMD Low Low-Power?

SIMD have multiple PEs in parallel A ith ti always Arithmetic l h has t to b be d done But: Instruction fetch is shared multiple times. Data (A,B,C) access is shared in multiple-word-wide memories g an 8 times wider memory y takes Accessing half the amount of energy per data entity.
ACIVS 2007 Tutorial
110
55
SIMD Energy Consumption

SIMD Energy Consumption 25
Basis: Convolution
20
Enery Consumption[nJ/pixel]
Computation Communication Memory access
15
W=13 10
W=11 W=9
W=7 W=5 W=3
100
200
300 400 Number of PEs
500
600
Without voltage scaling, energy saving levels off

111
Parallelism Memory Localization

SIMD Power/Energy Scaling

[40,1]
1 30 GOPS 0.9 Scaling Energy Consumption in SIMD Architectures
[Vddmax, fmax]
0.8
0.7
Energy Scaling Factor
20 GOPS 0.6
0.5
Parallelism enables scaling Vth limits degree of scaling fmax = 50 MHz

Sets Pmin
0.4
10 GOPS
0.3
[200,0.3]
0.1 GOPS
N = 640 pixels
Sets Pmax = 640 [Vddmin, fmin]
0.2
0.1
100
200
300 400 Number of PEs
500
600
700
ACIVS 2007 Tutorial
112
56
Consumer Smart Wireless Camera Architecture
SIMD
DSP
CPU
Event reporting
SIMD: Single Instruction Multiple Data

Smart Wireless Camera Platform
WiCa

ACIVS 2007 Tutorial
IC3D/Xetal3 based Stereo sensor input 50GOPS performance T i l 100 Typical 100milli-Watts illi W tt ZigBee node Battery powered C++ programmed
114
57
Smart Wireless Camera PCB

ZigBee module
Battery module
Ben Schueler, NXP ACIVS 2007 Tutorial Distributed Vision Networks 115
Picture of WiCa Setup
Alexander Danilin, NXP

58
Which Algorithms Run Easily on WiCa?

Where much of the application is running on the SIMD Where the DSP/CPU is used for limited or occasional tasks only Choose appropriate algorithmic basis for scene analysis
For F example: l feature f t b based d
ACIVS 2007 Tutorial
117
What Have We Mapped to WiCa?

Face detection: soft edge features
Horizontal soft edges
Vertical soft edges
ACIVS 2007 Tutorial
118
59

Wireless intruder detection system
ACIVS 2007 Tutorial
119

Object recognition applications
ACIVS 2007 Tutorial
120
60

Depth estimation from stereo
ACIVS 2007 Tutorial
121
Some Power Consumption Results

Object recognition Face detection Stereo depth estimation Gesture recognition 25mWatt 40mWatt 50mWatt 15mWatt
ACIVS 2007 Tutorial
122
61
Conclusions
Low-power smart wireless cameras can be d i designed d with ith SIMD f front-ends t d Real-time applications ZigBee node opens research challenges for distributed camera networks WiCa in use at a number of sites
ACIVS 2007 Tutorial
123
Outline
ACIVS 2007 Tutorial
124
62
Summary
Smart camera networks:
Enable novel user-centric applications:
Interpretive Context aware User centric
Processing at source allows:

Image transfer avoidance Scalable networks p reports p Descriptive
Privacy issues:
Awareness of user choices In-node processing and image transfer avoidance Model-based or silhouetted images
Summary
Smart camera networks:
Algorithm design is key in efficient use of computing resources
In-node feature extraction and opportunistic fusion Use of key features in the data exchange mechanism Model-based approach provides feedback / initial points for in-node processing
Balance between in-node and collaborative processing

Communication cost Latency Processing complexities Levels of data fusion
ACIVS 2007 Tutorial
126
63
Towards Active Vision

Active vision in feature extraction:
Use of key features instead of generic features (edges, motion, etc.) Detection of prominent color / texture attributes Use of spatiotemporal fusion results to learn key features
Active vision in modules with processing load:

Instead of avoiding methods with high processing cost / latency:
Define what they should look for Perform initialization to restrict searches
Active vision in gesture analysis:

Use history of subject and semantic meanings of gestures to feedback what is important to detect
ACIVS 2007 Tutorial
127
Outlook
Applications:
Select best view of person of interest in real-time tracking Adjust presentation based on speakers speaker s gestures Manipulate virtual world with free hand / finger gestures Detect accidental falls at home / elderly care Reconstruct face model from a few snapshots Build 3D models of objects Place people and their actions in virtual world
ACIVS 2007 Tutorial
128
64
Outlook
Agents Response systems Smart environments
Robotics
Feedback ( features, parameters, decisions etc. decisions, etc )
Enabling technologies: o o o o Vision processing Wireless sensor network Embedded computing Signal processing
Artificial Intelligence
Context Event interpretation Behavior modeling
Smart Environments
Assisted living Occupancy sensing Augmented reality
Scene construction Virtual reality Gaming
Multimedia
Human Computer Interaction
Immersive virtual reality Non-restrictive interface Robotics
ACIVS 2007 Tutorial
129
Outlook
Enabling technologies: o o o o Vision processing Wireless sensor networks Embedded computing Signal processing
Artificial Intelligence (AI)
Context Event interpretation Behavior models
Feedback ( features, parameters, decisions, etc. )
Quantitative Knowledge
Qualitative Knowledge
Immersive virtual reality Non-restrictive Non restrictive interface Interactive robotics Scene construction Virtual reality Gaming Agents Response systems User interactions
Human Computer I t Interaction ti
Multimedia
Smart Environments
Robotics
ACIVS 2007 Tutorial
130
65
References
H. Aghajan and C. Wu, From Distributed Vision Environment to Human Behavior Interpretations, Behaviour Monitoring and Interpretation Workshop at the 30th German Conference on Artificial Intelligence, Sept. 2007. C. Wu and H. Aghajan, Model-based Human Posture Estimation for Gesture Analysis in an Opportunistic Fusion Smart Camera Network, Int. Conf. on Advanced Video and Signal based Surveillance (AVSS), Sept. 2007. C. Chang and H. Aghajan, A LQR Spatiotemporal Fusion Technique for Face Profile Collection in Smart C Camera S Surveillance, ill I Int. t C Conf. f on Ad Advanced d Vid Video and d Si Signal lb based dS Surveillance ill (AVSS) (AVSS), S Sept. t 2007 2007. C. Chang and H. Aghajan, Spatiotemporal Fusion Framework for Multi-Camera Face Orientation Analysis, Advanced Concepts for Intelligent Vision Systems (ACIVS), August 2007. C. Wu and H. Aghajan, Model-based Image Segmentation for Multi-View Human Gesture Analysis, Advanced Concepts for Intelligent Vision Systems (ACIVS), August 2007. Hamid Aghajan and Chen Wu, Layered and Collaborative Gesture Analysis in Multi-Camera Networks, Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), April 2007. Chen Wu and Hamid Aghajan, Opportunistic Feature Fusion-based Segmentation for Human Gesture Analysis in Vision Networks, IEEE SPS-DARTS, March 2007. Chen Wu and Hamid Aghajan, Collaborative Gesture Analysis in Multi-Camera Networks, ACM SenSys Workshop on Distributed Smart Cameras (DSC) (DSC), Oct. Oct 2006 2006. Chung-Ching Chang and Hamid Aghajan, Collaborative Face Orientation Detection in Wireless Image Sensor Networks, ACM SenSys Workshop on Distributed Smart Cameras (DSC), Oct. 2006. A. Maleki-Tabar, A. Keshavarz, H. Aghajan, Smart Home Care Network using Sensor Fusion and Distributed Vision-Based Reasoning, ACM Multimedia Workshop On Video Surveillance and Sensor Networks (VSSN), Oct. 2006. A. Keshavarz, A. Maleki-Tabar, H. Aghajan, Distributed Vision-Based Reasoning for Smart Home Care, ACM SenSys Workshop on Distributed Smart Cameras (DSC), Oct. 2006.
http://wsnl.stanford.edu/publications.php
www.ICDSC.org
Tutorials: Andrea Cavallaro, Queen Mary University of London, UK: Smart Cameras: Algorithms, Evaluation and Applications Bjoern Gottfried, U. of Bremen, Germany: Ambient Intelligence and the Role of Spatial Reasoning: Smart Environments with Smart Cameras Richard Radke, Rensselaer Polytechnic Institute, USA: Multiview Geometry for Camera Networks Networks Wilfried Elmenreich, Vienna Univ., Realtime Sensor Networks for Smart Cameras: Communication, Data Processing and Applications PhD Forum: Students present their PhD research in spot posters and short talk Tutorial lecturers will discuss potential research topics with students
First ACM / IEEE International Conference on Distributed Smart Cameras (ICDSC-07)

Smart camera architectures Image sensing techniques for smart cameras Embedded vision programming Fusion of vision and other sensors Distributed vision processing algorithms Distributed appearance modeling Collaborative C ll b ti f feature t extraction, t ti d data t and dd decision i i f fusion i Architectures and protocols for camera networks Wireless and mobile image senor networks Position discovery and middleware applications Vision-based smart environments Surveillance and tracking applications Multi-view vision for human-computer interaction 3D scene analysis Distributed multimedia and gaming applications
September 25-28, 2007 Vienna, Austria
132
66

Vision Sensor Network Tutorial

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Vision Sensor Network Tutorial

Hochgeladen von

Copyright:

Verfügbare Formate

Distributed Vision Networks

August 27, 2007 Delft University, Netherlands

ACIVS 2007 Tutorial

Distributed Vision Networks

Technology Cross Cross-Roads

Smart Camera Networks

Vision Processing Architecture? Algorithms? Applications?

Scene understanding Context awareness

ACIVS 2007 Tutorial

Potential impact on design methodologies in each discipline

Sensor Networks Perspective

ACIVS 2007 Tutorial

Distributed Vision Networks

Vision Processing Perspective

Use communication to exchange descriptions - not raw data

Change of paradigm: Networked vision sensors

Distributed Vision Networks

ACIVS 2007 Tutorial

Distributed Vision Networks

Distributed Vision Networks

Novel smart environment applications:

ACIVS 2007 Tutorial

Distributed Vision Networks

Distributed Vision Networks

ACIVS 2007 Tutorial

Distributed Vision Networks

Distributed Vision Networks

Feedback ( features, parameters, decisions etc. decisions, etc )

Distributed Vision Networks ( DVN )

Context Event interpretation Behavior modeling

Assisted living Occupancy sensing Augmented reality

Scene construction Virtual reality Gaming

Human Computer Interaction

Immersive virtual reality Non-restrictive interface Robotics

ACIVS 2007 Tutorial

Distributed Vision Networks

Application Potentials: View Selection

ACIVS 2007 Tutorial

Distributed Vision Networks

Application Potentials: Assisted Living

ACIVS 2007 Tutorial

Distributed Vision Networks

Application Potentials: MultiMulti-Finger Gesture

ACIVS 2007 Tutorial

Distributed Vision Networks

Application Potentials: Face Profiling

Camera 3 (Test set)

Camera 1 (Training set)

ACIVS 2007 Tutorial

Distributed Vision Networks

Application Potentials: 3D Model Reconstruction

ACIVS 2007 Tutorial

Distributed Vision Networks

Application Potentials: Virtual Reality

ACIVS 2007 Tutorial

Distributed Vision Networks

ACIVS 2007 Tutorial

Distributed Vision Networks

Smart Camera Networks

Fusion Dimensions Space (views) Time

Key features and key frames:

Layered Spatial Collaboration

Description Layer 4 : Gestures

Mutual reasoning: Soft decision fusion - Joint estimation