Sie sind auf Seite 1von 84

VIDEO SURVEILLANCE

BIR BHANU
bhanu@cris.ucr.edu
http://vislab.ucr.edu

Center for Research in Intelligent Systems


University of California at Riverside
Riverside, CA 92521, USA
July 30, 2018

www.vislab.ucr.edu
SURVEILLANCE
- Dictionary Meaning -

- A watch kept over a person, group, etc.,


especially over a suspect, or the like
- Continuous observation of a place, person or
ongoing activity in order to gather information
- Attentive observation, as to oversee and direct
someone or something
- Covert/overt monitoring

www.vislab.ucr.edu
Video Surveillance
Sensors, Objects, Actions/Activities

- overlapped/non-overlapped
- at a distance, local, wide area
- distributed/centralized
- implementation
- real-time/forensic
- applications

Reference
B. Bhanu, C. Ravishankar, A. Roy Chowdhury, H. Aghajan and D. Terzopoulos (Eds.),
“Distributed Video Sensor Networks,” ISBN 978-0-85729-126-4, 503 pages, Springer,
2011.
www.vislab.ucr.edu
Video Surveillance
• Video cameras
• Other imaging/non-imaging sensors
• Fusion of sensors
• Algorithms

• TOPICS/Example Applications
• Humans, Re-identification
• At a distance - Gait, side face, ear
• Newborn babies pain
• Driver emotions
• Activities (sports)
• Vehicles
www.vislab.ucr.edu
Things to Come (1)
• Overview and sample applications

• Surveillance of Human Emotions in Video


Image vs. continuous video based natural
emotion recognition
Three Applications
- Driving experience of sports cars (stress &
inattention)
- Pain of neonates (health)
- Zapping Index: Interestingness of commercials
Things to Come (2)
Human Re-Identification
Reference-based framework for re-identification in images
Online learning for re-identification in video
- Attributes co-occurrence pattern mining
- Unbiased temporal representation for re-identification
- Deep agent with reinforcement learning for re-identification

Vehicle Surveillance in Video


- Structural signatures for passenger vehicle classification
- Ensemble of deep networks for vehicle classification
- Robust visual rear ground clearance estimation and classification of a
passenger vehicle
- Vehicle make and model recognition
- Vehicle logo recognition for make recognition
- 3D vehicle model building from video
Things to Come (3)
• Video Analytics in Camera Networks
Camera selection, camera hand-off, camera control

• Human Identification in a Network of Video Cameras


Gait (2D/3D), face, ear, fusion, performance

• Dynamic Sensor Fusion


Physics-based evolutionary strategies and dynamic
sensor fusion for moving object detection
DyFusion: Dynamic IR/RGB fusion for vessel recognition
Face and gait based human recognition at a distance
Segmentation of the growth of pollen tubes
Things to Come (4)
• Tracking
Online learned grouping model with non-linear motion
context
Multi-target tracking in a network of non-overlapping
cameras
A CRF model with social grouping information
A reference set based appearance model
Crowd simulation integrated pedestrian tracking
Group structure preserving pedestrian tracking
Context-aware crowd simulation in pedestrian tracking
Tracking multiple people using social groups
Video analytics for sports
Cameras are Everywhere
• Cities, Public Places, Casinos, Sports Events: Thousands of cameras
are installed in public places like city squares, subway stations, and
airports. Multiple sensors such as video cameras, infrared or range
sensors gather data from various view points, which are then sent to
central servers.
• Super Bowl/Olympics: Surveillance system with a large number of
cameras has been used to keep track of activities.
• Little Intelligent Processing of the Data: Central facility usually
consists of a monitoring station with human observers.
• Not enough people: To watch a large number of video streams 24
hours a day; it is also not interesting.
• Forensics vs. Real-time: Detection of suspicious activities.
• Security and Privacy: Aspects of individual recognition.
• Focus of this Talk: Automated Data Processing.
www.vislab.ucr.edu
The Challenge – 2004/5
• Build a scalable wireless camera network
– supports a large number (~100) of Video cameras (Fixed
and Mobile Pan/Tilt/Zoom).
– allows arbitrary selection and changing of processing
algorithms (centralized vs. distributed) on-the-fly.
– outputs high resolution video (≥ 640x480).
– operate year-round, 24 hours/day.
– used as a resource for community for data collection
from multiple cameras (outdoor/indoor) for multitude of
experiments.
• …and an infrastructure capable to process 100s of video
streams in real-time.

www.vislab.ucr.edu
VideoWeb Laboratory
• An Outdoor/Indoor Laboratory for Research & Education
• A wireless video network of 80 Pan, Tilt and Zoom (PTZ)
cameras for large scale experiments. All entry/exit
locations have cameras. (EURISP 2010)

•UCR www.vislab.ucr.edu •VISLab


UCR Wireless Network of Fixed and Mobile Cameras
Robot-Tom Robot-Harry Robot-Cliff
Onboard
Camera

Fixed
Camera 1

Fixed
Camera 2

Vislab

UCR
Wall of Video Streams

www.vislab.ucr.edu
Architecture and Servers
• Three levels of computation
- Low-level: Detect/track moving objects
– Mid-level: Feature extraction,
multi-camera tracking
– High-level: face/gait and activity
recognition, scene analysis,
Biometrics – soft/hard
• Requirements:
– Process over 98 MB/s of video data
in real-time
• Solution:
– Rack of 32 custom-built servers
• Intel Core 2 Quad Q6600 processors,
expandable to 8 cores/server
• 2GB of Memory, expandable to 24GB
• 250 GB of disk space, expandable to 4TB
– Interface server for user control of entire network
www.vislab.ucr.edu
Optimizing Network
• Measure the performance of a wireless camera network.
Depending on the task, there are numerous “optimal” ways to configure
– Biometrics - Maximize resolution and video quality!
– Surveillance recording - Maximize reliability!
– Tracking - Maximize frame rate and response!
• Configure the network to optimize its performance for specific
tasks. “Configuring the network” may consist of changing camera
settings, bandwidth controls, network infrastructure, etc.
• Performance metrics to optimize:
– Maximize frame rate, Minimize standard deviation of frame rate
– Minimize video compression, Maximize video resolution
– Minimize maximum lag time
• Managing the tradeoff between parameters:
– Higher resolution may lead to lower frame rates.
– Lower compression may lead to longer lag.
– Higher number of users reduces throughput.

www.vislab.ucr.edu
Pareto Set %s for All Cameras
Compression \
Resolution 100 60 30 20 0
176×120 46% 66% 46% 51% 74%
352×240 34% 46% 26% 34% 91%
704×240 51% 29% 17% 54% 97%
704×480 34% 31% 63% 94% 100%

• Surprisingly, every configuration was


Pareto efficient for at least one camera
• However, only 4 rank into the top 10th
percentile and 10 in the 50th percentile
www.vislab.ucr.edu
Top Four Configurations
(normalized and averaged)
704x480, 704x240,
0 compression 0 compression

704x480, 352x240,
20 compression 0 compression

www.vislab.ucr.edu
Data Collection: 8 Cameras
• Sample Actions:
– Running, Sit on bench, Wave off (Ignore), Observe from
afar (more than 10 feet), Sit cross legged, Walking with
crutches, Walking close (within One foot)
– CameraClient also capable of recording raw or processed
data. (Denina, et al. DVSN 2011)

www.vislab.ucr.edu
Intersection: 7 Camera Views

• Sample Actions:
– People: Getting out of a vehicle, Walking, Close door, Stand
by the vehicle, Carry object
– Cars: Right turn, Stop at intersection

www.vislab.ucr.edu
Camera Selection, Hand-off and Control
• How to track multiple persons using
multiple cameras?
• Which camera is the ‘best’ for each
person? Is the ‘best’ camera unique?
• How to get a stable solution?
• No detail calibration is required.
• Some Related Work
• Need topology information (Not
flexible for active cameras)
• Master-slave scheme (require
overlapping FOVs)
• No “best camera” is selected

www.vislab.ucr.edu
Re-Identification
Suspect Suspect
Camera
1 2
A Recognizing
individuals in
non-
Same
person?
overlapping
Camera cameras at
B different time
and locations
Different Changing
Occlusion
Pose Illumination
Plus low resolution, image
noise, blurriness …

Best Paper Award of


IEEE AVSS Conference,
September 2013
Drawbacks of Traditional Approaches

“The variations in feature space between the images of the same


person due to pose and illumination change are almost always larger
than image variations due to change in person identity.”

Different People
Look Similar

Same People
Look Different

Direct comparison is not that reliable…


A Reference Set Helps!
Intuition: A reference set can be used to indirectly
compare two sets of data (images from different camera
views) Jennifer Lopez Jennifer Lopez

Reference Set Angelina Jolie


Anomalous Activities in a Camera Network

Network Topology and Semi-supervised Learning

• Normal (Class 1)

• Break-in (Class 2)

• Stay (Class 3)
ous
suspici
suspicious
• Sudden appearance
uncertain and disappearance
Sudden appea
rance/d isappearance (Class 4)

IEEE ICIP 08, 09


IEEE AVSS 08 , EURISP 2009
www.vislab.ucr.edu
Bio-Inspired Algorithms for Tracking

• Adapting swarm intelligence


approaches to multi-object
tracking with non-stationary
backgrounds
• Particle Swarm Intelligence
• Bacterial Foraging Optimization

IEEE CEC 2011, ACM GECCO 09 Track Accuracy comparison


www.vislab.ucr.edu
Multi-target Tracking
Problem: Multi-target tracking in non-overlapping cameras
Goal: Associate tracks that contain the same target

26
www.vislab.ucr.edu
Pedestrian Tracking with Crowd Simulation

• Difficulties in pedestrian tracking


– Occlusion, illumination change, similar
appearance, etc.
– Severer when density of pedestrians becomes
higher
• Basic idea
– Use multiple cameras with overlapping field-of-
views
– Add crowd simulation to provide predictions
• Novelty
– Crowd simulation is used as additional
information in tracking
– New framework is proposed to integrate crowd
simulation into filtering-based tracking
– Two different crowd simulators are investigated
CVIU 2014
Recognition of Individuals
Identification: Automatically searching a facial
image (probe) in a database (gallery) resulting in
a set of facial images ranked by similarity.
Gallery Probe Results
1 2 3

Image 1 Yes/No

Verification: Given a pair


of face images, determine
if they belong to the same Image 2
person or not
Face Recognition Research
Goal: Perform face recognition in real-
world scenarios
 Pose variation
 Expression
 Misalignment
 Scale variation
 Varying
illumination
conditions
 Aging
www.vislab.ucr.edu
RFG for Unconstrained Face Recognition
Novelties
- A novel reference face graph (RFG) – Graph theory based method
- Graph centrality measures are utilized to identify distinctive faces
- DCT locality-sensitive hashing for fast similarity computation
efficient retrieval, and scalability.
- Any feature descriptor can be used. Even with LBP features the
proposed method achieved very competitive results compared
to other recent methods.
- Alignment free and robust to real-world distortions

• References: IEEE TIFS Dec. 2014; IEEE TMM June 2014


Gait-based Human Recognition
•t

•y

•x

• Gait – Manner of human walking


• Advantages over traditional methods (Fingerprint, iris, face)
– Does not require a cooperative subject
– Does not require physical contact or close proximity
• Challenges
– Difficulties associated with reliable moving human detection
– Large intra-person variation of gait appearance in various situations
– Lack of discrimination analysis for gait-based human recognition
www.vislab.ucr.edu
Gait Templates Construction
• Observation
– c ranges of human walking along different directions may be overlapping
– GEIs extracted from different sequences of the same person are similar if they
have similar c ranges
• Template construction
– Generate GEIs from the original silhouette sequence at the interval of 1/4
walking cycle (two adjacent GEIs have the overlap of 3/4 cycles)

a0

a  /6

a  /3

www.vislab.ucr.edu
USF HumanID Database
• Version 1.7

Label of Size of Data Recoding Conditions


Dataset Dataset
Gallery 71 Grass (Surface), A (Shoe), Left (camera view)
Probe A 71 Grass, A, Right
Probe B 41 Grass, B, Left
Probe C 41 Grass, B, Right
Probe D 70 Concrete, A, Left
Probe E 44 Concrete, B, Left
Probe F 70 Concrete, A, Right
Probe G 44 Concrete, B, Right

• Evaluate the performance of gait recognition approaches


• Evaluate the effect of environmental condition changes

IEEE TPAMI 2006


www.vislab.ucr.edu
Rank1 Performance
100%
80% USF
60% CMU
40% UMD
20% UCR
0%
A B C D E F G
• Legends
– Rank1: only the first candidate in the rank list is considered
– USF: direct frame shape matching approach
– CMU: key frame shape matching approach
– UMD: Hidden Markov Model approach
– UCR: our approach
• Observations
– Our approach is better than other approaches
– The surface change in D-G results in dramatic performance drop (introducing
more distortion in the silhouette)
Human in Thermal Infrared Imagery

•Noon

•Late
Afternoon

•Night

www.vislab.ucr.edu
Gait Template Construction

•Normalized binary silhouette sequence •GEI

•Slow
Walking

•Fast
Walking

•Run

www.vislab.ucr.edu
Performance Prediction
• Question
– How many people can be recognized by gait?
• Required information
– Feature distribution over population
– Uncertainty in feature estimation (estimation error distribution)
• Technical approach
– Define feature distribution space based on anthropometric data
– Construct feature uncertainty model in ideal situation
– Predict human recognition performance on static gait features
•Real
•Feature feature

• . •• . . distribution
Space •Feature
• .
• . uncertainty
• . • . • .

www.vislab.ucr.edu
Fusion of Color and Infrared
• Color Video • Infrared Video (Long-wave)

• Motivation
– Unreliably extracted body parts from one sensor might be reliably
extracted from the other sensor
– Moving object in a scene provides correspondence for automatic image
registration between synchronized color and infrared videos
• Contributions
– A hierarchical evolutionary computation approach for automatic image
registration
– Fusion of color and infrared videos to improve human detection
performance PR 2007
www.vislab.ucr.edu
Automatic Image Registration Result

•Original
Color Images

•Registered
Color Images

•Original
Infrared Images

www.vislab.ucr.edu
Sensor Fusion Results
• Original • Registered • Original • Detected • Detected • Sensor
Color Color Infrared Images Color Infrared Fusion
Images Images Silhouettes Silhouettes Results

www.vislab.ucr.edu
3D Gait - Body Data
• Human body measurement system captures 4 representative
poses of walking humans

Pose 1 Pose 2 Pose 3 Pose 4


Body Model
• Kinematic model of hierarchical structure
– 12 body parts approximated by 3D tapered cylinders
– 40 DOFs (6 whole body, 22 joint angles, 12 part lengths)
Model Fitting
• Principal Component Analysis (PCA) determines body
axes and centroid.
• Iterated Closest Point (ICP) algorithm fits tapered
cylinders to body parts.
• Fitted models include joint angles and part lengths.
Gait Reconstruction
• Four representative poses recover

angle
arbitrary pose by interpolation of time
joint angles and part lengths

Gait
Pose 1 Pose 2 Pose 3 Pose 4
Reconstruction
• Gait cycle is composed of 20 frames
• There are major differences among human gaits

Subject A Subject B Subject C Subject D Subject E Subject F


Human Ear Recognition in 3D

• 155 subjects, 902 shots (no time-lapse)


• 200X200 range and the registered color images
• Captured by Minolta Vivid 300

•(a) Range Images. (b) Color Images. IEEE TPAMI 07


Examples of Ear Detection

• Examples of ear localization. 99.3 % detection rate (896 out of 902)


3D Ear Recognition

• Gallery Ears

• Probe ear

B. Bhanu and H. Chen, “Human Ear Recognition


by Computer,” Springer 2008.
www.vislab.ucr.edu
Super-resolution

www.vislab.ucr.edu
Real Videos

35x35 80x80

www.vislab.ucr.edu
Integrating Face and Gait

•t

• Problem: Human recognition at a distance in outdoor conditions


• Solution: A fusion system that combines face and gait cues from
video sequences IEEE TSMCB 07, PR 08
www.vislab.ucr.edu
Examples of Improved Resolution
Experiments
SUMMARY OF THREE EXPERIMENTS
Experiments
Data
1 2 3
Number of
45 45 45
Subjects
Number of
subjects with 0 10 10
changed clothes
Number of GEIs for
2 2 1 or 2
testing per video
Number of ESFIs
for testing per 2 2 1 or 2
video
www.vislab.ucr.edu
Video Data
The FERA2011 Challenge
The 2011 IEEE Conference on Automatic Face and
Gesture Recognition (FG2011), Facial Expression
Recognition and Analysis challenge (FERA2011)
• Motivation
– Lack of standard evaluation procedure for Current
systems
• Mission
– Classify five categories of facial expressions from videos

anger fear joy sadness relief


The Competitors
Affiliation Country
UIUC / U. of Missouri USA
UC San Diego USA
MIT / Cambridge USA/UK
CMU / Queensland U. of Technology USA/Australia
Imperial College London UK
Australian National U. Australia
University College London UK
University of Montreal Canada
National University of Singapore Singapore
Vrije Universiteit Brussels Belgium
U. Oulo Finland
Karlsruhe Institute of Technology Germany
Test Results Comparison
Ours: EAI+LPQ
Ours: EAI+LPQ 0.84

UIUC-UMC 0.8 Overall


0.77
Ours: EAI+LBP
Ours: EAI+LBP
0.76
UCSD-CERT
0.73 Person specific
ANU
0.7
UCL
UMont. 0.7

NUS 0.67 Person


0.6 independent
QUT
0.56
Baseline
VUB 0.56
0.44
MIT-Cambridge
0.24
U. Oulo

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1


Challenge official website: http://sspnet.eu/fera2011/
Recognition of Research
• Best Entry Award at
IEEE FG Conference
2011
• NSF IGERT Judges
Award, Video and
Poster Competition
2012
• AVEC 2011 Video
Competition (second
Place in Video
Category)
• AVEC 2012
• Motor Trend Magazine
(2014)
• Evaluation of Commercials
• IEEE TSMCB (2012),
• IEEE TAFF (2014a, 2014b)
• Neurocomputing 2014 Fontaine Emotional model - valence, arousal, power, expectancy
Efficient Smile Detection by ELM
Facial expression is crucial in understanding
human emotions and behaviors
Among various facial expressions, smile is
important and it conveys joy, happiness,
satisfaction, etc.

60
Zapping Index based on Smile
Correlation of Facial Expression With Race Track

CorkScrew Turn
8-8A

Emotion:
Tense
Pain of Neonates
• It is estimated that 15 million premature babies
are born every year worldwide. 1 million of these
babies die each year due to complications in
preterm birth.
• These neonates are non verbal and cannot
communicate their feelings.
• Although a number of pain instruments have
been developed to assist health professionals,
these tools are subjective and skewed and may
under/over estimate the pain response.
• This could further lead to misdiagnosis and
under/over treatment which has a life long
impact on the society.
Vehicle Recognition (2017)
– The MIO-TCD dataset consists of 521,451 images with 11 different
classes namely:
• articulated truck (1.98%), background (30.68%), bus
(1.98%), bicycle (0.44%), car (49.96%), motorcycle
(0.38%), non-motorized vehicle (0.34%), pedestrian
(1.2%), pickup truck (9.76%), single unit truck (0.98%)
and work van (1.86%).
– We added extra images for the 8 classes that have less than 5% of
total images.
External Dataset Total number of images
added
Imagenet 2,247
MIO-TCD Localization dataset 101,234
Pedestrian Re-identification dataset (PETA) 18,000

– All the images in our dataset were resized to maintain aspect ratio
such that the shorter side has length of 256 pixels.
Vehicle Recognition

Single unit Work van Motorcycle Bus Bicycle


truck Work van Motorcycle Bus Bicycle
Single unit
truck
Articulated Pedestrian Non-motorized Pickup truck Work van Motorcycle
truck Pedestrian vehicle Work van Non-motorized Pedestrian
Articulated Non-motorized vehicle
truck vehicle
Non-motorized Work van Car Work van Bus
vehicle Pickup truck Non-motorized Car Car
Articulated vehicle
truck
Single unit truck Car Bicycle Non-
Pickup truck Articulated truck Motorcycle motorized_vehicle
Pickup truck

The boxes in green indicate correct classification, the


boxes in red indicate wrong classification and the boxes
in yellow indicate that the ground-truth was wrong and our
classification was correct.
Build 3D Models of Vehicles

Sedan Type 1 Sedan Type 2 SUV

Jeep Pickup Van

UCR, Vislab, Bir Bhanu, Nirmalya Ghosh


Result for SUV

IEEE T ITS 2010, 2013


Video Bioinformatics

• Video Bioinformatics is at the


intersection of video imaging methods,
computational techniques and Biology.
• Focuses on providing fundamental
understanding of continuous and
dynamic processes involving cells,
organisms, and disease.
• Develops computational techniques
for analyzing 5-D (3-D space, time and
wavelength) videos, obtained by performing
biological experiments, with live cell video
imaging techniques.

www.vislab.ucr.edu
Detection of mTBI
• Mild Traumatic Brain Injury (mTBI) occurs
in 1.3 million cases per year in US
• There is a wide range of causes: vehicle
accidents, assault, sports, falls, etc.
• The current understanding of many of the
mechanisms and direct outcomes of mTBI
are unknown. TBI caused by a Car
accident.
• This is due to the lack of quantitative
measures in the analysis of mTBI. mTBI is on the rise in the military

• Our Goal is to provide an automated


accurate quantitative measure.
Problem Formulation
• MRI is used to noninvasively measure
physical properties of the brain. Position and
• T2 weighted MRI shows both edema and Size of the
extra vascular blood after a mTBI. lesion in the rat
brain.
• A rat controlled cortical impact model is
used to simulate mTBI.
• The problem is the lesions are very low
contrast and small, which causes current
lesion detection approaches to fail.
• Texture measures and Context is
introduced to make detection possible.
T2 Histogram of Normal Brain and
0.03 Lesion
0.02

0.02
Blue - Normal
Red - Lesion
0.01

0.01

0.00
Low contrast nature of mTBI. Left: T2W MRI of the rat

226
251
101
126
151
176
201

276
301
326
351
376
1
26
51
76
brain 1 day after injury. Right: manually segmented T2
lesion (white) and brain (green and blue.) Lesion cannot be separated
based only on T2 value.
Papers: IEEE TMI 2014, IEEE TBME 2014, MEDIA 2014
Human Embroynic Stem Cells
• Promising in the treatment of many diseases
and for toxicological testing.
• Determining the number of various types of
cells automatically in a population of mixed
morphologies - Need good detection,
classification and tracking technique to
understand the behavior of cells over time.
• Modeling of dynamic blebbing phenomenon.
• Capacity to differentiate into diverse human
cell types (process of differentiation)
Segmentation of Stem Cells
Sample Video: General Challenges
1. Low SNR (Signal to Noise Ratio)
of phase contrast images.
2. Poor NDBSU-hESC recognition
when neighboring cells are
undergoing chemical reaction.
(NDBSU-hESC is shown in Figure (a).)
Our contribution is to reduce the
effects of the above problems for
NDBSU-hESC detection. We are
using multiple classifiers to improve
General Cell Types in the Videos: the detection accuracy.

(a). Non-dynamic (b). Dynamic (c). Apoptotic (d). Dead Cell


blebbing Blebbing dynamic
unattached cell blebbing
Sample Results on Videos

Video 1a Video 1b Video 1c Video 1d

Stem cell
colony
detection
and
analysis

IEEE/ACM TCBB 2014, Best Paper IEEE Conf Healthcare Informatics, Imaging, and Systems
Biology Conference, Sept. 27-28, 2012.
Pollen Tube Growth

Integrated Model for Understanding Pollen


Tube Growth in Video
Asongu L. Tambo*, Bir Bhanu*, Nan Luo+,
Geoffrey Harlow+ and Zhenbiao Yang+
*Center for Research in Intelligent Systems, Dept. of Elect. & Comp. Engineering,
+Center for Plant Cell Biology, Dept. of Botany and Plant Sciences,

University of California at Riverside, Riverside, CA 92521, USA

IBM Best Student Paper Award,


Biomedical Track, August 2014.
Pollen Tube Internal Dynamics

• Pollen tube is a single cell


• Growth is by stretching at the
tip/apex
• Pollen tube facts for the plant
Arabidopsis Thaliana:
– Average pollen tube
diameter = [5 - 10]μm
– Average vesicle diameter =
0.182μm
– Average cell wall thickness =
0.2μm
– Average growth speed of 4-
• Artist’s rendition of pollen tube and cytosolic 5μm/min
organelles in the plant Arabidopsis Thaliana
• Green dots represent vesicles travelling to the – In maize, tube length can
apex/tip reach 12 inches.

Pollen tube growth is a complex interplay between internal cell


dynamics and cell wall weakening/reconstruction.
Image: Y. Qin and Z. Yang, “Rapid tip growth: Insights from pollen tubes.,” Semin. Cell Dev. Biol., pp. 1–9, Jun. 2011.
Experiments
• 5 Videos of growing pollen tubes taken with a Leica
Confocal microscope, 20x objective lens.
• Image acquisition rates between 1s – 2s per image
• Average pixel resolution of 0.058μm/pixel
• Image size of 512x512 resized to 256x256 for speed
• Analysis performed on every 5th image to allow for
noticeable tube growth
– Expected average growth distance is 1.4 pixels/sec
– Using a metric minimum distance of 3 pixels
• Tip Accuracy metric:
– C = predicted tip contour
– G = observed tip contour (obtained using GraphCut
segmentation for automated analysis)
Tip Accuracy Metric: – d = locations in ‘C’ that are within 3 pixels of ‘G’
Tracking Cell Tip

• Pauses indicate start of new growth cycle


• Shape accuracy vs. Image number for the • Microscope acquisition rate = 1.6 sec/image
accompanying experimental video.
• Image sampling rate = 8.1 sec/image
• 61 predictions, 19 corrections(black circles)
• Blue curve shows method accuracy before • Average Analysis time = 1.88 sec/image (on
shape correction Intel corei7 processor)
• Green curve shows accuracy after shape
correction
Key Results

• Development of a model that allows for


quantitative measurements of growth
dynamics in video

• Model parameters can adapt to video


evidence and comparisons with AAM are
provided

• Model can lead to mechanistic


understanding of pollen tube growth
and validation of growth hypotheses

• Understanding pollen tube growth can


lead to
– Control/Regulation of plant fertilization
which will affect quality of seeds and
food production
Bimodal Fusion for Segmentation

MAY 2016 VOLUME 25 NUMBER 5 IIPRE4 (ISSN 1057-7149)


JUNE 2016 VOLUME 25 NUMBER 6

For the May 2016 issue, see p. 1929 for Table of Contents.
For the June 2016 issue, see p. 2423 for Table of Contents.
Cover art from: “Segmentation of Pollen Tube Growth Videos Using Dynamic Bi-Modal Fusion and Seam Carving”
by Asongu L. Tambo and Bir Bhanu (p. 1993–2004, Fig. 1).

www.vislab.ucr.edu
Biological Significance of This Research

• The proposed model has ability to make accurate


shape measurements and ion concentration
measurements when said ions are photo-tagged
with dyes. These measurements can lead to:
– Validation of hypotheses on key growth contributors
(e.g., wall mechanics)
– Understanding of the turning mechanism in response
to external stimuli (turning towards/away)
– Discovery of new intracellular dynamics
Things to Come (1)
• Overview and sample applications

• Surveillance of Human Emotions in Video


Image vs. continuous video based natural
emotion recognition
Three Applications
- Driving experience of sports cars (stress &
inattention)
- Pain of neonates (health)
- Zapping Index: Interestingness of commercials
Things to Come (2)
Human Re-Identification
Reference-based framework for re-identification in images
Online learning for re-identification in video
- Attributes co-occurrence pattern mining
- Unbiased temporal representation for re-identification
- Deep agent with reinforcement learning for re-identification

Vehicle Surveillance in Video


- Structural signatures for passenger vehicle classification
- Ensemble of deep networks for vehicle classification
- Robust visual rear ground clearance estimation and classification of a
passenger vehicle
- Vehicle make and model recognition
- Vehicle logo recognition for make recognition
- 3D vehicle model building from video
Things to Come (3)
• Video Analytics in Camera Networks
Camera selection, camera hand-off, camera control

• Human Identification in a Network of Video Cameras


Gait (2D/3D), face, ear, fusion, performance

• Dynamic Sensor Fusion


Physics-based evolutionary strategies and dynamic
sensor fusion for moving object detection
DyFusion: Dynamic IR/RGB fusion for vessel recognition
Face and gait based human recognition at a distance
Segmentation of the growth of pollen tubes
Things to Come (4)
• Tracking
Online learned grouping model with non-linear motion
context
Multi-target tracking in a network of non-overlapping
cameras
A CRF model with social grouping information
A reference set based appearance model
Crowd simulation integrated pedestrian tracking
Group structure preserving pedestrian tracking
Context-aware crowd simulation in pedestrian tracking
Tracking multiple people using social groups
Video analytics for sports
www.vislab.ucr.edu