Beruflich Dokumente
Kultur Dokumente
PH: S0031-3203(96)00107-0
Abstract-To assist human analysis of video data, a technique has been developed to perform automatic,
content-based video indexing from object motion. Moving objects are detected in tbe video sequence using
motiOn se~mentation I?etbo?s. By trac~ng individual objects through tbe segmented data, a symbolic
representation of tbe video IS generated m tbe form of a directed graph describing tbe objects and tbeir
movement. This graph is then annotated using a rule-based classification scheme to identify events of interest
~.g., a~pearance/di.sappearan~e, deposit/removal, entrance/exit, and motion/rest of objects. One may then use ~
m?ex mto. tbe monon graph mstead of the raw data to analyse the semantic content of tbe video. Application of
tb1s techmque to surveillance video analysis is discussed. 1997 Pattern Recognition Society. Published by
Elsevier Science Ltd.
Video indexing
Object tracking
Motion analysis
1. INTRODUCTION
Advances in multimedia technology, including commercial prospects for video-on-demand and digital library
systems, have generated recent interest in content-based
video analysis. Video data offers users of multimedia
systems a wealth of information; however, it is not as
readily manipulated as other data such as text. Raw video
data has no immediate "handles" by which the multimedia system user may analyse its contents. By annotating video data with symbolic information describing the
semantic content, one may facilitate analysis beyond
simple serial playback.
To assist human analysis of video data, a technique has
been developed to perform automatic, content-based
video indexing from object motion. Moving objects
are detected in the video sequence using motion segmentation methods. By tracking individual objects
through the segmented data, a symbolic representation
of the video is generated in the form of a directed graph
describing the objects and their movement. This graph is
then annotated using a rule-based classification scheme
to identify events of interest, e.g., appearance/disappearance, deposit/removal, entrance/exit, and motion/rest of
objects. One may then use an index into the motion graph
instead of the raw data to analyse the semantic content of
the video.
We have developed a system that demonstrates this
indexing technique in assisted analysis of surveillance
video data. The Automatic Video Indexing (AVI) system
allows the user to select a video sequence of interest, play
it forward or backward and stop at individual frames.
Furthermore, the user may specify queries on video
sequences and "jump" to events of interest to avoid
tedious serial playback. For example, the user may select
* E-mail: courtney@csc.ti.com.
607
Content-based retrieval
608
J. D. COURTNEY
Video Data
Motion Segmentation
Motion Graph
:--~--: :----ci\-:
----:
:--~-:---- -~--:
. ; .. :..,HJ .....,.... ,....... .ioi
.u_...,... T.....
, o
.:~.~-~--~-
:::~:,
..-1?---:
. ,
Fig. 1. Relation between video data, motion segmentation information, and tbe symbolic motion graph.
609
Fn
(In, tn)
Cn = ccomps(Thk),
where Th is the binary image resulting from thresholding
the absolute difference of images In and / 0 at h, Th k the
morphological close operation 2 l on Th with structuring
element k, and the function ccomps() performs connected components analysis/Ill resulting in a unique
label for each connected region in image Th k. The
image Th is defined as
T (i ")
h ,J
={1
0
(d)
(b)
(a)
(c)
(e)
(f)
Fig. 2. Motion segmentation example. {a) Reference image I0 . (b) Image In- (c) Absolute difference \In - I0 \.
(d) Thresholded image Th. (e) Result of morphological close operation. (f) Result of connected components
analysis.
610
J. D. COURTNEY
if,.
V~
J/, + v~
. (tn+l - tn),
II.U,: -
.u~+1ll
S II ,if,; - .u~+1ll Vq #
r.
3. For every pair (V~, JV~ = V~+ 1) for which no other Vobjects in Vn have V~+ 1 as a nearest neighbor, estimate
v~+ 1 , the (forward) velocity of V~+l' as
r
vn+l =
.U~+1- ~
;
tn+1 - tn
(1)
(a)
(b)
(c)
(f)
\'-
r "o.
I..... ..__,I
(g)
(h)
Fig. 3. Exposed background detection. (a) Reference image / 0 . (b) Image In. (c) Region to be tested. (d)
Edge image of (a), found using Sobel0 1) operator. (e) Edge image of (b). (t) Edge image of (c), showing
boundary pixels. (g) Pixels coincident in (d) and (t). (h) Pixels coincident in (e) and (t). The greater number
of coincident pixels in (g) versus (h) support the hypothesis that the region in question is due to exposed
background.
611
J. D. COURTNEY
612
Fig. 4. Reference image modified to account for the exposed background region detected in Fig. 3.
FO
Fl
F2
F3
F4
F5
F6
F7
Fig. 5. The output of the object tracking stage for a hypothetical sequence of 1-D frames. The vertical lines
labeled "Fn" represent frame number n. Primary links are shown as solid arcs; secondary links are shown as
dashed arcs.
F8
613
FO
Fl
F2
F3
F4
F5
F6
F7
F8
F9
FlO
Fll
Fl2
Fl3
Fl4
stationary. If equation (2) is true, then the stem is classified as stationary; if equation (3) is true, then the stem is
classified as moving. Figure 7 highlights stationary stems
B, C, F, and H; the remainder are moving.
A branch B ={Vi: i = 1, 2, ... ,NB} is a maximalsize dipath of two or more V-objects containing no
secondary links, for which outdegree(Vi)=1 for
1 :::; i < NB and indegree(V;)=l for 1 < i:::; NB. Figure
8 labels V-objects belonging to branches with the letters
"L" through "T". A branch represents a highly reliable
trajectory estimate of an object through a series of
frames.
If a branch consists entirely of a single stationary stem,
then it is classified as stationary; otherwise, it is classified as moving. Branches "N" and "Q" in Fig. 8 (highlighted) are stationary; the remainder are moving.
A trail Lis a maximal-size dipath of two or more Vobjects that contains no secondary links. This grouping
represents the object tracking stage's best estimate of an
object trajectory using the mutual nearest neighbor criterion. Figure 9 labels V-objects belonging to trails with
the letters "U" through "Z".
A trail and the V-objects it contains are classified as
stationary if all the branches it contains are stationary,
or
(3)
K
FO
Fl
F2
F3
F4
F5
F6
F7
F8
F9
FlO
Fll
Fl2
Fl3
Fl4
FO
Fl
F2
F3
F4
F5
F6
F7
FS
F9
FlO
Fll
Fl2
Fl3
Fl4
J. D. COURTNEY
614
FO
Fl
F2
F3
F4
F5
F6
F7
FS
F9
FlO
Fll
Fl2
Fl3
Fl4
Fig. 9. Trails.
= {Vf,G;, V;~d
(where Vf is the last V-object in L;, and V;~ 1 is the first Vobject in L;+ 1), such that every \'} E H meets the requirement
(4)
where p,~ is the centroid of Vf, v~ the forward velocity of
vf, (tj- f;) the time difference between the frames containing \'} and Vf, and P,j is the centroid of \'). Thus,
equation (4) specifies that the object must maintain a
constant velocity through path H.
A track represents the trajectory estimate of an object
that may cause or undergo occlusion one or more times in
a sequence. The motion analysis stage uses equation (4)
to attempt to follow an object through frames where an
FO
Fl
F2
F3
F4
F5
F6
F7
FS
F9
FlO
Fll
Fl2
Fl3
Fl4
Fl3
Fl4
Fig. 10. Tracks. The dipath connecting trails X and Y from Fig. 9 is highlighted.
FO
Fl
F2
F3
F4
F5
F6
F7
FS
F9
FlO
Fll
Fl2
615
A system has been developed that performs contentbased video indexing for assisted analysis of surveillance
video data (see Fig. 13). The AVI system processes video
sequences using the indexing technique described in
Section 3, then stores the output-the video data, motion
segmentation information, and indexed motion graphin a database. A graphical user interface (GUI) allows the
user to retrieve a video sequence from the database, play
J. D. COURTNEY
616
Table 1. Conditions for annotating V-objects with each of the object-motion events
V-object motion state
Unknown
Moving
Stationary
Appearance
1. Head of track
2. indegree(V) > 0
1. Head of track
2. indegree(V) = 0
Disappearance
1. Tail of track
2. outdegree(V)
1. Tail of track
2. outdegree(V)
>0
=0
Entrance
1. Head of track
2. indegree(V) = 0
1. Head of track
2. indegree(V) = 0
Exit
1. Tail of track
2. outdegree(V)
1. Tail of track
2. outdegree(V)
=0
Deposit
1. Head of track
2. indegree(V) = 1
Removal
1. Tail of track
2. outdegree(V) = 1
(Depositor)
(Remover)
Motion
Rest
Entrance
FO
Fi
Entrance
F2.:
Depositor I Deposit
F3
Motion
FS
F..
Exit
F6..
Appear
Exit
Rest
F7
FS
F9
Disappearance
FlO /Fll
Fl2.:
Removal I Remover
Entrance
=0
Fl3
:Fl4
Exit
Video Indexing
617
The user may select one of these clips, play it forward and
back, and pose a new query using it. The clip(s) resulting
from the new query are then pushed onto the top of the
clipboard stack. The user may also peruse the clipboard
J. D. COURTNEY
618
Y = ('{!', T, V,R,E),
E =Exit
T
R
FO
Fl
F2
F3
F5
F4
F6
F7
F8
F9
FlO
Fll
Fl2
Fl3
Fl4
Fl3
F14
T
Fig. 16. Graphical depiction of the query Y
E =Exit
FO
Fl
F2
F3
F4
F5
F6
F7
F8
F9
FlO
Fll
F12
619
Fl
FO
F2
F3
F4
F5
F6
F7
FS
F9
FlO
Fll
Fl2
Fl3
Fl4
Fll
Fl2
Fl3
Fl4
Fl2
Fl3
Fl4
Fl
FO
F2
F3
F4
F5
F6
F7
F8
F9
FlO
T
1
R
FO
Fl
F2
F3
F4
F5
F6
F7
FS
F9
FlO
Fll
620
1. D. COURTNEY
[0]
[10]
[20]
[30]
[40]
[50]
[60]
[70]
[80]
[90]
[100]
[110]
[120]
(130]
[140]
(150]
(160]
[170]
[180]
(190]
[200]
[210]
[220]
[230]
Fig. 21. Frames from an example video sequence. Frame numbers are shown below each image.
[36]
621
[78]
[110]
Fig. 22. Clips from the video sequence of Fig. 21 satisfying the query "fmd all deposit events". Boxes
highlight the objects contributing to the event.
(a)
(b)
(c)
(d)
Fig. 23. Advanced video analysis example. Clips show: (a) the briefcase being deposited, (b) the entrance of
the person who deposits the briefcase, (c) the briefcase being removed, (d) the exit of the person who
removes the briefcase.
5. EXPERIMENTAL RESULTS
J. D. COURTNEY
622
0
2
2
2
3
1
0
0
10
Detected
0
2
2
3
3
0
0
11
Type I
Type ll
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
Actual
Appearance
Disappearance
Entrance
Exit
Deposit
Removal
Motion
Rest
Total
Detected
3
2
3
2
1
2
3
2
2
3
14
15
Type I
0
0
0
0
0
0
0
0
Type
0
0
0
0
0
0
0
2
0
7
8
0
0
0
3
20
Detected
3
2
8
9
0
0
Type I
Type II
0
0
0
0
0
0
0
1
2
0
0
27
Automatic indexing techniques enable intelligent analysis of video data by creating symbolic "handles" by
623
[OJ
[12]
[24]
[36]
[48]
[60]
(72]
[84]
[96]
[108]
[120]
[132]
(144]
[156]
[168]
[180]
[192]
(204]
[216]
[228]
[240J
[252]
[264]
[276]
624
J. D. COURTNEY
[OJ
(13]
(26]
(39]
(52]
(65]
[78]
[91]
(104]
(117]
(130]
(143]
[156]
[169]
[182]
[195]
(208]
(221]
[234]
(247]
(260]
[273]
(286]
(299]
625
(b)
(a)
Fig. 26. Appearance and exit of an individual pedestrian from Test Sequence 3. Frame F 217 shows the
pedestrian emerging from a car; frame F 248 shows the pedestrian walk out of the field of view.
8.
REFERENCES
9.
1. HongJiang Zhang, Atreyi Kankanhalli and W. Stephen,
Automatic partitioning of full-motion video, Multimedia
Systems 1(1), 10--28 (1993).
2. Akihito Akutsu, Yoshinobu Tonomura, Hideo Hashimoto
and Yuji Ohba, Video indexing using motion vectors, in
Visual Communications and Image Processing Proc. SPIE
1818, Petros Maragos, ed., pp. 1522-1530, Boston,
Massachusetts (November 1992).
3. Mikihiro Ioka and Masato Kurokawa, A method for
retrieving sequences of images on the basis of motion
analysis, in Image Storage and Retrieval Systems, Proc.
SPIE 1662, pp. 35-46 (1992).
4. Suh-Yin Lee and Huan-Ming Kao, Video indexing-an
approach based on moving object and track, in Storage and
Retrieval for Image and Video Databases, Proc. SPIE
1908, Wayne Niblack, ed., pp. 25-36, San Jose, California
(February 1993).
5. Glorianna Davenport, Thomas Aguierre Smith and Nata1io
Pincever, Cinematic primitives for multimedia, IEEE
Comput. Graphics Appl., 67-74 (July 1991).
6. Masahiro Shibata, A temporal segmentation method for
video sequences, in Visual Communications and Image
Processing, Proc. SPIE I818, Petros Maragos, ed., pp.
1194-1205, Boston, Massachusetts (November 1992).
7. Deborah Swanberg, Chiao-Fe Shu and Ramesh Jain,
Knowledge guided parsing in video databases, in Storage
10.
11.
12.
13.
14.
15.
16.
About the Author-JONATHAN D. COURTNEY received the M.S. degree in Computer Science and the B.S.
degree in Computer Engineering and Computer Science from Michigan State University. Mr Courtney is a
Member of the Technical Staff in the Multimedia Systems Branch of Corporate Research and Development at
Texas Instruments. His Master's thesis research, under the direction of Professor Ani! K. Jain, concerned mobile
robot localization using multisensor maps. His current research interests include multimedia information
systems and virtual environments for cooperative work. Mr Courtney is a member of the IEEE.