7 VideoDB

7.
Video databases
Video data representations

 Video = time-ordered sequence of correlated images (frames)
 Video signal representations originate from TV technology;
different standards in USA (NTSC; National Television System
Committee) and Europe (PAL; phase alternating line , SECAM;
sequential color with memory)
 25-30 frames/sec
 Interlaced presentation of even/odd rows to avoid flickering.
 Frame size levels: 352 x 240, 768 x 576 (PAL), 720 x 576 (CCIR
601), 720 x 480 (NTSC), 1440 x 1152, 1920 x 1080 (HDTV)
 Aspect ratios: 4:3, 16:9 (widescreen)
 Color videos: Decomposition into luminance and chrominance.
 Typical sampling rates for SD video:
720 samples per line for luminance,
360 samples per line for chrominance signals.
MMDB-7 J. Teuhola 2012 1

Video compression
 Not just coding of a sequence of images (Motion-JPEG), because
the subsequent images are correlated (temporal redundancy).
 Motion compensation: blocks (e.g. 8 x 8 pixels) in a frame are
predicted by blocks in a previously reconstructed frame.
 Compression artifacts disturbing the human eye may be different from
those in still images.
 Different techniques for different application areas (tv, dvd/bd,
internet, videoconferencing)
 Important issues:
 Speed of compression/decompression
 Robustness (error sensitivity)
 Most of the standards are based on DCT (Discrete Cosine Transform)
 Typical compression ratios from 50:1 to 100:1;
the decompressed video is almost indistinguishable from the original.

Standardization of video compression
ISO/IEC MPEG (Moving Pictures Experts Group)
 Standard includes both video and audio compression.
 Started 1988; steps:
 MPEG-1: Rates up to 1.5 Mbits / sec (VHS quality)
 MPEG-2: Rates up to 10 Mbits / sec (Digi-TV, DVD, HDTV)
 MPEG-3: Planned but dropped (found to be unnecessary)
 MPEG-4: Object-based (separation from scene, animation,
3D, face modelling, interactivity, etc.)
ITU-T (International Telecommunication Union):

 H.261: Low bit-rates (e.g. videoconferencing)
 H.262 = MPEG-2
 H.263: Low bit-rates (improved)
 H.264 = MPEG 4 / Part 10, high compression power

Random access from compressed video
 Broadcasting or accessing video from storage:
It should be possible to start from (almost) any frame.
 MPEG solution: Three kinds of frames:
 I-frame: Coded without temporal correlation (prediction);
 gives lowest compression gain.
 P-frame: Motion-compensated prediction from the last

(closest) I- or P-frame.
 B-frame: Bidirectional prediction from the previous and/or
the next I- or P-frame;
 highest compression gain
 gets over sudden changes
 errors do not propagate.
 GOP = Group Of Pictures = smallest random-access unit, must

be decodable independently (starts usually with an I-frame).

Example of frame order in MPEG
Bidirectional prediction
I B B B P B B B P B B B I
Forward prediction
 Two orders of frames:

 Display order
 Bitstream order
 Buffering is needed to convert from bitstream order into display
order; a small delay is involved.
 The predictor and predicted frame need not be adjacent.

Organizing and querying content of a video database
Questions to be answered:
 Which aspects of videos are likely to be of interest?
 How should these aspects be represented and stored?
 What kind of query languages are suitable?
 Is the content extraction process manual or automatic?
Possible aspects of interest:

 Animate objects (people, etc.)
 Inanimate objects (houses, cars, etc.)
 Activities and events (walking, driving, etc.)
Properties of objects:
 Frame-dependent: valid in a subset of frames.
 Frame-independent: valid for the video as a whole.

Query types from a video database
(a) Retrieve a complete video by name
(b) Find frame sequences (‘clips’; ’shots’) containing certain objects or

activities.
(c) Find all videos/sequences containing objects/activities with certain

properties.
(d) Given a frame sequence, find all objects (of a certain type) occurring in
some or all of the frames of the segment.
(e) Given a frame sequence, find all activities (of a certain type) occurring in
it.
NOTE: Video is a multimedia tool: images + audio + possible text.

Audio channel can be extremely important in detecting events.
Textual components (e.g. subtitles are invaluable keyword sources)
Indexing of video content
 Content descriptions are not usually built on a frame-by-frame

basis, due to the high number of frames.
 Compact representations are needed.
 Concepts:
 Frame sequence:
A contiguous subset of frames (e.g. a ‘shot’)
 Well-ordered set of frame sequences:
Temporal order, no overlaps
 Solid set of frame sequences:
Well-ordered, non-empty gaps between sequences (‘scene’)
 Frame sequence association map:
For each object and activity, a solid set of frame sequences
is attached, showing frames in which they appear.

Frame segment tree
 Binary tree
 Special (1-dimensional) case of the spatial clipping approach.
 Leaves represent basic intervals of the frame sequence:
 Leaves are well ordered, and they cover the whole video.
 Their endpoints include all endpoints of the sequences.
 An internal node represents the concatenation of its children
 The root represents the whole video.
 Example of objects and activities:
obj. 1
obj. 2
act. 1
1000 2000 3000 4000 5000 frame no

Frame segment tree: example
1 0-
0- 5000 3000-
3000 2 3 5000
0- 2000- 3000- 4000-

4 o2 5 6 o1 7 a1
2000 3000 4000 5000
o1 o2
8 9 10 11 12 a1 13 o2 14 o2 15 o1
a1 a1
0- 500- 2000- 2500- 3000- 3500- 4000- 4500-
500 2000 2500 3000 3500 4000 4500 5000
Indexing:
 Obj. 1  6, 9, 15 Note: Actually the intervals are
 Obj. 2  4, 10, 13, 14 half-open, e.g. [0, 500) = 0..499
 Act. 1  7, 9, 10, 12

Indexing in the frame segment tree
 For each object and activity record, there is a list of pointers to
the nodes of the frame segment tree.
 Objects and activities themselves may be indexed in traditional
ways.
 Each node of the frame segment tree points to a linked list of
pointers to the objects and activities that appear throughout the
whole segment that this node represents (but only partially in the
parent segment). In the previous example:
node 4  obj. 2, node 6  obj. 1
node 7  act. 1 node 9  obj.1, act. 1
node 10  obj. 2, act. 1 node 12  act. 1
node 13  obj. 2 node 14  act.2
node 15  obj. 1
 This can be generalized to a set of videos (common frame
segment tree, combined object/activity set, extended pointers).

Queries using a frame segment tree
(a) Find segments where a given object/activity occurs

(trivial; just follow the pointers.)
(b) Find objects occurring between frames s and e:

Walk the tree in preorder, denote the current node interval by I.
 If I  [s, e) = , then this subtree can be skipped.
 If I  [s, e), then walk through the whole subtree (including
the current node) and report all its objects.
 Otherwise report the objects and activities of the current node,
and continue the search to both subtrees.
(c) Find objects/activities occurring together with object x:

Scan the segments where x occurs, and report the
objects/activities occurring in these segments and their ancestors.

R-segment tree (RS-tree)
 Special case of R-tree

 Two possible implementations:
(a) 1-dimensional space (dimension = time)
(b) 2-dimensional space, where the other dimension is just
enumeration of objects/activities (not a true spatial
dimension):
R1 R2
obj. 1
obj. 2
act. 1 R3
1000 2000 3000 4000 5000

Computer-assisted video analysis
Video segmentation:
 Division of videos into homogeneous sequences.
 Typical segments are often so called shots, filmed without interrupts
 Segmentation = detection of shot boundaries
 Sharp cuts are easier than gradual transitions (e.g. crossfade)
 Features for automatic segmentation:
 Similarity of color histograms of subsequent frames:
simple and effective, but sensitive to varying illumination.
 Edge features: similarity of shapes
 Motion vectors: restricted vector lengths within a shot.
 Corner points: similarity of landmark points in frames
 The actual segmentation can be based on thresholds for similarity,
but also machine learning techniques have been used widely.
 Higher-level segmentation into scenes, called also story units.

Computer-assisted video analysis (cont.)
Keyframes:
 Representative frames within shots, containing the essential
elements for retrieval
 Scene-level segmentation often uses keyframe features, and
operates e.g. in top-down or bottom-up manner.
Choosing keyframes:
 Fuzzy task – no definite optimum
 Can be based on the same features as segmentation
 Various algoritmic approaches:
 Sequential comparison
 Clustering
 Trajectory-based
 Decision in the context of object/event detection

Computer-assisted video analysis (cont.)
Object recognition:
 Keyframe-based recognition extracts the same features as for still
images: color, texture, shape, but also objects and motion.
 Motion compensation techniques can be used to find out the frame
interval of the occurrence of the object.
Annotations:
 Allocation of semantic concepts to video segments
 Means roughly the same as segment classification
 Machine-learning tools have been attampted
 Human assistance is usually needed in the final recognition, naming
and classification of segments and detected objects within them.
Ref: W. Hu, N. Xie, L. Li, X. Zend, and S. Maybank: ”A Survey of Visual Content-Based
Video Indexing and Retrieval”, IEEE Trans. on Systems, Man, and Cybernetics
41(6), Nov. 2011.

7 VideoDB

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

7 VideoDB

Hochgeladen von

Copyright:

Verfügbare Formate

7.

Video data representations

MMDB-7 J. Teuhola 2012 1

MMDB-7 J. Teuhola 2012 2

ITU-T (International Telecommunication Union):

MMDB-7 J. Teuhola 2012 3

 P-frame: Motion-compensated prediction from the last

 gets over sudden changes

 errors do not propagate.

 GOP = Group Of Pictures = smallest random-access unit, must

MMDB-7 J. Teuhola 2012 4

 Two orders of frames:

MMDB-7 J. Teuhola 2012 5

Possible aspects of interest:

MMDB-7 J. Teuhola 2012 6

(a) Retrieve a complete video by name

(b) Find frame sequences (‘clips’; ’shots’) containing certain objects or

(c) Find all videos/sequences containing objects/activities with certain

NOTE: Video is a multimedia tool: images + audio + possible text.

 Content descriptions are not usually built on a frame-by-frame

MMDB-7 J. Teuhola 2012 8

1000 2000 3000 4000 5000 frame no

0- 2000- 3000- 4000-

MMDB-7 J. Teuhola 2012 10

MMDB-7 J. Teuhola 2012 11

(a) Find segments where a given object/activity occurs

(b) Find objects occurring between frames s and e:

(c) Find objects/activities occurring together with object x:

MMDB-7 J. Teuhola 2012 12

 Special case of R-tree

1000 2000 3000 4000 5000

MMDB-7 J. Teuhola 2012 13

MMDB-7 J. Teuhola 2012 14

MMDB-7 J. Teuhola 2012 15

MMDB-7 J. Teuhola 2012 16

Das könnte Ihnen auch gefallen