Sie sind auf Seite 1von 7

Multimedia Decomposition.

Representation of different levels of granularity and abstraction of video, speech and


text for fast navigation.

A. Barletta, B. Moser and M. Mayer


a
Advanced Software Lab
Sony Coorporate Lab Europe, Stuttgart, Germany
{barletta,moser,mayer}@sony.de

ABSTRACT
.
Keywords: Video Browsing, Multimedia Abstraction, Fast Data Browsing.

1. INTRODUCTION
In the last years the use of digital information (video, audio, hypertext) has been increasing dramatically. In the ASL
(Advanced Software Lab) we are currently investigating several mechanisms for quick manipulating a large amount of
digital information, specifically video contents.
Anybody is familiar with the concept of leafing through the pages of a magazine for previewing the contents and coarse-
grained search. Mentally the magazine contents are divided in different level of details (or info details): pictures, titles,
subtitles, paragraphs, box text, etc. During the leafing the brain use these different degree of details for approximating and
navigating through the content.
The general concept can be applied on the “leafing” of other media: video, speech and plain text.
In the following we are showing some of the basic general concepts (paragraph 2) applied to speech/text (paragraph 3),
and video content (paragraph 4)

2. MEDIA DECOMPOSITION AND ABSTRACTION


In the analog era, media were used to be produced and consumed in a monolithic way (TV, radio, books). Historically
and technically media like newspapers, television were designed specifically with a single purpose and with a very efficient
(technically) format. With the advent of digital technology, all the media can be represented in a neutral and extendible way
using byte of information. Digital content can be structure in a very flexible way to add new meta-information or to
facilitate data extraction.
In this scenario, traditional contents like speech, video and text can be formatted or analyzed in a more advanced
manner. One of the scopes of our research is to find new ways of consuming information using the flexibility added by the
digital media.
The starting point of our analysis is a physical metaphor: “leafing through the pages of a magazine”. Such a mental
process is use to go quick through the pages of a magazine for searching, previewing, checking contents. In this process the
information contained in the magazines are scanned unconsciously at different level of details (coarse to fine grained) at
different speed: Speed vs. Degree of Details (DoD).
Start

Know enough ? Exit

1 step
Coarse Grained Leafing
"The magazine is leafed at fast
speed and only the coarse
contents are extraxted
(pictures and titles)"

A page is
selected ?

2 step
Finer Grained Scanning
"The content around the
picture or tile are scanned
for getting more details"

details ?

3 step
Fine Grained Reading
"The content is read in
details"

Figure 1 - Leafing through a magazine: a flow chart.

Such a basic mechanism can be applied to other kind of contents. We can distinguish the following steps:
- Media Decomposition: the media (video, speech, text) is decomposed in more elementary/coarse grained
information components (images, words, keywords);
- Degree of details (or mental focus): each of these elementary information channels can be abstracted further for
reducing the degree of redundancy; for example text can be structured, in logic parts (titles, subtitles) or semantic
parts (key words);
- Abstract representation: the information often can be represented using simpler abstractions or alternative medium;
for example music, speech and silence [1][2] inside a movie can be mapped to different colors in a time bar or the
speech can be represented by text.
LOW HIGH

DEGREE OF DETAIL

ABSTRACTION

TIME discrete discrete continous


more key
IMAGES key frame play
frame

VIDEO NOISE no noise symbolic play


SOUND MUSIC symbolic play play
SPEECH symbolic segmented play
TEXT/
DESCRIPTION key
dialogue play
words

...

HIGH LOW
SPEED OF PRESENTATION

Figure 2 - Video Decomposition (Example)

Media decomposition can have different forms depending the nature of the contents: more contents are sophisticated
higher is the possibility to decompose them.
Degree of details can be reached with different techniques too; for example the video information can be shown at
different level of abstraction using different techniques. In Fig 3 we propose few examples of possible picture abstraction,
from the simplest ones (B/W, scaling) to the more complex (MPEG4).
Figure 5 – Some examples of different degree of details for picture representation.

3. SPEECH/TEXT ABSTRACTION AND NAVIGATION

Speech and text can be seen as two alternative information channels. While speech has more a temporal/auditory
dimension, a text page can be browsed spatially and visually.
A traditionally speech recording has very limited alternative uses than “being listen to”. One possibility might be to
navigate through the phonemes of the speech sample; of course this method has some limitation due to the nature of playing
quickly audio samples. Alternatively speech can be mapped to text using extraction algorithms or using a digital format that
transports both audio and text (metadata). Text can further abstracted using key words; and so on.
This simple mechanism can be applied for fast speech browsing, navigation, especially for speech recording of audio
books, educational material, etc. In Figure 3 you can see a basic text bar that can be used for fast positioning in a speech
audio sample. In a very simple small radio player, the key words might be shown in a one-line LCD screen and the user can
advance pressing a fast forward button.

10 20
sec sec

Theory of Quantum Properties of Heat


Dynamic Theoretical History

Wide Applications Law of the Emotion

Figure 3 – An example of text based navigation bar for speech on a small device: the keywords are scrolled in the one line LCD for fast
positioning and search.
Another example of text navigation can be done for a normal text page, for example a web page (see Fig.4). This
technique might be useful for e-books: the e-book text can have embedded some “key words” metadata or structure and the
user could operate it in different mode, one for each degree of details: keywords, only picture and titles, etc

Figure 5 – A text can be reduce to different level of details (keywords) for increasing the navigation process.

This technique has very interesting application for increasing the “navigation/browsing” through a large number of
pages. For example imagine to browse through 10 web pages: if it were possible to set the level of granularity or details you
could have a fast overview of the pages with keywords (see figure 5) and occasionally when interested you could “zoom” in
a more detailed level. Fast Forwarding and rewind in Web browser in already available as feature [3], but it is a fast
navigation on full web pages considered more important following an heuristic model.

4. VIDEO ABSTRACTION AND NAVIGATION


Video is the more sophisticate medium available for entertainment. Although a lot of work has been done for increasing
the effectiveness of such a medium (MPEG7, interactive TV, etc), consuming video contents is remained almost
unchanged. Several researches have been done in the area of automatic summarization [1] [5] and video segmentation
techniques [4] [6].
The focus on this paper is more on the fast “leafing” video following the pattern describe above.
If we follow the decomposition steps (Media Decomposition, Degree of Details, Abstract Representation) we can “leaf”
through the content of a video.
A video is a composite medium, we can divide it in the following information channels: images, sound, and extra
textual data (metadata, scene description, etc). Each of these information channels can be abstract to different level of
details (text – keywords, sound – speech – phonemes – text, etc) or represented in alternative (music – color, scene
dynamics – color, etc).
After this decomposition phase we can provide different level of surrogates of a video for different speed of browsing.
In figure 5 we can see a possible layout of a video for a fast browsing at different quality levels.
In the video decomposition scheme shown in figure 5 we have 3 levels of details:
a) Key frames are shown plus key words; no sound is played;
b) More frames around a chosen key frame are shown; text dialogue is shown; additionally a background music can be
played;
c) The video is played with audio.

This simple example shows how the mechanism of decomposing media contents in elementary information channels
works. This information can be further abstracted and presented to the user depending on the speed of browsing.

NY Tribune
I catched the Spider
Save the baby !!
Tokio New York
Bomb
Peter arrest

Figure 5 – “Video leafing”: speed vs. details

5. CONCLUSION
Some basic ideas about new ways of consuming multimedia content were presented. A basic concept for finding a trade-off
between speed of presentation and degree of details was explained: “leafing multimedia contents”. In details we described
the speech/text fast browsing and the video “leafing” concept.

ACKNOWLEDGMENTS
A big acknowledgement is aided to all the members of Sony Corporate Lab Europe (SCLE) for their continuous feedback
relatively to the different areas of engineering and scientific fields.
REFERENCES
1. C.G.M. Snoek, M. Worring, MULTIMODAL VIDEO INDEXING: A REVIEW OF THE STATE-OF-ART, Department
of Computer Science (University of Amsterdam), 2003.
2. Ying Li, Wei Ming and Jay Kuo, SEMANTIC VIDEO CONTENT ABSTRACTION BASED ON MULTIPLE CUES,
(Department of Electrical Engineer) University of Southern California, 2001
3. Opera Browser 7.10 – Fast forward and Rewind
4. S. Pfeiffer, R. Lienhart, S. Fisher, and W. Effelsberg, ABSTRACTING DIGITAL MOVIES AUTOMATICALLY,
University of Mannheim, 1996
5. S. X. Ju, Michael J. Black, S. Minneman, and D. Kimber, SUMMARIZATION OF VIDEOTAPED
PRESENTATIONS: AUTOMATIC ANALYSIS OF MOTION AND GESTURE, IEEE trans. On circuits and systems for
video technology (vol. 8, no. 5), 1998
6. A. Barletta, B. Moser, M. Mayer, TECHNOLOGY INVESTIGATION REPORT (Draft – Internal Use), Advanced
Software Lab (Sony Corporate Lab Stuttgart), 2003
7. A. Barletta, B. Moser, M. Mayer, Presentation: VIDEO LEAFING (Draft – Internal Use), Advanced Software Lab
(Sony Corporate Lab Stuttgart), 2003

Das könnte Ihnen auch gefallen