You are on page 1of 35

Working Paper

‘An Introduction to MPEG-1, MPEG-2, MPEG-4,

MPEG-7 & MPEG-21’


Aoife Ní Chionnaith

June 2003

ISH www.ish-lyon.cnrs.fr

CNRS www.cnrs.fr
An Introduction to MPEG-1, MPEG-2, MPEG-4, MPEG-7 & MPEG-2

Contents
Introduction ......................................................................................................... 4
MPEG-1................................................................................................................ 5
Introduction ....................................................................................................... 5
Aims & Features .................................................................................................. 5
Part 1 Systems ............................................................................................... 5
Part 2 Video .................................................................................................. 5
Part 3 Audio .................................................................................................. 6
Part 4 Compliance Testing................................................................................. 7
Part 5 Software Simulation ................................................................................ 7
End User Applications & Products ............................................................................. 7
Future of MPEG-1 ................................................................................................ 7
References ........................................................................................................ 7
MPEG-2................................................................................................................ 8
References ........................................................................................................ 8
MPEG-4................................................................................................................ 9
Introduction ....................................................................................................... 9
Aims & Features .................................................................................................. 9
Part 1 - ‘Systems’ ............................................................................................... 10
Elementary Streams ......................................................................................... 10
Scene description ............................................................................................ 10
MPEG-4/BiFS............................................................................................... 10
XMT ......................................................................................................... 11
Profiles & Levels .......................................................................................... 11
Part 2 - ‘Visual’.................................................................................................. 12
Part 3 - ‘Audio’ .................................................................................................. 12
Part 4 - ‘Conformance Testing’ ............................................................................... 12
Part 5 - ‘Reference Software’................................................................................. 12
Part 6 - ‘Delivery Multimedia Integration Framework (DMIF)’ ........................................... 12
Part 7 - ‘Optimised Reference Software for Coding of Audio-visual Objects’ .........................13
Part 8 - ‘Carriage of MPEG-4 contents over IP networks’................................................. 13
Part 9 - ‘Reference Hardware Description’ ................................................................. 13
Part 10 - ‘Advanced Video Coding’ ........................................................................... 13
Part 11 ‘Scene Description and Application Engine’ ................................................... 13
Part 12 ‘ISO Base Media File Format’..................................................................... 13
Part 13 ‘IPMP Extensions’ .................................................................................. 13
Part 14 ‘MP4 File Format’ .................................................................................. 13
Part 15 ‘AVC File Format’ .................................................................................. 14
MPEG-J ............................................................................................................ 14
MPEG-J Profiles .............................................................................................. 15
End User Applications & Products ............................................................................ 16
Authoring Tools .............................................................................................. 16
Encoders....................................................................................................... 16
Decoders ...................................................................................................... 17
Codecs ......................................................................................................... 17
Players ......................................................................................................... 17
SDKs ............................................................................................................ 17
Streaming Servers............................................................................................ 17
Others.......................................................................................................... 17
The Future of MPEG-4 .......................................................................................... 18
References ....................................................................................................... 18
Bibliography...................................................................................................... 19
MPEG-7.............................................................................................................. 20
Introduction ...................................................................................................... 20
Aims & Features ................................................................................................. 21
Part 1 ISO 15938-1 Systems ............................................................................... 22
Part 2 ISO 15938-2 Description Definition Language.................................................. 22
Part 3 ISO 15938-3 Visual ................................................................................. 23
Part 4 ISO 15938-4 Audio.................................................................................. 24
Part 5 ISO 15938-5 Multimedia Description Schemes ................................................. 25

Page 2 sur 35
An Introduction to MPEG-1, MPEG-2, MPEG-4, MPEG-7 & MPEG-2

Content management & Content Description ........................................................ 25


Part 6 ISO 15938-6 Reference Software ................................................................ 26
Part 7 ISO 15938-7 Extraction and use of MPEG-7 descriptions/Conformance ...................26
End User Applications & Products ............................................................................ 26
References ....................................................................................................... 26
Bibliography...................................................................................................... 26
MPEG 21 ............................................................................................................ 28
Introduction ...................................................................................................... 28
Part 1- Vision, Technologies & Strategy ..................................................................... 29
Part 2 - Digital Item Declaration.............................................................................. 29
Part 3 - Digital Item Identification & Description.......................................................... 30
Part 4 - Intellectual Property Management & Protection................................................. 30
Part 5 - Rights Expression Language ......................................................................... 30
Part 6 - Rights Data Dictionary................................................................................ 30
References ....................................................................................................... 30
Bibliography...................................................................................................... 31
General Products ................................................................................................. 32
References ......................................................................................................... 32
Appendix 1 MPEG-4 Profiles.................................................................................... 33
Visual Profiles.................................................................................................... 33
Natural video content ....................................................................................... 33
Synthetic and synthetic/natural hybrid visual content ................................................ 33
Natural video content (Version 2)......................................................................... 33
Synthetic and synthetic/natural hybrid visual content (Version 2) .................................. 34
Audio Profiles .................................................................................................... 34
MPEG-4 Version 1 ............................................................................................ 34
MPEG-4 Version 2 ............................................................................................ 35
Graphic Profiles ................................................................................................. 35
Scene Graph Profiles (Scene Description Profiles) ......................................................... 35
MPEG-J Profiles.................................................................................................. 35
Comparison of Parameters of each MPEG standard ...........................Erreur ! Signet non défini.

Page 3 sur 35
An Introduction to MPEG-1, MPEG-2, MPEG-4, MPEG-7 & MPEG-2

Introduction
The Moving Picture Experts Group (MPEG) was established by the International Organization for
Standardization (ISO) and the International Electrotechnical Commission (IEC). The MPEG group is
responsible for the development of a range of standards and technical reports relating to video,
audio and multmedia content. This working paper will begin by discussing MPEG-1, which relates to
the coding of moving pictures and audio, followed by MPEG-2, which evolved from MPEG-1 but
enables the encoding of video at higher speeds. MPEG-4, which deals with the coding of multimedia
content and MPEG-7 which standardises a method for describing multimedia content are then
discussed, followed by MPEG-21, which is a technical report aimed at providing a multimedia
framework.

Page 4 sur 35
An Introduction to MPEG-1, MPEG-2, MPEG-4, MPEG-7 & MPEG-2

MPEG-1
Introduction
The ‘Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to about 1,5
Mbit/s’ (ISO/IEC 11172) or MPEG-1 as it is more commonly known as, standardises the storage and
retrieval of moving pictures and audio storage media forms the basis for Video CD and MP3 formats.
The specification is divided into the following parts:

Part Reference Title


Part 1 ISO/IEC 11172-1:1993 ‘Systems’
Part 2 ISO/IEC 11172-2:1993 ‘Video’
Part 3 ISO/IEC 11172-3:1993 ‘Audio’
Part 4 ISO/IEC 11172-4:1995 ‘Compliance Testing’
Part 5 ISO/IEC TR 11172-5:1998 ‘Software Simulation’

Aims & Features


Restitute an image with:

pixels lines frames per second (fps)


352 x 282 @ 25
352 x 240 @ 30

Part 1 Systems
‘Systems’ deals with the combination of one or more audio, video and timing information data
streams to form one single stream suitable for digital storage or transmission.

Part 2 Video
This part of the specification describes the coded representation for the compression of video
sequences.

The basic idea of MPEG video compression is to discard any unnecessary information i.e. an MPEG-1
encoder by analyses:

• how much movement there is in the current frame compared to the previous frame
• what changes of colour have taken place since the last frame
• what changes in light or contrast have taken place since the last frame
• what elements of the picture have remained static since the last frame

The encoder then looks at each individual pixel to see if movement has taken place, if there has
been no movement, the encoder stores an instruction to say to repeat the same frame or repeat the
same frame, but move it to a different position.

I intra-frame
B Bidirectional frames
P Predicted frames

Audio, video and time code are converted into one single stream.

625 and 525 line


from 1 to 1.5Mbits/s
24-30 frames per second

Page 5 sur 35
An Introduction to MPEG-1, MPEG-2, MPEG-4, MPEG-7 & MPEG-2

MPEG-1 compression treats video as a sequence of separate images. ‘Picture Elements’, often
referred to as ‘pixels’ are elements in the image. Each pixel consists of three components –
Luminance/luminosity (Y) and two for chrominance Cb and Cr. MPEG-1 encodes Y pixels in full
(check the correct term) resolution as the Human Visual System (HVS) is most sensitive to
luminance/luminosity.

• Sous-échantillonnage – on chrominance, as HVS less sensitive to this.


• Quantification
• Predictive coding –the difference between the predicted pixel value and the real value is
coded.
• Movement compensation (MC) predicts the value of a neighbouring block of pixels (1 block =
8x8 pixels) in an image to those of a known block of pixels. A vector describes the 2-
dimensional movement. If no movement takes place, the value is 0.
• Interframe coding
• Sequential coding
• VLC (Variable ? Coding)
• Image Interpolation
• Intra (I frames) are coded independently of other images.

MPEG codes images progressively


Interlaced images need to be converted into a de-interlaced format before encoding
Video is encoded
Encoded video is converted into an interlaced form

To achieve a high compression ratio:


• An appropriate spatial resolution for the signal is chosen/the image is broken down into
different pixels
• block-based motion compensation is used to reduce the temporal redundancy
• Motion compensation is used for causal prediction of the current picture from a previous
picture, for non-causal prediction of the current picture from a future picture, or for
interpolative prediction from past and future pictures
• The difference signal, the prediction error, is further compressed using the discrete cosine
transform (DCT) to remove spatial correlation and is then quantised. Finally, the motion
vectors are combined with the DCT information, and coded using variable length codes.

Block 8x8 pixels


Macrobloc: 4:2:2=6 blocks (4Y + 2U +2V)
Slice: grouping of similar blocks
Picture: Group of slices
GOP: Group of slices (I, B, P)

Part 3 Audio
Part 3 of the specification describes the coded representation for the compression of audio
sequences.

• Audio input is fed into an encoder


• The mapping creates a filtered and subsampled representation of the input audio stream. A
psychoacoustic model creates a set of data to control the quantiser and coding.
• The quantiser and coding block creates a set of coding symbols from the mapped input
samples.
• The block 'frame packing' assembles the actual bitstream from the output data of the other
blocks, and adds other information (e.g. error correction) if necessary

Audio (see page 76 of standards book)

Audio Codecs
MPEG-1 Layer 1

Page 6 sur 35
An Introduction to MPEG-1, MPEG-2, MPEG-4, MPEG-7 & MPEG-2

MPEG-1 Layer 2, near CD quality, for use in digital video broadcasts 128kbit/s/channel
MPEG-1 Layer 3 (MP3)

Part 4 Compliance Testing


Part 4 specifies how tests can be designed to verify whether bitstreams and decoders meet the
requirements as specified in parts 1, 2 and 3 of the MPEG-1 standard. These tests can be used by:

• manufacturers of encoders, and their customers, to verify whether the encoder produces
valid bitstreams.
• manufacturers of decoders and their customers to verify whether the decoder meets the
requirements specified in parts 1,2 and 3 of the standard for the claimed decoder
capabilities.
• applications to verify whether the characteristics of a given bitstream meet the application
requirements, for example whether the size of the coded picture does not exceed the
maximum value allowed for the application.

Part 5 Software Simulation


Part 5 A technical report on software implementation of parts 1 – 3 of MPEG-1.

End User Applications & Products


Future of MPEG-1

References

Page 7 sur 35
An Introduction to MPEG-1, MPEG-2, MPEG-4, MPEG-7 & MPEG-2

MPEG-2
Used for DVD, digital satellite
Up to 20Mbit/s
Up to 920 pixels x 1080 lines
Up to 60 frames per second
4:3 or 16:9 aspect ratio

pixels lines frames per second (fps)


MP@ML Main Profile, Main Level 720 X 480 @ 30
MP@ML Main Profile, Main Level 720 X 576 @ 25

See also DVD Technology pages.

References

Page 8 sur 35
An Introduction to MPEG-1, MPEG-2, MPEG-4, MPEG-7 & MPEG-2

MPEG-4
Introduction
‘Information technology - Coding of audio-visual objects’ or otherwise referred to MPEG-4 is a
fifteen-part publication, six of which are currently international standards (the balance are still
under development).

Part Reference Title


Part 1 ISO/IEC 14496-1 ‘Systems’
Part 2 ISO/IEC 14496-2 ‘Visual’
Part 3 ISO/IEC 14496-3 ‘Audio’
Part 4 ISO/IEC 14496-4 ‘Conformance Testing’
Part 5 ISO/IEC 14496-5 ‘Reference Software’
Part 6 ISO/IEC 14496-6 ‘Delivery Multimedia Integration Framework (DMIF)’
Part 7 ISO/IEC TR 14496-7 ‘Optimised Reference Software for Coding of Audio-visual Objects’
Part 8 ISO/IEC FCD 14496-8 ‘Carriage of MPEG-4 contents over IP networks’
Part 9 ISO/IEC CD TR 14496-9 ‘Reference Hardware Description’
Part 10 ISO/IEC FCD 14496-10 ‘Advanced Video Coding’ (also to be published as ITU-T H.264/AVC)
Part 11 ISO/IEC 14496-11 ‘Scene Description and Application Engine’
Part 12 ISO/IEC FDIS 14496-12 ‘ISO Base Media File Format’
Part 13 ISO/IEC FDIS 14496-13 ‘IPMP Extensions’
Part 14 ISO/IEC FDIS 14496-14 ‘MP4 File Format’
Part 15 ISO/IEC FCD 14496-15 ‘AVC File Format’

The creation of the MPEG-4 specification arose as experts wanted a faster compression rate than
MPEG-2, but which also worked well at low bit rates. Discussions began at the end of 1992 and work
on the standards started in July 1993.

MPEG-4 provides a standardised method of:

• Audio-visual coding at very low bit rates


• Representing audio-visual objects (the objects can be natural and/or synthetic).
• Describing audio-visual objects in a scene.
• Multiplexing and synchronising the information associated with the objects (enabling them
to be transported via a network channel)
• Interacting with the audio-visual scene that is received by the end user

MPEG-4 being is developed by Working Group 11 of the Joint Technical Committee and the MPEG-4
Industry Forum is the organisation responsible for furthering the adoption of the MPEG-4 Standard
among relevant users and authors of multimedia content.

Aims & Features


MPEG-4 aims to enable:

• Interoperability of products from different vendors.


• Authors to have greater re-usability and flexibility with multimedia content produced.
• Improvement in the management of Intellectual Property Rights.
• Transparent information for network service providers.
• Greater interactivity for end users. Users and content authors can manipulate rich media
content (both natural and synthetic). The limits of interactivity are set by the author. Users
can have the ability to:

o Change their viewing or listening point in the scene, e.g. by navigation through a
scene
o Drag objects in the scene to a different position

Page 9 sur 35
An Introduction to MPEG-1, MPEG-2, MPEG-4, MPEG-7 & MPEG-2

o Trigger a range of events by clicking on a specific object, e.g. starting or stopping a


video stream
o Select a language (if the option is provided by the author)
• Seamless flow and deliver of audiovisual content to various bit rates via a wider range of
networks e.g. 3G mobile networks, broadcast, broadband networks, satellite andcable
modems.
• Scalability
• Advanced compression that provides higher capacity on CDs and DVDs and also greater
bandwidth (therefore more space for digital channels).

Part 1 - ‘Systems’
ISO/IEC 14496-1 ‘Systems’ is an international standard that addresses the following:

• Elementary Streams (ES)


• Scene description

Elementary Streams
Each encoded media object has its own Elementary Stream (ES), which is sent to the decoder and
decoded individually, before composition. The following streams are created in MPEG-4:

• Scene Description Stream


• Object Description Stream
• Visual Stream
• Audio Stream

When data has been encoded, the data streams can be transmitted or stored separately and need to
be composed at the receiving end.

Scene description
A scene consists of a set of objects and a scene description consists of:

• The spatial and temporal relationship between multimedia objects (the objects can be 2D or
3D). A ‘Spatial relationship’ refers to ‘where’ the object appears in a scene and a ‘temporal
relationship’ refers to ‘when’ an object appears in a scene.
• The behavior of audio-visual objects.
• The interactive behavior of audio-visual objects features made available to the user.
• The timing information so that the scene can be updated as it changes over time.

Media objects are organised in a hierarchial manner to form audio-visual scenes. Due to this
organisationla manner, the media objects, each object can be described or encoded independently
of other objects in the scene e.g. the background.

Describing audio-visual objects in a scene allows:

• A media object to be positioned anywhere in a coordinate system


• Transformations to take place on the geometry and acoustical appearance of the object
• The user to change their viewing or listening point in the scene

There are two main levels of scene description ‘MPEG-4/BiFS’ and ‘XMT’.

MPEG-4/BiFS
MPEG-4 Binary Format for Scenes (MPEG-4/BiFS) is a method of encoding a scene description in a
binary form and it is based on Virtual Reality Modelling Language (version ‘VRML97’) which uses
hierarchies and nodes. VRML97 is used for the web, therefore MPEG-4/BiFS extends VRML97 for
other uses e.g. broadcasting.

Some of the extensions provided by MPEG-4/BiFS include:

Page 10 sur 35
An Introduction to MPEG-1, MPEG-2, MPEG-4, MPEG-7 & MPEG-2

• Binary Compression – BiFS files are usually 10-20 times smaller than the VRML equivalent.
• Media Mixing – media integration. BiFS integrates well with other media types.
• Audio Composition – BiFS allows the mixing of sound sources, synthesized sounds, sound
effects etc.
• Streaming of 3D content

MPEG-4/BiFS:
• Allows users to change their view point in a 3D scene or to interact with media objects.
• Allows different objects in the same scene to be coded at different levels of quality.

For tutorials on MPEG-4/BiFS see:


http://www.comelec.enst.fr/~dufourd/mpeg-4/Bifs_Primer/primer.html
http://www.comelec.enst.fr/~concolat/mpeg-4/tutorial.html (french)

VRML
Virtual Reality Modeling Language (VRML) is an international standard (ISO/IEC 14772-1 1997)
developed by the Web3D consortium (for an article in French see 'Le W3C'). It is an open 3D
programming language for use on the Internet and is used to describe a scene. VRML consists of
plain text file (usually with the .wrl extension). The file describes the composition of a 3D scene,
often called a ‘world’.

XMT
For a tutorial in XMT (in French), see http://perso.enst.fr/~concolat/mpeg-
4/tutorial.html.

MPEG-4 ‘Systems’ also addresses:


• A standard file format to enable the exchange and authoring of MPEG-4 content (MP4).
• Interactivity (both client-side and server-side
• MPEG-J (MPEG-4 & Java)
• FlexMux tool which allows for the interleaving of multiple streams into a single stream.
• Intellectual property rights identification: See ‘Intellectual Property Management and
Protection in MPEG Standards’

Profiles & Levels


Profiles have been developed to create conformance points for MPEG-4 tools and toolsets, therefore
interoperability of MPEG-4 products with the same Profiles and Levels can be assured.

A Profile is a subset of the MPEG-4 Systems, Visual or Audio tools set and is used for specific
applications. It limits the tool set a decoder has to implement, therefore many applications only
need a portion of the MPEG-4 toolset. Profiles specified in the MPEG-4 standard include:

• Visual Profile
• Natural Profile
• Synthetic & Natural/Synthetic Hybrid Profiles
• Audio Profile
• Graphic Profile
• SceneGraph Profile

For each profile, levels have been set.

A ‘Level’ sets the complexity of a profile. Profiles and levels are written in the following format
‘Profiles@Levels’.

Page 11 sur 35
An Introduction to MPEG-1, MPEG-2, MPEG-4, MPEG-7 & MPEG-2

Further details on each profile can be found at Appendix XXXX

Part 2 - ‘Visual’
Natural media objects include sound recorded from a microphone or video recorded by a camera.
Synthetic media objects include text, graphics, synthetic music etc. Media objects can be either 2D
or 3D.
o aural (e.g. soundtrack)
o visual (e.g. images, text)
o audio-visual content

Part 3 - ‘Audio’
Part 3 of the MPEG-4 specification deals with the representation of audio objects.
MPEG-4 AAC (audio codec)
Capable of coding 5.1 channel surround sound
Scalable
Used by satellite-based ‘XM radio’ and Digital Radio Mondiale
• General Audio Signals: The encoding of both low to high quality bit-rates and mono and
multi-channel is supported.
• Speech Signals: Speech coding tools enable the coding of 2 kbit/s to 24 kbit/s. Bit rates
such as 1.2 kbit/s are possible when variable rate is used.
• Speed and pitch can be controlled during playback
• Synthetic Audio
• Synthesised SpeechScalable

Part 4 - ‘Conformance Testing’


‘describes how compliance to the various parts of the standard ban be tested.’ 1

Part 5 - ‘Reference Software’


‘contains a complete software implementation of the MPEG-4 specification.’ 2

Part 6 - ‘Delivery Multimedia Integration Framework


(DMIF)’
Delivery Multimedia Framework Integration (DMIF)
MPEG-4 data is authored once and delivered anywhere.
A set of interfaces for accessing multimedia content.
“An interface between the application and the transport.” 3
‘As MPEG-4 is likely to be used in a variety of common environments, DMIF is an adaptation of the
traditional audio-visual stream management to the new environment’. 4
Network abstraction 5

1
pg 10 jump start
2
jump start
3
http://mpeg.telecomitalialab.com/standards/mpeg-4/mpeg-4.htm
4
peg xvi jump start
5
pg 5 jump start

Page 12 sur 35
An Introduction to MPEG-1, MPEG-2, MPEG-4, MPEG-7 & MPEG-2

Part 7 - ‘Optimised Reference Software for Coding of


Audio-visual Objects’
Part 8 - ‘Carriage of MPEG-4 contents over IP networks’
Part 9 - ‘Reference Hardware Description’
Part 10 - ‘Advanced Video Coding’
Also to be published under H.264. (check if published)
It is also and ITU-T and ISO standard and will be known as ITU-T Rec. H.264 and ISO/IEC 14496-10
"Advanced Video Coding".

http://www.eetimes.com/printableArticle?doc_id=OEG20030106S0035
MPEG-4 High-Efficiency AAC
MPEG-4 High-Efficiency Advanced Audio Coding (HE-AAC) has been elevated to
its final ballot stage leading up to becoming an International Standard.
The addition of a new profile, significantly enhances the existing AAC LC
(low
complexity) standard with the Spectral Bandwidth Replication (SBR). This
provides industry with one of the most remarkable advancements in audio
compression in many years.

profils H.264 destinés à la téléphonie mobile et la


visioconférence :
"To this date, three major profiles have been agreed upon: Baseline, mainly
for video conferencing and telephony/mobile applications, Main, primarily
for broadcast video applications, and X, mainly for streaming and mobile
video applications."

Part 11 ‘Scene Description and Application Engine’


Part 12 ‘ISO Base Media File Format’
Part 13 ‘IPMP Extensions’
IPMP.
MPEG-4 hooks protect the audio-visual content. Hooks ‘allow the identification of the IPMP
system’ 6 . The system is not specified by MPEG-4.

Part 14 ‘MP4 File Format’


.mp4 or .mpeg4 is a storage format for MPEG-4 content. It is based on the QuickTime format.

MPEG is enhancing its MP4 file format so that it can contain AVC data in a
well-specified way. MP4 has spawned the more generic ISO file format, the
basis of a growing family of compatible formats. In addition to the ISO/IEC
MP4 and Motion JPEG 2000 file formats, it has also been adopted by 3GPP and
3GPP2 for multimedia in mobile, as well as in other industry associations.
The file format is also being enhanced to better support un-timed (static)
meta-data, and to support MPEG-21. MPEG-21 support is targeted to enable
the storage of a 'Digital Item Declaration' with some or all of its
resources in a single file. This allows MPEG-21 files to be compatible
with

6
jump start

Page 13 sur 35
An Introduction to MPEG-1, MPEG-2, MPEG-4, MPEG-7 & MPEG-2

other files in the family.

Part 15 ‘AVC File Format’


MPEG-J
MPEG-J enables the ‘programmatic description of scenes’.
It seeks to ‘extend the content creator’s ability to incorporate complex controls and data processing
mechanisms along with the BiFS scene representations and elementary media data….MPEG-J intends
to enhance the user’s ability to interact with the content.’ Pg 12

Pg 21 MPEG-4 defines a set of Java programming language APIs (MPEG-j) that allows access to an
underlying MPEG-4 engine to Java applets (MPEG-lets). Forms the basies for very sophisticated
applications.

Greater interactivity between the user and the content.

“MPEG-J profiles allows usage of five API packages


Scene
Resource
Net
Decoder
Section Information and Service Filtering (SI/SF)” 7

MPEG-J is MPEG-4 and Java


The end user has more interactivity with the content.

“a presentation engine for programmatic control of the player” 8 . Inserted in Version 2 of Systems.

it “defines the format and the delivery of downloadable Java byte code as well as its execution
lifecycle and behavior through standardized APIs.” 9

“MPEG-J is a programmatic system which specifies an API for interaction of Java code present as
part of the media content with MPEG-4 media players.” 10

The programmatic environment of MPEG-4.


“… seeks to extend the content creators’ ability to incorporate complex controls and data
processing mechanisms along with the BIFS scene representations and elementary media data” 11

MPEG-J is a set of Java application program interfaces… It also sets the rules for delivering Java into
a bitstream and it specifies what happens at the receiving end." Practically, MPEG-J will permit a
television viewer or a Web surfer to control the image that he or she sees. 12

“It is an extension of MPEG-4. It allows the use of Java classes within MPEG-4 content

capability to allow graceful degradation under limited or time varying resources and the ability to
respond to user interaction to allow programmatic control of the terminal to facilitate the
integration of features for applications such as set top box, interactive games and mobile AV
terminals in MPEG-4

enable a high level of interaction for both local and remote terminal control

7
MPEG-4 Jump start pg 451
8
http://mpeg.telecomitalialab.com/faq/mp4-sys/sys-faq-sys4gen.htm
9
http://mpeg.telecomitalialab.com/faq/mp4-sys/sys-faq-sys4gen.htm
10
http://java.sun.com/pr/2000/09/spotnews/sn000912.html
11
MPEG-4 Jump start pg 12
12
http://www.spie.org/web/oer/october/oct00/cover2.html

Page 14 sur 35
An Introduction to MPEG-1, MPEG-2, MPEG-4, MPEG-7 & MPEG-2

The Java code will be able to:


(scene graph control?)
create and modify scenes
participate fully in the scene interaction model and control media decoders
generate GUI components to directly implement application functionality.
Resource management 13
Graceful degradation 14

will not
participate in the data flow of real-time media, e.g., implementing a video decoder, in order to
ensure high quality media decoding

It will be received by an MPEG-4 terminal in its own elementary stream (ES), and will be associated
with the scene using a special node and the regular object descriptor facilities” 15

Supported by Version XXXX of MPEG-4

“MPEG-j defines the format and the delivery of downloadable Java byte code as well as its
execution lifecycle and behaviour through standardized APIs.” 16

programmatic system (as opposed to the parametric system offered by MPEG-4 Version 1)
specifies API for interoperation of MPEG-4 media players with Java code.
By combining MPEG-4 media and safe executable code, content creators may embed complex
control and data processing mechanisms with their media data to intelligently manage the operation
of the audio-visual session.
The Java application is delivered as a separate elementary stream to the MPEG-4 terminal.
There it will be directed to the MPEG-J run time environment, from where the MPEG-J program will
have access to the various components and data of the MPEG-4 player, in addition to .the basic
packages of the language (java.lang, java.io, java.util).
MPEG-J specifically does not support downloadable decoders.
For the above-mentioned reason, the group has defined a set of APIs with different scopes.
For Scene graph API the objective is to provide access to the scene graph: to inspect the graph, to
alter nodes and their fields, and to add and remove nodes within the graph.
The Resource Manager API is used for regulation of performance: it provides a centralized facility
for managing resources.
The Terminal Capability API is used when program execution is contingent upon the terminal
configuration and its capabilities, both static (that do not change during execution) and dynamic.
Media Decoders API allow the control of the decoders that are present in the terminal.
The Network API provides a way to interact with the network, being compliant to the MPEG-4 DMIF
Application Interface. Complex applications and enhanced interactivity are possible with these basic
packages.
MPEGlets - remote applications that are streamed to the client in an MPEG-J elementary stream 17

MPEG-J Profiles
Two MPEG-J Profiles exist: Personal and Main:
Personal - a lightweight package for personal devices.

The personal profile addresses a range of constrained devices including mobile and portable devices.
Examples of such devices are cell video phones, PDAs, personal gaming devices. This profile includes
the following packages of MPEG-J APIs:

Network
Scene
Resource

13
http://www.web3d.org/WorkingGroups/vrml-mpeg4/differences.html
14
15
http://mpeg.telecomitalialab.com/faq/mp4-sys/sys-faq-mpegj.htm
16
http://mpeg.telecomitalialab.com/faq/mp4-sys/sys-faq-sys4gen.htm
17
http://mpeg.telecomitalialab.com/faq/mp4-sys/sys-faq-mpegj.htm

Page 15 sur 35
An Introduction to MPEG-1, MPEG-2, MPEG-4, MPEG-7 & MPEG-2

Main - includes all the MPEG-J API's.

The Main profile addresses a range of consumer devices including entertainment devices. Examples of
such devices are set top boxes, computer based multimedia systems etc. It is a superset of the Personal
profile. Apart from the packages in the Personal profile, this profile includes the following packages of
the MPEG-J APIs:

Decoder
Decoder Functionality
Section Filter and Service Information

Look up
http://www.cordis.lu/infowin/acts/analysys/products/thematic/mpeg4/sommit/sommit.htm

End User Applications & Products


Authoring Tools
• Broadcast Studio by Envivio (Windows): MPEG-4 composition and post-production
environment for the broadcast industry.
• Authoring Tool , GSRT.
• IST MPEG-4
• Studio Author by Ivast: Visual, object-oriented authoring environment for the creation of
interactive MPEG-4 content.
• MPEG-4 Tools by ENST
• Nexauthor by Nextreaming
• BS Contact MPEG-4: enables grouped compositions of digital media, such as photographs,
video and/or audio streams and interactive information textures in a virtual environment of
your e-commerce applications for your customer. An integrated 2D/3D Player that allows
stable and performing visualisation in the MPEG 4 standard. We have linked the Player to ERP-
Systems for E-Commerce applications."
• ?, Song : An R&D project. investigates and develops new building blocks for rich media
communication and delivery for business purposes. The final prototype will show a 2D and 3D
studio application for streaming live conferences interactively to a multitude of participants.
Online consultants can thus be enabled to support clients directly over the Internet. Particular
emphasis in the project is put on influencing standards and active participation in
standardisation consortia, especially for the emerging MPEG-4 ISO standard. The project has
launched an Open Platform Initiative for producing a complete end-to-end chain of MPEG-4
related products. This technology reveals great potential for interactive TV and mobile
computing applications and might strongly change the way we interact with these systems
today.
• WonderStream, TDK: A streaming system which handles a single video and audio, transmits
multi-video and audio, still images, text, in realtime, compliant with the MPEG-4 SYSTEMS
specification. TDK also presents you a remote surveillance system. Web Station / for Security,
integrated based on WonderStream technology, and releases a newly-developed MPEG-4
Authoring system, WonderCreator, which helps you create MPEG-4 contents.
• A Template Guided Authoring Environment to Produce MPEG-4 Content for the Web:
http://www.comelec.enst.fr/~dufourd/mpeg-4/mediafutures01.pdf
• TDK and Optibase Introduce MPEG-4 Based Streaming Server Software
http://www.tdk.co.jp/teaah01/aah04300.htm

Encoders
• Dicas
• Encoding Station , Envivio
• Studio Encode, Ivast
• Nexencoder Standard, Nexencoder Enterprise, Nexencoder Component by Nextreaming
• Packetvideo
• WebCast

Page 16 sur 35
An Introduction to MPEG-1, MPEG-2, MPEG-4, MPEG-7 & MPEG-2

Decoders
• EM8610 (Sigma Designs): For HDTV Decoder
• EM8611 (Sigma Designs): For HDTV Decoder
• EM8605 (Sigma Designs): For set-top appliances & media gateways
• RealMagic: Used with Adobe Premiere DV Edition, Pinnacle etc. Simple Advanced Profile, VBR,
CBR up to 720 x 576, De-interlacing

Codecs
• DivX Pro 5.02 (DivX Networks): Used with Adobe Premiere, DVD Edition, Pinnacle etc. Exports
files in AVI format. Uses Simple Profile and Advanced Profile.
• Mpegable AVI (Dicas) : Windows. Used with Adobe Premiere, DV Edition, Pinnacle, etc. AVI.
Creates DivX files
• LSX-Mpeg Player for Adobe Premiere (Ligos): For use with Adobe Premiere, DV Edition, Pinnacle
etc. Export MPEG-1, 2 and 4 Up to 100 Mb/s, CBR & VBR, Simple profile.
• MPEG-4aacPlus (Ahead Software): Audio codec

Players
mpegable X4 live (Dicas): Windows. includes live capturing.
• mpegable S4 (Dicas): Windows. more advanced features
• mpegable SDK 1.4 (Dicas): Windows, Linux, Sun Solaris and Mac OS X. Video software
development kit
• mpegable Player: MP4 files player
• Live Broadcaster (Envivio):
• Studio Encode (iVast):
• Corona 9
• Nexplayer for PC
• Nexplayer for PDA
• WebCine (Philips Digital Networks)
• Osmose
• SoNG3D

www.mpeg-4.philips.com
? mpegable DS MPEG-4 DirectShow® Filter for Microsoft Media

SDKs
• mpegable SDK 1.4 (Dicas): Windows (9x, NT 4.0, 2000, XP), Linux, Apple OS X and Sun Solaris.
• MPEG-4 SDK (Osoon):

Streaming Servers
• Streaming Server (Envivio)

Others
• face2facetm : uses MPEG-4 technology to create models of faces for television animation,
computer games, and streaming over the Internet.
• StorEdge Media Central platform, an open-standards-based architecture for the broadcast and
Internet streaming-media markets. StorEdge Media Central supports audio, video, and other
time-based media on the Java platform.

http://www.siliconstrategies.com/printableArticle?doc_id=OEG20030324S0030

http://www.europeanstreaming.com/mpeg4.htm
http://www.seromemobile.com/products/prod_author.html
http://www.huntek.com/english/product12.php
http://www.mpegable.com/showPage.php?SPRACHE=UK&PAGE=news16
P800, Sony allié à Ericcson tient un appareil hors normes sur lequel
devraient se jeter les fous de technologie, pour peu qu'ils disposent d'un
budget suffisant: ce téléphone-PDA-appareil photoEnd User Applications

Page 17 sur 35
An Introduction to MPEG-1, MPEG-2, MPEG-4, MPEG-7 & MPEG-2

mobile communications
conferencing
interactive multimedia broadcast
video screens
wireless products
speech recognition?
Television – logos and images can be overlayed easily onto broadcasts received from other television
companies. MPEG-4 image overlay is better than MPEG-2.

For more details on products, view the following page created by Olivier Amato
http://81.1.61.164/index.php?id=19

The Future of MPEG-4


"The cable industry is looking into MPEG-4 because of rumours that it has a substantially better bit
rate than MPEG-2," Ostermann said. "That would permit extra channels." However, no hardware is
yet available for this high-end application. 18

Specifications to be finalised.

HDTV
Digital film (in cinemas)

Look up
http://leonardo.telecomitalialab.com/icjfiles/mpeg-4_si/11-Profiles_paper/11-Profiles_paper.htm
http://www.optibase.com/html/mpeg-4/mpeg-4_whitepaper.html white paper
http://wwwalt.ldv.ei.tum.de/conferences/siggraph/MP4_Profiles.pdf

MPEG-4 allows scene interaction:

Transport
“MPEG-4 does not define transport layers. However, in some cases, adaptation to an existing
transport layer was defined:

• Transport over MPEG-2 Transport Stream (this is an amendment to MPEG-2 Systems)


• Transport over IP (In cooperation with IETF, the Internet Engineering Task Force)” 19

References
Web3D.org (1998). ‘Main differences between MPEG-4 and VRML97’ [online]. Available from:
http://www.web3d.org/WorkingGroups/vrml-mpeg4/differences.html [Accessed 5 March 2003]

(2000). ‘Sun Joins MPEG-4 Industry 4 Forum to Help Drive Adoption of Mpeg-4 Standard in
Media Applications’ [online]. Sun Software Systems. Available from:
http://java.sun.com/pr/2000/09/spotnews/sn000912.html [Accessed 5 March 2003]

Overview of the MPEG-4 Standard [online]. Available from:


http://mpeg.telecomitalialab.com/standards/mpeg-4/mpeg-4.htm. [Accessed DATE]
http://mpeg.telecomitalialab.com/faq/mp4-sys/sys-faq-mpegj.htm
MPEG-4 Jump start
Barda, Jean, Cohen, Daniel, DeBellefonds, Phillipe, Lecomte, Daniel, (2000) ? ‘Les Normes & Les
Standards du Multimédia’ (Dunod, Paris)

http://www.comelec.enst.fr/~dufourd/mpeg-4/pv349.pdf MPEG-4 authoring

Links from favorites


http://www.cms.livjm.ac.uk/library/EMMSEC/Part-04/085-Bauer.pdf

18
http://www.spie.org/web/oer/october/oct00/cover2.html
19
http://mpeg.telecomitalialab.com/standards/mpeg-4/mpeg-4.htm

Page 18 sur 35
An Introduction to MPEG-1, MPEG-2, MPEG-4, MPEG-7 & MPEG-2

http://www.vialicensing.com/products/mpeg4aac/standard.html
http://perso.enst.fr/~dufourd/mpeg-4/tools.html#1
http://www.fing.org/index.php?num=1864,3,1066,8

Look up
http://www.comelec.enst.fr/~dufourd/mpeg-4/iscas00_1312.pdf
MPEG-4 PC http://www.q-team.de/mpeg4/mpeg4pc.htm by Esprit

Bibliography
http://www.vcodex.fsnet.co.uk/h264.html
http://sourceforge.net/mail/?group_id=62855

Page 19 sur 35
An Introduction to MPEG-1, MPEG-2, MPEG-4, MPEG-7 & MPEG-2

MPEG-7
Introduction
MPEG-7 is a seven-part specification, formally entitled ‘Multimedia Content Description Interface’.
It provides standardised tools for describing multimedia content, which will enable searching,
filtering and browsing of multimedia content.

Part Reference Title


Part 1 ISO 15938-1 ‘Systems’
Part 2 ISO 15938-2 ‘Description Definition Language’
Part 3 ISO 15938-3 ‘Visual’
Part 4 ISO 15938-4 ‘Audio’
Part 5 ISO 15938-5 ‘Multimedia Description Schemes’
Part 6 ISO 15938-6 ‘Reference Software’
Part 7 ISO 15938-7 ‘Extraction and use of MPEG-7 descriptions/Conformance’

MPEG-7 tools consist of:


• Descriptors (Ds)
• Description Schemes (Dss)
• Description Definition Language (DDL).

Descriptors describe the syntax and semantics of audio, video and multimedia content. MPEG-7
descriptors use XML (eXtensible Markup Language) with MPEG-7 extensions. It provides a textual
representation of the content. The Descriptors (located in an .xml file) can be both physically
located with the multimedia content it is describing (either in the same data stream or storage
system) or externally, through the provision of a link between the file and the multimedia content.

Descriptors are used to describe:

• What is in the content


• The form of the content (e.g. file format, file size)
• Conditions for accessing the multimedia content (e.g. the cost, intellectual property rights)
• Content classification (e.g. classification into pre-defined categories)
• Links to other material that may be relevant to the search being carried out by the user
• Context of the material (e.g. a particular event that is depicted in a video)

Different levels of abstraction exist e.g.:

• low abstraction level includes the description of shape, motion, size, texture, colour and
camera movement, position for video and mood, tempo changes, energy, harmonicity, and
timbre for audio. Many of these features can be automatically extracted.
• high abstraction level provides semantic information on the content e.g. abstract concepts
or content genres. These features require human interaction in describing the content.

Description Schemes (Dss), which are defined by MPEG-7 Description Definition Language (DDL),
specify the structure and semantics of the relationships between the Descriptors (XML elements)
and Description Schemes. DSs are mainly used to describe high-level audio-visual features e.g.
describing regions, segments, objects, events, creation and production information and content
usage.
• Multimedia DSs describe multimedia content (audio, visual, textual etc.)
• Audio DSs describe audio content
• Visual DSs describe visual content

Description Definition Language (DDL) allows new ‘Descriptors’ and ‘Description Schemes’ to be
defined and existing DSs to be modified. XML Schema is the basis for the DDL. The DDL consists of:

• XML Schema structural language components

Page 20 sur 35
An Introduction to MPEG-1, MPEG-2, MPEG-4, MPEG-7 & MPEG-2

• XML Schema datatype language components


• MPEG-7 specific extension

MPEG-7 can be used to describe the following:


• Content
• Form
• Conditions for accessing the material
o Links to registry
o Intellectual property rights
o Price
• Classification
o Parental rating
o Content classificaiton (into pre-defined categories)
• Links to other relevant material (enables quicker searching)
• Context e.g. know the occasion of the recording
o Often textual information
• Creation – director, title, short feature movie
• Production process
• Content usage information
o Copyright pointers
o Usage history
o Broadcast schedule
• Storage features on the content
o Storage format
o Encoding
• Structural information (spatial, temporal or spatio-temporal components)
o Scene cuts
o Segmentation in regions
o Region motion tracking
• Low level features of content
o Colours
o Textures
o Sound timbres
o Melody description
• Conceptual information of the reality captured by the content
o Objects
o Events
o Interactions among objects
• How to browse the content in an efficient way
o Summaries
o Variations
o Spatial & frequency subbands
• Collection of objects
• User interaction (with the content)
o User preferences
o Usage history

Aims & Features


Through detailed descriptions of multimedia content, the indexation of the content will allow:

• Fast and efficient searching


• Filtering of audio-visual content
• Access of audio-visual content from a wide range of devices e.g. mobile phones, PDAs or set top
boxes.
• Identification
• retrieval of content
• Interoperability between different systems used to create, manage, distribute and consume
multimedia content descriptions.

Page 21 sur 35
An Introduction to MPEG-1, MPEG-2, MPEG-4, MPEG-7 & MPEG-2

“Most areas of the standard are platform independent”


It aims to be complimentary to existing standards including those created by MPEG as well as non-
compressed formats by other organisations.

Part 1 ISO 15938-1 Systems


MPEG-7 descriptions exist in two formats:
• Textual - XML which allows editing, searching and filtering of a multimedia description. The
description can be located anywhere, not necessaryily with the content.
• Binary format - suitable for storing, transmitting and streaming delivery of the multimedia
description.

The MPEG-7 Systems provides the tools for:


• The preparation of binary coded representation of the MPEG-7 descriptions, for efficient
storage and transmission.
• Transmission techniques (both textual and binary formats)
• Multiplexing of descriptions
• Synchronisation of descriptions with content
• Intellectual property management and protection

Terminal architecture
Normative interface

Descriptions may represented in two forms:


• textual (XML)
• binary form (BiM – Binary format for Metadata). Binary coded representation is useful for
efficient storage and transmission of content.

• MPEG-7 data is obtained from transport or storage


• Handed to delivery layer
• This allows extraction of elementary streams (consisting of individually accessible chunks
called access units) by undoing the transport/storage specific framing and multiplexing and
retains timing information needed for synchronisation.
• Elementary streams are forwarded to the compression layer where the schema streams
(schemes describing strucure of MPEG-7 data) and partial or full description streams
(streams describing the content) are decoded.

The delivery layer sends user request streams to transmission/storage.

BiM Binary format for Metadatam


Transport e.g. MPEG-2; IP
Storage MP4 file

see “Overview of the MPEG-7 Standard”by S F Chang, T Sikora, A Puri.


Goal, context, open issues of the standard.

Part 2 ISO 15938-2 Description Definition Language


“MPEG-7 requirements
• Datatype definition
• D and description scheme declaration
• Attribute declaration
• Typed reference
• Content model
• Inheritance/subclassing mechanism
• Abstract D and description scheme
• Description scheme inclusion”

Page 22 sur 35
An Introduction to MPEG-1, MPEG-2, MPEG-4, MPEG-7 & MPEG-2

Part 3 ISO 15938-3 Visual


Part 3 specifies a set of standardised visual Ds and DSs including:
• Colour
o Color Space
o Color Quantization
o Dominant Color
o Scalable Color
o Color Layout
o Color Structure
o Group of Picture Color
• Texture
o Homogeneous Texture
o Texture Browsing
o Edge histogram
• Shape
o Region Shape
o Contour Shape
o Shape 3D
• Motion
o Camera Motion
o Motion Trajectory
o Parametric Motion
o Motion Activity
• Face recognition
• Localisation
Visual Ds often require other low-level Ds or support elements.

Low-level Ds/Support elements structure (grid layout, spatial coordinates)


• viewpoint (multiple view)
• localisation (region locator)
• temporal (time series, temporal interpolation)

SpatioTemporal locator - Description Scheme for localisation of information. It is composed of


other DS e.g. FigureTrajectory, ParameterTrajectory.
see “The MPEG-7 Visual Standard for Content Description – An Overview” by T Sikora.
“High-level overview of the organisation and components of the MPEG-7 Visual Standard.” 20

Basic Elements
‘Content Entity’:

Time

Media Time
Time measured or stored within the media.
Datatypes:

Datatype <mediaTimePoint> <mediaDuration>


Represents: start time point duration
Syntax -YYYY-MM-DDThh:mm:ss:nFN (-)PnDTnHnMnSnNnF
Y = Year (use negative for BC P = separator, indicates start of a duration
dates) D = days
M = Months T = separates time from days
D = Days H = hours
T = separator M = minutes
h = hours S = seconds
m = minutes N = the counted fractions
s = seconds F = the fractions of 1 second

20
Introduction to the Special Issue on MPEG-7

Page 23 sur 35
An Introduction to MPEG-1, MPEG-2, MPEG-4, MPEG-7 & MPEG-2

n = number of fractions (e.g. ½


01F02)
Example 2 minutes, 10 seconds and 25/30 of a second =
<MediaDuration>PT2M10S25N30F</MediaDuration>

<MediaRelIncrTimePoint>

The data types represent ‘time periods’ using a ‘start time’

Generic/World Time
Time measured in the world.
Same as Media Time, but also contains Time Zone (TZ) information.

Graph

Relations
Internal
External

Text Annotation
Free text
Keyword
Structured

Dependency structure
Governer
Set of dependents

Classification Schemes and Terms


Term reference use
Inline term use
Free term use

Graphical classification schemes

People and places


Agents

Part 4 ISO 15938-4 Audio


Part 4 of the standard outlines a set of standardised audio Ds and DSs.
The audio Ds address four different types of audio signals:
• Pure music
• Pure speech
• Pure sound effects
• Arbitrary soundtracks

Examples of Audio features include:


• Silence
o SilenceType
• Spoken content (“representation of output of automatic speech recognition”)
o SpokenContentSpeakerType
o SpokenContentExtractionInfoType
o SpokenContentConfusionInfoType
o spokencontentLinkType
• Timbre (“perceptual features of instrument sounds”)
o InstrumentTimbreType
o HarmonicInstrumentTimbreType
o PercussiveInstrumentTimbreType

Page 24 sur 35
An Introduction to MPEG-1, MPEG-2, MPEG-4, MPEG-7 & MPEG-2

• Sound effects
o AudioSpectrumBasisType
o SoundEffectFeaturesType

• Melody contour
o ContourType
o MeterType
o BeatType

Low-level D categories
ScalableSeries
SeriesofScalarType

Audio Description framework


AudioSampledType
AudioWaveformEnvelopeType

Part 5 ISO 15938-5 Multimedia Description Schemes


Multimedia Description Schemes (MDS) are “metadata structures for describing and annotating
audio-visual (AV) content” 21 . Two forms of creating MPEG-7 descriptions exist - textual form (i.e.
XML) or in compressed binary form.

Both generic and more complex description tools have been standardised. Complex description
tools are used e.g. to describe audio and video at the same time.
MDS covers the following areas:

• Basic elements
• Content description
• Content management
• Content organisation
• Navigation & access
• User interaction e.g. user preferences and usage history.

Basic elements (lowest level)


MPEG-7 content description starts with an ‘root element’. Root elements can signify a partial or
complete description of the content. In a complete description, a top-level element follows the
root element.

Content management & Content Description


Builds on Basic elements level.
Describes the content from several viewpoints:
Content management (address primarily information related to the management of the content)
creation and production
media
usage
Content Description (devoted to the description of perceivable information)
structural aspects
conceptual aspects
Navigation and Access Tools are defined.
Summaries
Partitions and Decompositions
Variations

User Interaction
User Preferences
Usage History

21
Overview of the MPEG-7 standard

Page 25 sur 35
An Introduction to MPEG-1, MPEG-2, MPEG-4, MPEG-7 & MPEG-2

Part 6 ISO 15938-6 Reference Software


The aim of part 6 of the specification is to provide reference implementation of the relevant parts
of the standard. Known as XM – experimentation software

Part 7 ISO 15938-7 Extraction and use of MPEG-7


descriptions/Conformance
Guidelines and procedures for testing the conformance of MPEG-7 implementations.

End User Applications & Products


MPEG-7 applications can be:
• stored (on-line or off-line)
• streamed (broadcast, push models on the internet)
• Operated in real-time environments (description generated while content captured
• Operated in non real-time environments
MPEG-7 can be applied to a wide range of areas. Users can search for multimedia content via the
Internet or any databases containing such content or for an piece of music by keying in a few notes.
Speeches by a particular person could be extracted by playing inputting a few seconds of their
voice. Searches could be carried out using an images to find similar images, which could be useful
look up Avanti
ACTS
Yves Rocher database
An experimental photo and annotation retrieval tool. http://www.know-
center.at/en/divisions/div3demos.htm

References
‘Introduction to the Special Issue on MPEG-7’, IEEE Transactions on Circuits and Systems for Video
Technology, Vol. 11, No. 6, June 2001.

ISO/IEC JTC1/SC29/WG11 N, March 2000 ‘MPEG-7 Frequently Asked Questions’.


ISO/IEC JTC1/SC29/WG11 N3934, January 2001, ‘MPEG-7 Applications Document Version 10’

Bibliography
B S Manjunath, J R Ohm, V V Vasudevan, A Yamada. ‘Color and Texture Descriptors’. Discusses
specific Descriptors to describe colour and texture in visual scenes.

M Bober. ‘MPEG-7 Visual Shape Descriptors’. Discusses the Descriptors available to describe
shapes in visual scenes.

S Jeannin, A Divakaran. ‘MPEG-7 Visual Motion Descriptors’. Discusses the representation of


motion in visual scenes.

S Quackenbush, A Lindsay . ‘Overview of MPEG-7 Audio’. Presents a high level overview of the
organisation and components of the MPEG-7 Audio Standard.

JPA Charlesworth, PN Garner . ‘Spoken Content Representation in MPEG-7’. Introduces tools for
recognition of spoken content in MPEG-7 Audio.

M Casey . ‘MPEG-7 Sound Recognition Tools’. Tools for recognition of sounds included in MPEG-7
Audio.

P Salembier, JR S,oth. ‘MPEG-7 Multimedia Description Schemes’. The description schemes for
multimedia content of the MPEG-7 standard.

O Avaro, P Salembier. ‘MPEG-7 Systems: Overview’. an overview of progress in MPEG-7 systems.

Page 26 sur 35
An Introduction to MPEG-1, MPEG-2, MPEG-4, MPEG-7 & MPEG-2

J Hunter. ‘An Overview of the MPEG-7 Description Definition Language (DDL)’. Introduces DDL

http://perso.enst.fr/~dufourd/mpeg-4/tools.html#1
http://www.iath.virginia.edu/inote/
http://www.ricoh.co.jp/src/multimedia/MovieTool/
http://www.mpeg-industry.com/
http://archive.dstc.edu.au/mpeg7-ddl/

Page 27 sur 35
An Introduction to MPEG-1, MPEG-2, MPEG-4, MPEG-7 & MPEG-2

MPEG 21
Introduction
ISO/IEC TR21000 ‘Information Technology – Multimedia Framework (MPEG-21)’ is not a standard but
a type 3 ‘Technical Report’. ‘Type 3’ means that the Joint Technical Committee has collected
information that is different to what is normally published as an international standard. It is a six-
part document, however, additional parts may be added in the future.

Part 1 Vision, Technologies & Strategy


Part 2 Digital Item Declaration
Part 3 Digital Item Identification & Description
Part 4 Intellectual Property Management & Protection
Part 5 Rights Expression Language
Part 6 Rights Data Dictionary

The aim of MPEG-21 is:

• to describe how elements that support the multimedia chain (e.g. protocols for interfaces),
either existing or under development, that form the infrastructure for the delivery and
consumption of multimedia content will fit together and therefore create an ‘open framework’
for multimedia delivery and consumption.

• Recommend which new standards are required. The new standards will be developed by MPEG
and other standard bodies, who will collaborate with each other to integrate the standards into
the multimedia framework.’

• to provide interfaces and protocols that will enable the creation, manipulation, search, access,
delivery and (re)use of content anywhere in the multimedia chain.

It will ensure equal opportunities for users and content creators as content will be interoperable.
It will enable the ‘transparent and augmented use of multimedia resources across a wide range of
networks and devices used by different communities’.
Ease of use.
Simplified and interoperable (perhaps automatic) transactions.
Efficient interaction.
‘The integration of critical technologies enabling…..’

Multimedia Framework
A multimedia framework will facilitate the co-operation between different terminals and networks
and different communities. Communities consists of content creators, financial services,
communications, computer, consumer electronics sectors, customers etc.

Interoperability
Identification, management, protection of content.
Content can be transported over various terminals or devices.
Accurate and efficient event reporting and management of personal information, preferences and
privacy.
Automation, if possible.

The multimedia chain


Content creation, production, delivery, consumption. Content needs to be identified, described,
managed and protected. User privacy is also required.

Elements consist of:


Digital item declaration
Content handling & usage

Page 28 sur 35
An Introduction to MPEG-1, MPEG-2, MPEG-4, MPEG-7 & MPEG-2

The availability of and access to content is increasing. Standards are required to facilitate
searching, locating, caching, archiving, routing, distribution and use of content. Personalisation
and user profile management is also required so that the user enjoys a better experience and
businesses receive a better return.

Terminals and networks


The aim of MPEG-21 is to enable content to be used on all devices in a transparent manner. Devices
include set top boxes, mobile phones, television etc.

interoperable and transparent access to multimedia content ‘by shielding users from network and
terminal installation, management and implementation issues’. This will improve the user
experience.

Content representation
The content is:
Coded – the content is encoded into digital format. Currently, providers need to created different
version of their content to be viewed on different supports, bandwidths.
Identified, described, stored, delivered, protected, consumed etc.
Content representation means the content will be efficient, scalable, error resilient, interactive,
able to be synchronised and multiplexed.
Requirements: to provide technology so that all types of content can be efficiently represented.

Event reporting
the document should provide metrics and interfaces.
An event is an interaction. A report describes what occurred. Different observers may provide
different reports as they may have different views or opinions. Therefore, there is no standard way
of reporting an event. Types of event reporting required include effectiveness of financial
reporting, network service delivery, advertising. Standardise ‘metrics and interfaces’ for
performance of all reportable events in MPEG-21. A way of storing metrics and interfaces.

User Requirements
e.g. personalisation of content
tracking content and transactions
privacy
scalability

Work Plan
Part 2 – 2002
Part 3 – 2002
Part 4 - 2003-06-13 Part 5 – started in 2001, due May 2002.
Part 6 - ?
Part 7 – May 2002

Part 1- Vision, Technologies & Strategy


This part of the document discusses the vision for a multimedia framework. It aids ‘the integration
of components and standards’ which will in turn aid the harmonisation of technologies involved in
‘creation, management, manipulation, transport, distribution and consumption of content’. It also
provides a strategy for achieving a multimedia framework and discusses how, via collaboration with
various standards bodies, specifications and standards ‘based on well-defined functional
requirements can be developed’.

Part 2 - Digital Item Declaration


A ‘Digital Item’ is ‘a structured digital object with a standard representation identification and
metadata’ within the MPEG-21 framework. It is also ‘the fundamental unit of distribution and
transaction’ in the framework. It is a ‘structured digital object’ 22 with a standard representation,
identification and description.

22
“Delivery Context in MPEG-21”

Page 29 sur 35
An Introduction to MPEG-1, MPEG-2, MPEG-4, MPEG-7 & MPEG-2

Although there is lots of different types of content, currently there is no way to represent a digital
item, therefore a definition of a digital item is required. Eg it is difficult to determine if a web
(HTML) page with images etc. and some JavaScript can be considered as one digital item or a
number of digital items.

MPEG-21 will ‘design a method for identification and description that is interoperable to provide for
support for’:
Accuracy, reliability and uniqueness of identification
So that all entities of any type can be identified
The association of identifiers with Digital Items in a persistent and efficient manner.
Ids and descriptions will be secure and remain intact
The process of rights and location, retrieval and acquisition of content will be automatic.

Model
Representation
Schema

Part 3 - Digital Item Identification & Description


Currently most multimedia content is not identified or described. Some identification systems exist
e.g. ISBN (International Standard Book Number), however, often these systems are specific to
certain media. There is also no way to ensure identification and description will remain associated
with the content, therefore efficient usage is not possible. A standard way to describe content
needs to be found, so the idea of digital item identification and description is to identify and
describe content to make it easier to manage. This will help to raise the value of the digital item
within the multimedia framework. The descriptive information needs to be organised and
structured so that different versions of the same media can be distinguished from each other.
Examples of systems currently underway include MPEG-7 and Onix International. This will also
enable IPMP, searching, cataloguing etc.

Part 4 - Intellectual Property Management & Protection


Different IPMP systems exist but they are not interoperable and monitoring and detection systems
are not interoperable. MPEG-21 aims to create reliable protection across networks and devices of
rights and a ‘uniform framework ‘ to enable reliable management and protection of rights across
different networks and devices. It will enable access to and interaction with digital items, while
also protecting the privacy of the user.

Part 5 - Rights Expression Language


Rights Expression Language (REL) is “a machine-readable language that can declare rights and
permissions using the terms as defined in the Rights Data Dictionary” 23 .

Part 6 - Rights Data Dictionary

References
ISO/IEC TR21000-1 ‘Information Technology – Multimedia Framework (MPEG-21) Part 1 ‘Vision,
Technologies & Strategy’. 2001. ISO/IEC. Geneva.

Vetro, Anthony, Devillers, Sylvain. (2002). ‘Delivery Context in MPEG-21’. Available from

Editors Bormans, Jan, Hill Keith. MPEG-21 Overview, Version 4. ‘N4801’. Fairfax 2002

23
N4801, MPEG-21 Overview.”

Page 30 sur 35
An Introduction to MPEG-1, MPEG-2, MPEG-4, MPEG-7 & MPEG-2

Bibliography
http://www.dmdsecure.com/pdfs/DMDsecure_Solution_Overview.pdf

Page 31 sur 35
An Introduction to MPEG-1, MPEG-2, MPEG-4, MPEG-7 & MPEG-2

General Products
Open source product for streaming MPEG-1, 2, 4 and other formats. http://www.videolan.org/

References
Gwynne, Peter (2000). ‘MPEG standards stimulate fresh multimedia technologies’ [online].
Available from: http://www.spie.org/web/oer/october/oct00/cover2.html [Accessed 6 March 2003]

ISO/IEC. ‘Programme of Work’ [online]. Available from:


http://www.itscj.ipsj.or.jp/sc29/29w42911.htm [Accessed 2003]

[online]. Available from:


http://www.digitalnetworks.philips.com/InformationCenter/Global/FArticleSummary.asp?lNodeId=
760&channel=760&channelId=N760A2171 webcine encoder [Accessed 2003]

[online]. Available from: http://www.europeanstreaming.com/mpeg4.htm [Accessed 2003]

[online]. Available from: http://www.m4if.org/public/documents/vault/m4-out-


20027.pdf?PHPSESSID=14ef2822738e70565e82511165b0ca14 [Accessed: 28 March 2003]

look up
http://java.sun.com/

Page 32 sur 35
An Introduction to MPEG-1, MPEG-2, MPEG-4, MPEG-7 & MPEG-2

Appendix 1 MPEG-4 Profiles


Visual Profiles
Natural video content
1 Simple Visual Profile
Efficient and error resilient coding of rectangular objects. It forms the basis of all visual
profiles and is derived from the ITU H.263 specification. Mobile networks e.g. PCS and IMT2000.

2 Simple Scalable Visual Profile


Adds support to the Simple Visual Profile for the coding of temporal and spatial scalable
objects. Applications that provide services at different levels of quality e.g. Internet and
software decoding.

3 Core Visual Profile


Adds support to the Simple Visual Profile for the coding of arbitrary-shaped and temporally
scalable objects. Applications that provide relatively simple content interactivity e.g. Internet
multimedia applications.

4 Main Visual Profile


Adds support to the Core Visual Profile for coding of interlaced, semi-transparent and sprite
objects. Interactive and entertainment quality broadcast and DVD applications.

5 N-Bit Visual Profile


Adds support to the Core Visual Profile for the coding of video objects having pixel-depths
ranging from 4-12 bits. Surveillance applications.

Synthetic and synthetic/natural hybrid visual content


6 Simple Facial Animation Visual Profile
A simple means to animate a face model. Applications such as audio/video presentation for the
hearing impaired.

7 Scalable Texture Visual Profile


Spatial scalable coding still image (texture) objects. Applications needing multiple scalability
levels, such as mapping texture onto objects in games, and high-resolution digital still cameras.

8 Basic Animated 2-D Texture Visual Profile


Spatial scalability, SNR scalability, mesh-based animation for still image (texture) objects and
also simple face object animation.

9 Hybrid Visual Profile


Combines the ability to decode arbitrarty-shaped and temporally scalable natural video objects
(as in the Core Visual Profile) with the ability to decode several synthetic and hybrid objects.
Content-rich multimedia applications.

Natural video content (Version 2)


10 Advanced Real-Time Simple Profile (ARTS)
Advanced error resilient coding techniques of rectangular video objects using a back channel
and improved temporal resolution stability with the low buffering delay. Real time coding
applications e.g. videophone, tele-conferencing and remote observation.

11 Core Scalable Profile

Page 33 sur 35
An Introduction to MPEG-1, MPEG-2, MPEG-4, MPEG-7 & MPEG-2

Adds support for coding of temporal and spatial scalable arbitrarily shaped objects to the Core
Profile. Internet, mobile and broadcast.

12 Advanced Coding
Improves the coding efficiency for both rectangular and arbitrary shaped objects. Mobile
broadcast reception, acquisition of image sequences (camcorders) and where high coding
efficiency is requested and small footprint is not the prime concern.

Synthetic and synthetic/natural hybrid visual content (Version 2)


13 Advanced Scaleable Texture Profile
Supports decoding of arbitrary-shaped texture and still images. Fast random access.

14 Advanced Core Profile


Combines Core Visual Profile with Advanced Scaleable Texture Profile. Content-rich multimedia
applications.

15 Simple Face and Body Animation Profile


Superset of the Simple Face Animation Profile, adding body animation.

16 Advanced Simple Profile


Similar to Simple Visual Profile, but with B-frames, ¼ pel motion compensation, extra
quantisation table and global motion compensation.

17 Fine Granularity Scalability Profile


Allows truncation of the enhancement layer bitstream at any bit position so that delivery
quality can easily adapt to transmission and decoding circumstances. Can be used with Simple
or Advanced Simple as a base layer.

18 Simple Studio Profile


Has only I frames. Supports arbitrary shape and multiple alpha channels. Bitrates nearly up to
2 Gigabit per second. Studio editing applications.

19 Core Studio Profile


More efficient than Simple Studio Profile, as it adds P frames, but requires more complex
implementations.

Audio Profiles
MPEG-4 Version 1
1 Speech Profile
Provides HVXC (a very-low bit-rate parametric speech coder), a CELP narrowband/wideband
speech coder and a Text-To-Speech interface.

2 Synthesis Profile
Provides score driven synthesis using SAOL and wavetables and a Text-to-Speech interface.

3 Scalable Profile
Superset of the Speech Profile. Scalable coding of speech and music. Bitrates range 6 - 24
kbits/s, Bandwidths 3.5 - 9 kHz. Networks e.g. Internet and Narrow band Audio Digital
Broadcasting (NADIB).

4 Main Profile
Rich superset of all the other Profiles. Contains tools for natural and synthetic audio.

Page 34 sur 35
An Introduction to MPEG-1, MPEG-2, MPEG-4, MPEG-7 & MPEG-2

MPEG-4 Version 2
5 High Quality Profile
Contains CELP and Low Complexity AAC coder, including Long Term Prediction. Scalable coding.
Coding can be performed by the AAC Scalable object type. Error Resilient (ER) bitstream syntax
may be used.

6 Low Delay Profile


Contains HVXC and CELP speech coders, Low-delay AAC coder, Text-to-Speech interface TTSI.
ER is optional.

7 Mobile Audio Internetworking (MAUI)


Contains all natural audio coding tools available in MPEG-4 except synthetic coding tools.

Graphic Profiles
1 Simple 2D Graphics Profile
2 Complete 2D Graphics Profile
3 Complete Graphics Profile
4 3D Audio Graphics Profile

Profiles Under Definition or Consideration:


5 Simple 2D+Text Profile
6 Core 2D Profile
7 Advanced 2D Profile
8 X3D Core Profile

Scene Graph Profiles (Scene Description Profiles)


1 Audio
2 Simple 2D
3 Complete 2D
4 Complete
5 3D Audio

Profiles under definition


6 Basic 2D Profile
7 Core 2D Profile
8 Advanced 2D Profile
9 Main 2D Profile
10 X3D Core Profile

MPEG-J Profiles
1 Personal
2 Main

Page 35 sur 35