Beruflich Dokumente
Kultur Dokumente
3GPP2 C.S0050-0
Version 1.0
Date: 12 December 2003
COPYRIGHT
3GPP2 and its Organizational Partners claim copyright in this document and individual
Organizational Partners may copyright and issue documents or standards publications
in individual Organizational Partner's name based on this document. Requests for
reproduction of this document should be directed to the 3GPP2 Secretariat at
secretariat@3gpp2.org. Requests to reproduce individual Organizational Partner's
documents should be directed to that Organizational Partner. See www.3gpp2.org for
more information.
5
6
C.S0050-0 v1.0
1
2
1 Contents
Contents ............................................................................................................................................2
Scope.................................................................................................................................................5
References.........................................................................................................................................5
Abbreviations....................................................................................................................................6
Introduction.......................................................................................................................................7
10
11
12
13
14
15
8.1
Conformance ......................................................................................................................8
File identification
8
8.1.1
Registration of codecs
9
8.1.2
Interpretation of 3GPP2 file format
9
8.1.3
Limitation of the ISO base media file format
9
8.1.4
16
17
18
8.2
19
20
21
8.3
Video ................................................................................................................................11
MPEG-4 Video
11
8.3.1
H.263
12
8.3.2
22
23
24
25
26
27
8.4
28
29
30
8.5
31
32
33
34
35
36
C.S0050-0 v1.0
1
10
11
3
4
11.1
11.2
11.3
Definition of a cmf-header................................................................................................32
11.4
8
9
10
11
11.5
Tables ...............................................................................................................................40
40
11.5.1 TimeBase Values
41
11.5.2 Pitch Bend Range Values
41
11.5.3 Fine Pitch Bend Values
12
13
14
15
11.6
16
17
18
19
20
21
22
11.8
23
24
25
26
27
28
29
30
31
32
33
34
35
2 List of Figures
36
37
38
39
C.S0050-0 v1.0
1
Figure B 1: Hinted Presentation for Streaming (Reprint from ISO/IEC 14496-12) ....... 46
7
8
9
3 List of Tables
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Table 9-1: Content model for the 3GPP2 SMIL profile ................................................. 31
32
33
34
35
C.S0050-0 v1.0
4 Scope
2
3
4
5
6
7
The objective is to define and standardize a set of common file formats to be used in
multimedia services such as Multimedia Streaming Service (MSS) and Multimedia
Messaging Service (MMS) and to provide interoperability with existing 3G and the
Internet multimedia services to the greatest extent possible. The specific media types
and descriptions to be covered include: video, audio, images, graphics, high fidelity
audio as well as presentation layout and synchronization.
5 References
10
11
12
3. ISO/IEC 14496-12:2003 Information technology Coding of audiovisual-objects Part 12: ISO base media file format.
13
14
4. ISO/IEC 14496-14:2003, Information technology Coding of audiovisual-objects Part 14: MP4 file format.
15
16
17
18
19
20
21
22
23
24
25
26
27
28
12. IETF RFC 2658: RTP Payload Format for PureVoice(tm) Audio.
29
30
13. IETF RFC 3558: RTP Payload Format for Enhanced Variable Rate
Codecs (EVRC) and Selectable Mode Vocoders (SMV), June, 2003.b
31
32
33
14. IETF RFC 3267: RTP payload format and file storage format for the
Adaptive Multi-Rate (AMR) Adaptive Multi-Rate Wideband (AMR-WB)
audio codecs, March 2002.
34
35
15. C.S0020-0: High Rate Speech Service Option 17 for Wideband Spread
Spectrum Communications Systems
36
37
16. C.S0030-0 Version 2.0 Selectable Mode Vocoder Service Option for
Wideband Spread Spectrum Communication Systems.
38
39
40
C.S0050-0 v1.0
1
2
3
4
5
20. ITU-T Recommendation H.263 (annex X): "Annex X: Profiles and levels
definition".
21. IETF RFC 2234: Augmented BNF for Syntax Specification: ABNF.
7
8
22. IETF RFC 3625: The QCP File Format and Media Types for Speech
Data.
9
10
11
12
13
24. "Standard MIDI Files 1.0", RP-001, "The Complete MIDI 1.0 Detailed
Specification, Document Version 96.1" The MIDI Manufacturers
Association, Los Angeles, CA, USA, February 1996.
14
15
16
17
18
26. IETF RFC 2083: "PNG (Portable Networks Graphics) Specification version
1.0 ".
19
20
21
22
23
24
25
26
27
28
29
30
31
6 Abbreviations
32
33
3G
3GP
3GPP
3GPP2
AAC
AMR
Adaptive Multi-Rate
AMR-WB
BMP
C.S0050-0 v1.0
CMF
EVRC
IETF
IP
Internet Protocol
ISO
ITU-T
JPEG
MIDI
MIME
MMA
MPEG
QCELP
PNG
RFC
RTP
SMIL
SMV
1
2
7 Introduction
3
4
5
6
7
8
The purpose of this standard is to define a set of file formats to be used with 3GPP2
multimedia services. Among these file formats is a new format designated as the 3GPP2
file format or .3g2 file format. It is the recommended format to use and can contain
multiple media types (such as, video, audio, and timed text). Also included in this
release are a file format for 13K [15], .qcp, a presentation and layout description
language and file format, .smi, and a compact multimedia file format, .cmf.
10
11
12
13
14
15
16
17
This document does not specify how this file format is used in specific services. In other
words, it should be expressly mandated, if necessary, in other service specifications
whether to use this specification.
C.S0050-0 v1.0
2
3
4
5
6
The purpose of this section is to define the 3GPP2 file format for multimedia services.
This file format is based on the ISO base media file format [3]. Also, it adopts the
methodology defined in [5] to integrate necessary structures for inclusion of non-ISO
codecs such as H.263 [8], AMR [9], AMR-WB [10], and extends this approach to include
3GPP2 specific codecs: EVRC [11], SMV [16] and 13K Speech (QCELP) [15].
8.1 Conformance
8
9
10
11
12
13
14
15
16
The 3GPP2 file format, used in the specification for timed media (such as video, audio,
and timed-text), is structurally based on the ISO base media file format defined in [3].
However, the conformance statement for 3GPP2 files is defined in the present document
by addressing file identification (file extension, brand identifier and MIME type
definition) and registration of codecs.
NOTE:
Future releases may expand the conformance statement for 3GPP2 files to include more codecs
or functionalities by defining new boxes. Boxes of unknown type in the 3GPP2 file shall be
ignored.
17
18
19
20
21
22
23
3GPP2 multimedia files can be identified using several mechanisms. When stored in
traditional computer file systems, these files should be given the file extension .3g2
(readers should allow mixed case for the alphabetic characters). The following MIME
types are expected to be registered and should be used: video/3gpp2 (for visual or
audio/visual content, where visual includes both video and timed text) and
audio/3gpp2 (for purely audio content).
24
25
26
27
A file-type box Table 8-1, as defined in the ISO Base Media File Format [3] shall be
present in conforming files. The file type box ftyp shall occur before any variable-length
box (e.g. movie, free space, media data). Only a fixed-size box such as a file signature,
if required, may precede it.
28
29
30
31
32
33
34
The brand identifier for this specification is '3g2a'. This brand identifier shall occur in
the compatible brands list, and may also be the primary brand. Readers should check
the compatible brands list for the identifiers they recognize, and not rely on the file
having a particular primary brand, for maximum compatibility. Files may be
compatible with more than one brand, and have a 'best use' other than this
specification, yet still be compatible with this specification.
Field
BoxHeader.Size
BoxHeader.Type
Brand
MinorVersion
CompatibleBrands
35
36
37
Type
Unsigned int(32)
Unsigned int(32)
Unsigned int(32)
Unsigned int(32)
Unsigned int(32)
Details
Value
'ftyp'
C.S0050-0 v1.0
1
be 3g2a.
2
3
4
5
6
MinorVersion: This identifies the minor version of the brand. Files with brand '3g2X',
where X is an alphabet, shall have a corresponding releaseX.Y.Z such that X = 1
when the brand equals A; X = 2 when the brand equals B; and so on. A conforming
minor version value for releaseX.y.z uses the byte aligned and right adjusted value
of releaseX*2562 + y*256+z.
7
8
9
10
CompatibleBrands: a list of brand identifiers (to the end of the Box). 3g2a shall be a
member of this list. The brand compatibility list shall include major brands 3gp4
and/or 3gp5 as described in [5] and [6] when the file content meets the conditions
defined in Annex A section A.3.
11
12
13
14
15
In 3GPP2 files, MPEG-4 video and MPEG-4 AAC audio streams, and other ISO codec
streams as well as the non-ISO media streams such as AMR narrow-band speech, AMR
wide-band speech, EVRC speech, H.263 video, 13K speech, SMV speech and timed text
can be included as described in this specification.
16
17
18
19
20
All index numbers used in the 3GPP2 file format start with the value one rather than
zero, in particular first-chunk in Sample to chunk box, sample-number in Sync
sample box and shadowed-sample-number, sync-sample-number in Shadow sync
sample box.
21
22
The following limitation to the ISO base media file format [3] shall apply to a 3GPP2 file:
23
24
A 3GPP2 file shall be self-contained, i.e., there shall not be references to external media
data from inside the 3GPP2 file.
25
26
27
8.2.1 Overview
28
29
30
31
32
33
34
The purpose of this section is to give some background information about the Sample
Description Box in the ISO base media file format [3]. The following sections define the
necessary structure for integration of the MPEG-4 video, H.263 video, AMR speech,
AMR-WB speech, MPEG-4 AAC audio, 13K speech, EVRC speech, and SMV speech
media specific information in a 3GPP2 file. Section 8.3 and 8.4 describe Sample Entries
for video and audio media respectively. Section 8.5 describes TextSampleEntry for the
timed text.
35
36
37
38
In an ISO file, Sample Description Box gives detailed information about the coding type
used, and any initialization information needed for that coding. The Sample Description
Box can be found in the ISO file format Box Structure Hierarchy shown in Figure 8-1.
C.S0050-0 v1.0
Movie Box
Track Box
Media Box
1
2
3
4
5
The Sample Description Box can have one or more Sample Entry Boxes. Valid Sample
Entry Boxes already defined for ISO [5] and MP4 [5] include MP4AudioSampleEntry,
MP4VisualSampleEntry, and HintSampleEntry.
In addition, the Sample Entry Box for H.263 video shall be H263SampleEntry.
The Sample Entry Box for AMR and AMR-WB speech shall be AMRSampleEntry.
9
10
The Sample Entry Box for 13K (QCELP) speech shall be QCELPSampleEntry or
MP4AudioSampleEntry.
11
12
13
14
15
10
C.S0050-0 v1.0
1
2
3
4
5
6
7
8
9
10
11
SampleEntry ::=
MP4VisualSampleEntry |
MP4AudioSampleEntry |
H263SampleEntry |
AMRSampleEntry |
EVRCSampleEntry |
QCELPSampleEntry |
SMVSampleEntry |
TextSampleEntry |
HintSampleEntry
Field
MP4VisualSampleEntry
Type
MP4AudioSampleEntry
H263SampleEntry
AMRSampleEntry
EVRCSampleEntry
QCELPSampleEntry
SMVSampleEntry
TextSampleEntry
HintSampleEntry
12
Details
Entry type for visual samples defined in
the MP4 specification [5].
Entry type for audio samples defined in
the MP4 specification [5].
Entry type for H.263 visual samples
defined in section 8.3.2 of the present
document.
Entry type for AMR and AMR-WB
speech samples defined in section 8.4
of the present document.
Entry type for EVRC speech samples
defined in section 8.4 of the present
document.
Entry type for 13k (QCELP) speech
samples defined in section 8.4 of the
present document.
Entry type for SMV speech samples
defined in section 8.4 of the present
document.
Entry type for timed text samples
defined in section 8.5 of the present
document.
Entry type for hint track samples
defined in the ISO specification [5].
Value
13
8.3 Video
14
15
16
17
NOTE:
18
19
The MP4VisualSampleEntry Box in the MPEG-4 file format [4] is defined as follows:
20
21
22
Throughout this document MPEG-4 Visual is referred to as MPEG-4 video, which should be
taken to mean encoding of natural (pixel based) video using MPEG-4 Visual methods.
11
C.S0050-0 v1.0
1
2
3
4
5
6
7
8
9
10
11
12
13
Data-reference-index
Reserved_16
Width
Height
Reserved_4
Reserved_4
Reserved_4
Reserved_2
Reserved_32
Reserved_2
Reserved_2
ESDBox
Field
BoxHeader.Size
BoxHeader.Type
Reserved_6
Data-reference-index
Type
Unsigned int(32)
Unsigned int(32)
Unsigned int(8) [6]
Unsigned int(16)
Reserved_16
Width
Height
Unsigned int(16)
Reserved_4
Reserved_4
Reserved_4
Reserved_2
Reserved_32
Reserved_2
Reserved_2
ESDBox
Details
Value
'mp4v'
0
14
15
16
17
This version of the MP4VisualSampleEntry, with explicit width and height, shall be
used for MPEG-4 video streams conformant to this specification.
18
19
20
NOTE:
21
8.3.2 H.263
22
23
24
Width and height parameters together may be used to allocate the necessary memory in the
playback device without need to analyse the video stream.
25
12
C.S0050-0 v1.0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Field
BoxHeader.Size
BoxHeader.Type
Reserved_6
Data-reference-index
Type
Unsigned int(32)
Unsigned int(32)
Unsigned int(8) [6]
Unsigned int(16)
Reserved_16
Width
Height
Unsigned int(16)
Reserved_4
Reserved_4
Reserved_4
Reserved_2
Reserved_32
Reserved_2
Reserved_2
H263SpecificBox
Details
Value
's263'
0
16
17
18
19
20
If one compares the MP4VisualSampleEntry box to the H263SampleEntry box the main
difference is in the replacement of the ESDBox, which is specific to MPEG-4 systems,
with a Box suitable for H.263: H263SpecificBox. The H263SpecificBox field structure
for H.263 is described in Section 8.3.2.2.
21
22
23
24
The H263SpecificBox fields for H.263 shall be as defined in Table 8-5. The
H263SpecificBox for the H263SampleEntry Box shall always be included if the 3GPP2
file contains H.263 media.
13
C.S0050-0 v1.0
1
Type
Unsigned int(32)
Unsigned int(32)
H263DecSpecStruc
Details
Value
d263
3
4
BoxHeader Size and Type: indicate the size and type of the H.263 decoder-specific
Box. The type must be d263.
5
6
DecSpecificInfo: This is the structure where the H263 stream specific information
resides.
8
9
10
11
12
13
14
struct H263DecSpecStruc{
Unsigned int (32)
Unsigned int (8)
Unsigned int (8)
Unsigned int (8)
}
15
16
17
18
19
20
21
vendor: four character code of the manufacturer of the codec, e.g. 'VXYZ'. The vendor
field gives information about the vendor whose codec is used to create the encoded
data. It is an informative field which may be used by the decoding end. If a vendor
already has a four character code, that code should be used in this field. Otherwise, a
vendor may create a four character code which best expresses the vendors name. This
field may be ignored.
22
23
24
25
26
decoder_version: version of the vendors decoder which can decode the encoded
stream in the best (i.e. optimal) way. This field is closely associated with the vendor
field. It may be used advantageously by vendors which have optimal encoder-decoder
version pairs. The value shall be set to 0 if the decoder version has no importance for
the vendor. This field may be ignored.
27
28
29
H263_Level and H263_Profile: These two parameters define which H263 profile and
level is used. These parameters are based on the MIME media type, video/H263-2000.
The profile and level specifications can be found in [20].
30
31
EXAMPLE 1:
32
EXAMPLE 2:
33
34
NOTE: The "hinter", for the creation of the hint tracks, can use the information given by
the H263DecSpecStruc members.
35
36
vendor
decoder_version
H263_Level
H263_Profile
14
C.S0050-0 v1.0
2
3
5
6
7
8
9
10
11
12
13
14
15
16
MP4AudioSampleEntry::= BoxHeader
Reserved_6
Data-reference-index
Reserved_8
Reserved_2
Reserved_2
Reserved_4
TimeScale
Reserved_2
ESDBox
Field
BoxHeader.Size
BoxHeader.Type
Reserved_6
Data-reference-index
Type
Unsigned int(32)
Unsigned int(32)
Unsigned int(8) [6]
Unsigned int(16)
Reserved_8
Reserved_2
Reserved_2
Reserved_4
TimeScale
Reserved_2
ESDBox
17
Details
Value
'mp4a'
0
18
19
20
21
8.4.2 AMR
22
23
24
AMR and AMR-WB speech data are stored in the stream according to the AMR and
AMR-WB storage format for single channel header [14], without the AMR magic
numbers.
25
26
27
For narrow-band AMR, the box type of the AMRSampleEntry Box shall be 'samr'. For
AMR wideband (AMR-WB), the box type of the AMRSampleEntry Box shall be 'sawb'.
15
C.S0050-0 v1.0
1
2
3
4
5
6
7
8
9
10
11
12
13
Type
Unsigned int(32)
Unsigned int(32)
Unsigned int(8) [6]
Unsigned int(16)
Reserved_8
Reserved_2
Reserved_2
Reserved_4
TimeScale
Reserved_2
AMRSpecificBox
Details
Value
'samr' or sawb
0
14
15
16
17
18
If one compares the MP4AudioSampleEntry Box to the AMRSampleEntry Box the main
difference is in the replacement of the ESDBox, which is specific to MPEG-4 systems,
with a box suitable for AMR and AMR-WB. The AMRSpecificBox field structure is
described in section 8.4.2.2.
19
20
21
22
23
The AMRSpecificBox fields for AMR and AMR-WB speech shall be as defined in Table
8-8. The AMRSpecificBox for the AMRSampleEntry Box shall always be included if the
3GPP2 file contains AMR or AMR-WB media.
Field
BoxHeader.Size
BoxHeader.Type
DecSpecificInfo
Type
Unsigned int(32)
Unsigned int(32)
AMRDecSpecStruc
Details
Value
damr
24
25
26
BoxHeader Size and Type: indicate the size and type of the AMR decoder-specific Box.
The type must be damr.
16
C.S0050-0 v1.0
1
2
DecSpecificInfo: the structure where the AMR and AMR-WB stream specific
information resides.
4
Field
vendor
decoder_version
mode_set
mode_change_period
frames_per_sample
5
6
Type
Unsigned int(32)
Unsigned int(8)
Unsigned int(16)
Unsigned int(8)
unsigned int(8)
Details
Value
7
8
9
10
11
12
vendor: four character code of the manufacturer of the codec, e.g. 'VXYZ'. The vendor
field gives information about the vendor whose codec is used to create the encoded
data. It is an informative field which may be used by the decoding end. If a
manufacturer already has a four character code it should be used in this field.
Otherwise, a vendor may create a four character code which best expresses the
vendors name. This field may be ignored.
13
14
15
16
17
decoder_version: version of the vendors decoder which can decode the encoded
stream in the best (i.e. optimal) way. This field is closely associated with the vendor
field. It may be used advantageously by vendors, which have optimal encoder-decoder
version pairs. The value shall be set to 0 if the decoder version has no importance for
the vendor. This field may be ignored.
18
19
20
21
22
mode_set: the active codec modes. Each bit of the mode_set parameter corresponds to
one mode. The bit index of the mode is calculated according to the 4 bit FT field of the
AMR or AMR-WB frame structure. The mode_set bit structure is as follows:
(B15xxxxxxB8B7xxxxxxB0) where B0 (Least Significant Bit) corresponds to Mode 0, and
B8 corresponds to Mode 8.
23
24
25
The mapping of existing AMR modes to FT is given in table 1.a in [17]. A value of
0x81FF means all modes and comfort noise frames are possibly present in an AMR
stream.
26
27
28
The mapping of existing AMR-WB modes to FT is given in Table 1.a in [18]. A value of
0x83FF means all modes and comfort noise frames are possibly present in an AMR-WB
stream.
29
30
31
32
33
34
35
36
frames_per_sample = k x (mode_change_period)
37
38
mode_change_period = k x (frames_per_sample)
39
17
C.S0050-0 v1.0
1
2
If mode_change_period is equal to frames_per_sample, then the mode is the same for all
frames inside one sample.
3
4
5
6
7
8
9
10
11
12
13
NOTE: The "hinter", for the creation of the hint tracks, can use the information given by the
AMRDecSpecStruc members.
14
8.4.3 EVRC
15
16
17
18
19
EVRC speech data shall be stored inside of a media track in such a way that is
described in Section 11 of [13]. The magic number shall not be included. The codec
data frames are stored in a consecutive order with a single TOC entry field as a prefix
per each of data frame, where the TOC field is extended to one octet by setting the four
most significant bits of the octet to zero, as illustrated in the following figure.
20
21
22
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-+-+-+-+-+-+-+-+
23
|0|0|0|0|FR Type|
24
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-+-+-+-+-+-+-+-+
25
26
27
For EVRC, the Box type of the EVRCSampleEntry Box shall be 'sevc'.
28
29
30
31
32
33
34
35
36
37
38
18
C.S0050-0 v1.0
1
EVRCSpecificBox
Field
BoxHeader.Size
BoxHeader.Type
Reserved_6
Data-reference-index
Type
Unsigned int(32)
Unsigned int(32)
Unsigned int(8) [6]
Unsigned int(16)
Reserved_8
Reserved_2
Reserved_2
Reserved_4
TimeScale
Reserved_2
EVRCSpecificBox
Details
Value
'sevc
0
3
4
5
6
If one compares the MP4AudioSampleEntry Box to the EVRCSampleEntry Box the main
difference is in the replacement of the ESDBox, which is specific to MPEG-4 systems,
with a box suitable for EVRC speech. The EVRCSpecificBox field structure is described
in section 8.4.3.2.
8
9
10
11
The EVRCSpecificBox fields for EVRC shall be as defined in Table 8-11. The
EVRCSpecificBox for the EVRCSampleEntry Box shall always be included if the 3GPP2
file contains EVRC media.
Field
BoxHeader.Size
BoxHeader.Type
DecSpecificInfo
12
Type
Unsigned int(32)
Unsigned int(32)
EVRCDecSpecStruc
Details
Value
devc
13
14
BoxHeader Size and Type: indicates the size and type of the EVRC decoder-specific
Box. The type shall be devc.
15
16
DecSpecificInfo: the structure where the EVRC stream specific information resides.
The EVRCDecSpecStruc is defined as follows:
17
Field
vendor
decoder_version
frames_per_sample
18
Type
Unsigned int(32)
Unsigned int(8)
Unsigned int(8)
Details
Value
19
20
21
22
23
vendor: four character code of the manufacturer of the codec, e.g. 'VXYZ'. The vendor
field gives information about the vendor whose codec is used to create the encoded
data. It is an informative field which may be used by the decoding end. If a
manufacturer already has a four character code it should be used in this field.
19
C.S0050-0 v1.0
1
2
Otherwise, a vendor may create a four character code which best expresses the
vendors name. This field may be ignored.
3
4
5
6
7
decoder_version: version of the vendors decoder which can decode the encoded
stream in the best (i.e. optimal) way. This field is closely associated with the vendor
field. It may be used advantageously by vendors, which have optimal encoder-decoder
version pairs. The value shall be set to 0 if the decoder version has no importance for
the vendor. This field may be ignored.
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
13K speech data shall be stored inside of a media track in the same way for codec data
frame format as described in Section 3.2 of [12]. Each codec data frame is zero-padded
to become of multiple of octets and the frames are stored in a consecutive order, as
illustrated in the following figure. Here z is the stuffing bit used to keep byte
alignment; its value is 0.
25
26
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-+-+-+-+-+-+-+-+
27
28
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-+-+-+-+-+-+-+-+
Rate
|z|z|
29
30
31
32
For 13K, the box type of the QCELPSampleEntry Box shall be 'sqcp'.
33
34
35
36
37
38
39
40
41
42
43
QCELPSampleEntry::= BoxHeader
Reserved_6
Data-reference-index
Reserved_8
Reserved_2
Reserved_2
Reserved_4
TimeScale
Reserved_2
20
C.S0050-0 v1.0
1
2
QCELPSpecificBox
Field
BoxHeader.Size
BoxHeader.Type
Reserved_6
Data-reference-index
Type
Unsigned int(32)
Unsigned int(32)
Unsigned int(8) [6]
Unsigned int(16)
Reserved_8
Reserved_2
Reserved_2
Reserved_4
TimeScale
Reserved_2
QCELPSpecificBox
Details
Value
'sqcp
0
4
5
6
7
9
10
11
12
The QCELPSpecificBox fields for 13K speech shall be as defined in Table 8-14. The
QCELPSpecificBox for the QCELPSampleEntry Box shall always be included if the
3GPP2 file contains 13K speech media.
Field
BoxHeader.Size
BoxHeader.Type
DecSpecificInfo
13
Type
Unsigned int(32)
Unsigned int(32)
QCELPDecSpecStruc
Details
Value
dqcp
14
15
BoxHeader Size and Type: indicate the size and type of the 13k decoder-specific Box.
The type must be dqcp.
16
17
DecSpecificInfo: the structure where the 13K speech stream specific information
resides. The QCELPDecSpecStruc is defined as follows:
18
Field
vendor
decoder_version
frames_per_sample
19
20
Type
Unsigned int(32)
Unsigned int(8)
Unsigned int(8)
Details
Value
21
C.S0050-0 v1.0
1
2
3
4
5
6
7
8
vendor: four character code of the manufacturer of the codec, e.g. 'VXYZ'. The vendor
field gives information about the vendor whose codec is used to create the encoded
data. It is an informative field, which may be used by the decoding end. If a
manufacturer already has a four character code, it is recommended that it uses the
same code should be used in this field. Otherwise, a vendor may create a four character
code which best expresses the vendors name. Else, it is recommended that the
manufacturer creates a four character code which best addresses the manufacturers
name. This field may be safely ignored.
9
10
11
12
13
decoder_version: version of the vendors decoder which can decode the encoded
stream in the best (i.e. optimal) way. This field is closely associated with the vendor
field. It may be used advantageously by the vendors, which have optimal encoderdecoder version pairs. The value shall be set to 0 if the decoder version has no
importance for the vendor. This field may be safely ignored.
14
15
16
17
18
19
20
21
22
23
24
25
For storage of 13K speech media, MP4AudioSampleEntry also can be used. 13K speech
data shall be stored inside of a media track in the same way as described in [22]
26
27
28
29
When storing 13K speech bitstreams in a 3GPP2 file, the handler-type field within the
HandlerAtom shall be set to soun to indicate media of type AudioStream, and the
SampleEntry Box type shall be 'mp4a' and the same Box described in Section 8.4.1.1 is
used.
30
31
32
33
For inclusion of 13K speech media in MP4AudioSampleEntry, the stream type specific
information is in the ESDBox structure. The 13K speech codec is to be signaled by new
value from the User Public area of objectTypeIndication within the
DecoderConfigDescriptor structure.
34
objectTypeIndication = 0xE1
35
36
37
38
39
40
41
The above ABNF rule indicates that QCELPDecoderSpecificInfo is the same as the
header for 13K vocoder in .qcp file format as described in [22], but without RIFF, riffsize, or anything after fmt. In addition, if the size of packets is completely constant i.e.
fixed rate encoding, the following rules apply to the definition of fmt (see variable-rate
referenced by codec-info).
42
num-rates = 0
43
rate-map = 0
44
major
=1
22
C.S0050-0 v1.0
1
2
3
4
5
QCELPSampleEntry
MP4AudioSampleEntry
Vendor
decoder_version
framesPerSample (fPS)
N.A.
AvgBitsPerSec (aBPS)
Calculated based on duration field in Track Header Box and data
size in Sample Size Box.
N.A.
bytesPerBlock (bPB)
Calculated according to the equation: aBPS = bPB * 8(bits/Byte) /
(0.02 * fPS)
N.A.
samplePerBlock (sPB)
sPB = sPS * 0.02 * fPS
N.A.
6
7
8
9
10
11
12
13
8.4.5 SMV
SMV speech data shall be stored inside of a media track in such a way that is described
in Section 11 of [13]. The magic number is not included. The codec data frames are
stored in a consecutive order with a single TOC entry field as a prefix per each of data
frame, where the TOC field is extended to one octet by setting the four most significant
bits of the octet to zero, as illustrated in the following figure.
14
15
16
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-+-+-+-+-+-+-+-+
17
|0|0|0|0|FR Type|
18
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-+-+-+-+-+-+-+-+
19
20
23
C.S0050-0 v1.0
1
For SMV speech, the box type of the SMVSampleEntry Box shall be 'ssmv'.
4
5
6
7
8
9
10
11
12
13
14
15
Type
Unsigned int(32)
Unsigned int(32)
Unsigned int(8) [6]
Unsigned int(16)
Reserved_8
Reserved_2
Reserved_2
Reserved_4
TimeScale
Reserved_2
SMVSpecificBox
16
Details
Value
'ssmv
0
17
18
19
20
If one compares the AudioSampleEntry Box for the SMVSampleEntry Box the main
difference is in the replacement of the ESDBox, which is specific to MPEG-4 systems,
with a box suitable for SMV. The SMVSpecificBox field structure is described in
section 8.4.5.2.
21
22
23
24
The SMVSpecificBox fields for SMV shall be as defined in Table 8-18. The
SMVSpecificBox for the SMVSampleEntry Box shall always be included if the MP4 file
contains SMV media.
Field
BoxHeader.Size
BoxHeader.Type
DecSpecificInfo
Type
Unsigned int(32)
Unsigned int(32)
SMVDecSpecStruc
Details
Value
dsmv
25
26
27
BoxHeader Size and Type: indicate the size and type of the SMV decoder-specific Box.
The type must be dsmv.
24
C.S0050-0 v1.0
1
DecSpecificInfo: the structure where the SMV stream specific information resides.
3
Field
Vendor
decoder_version
Frames_per_sample
Type
Unsigned int(32)
Unsigned int(8)
Unsigned int(8)
Details
Value
6
7
8
9
10
11
12
13
vendor: four character code of the manufacturer of the codec, e.g. 'VXYZ'. The vendor
field gives information about the vendor whose codec is used to create the encoded
data. It is an informative field, which may be used by the decoding end. If a
manufacturer already has a four character code, it is recommended that it uses the
same code should be used in this field. Otherwise, a vendor may create a four character
code which best expresses the vendors name. Else, it is recommended that the
manufacturer creates a four character code which best addresses the manufacturers
name. This field may be safely ignored.
14
15
16
17
18
decoder_version: version of the vendors decoder which can decode the encoded
stream in the best (i.e. optimal) way. This field is closely associated with the vendor
field. It may be used advantageously by the vendors, which have optimal encoderdecoder version pairs. The value shall be set to 0 if the decoder version has no
importance for the vendor. This field may be safely ignored.
19
20
21
22
23
24
25
26
27
28
29
30
31
For timed text in downloaded files, 3GPP Timed Text shall be used [6]. This is for
download use, not for streaming.
32
33
34
35
Note that operators may deploy terminals specified with rules and restrictions in
addition to those in 3GPP timed text. Behavior that is optional for 3GPP Timed Text
may be mandatory for particular deployments. In particular, the required character set
is almost certainly dependent on the geography of the deployment.
36
37
38
39
40
Automatic wrapping of text from line to line is complex, and can require hyphenation
rules and other complex, language-specific criteria. For these reasons, text wrap is
25
C.S0050-0 v1.0
1
2
3
4
5
6
7
8
9
There may be multiple lines of text in a sample (hard wrap). Terminals shall start a
new line for the Unicode characters line separator (\u2028), paragraph separator
(\u2029) and line feed (\u000A). Terminals should follow Unicode Technical Report 13
[23]. Terminals should treat carriage return (\u000D), next line (\u0085) and CR+LF
(\u000D\u000A) as new line.
10
8.5.2 TextWrapBox
11
'twrp' - Specifies text wrap: the Box contains one 8-bit integer as a wrap mode flag.
12
13
14
15
16
17
18
Description
No wrap
Automatic wrap
Reserved
19
20
21
22
23
24
3GPP2 SMIL is a markup language based on SMIL Basic [19] and SMIL Scalability
Framework.
25
26
27
28
3GPP2 SMIL consists of the modules required by SMIL Basic Profile (and SMIL 2.0 Host
Language Conformance) and additional MediaAccessibility, MediaDescription,
MediaClipping, MetaInformation, PrefetchControl, EventTiming and BasicTransitions
modules. All of the following modules are included:
29
30
31
32
33
34
35
36
26
C.S0050-0 v1.0
1
2
<smil xmlns="http://www.w3.org/2001/SMIL20/Language">
9
10
11
12
EXAMPLE1:
<smil xmlns="http://www.w3.org/2001/SMIL20/Language"
xmlns:EventTiming="http://www.w3.org/2001/SMIL20/EventTiming "
systemRequired="EventTiming">
13
14
15
16
17
18
19
20
EXAMPLE2:
<smil xmlns="http://www.w3.org/2001/SMIL20/Language"
xmlns:ffms10="http://www.3gpp2.org/SMIL20/FFMS10/"
systemRequired="ffms10">
21
22
23
24
25
The content authors should generally not include the FFMS requirement in the
document unless the SMIL document relies on FFMS specific semantics that are not
part of the W3C SMIL. The reason for this is that SMIL players that are not conforming
3GPP2 FFMS10 user agents may not recognize the FFMS10 URI and thus refuse to play
the document.
26
27
A conforming 3GPP2 SMIL user agent shall be a conforming SMIL Basic User Agent.
28
29
A conforming user agent shall implement the semantics of 3GPP2 SMIL as described in
Sections 9.1.3 and 9.1.4.
30
31
32
33
34
35
36
37
38
3GPP2 SMIL is based on SMIL 2.0 Basic language profile [19]. This section defines the
content model and integration semantics of the included modules where they differ
from those defined by SMIL Basic.
27
C.S0050-0 v1.0
1
2
3
4
5
6
7
8
9
10
11
12
13
The PrefetchControl module adds the prefetch1 element to the content model of SMIL
Basic body, switch, par and seq elements. The prefetch element has the attributes
defined by the PrefetchControl module (mediaSize, mediaTime and bandwidth), the
src attribute, the BasicContentControl attributes and the skip-content attribute.
14
15
16
3GPP2 SMIL shall use the BasicLayout module of SMIL 2.0 for spatial layout. The
module is part of SMIL Basic.
17
18
Default values of the width and height attributes for root-layout shall be the
dimensions of the device display area.
19
20
21
22
3GPP2 SMIL shall use the SMIL 2.0 BasicLinking module for providing hyperlinks
between documents and document fragments. The BasicLinking module is from SMIL
Basic.
23
24
25
26
27
28
29
30
31
32
3GPP2 SMIL includes the media elements from the SMIL 2.0 BasicMedia module and
attributes from the MediaAccessibility, MediaDescription and MediaClipping modules.
MediaAccessibility, MediaDescription and MediaClipping modules are additions in this
profile to the SMIL Basic.
33
34
35
MediaClipping module adds to the profile the ability to address sub-clips of continuous
media. MediaClipping module adds 'clipBegin' and 'clipEnd (and for compatibility
'clip-begin' and 'clip-end') attributes to all media elements.
36
37
38
39
Bold indicates elements that are not part of SMIL 2.0 Basic
28
C.S0050-0 v1.0
1
2
3
4
The MetaInformation module of SMIL 2.0 is included in the profile. This module is an
addition, in this profile, to the SMIL Basic and provides a way to include descriptive
information about the document content into the document.
5
6
This module adds meta and metadata elements to the content model of SMIL Basic
head element.
8
9
The Structure module defines the top-level structure of the document. It is included in
SMIL Basic.
10
11
12
13
The timing modules included in the 3GPP2 SMIL are BasicInlineTiming, MinMaxTiming,
BasicTimeContainers, RepeatTiming and EventTiming. The EventTiming module is an
addition in this profile to the SMIL Basic.
14
15
For 'begin' and 'end' attributes either single offset-value or single event-value shall be
allowed. Offsets shall not be supported with event-values.
16
17
Event timing attributes that reference invalid IDs (for example elements that have been
removed by the content control) shall be treated as being indefinite.
18
19
20
Supported event names and semantics shall be as defined by the SMIL 2.0 Language
Profile. All user agents shall be able to raise the following event types:
- activateEvent;
21
beginEvent;
22
endEvent.
23
24
focusInEvent;
25
focusOutEvent;
26
inBoundsEvent;
27
outBoundsEvent;
28
repeatEvent.
29
User agents shall ignore unknown event types and not treat them as errors.
30
31
Events do not bubble and shall be delivered to the associated media or timed elements
only.
32
33
34
3GPP2 SMIL profile includes the SMIL 2.0 BasicTransitions module to provide a
framework for describing transitions between media elements.
35
36
37
38
NOTE: The SMIL specification [19] defines that all functionality of BasicTransitions
module is optional: "Transitions are hints to the presentation. Implementations must be
able to ignore transitions if they so desire and still play the media of the presentation".
This means that even although the BasicTransitions module is mandatory user agents
29
C.S0050-0 v1.0
1
2
3
4
5
6
7
8
User agents that implement the semantics of this module should implement at least the
following transition effects described in SMIL 2.0 specification [19]:
barWipe;
10
irisWipe;
11
clockWipe;
12
snakeWipe;
13
pushWipe;
14
slideWipe;
15
fade.
16
A user agent should implement the default subtype of these transition effects.
17
18
19
20
A user agent that implements the semantics of this module shall at least support
transition effects for non-animated image media elements. For purposes of the
Transition Effects modules, two media elements are considered overlapping when they
occupy the same region.
21
22
23
24
BasicTransitions module adds attributes 'transIn' and 'transOut' to the media elements
of the Media Objects modules, and value "transition" to the set of legal values for the
'fill' attribute of the media elements. It also adds transition element to the content
model of the head element.
25
26
27
28
Table 9-1shows the full content model and attributes of the 3GPP2 SMIL profile. The
attribute collections used are defined by SMIL Basic [19], SMIL Host Language
Conformance requirements, chapter 2.4. Changes to SMIL Basic are shown in bold.
30
C.S0050-0 v1.0
1
Element
layout
Elements
head, body
layout, switch,
meta, metadata,
transition
TIMING-ELMS,
MEDIA-ELMS,
switch, a, prefetch
root-layout, region
root-layout
EMPTY
region
EMPTY
area
MEDIA-ELMS
area
EMPTY
smil
head
body
par, seq
switch
prefetch
meta
metadata
transition
Attributes
COMMON-ATTRS, CONTCTRL-ATTRS, xmlns
COMMON-ATTRS
COMMON-ATTRS
COMMON-ATTRS, CONTCTRL-ATTRS, type
COMMON-ATTRS, backgroundColor, height, width, skipcontent
COMMON-ATTRS, backgroundColor, bottom, fit, height,
left, right, showBackground, top, width, z-index, skipcontent, regionName
COMMON-ATTRS, CONTCTRL-ATTRS, TIMING-ATTRS,
repeat, region, MEDIA-ATTRS, clipBegin(clip-begin),
clipEnd(clip-end), alt, longDesc, readIndex, abstract,
author, copyright, transIn, transOut
COMMON-ATTRS, LINKING-ATTRS
COMMON-ATTRS, LINKING-ATTRS, TIMING-ATTRS,
repeat, shape, coords, nohref
TIMING-ELMS,
COMMON-ATTRS, CONTCTRL-ATTRS, TIMING-ATTRS,
MEDIA-ELMS,
repeat
switch, a, prefetch
TIMING-ELMS,
MEDIA-ELMS,
COMMON-ATTRS, CONTCTRL-ATTRS
layout, a, prefetch
COMMON-ATTRS, CONTCTRL-ATTRS, mediaSize,
EMPTY
mediaTime, bandwidth, src, skip-content
EMPTY
COMMON-ATTRS, content, name, skip-content
EMPTY
COMMON-ATTRS, skip-content
COMMON-ATTRS, CONTCTRL-ATTRS, dur, type,
EMPTY
subtype, startProgress, endProgress, direction,
fadeColor. skip-content
4
5
6
7
This section describes the qcp file format for reading and writing 13K vocoder packets.
RFC2658 [12] specifies RTP streaming for 13K vocoder but does not include a file
format. The qcp file format is described in [22]. The MIME type for .qcp files with 13K
vocoder is audio/qcelp.
9
10
11
12
13
14
15
This section presents a file format for creating multimedia files to combine 13K vocoder
speech, WAVE/RIFF sound [29], IMA ADPCM sound [27], MIDI [24], text, JPEG [25]
pictures, PNG pictures [26], BMP pictures [28] and animation data. This is similar to
standard MIDI and provides functionality similar to that provided by the SMIL
presentation layer. The format described here results in smaller files compared to SMIL,
while maintaining the functionality to create synchronized multimedia presentations.
Some examples of presentations that can be created using this format are:
31
C.S0050-0 v1.0
1
2
3
4
5
6
7
8
In addition, the above format allows for addition of new and custom sub-chunks with
full backward compatibility. Media elements can be recycled, resulting in more compact
files.
10
11
12
This section describes .cmf, a binary format container for multimedia elements with
embedded synchronization information. The file format is presented in ABNF format as
specified in [21]
13
11.1.1
14
This section defines the definitions used in describing the cmf file format.
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
UINT32
= 4OCTET
; this field contains a 32-bit integer stored
; as a sequence of four octets, in
; big-endian order (most significant
; octet first)
UINT16
= 2OCTET
; this field contains a 16-bit integer stored
; as a sequence of two octets, in
; big-endian order (most significant
; octet first)
OCTET
= %x00-FF
; an octet, also called a byte - any possible
; combination of eight bits, forming a single
; integer or part of a larger integer having
; more than eight bits
length2
length4
= 2OCTET
= 4OCTET
; The above two define the length of a track,
; field, chunk, etc.
39
40
41
A cmf file is composed of file ID, file length, header information and one or more content
tracks, as shown below.
42
43
44
45
The following ABNF describes the cmf header. The header sets up some global
parameters for interpreting the cmf-tracks.
32
C.S0050-0 v1.0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
= OCTET
; Currently, up to 4 tracks are supported.
sub-chunk
sub-chunk
optional-chunk
vers-chunk
code-chunk
titl-chunk
date-chunk
sorc-chunk
copy-chunk
=
=
=
=
=
=
code-chunk
/ titl-chunk
/ date-chunk
/ sorc-chunk
/ copy-chunk
/ exsn-chunk
/ exsa-chunk
/ exsb-chunk
/ exsc-chunk
/ cuep-chunk
/ pcpi-chunk
/ cnts-chunk
/ prot-chunk
/ poly-chunk
/ wave-chunk
33
C.S0050-0 v1.0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
exsn-chunk
exsa-chunk
exsb-chunk
exsc-chunk
cuep-chunk
number
pcpi-chunk
cnts-chunk
prot-chunk
poly-chunk
wave-chunk
=
=
=
code-value
=
/
/
/
/
/
/
/
/
/
/
/
/
/
/
%b00000000 ;
ANSI CHARSET
%b00000001 ;
ISO8859-1
%b00000010 ;
ISO8859-2
%b00000011 ;
ISO8859-3
%b00000100 ;
ISO8859-4
%b00000101 ;
ISO8859-5
%b00000110 ;
ISO8859-6
%b00000111 ;
ISO8859-7
%b00001000 ;
ISO8859-8
%b00001001 ;
ISO8859-9
%b00001010 ;
ISO8859-10
%b10000001 ;
HANGUL CHARSET
%b10000010 ;
Chinese Simplified
%b10000011 ;
Chinese Traditional
%b10000100 ;
Hindi
;
Others Reserved
34
C.S0050-0 v1.0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
source-info
data
media-type
= %x00-ff
= ["SONG"
/"WAV"
/"TEXT"
/"PICT"
/"ANIM"
/"LED"
/"VIB"]
17
18
19
This section defines the cmf-tracks field. A cmf file can contain one or more tracks, as
indicated by the nTracks field.
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
;
;
;
;
;
;
;
Contains
Contains
Contains
Contains
Contains
Contains
Contains
MIDI
Wave sounds
text data
still image data
animation data
LED data
VIB data
= delta-time event-message
= OCTET
; Delta time is described the elapsed time from a
previous
; event. The unit of time is determined from Tempo and
; TimeBase. Default Tempo Value is 125.
; Default TimeBase Value is 48.
event-message = note-message
/ ext-A-message
/ ext-B-message
/ ext-info-message
note-message
note-status
gate-time
vel-oct-shift
= note-status gate-time
/ note-status gate-time vel-oct-shift
; If note-msg-config is 1,
; we have velocity-octaveShift info.
= OCTET
; vvkkkkkk
; vv = Assigned channel index, defined
; with respect to the channel reference index.
; kkkkkk = key number, from 0 to 62.
; Key number 63 is prohibited.
; Key number 15 is middle C of keyboard
= OCTET
; Continuation time from note-on to note-off.
; If a gate-time value of more than 255 is required
; multiple note-messages are used. See the examples.
= OCTET
; Velocity (6 bits) and OctaveShift (2 bits)
; eeeeeess
; eeeeee = Velocity. 0 to 63.
; ss
= Octave Shift
; 01 = Increase one octave
35
C.S0050-0 v1.0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
; 00 = ineffective
; 11 = decrease one octave
; 10 = decrease two octaves
; Extension Information message format
ext-A-message
= %xFF A-command-data
ext-B-message
= %xFF B-command-data
ext-info-message = %xFF %x11110001 length2
/ %xFF %x11110010 length2
/ %xFF %x11110011 length2
/ %xFF %x11110100 length2
A-command-data
wav-data
text-data
picture-data
animation-data
= UINT16
; Assigned channel fine pitch bend command
; A-command-data = 0vvnnnnn nnnnnnnn
; vv = Assigned channel index (0..3)
; nnnnnnnnnnnnn = Fine Pitch Bend [cents]
; (range: 0x0000 to 0x1fff) (see table)
; Fine pitchbend message sets the change value of the
; pitch specified in the note message.
B-command-data = master-volume
/ master-balance
/ master-tune
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
part-configuration
pause
stop
reset
timebase-tempo
cuepoint
jump
NOP
end-of-track
program-change
bank-change
volume
panpot
pitchbend
channel-assign
pitchbend-range
wave-channel-volume
wave-channel-panpot
text-control
picture-control
vib-control
LED-control
;
;
;
;
master-balance = %b10110001 %x00-7F ;
;
master-tune = %b10110011 %x34-4C
;
;
;
;
36
C.S0050-0 v1.0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
;
;
;
;
;
;
volume-attr = ccdddddd
cc = channel index
dddddd = volume change value
(range: 0x00 to 0x3f)
panpot
= %b11100011 panpot-attr
panpot-attr = OCTET
; panpot-attr = ccdddddd
; cc = channel index
; dddddd = panpot change value
; (range: 0x00 to 0x3f)
; 0x00 = Far Left
; 0x20 = Center
; 0x3F = Far Right
pitchbend
= %b11100100 pitchbend-attr
pitchbend-attr
= OCTET
; pitchbend-attr = ccdddddd
; cc = channel index
; dddddd = pitchbend change value
; (range: 0x00 to 0x3f) (see table)
37
C.S0050-0 v1.0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
; led-data = %b0cdddddd
; c = enable/disable LED
; dddddd = color pattern
; vib-data = %b0cdddddd
; c = enable/disable vib
; dddddd = vibrator pattern
38
C.S0050-0 v1.0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
;
;
;
;
= text-atrb *data
= OCTET ; Set/Append and XY Alignment
; 0cxxxyyy
; c = Set(0)/Append(1) a string
; xxx=x-align (000=Left, 001=Center, 010=Right)
; yyy=y-align (000=Bottom, 001=Center, 010=Top)
picture-data
*data
pict-atrb1
pict-atrb2
= OCTET
pict-atrb3
= OCTET
pict-x-off
= OCTET
pict-y-off
= OCTET
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
= OCTET;
;
;
;
;
;
;
;
;
;
ccaaaaaa
cc = reserved
aaaaaa = Picture packet ID (0-63)
Picture packet mode (2) and format (6)
rrffffff
rr = Picture packet mode
00 = Store Mode
01 = Set Mode
10 = Recycle Mode
ffffff = Picture format
000001 = BMP format
000010 = JPEG format
000011 = PNG format
Draw Mode: Normal
00000000
If subchunk for Picture packet = 0
00000000 = X-offset 0%
00000001 = X-offset 1%
39
C.S0050-0 v1.0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
animation-data
= anim-atrb0 anim-atrb1 anim-atrb2 anim-x-off anim-yoff *data
anim-atrb0
= 4OCTET / UINT32
anim-atrb1
= OCTET ; Animation packet mode (2) and packet ID (6)
; rrffffff
; rr = Animation packet mode
; 01 = reserved
; ffffff = Animation packet ID
; 000000 = reserved
anim-atrb2
= OCTET
anim-x-off
anim-y-off
;
;
;
;
;
;
;
;
;
;
= OCTET ;
;
;
;
;
;
;
;
;
;
cccaaaaa
ccc = Animation format
000 = Reserved animation format
001
= OCTET ;
;
;
;
;
;
;
;
;
;
45
11.5 Tables
46
47
11.5.1
48
TimeBase is expressed by the lower 4-bits of the status byte. The default value is 48.
TimeBase Values
49
----0000
TimeBase = 6
----0001
TimeBase = 12
40
C.S0050-0 v1.0
----0010
TimeBase = 24
----0011
TimeBase = 48
----0100
TimeBase = 96
----0101
TimeBase = 192
----0110
TimeBase = 384
----0111
Reserved
----1000
TimeBase = 15
----1001
TimeBase = 30
----1010
TimeBase = 60
----1011
TimeBase = 120
----1100
TimeBase = 240
----1101
TimeBase = 480
----1110
TimeBase = 960
----1111
Reserved
11.5.2
3
4
5
The following table contains a description of the contents of Pitch Bend Range.
RangeValue is defined in the section of Assigned Channel PitchBend Range. That
initial value is 2.
6
000000
011110
011111
100000
0 [cent]
100001
100010
111111
11.5.3
The following table contains a description of the fine pitch bend value.
10
0000000000000
41
C.S0050-0 v1.0
0111111111110
0111111111111
1000000000000
0 [cent]
1000000000001
1000000000010
1111111111111
3
4
5
6
7
It is required that compliant players check the acceptable profiles through the cnts
subchunk. Only the following profiles are specified. Other profiles are invalid
configurations of the CMF file format. Additional profiles may be added in future
versions.
11.6.1
cnts = WAV;PICT
10
11
This profile is primarily used for messaging applications. PICT can either be JPEG or
PNG for graphics/image data.
12
11.6.2
13
cnts = SONG;WAV
14
15
16
This profile is primarily used for ringers and other audio only applications such as the
audio portion of a game application. SONG is used for MIDI and WAV is used to
provide QCP or ADPCM sound effects.
17
11.6.3
18
cnts = SONG;WAV;PICT
19
20
This profile is an enhancement on 11.6.2 that adds graphics capability for picture or
animated ringers. PICT can be either PNG or JPEG format.
21
22
23
24
25
11.7.1
26
27
There are 3 required subchunks: note, vers, and cnts. All encoders are required to
include these subchunks and all decoders are required to verify the existence of these
Audio-only profile
Picture ringers
Subchunk Requirements
42
C.S0050-0 v1.0
1
11.7.2
3
4
5
Implementations of all MIDI related functions are interpreted the same as General MIDI
requirements set up by the MMA. Where parameters are provided with different
resolution, those parameters should be mapped to equivalent dynamic range.
11.7.3
MIDI Requirements
7
8
9
10
Encoders are recommended to break wave packets into reasonable subchunks and time
stamped with the use of the prev-flag to ensure continuous decoding of audio packets.
A typical implementation breaks audio packets into 0.5 second chunks with
appropriate time stamps.
11
12
13
14
15
16
17
18
19
11.7.4
20
21
11.7.4.1
Cuepoints
22
23
24
25
26
27
Cuepoints are used to provide an alternative play mode for CMF files. When in cuepoint play mode, the decoder should jump to the cue start point when starting
playback. All rules for setup that are observed for normal playback at the beginning of
the file should be observed. For example, an encoder is required to insert all
configuration events in between cuepoint boundaries even if those events are
redundant with configuration events outside cue-point boundaries.
28
11.7.4.2
29
30
31
32
33
34
35
Jump points are used to reuse portions of the playback using loops to reduce file size.
The decoder is required to parse a Jump Destination point and save a pointer to the
file. Up to 4 JUMP IDs can be saved for later reference. When a jump command is
received for a given destination ID, the decoder should continue playback from the
destination point as normal with no further memory. The loop number specifies the
number of times the jump should be taken. After the final jump, decoding should
continue as normal ignoring the final jump command.
Jump Points
36
37
11.7.5
Recycle Requirements
38
39
40
Recycling is supported in Picture, Wave, and Animation packets. The use of recycle is
recommended to optimize file sizes for data transmission. Each packet group allows for
up to 64 individual IDs to be selected be used for recycle.
43
C.S0050-0 v1.0
1
11.7.5.1
2
3
The store operation specifies that the decoder should not display the data, but instead
cache the data for displaying at some future time.
11.7.5.2
5
6
The set operation specifies that the decoder should both cache the data and
displaying it.
11.7.5.3
Store Command
Set Command
Recycle Command
8
9
10
11
12
The recycle operation specifies that the decoder should redisplay picture image data
previously cached by a store or set operation that used the same packet ID value
specified in the Attributes 1 field.
13
14
15
16
The media files created as per the above format specification shall use the extension of
.cmf, short for Compact Multimedia Format. The MIME type application/cmf is
expected to be registered and used.
17
18
19
20
21
22
23
24
ISO defines ISO Base Media File Format as a basis of developing a media container for
various usages. It describes a basic architecture of the multimedia file, and mandatory
/optional elements in it. There are some extensions over ISO Base Media File Format,
one of which is an MP4 file format to support MPEG-4 visual/audio codecs and various
MPEG-4 Systems features such as object descriptors and scene descriptions.
25
26
27
28
29
MP4 extension
30
31
32
33
34
35
36
44
C.S0050-0 v1.0
1
2
3
4
5
6
7
8
9
3GPP2 employs full aspects of ISO Base Media File Format. It also adds new codecs and
extends a 3GPP timed text. On the other hand, it only uses the same portion of MP4
extension as 3GPP does. Figure A.3 illustrates it.
10
11
12
13
14
15
16
17
18
19
20
21
22
23
3GPP2 does not prohibit this feature while 3GPP does. It may be useful for
various applications, e.g., pseudo-streaming and live authoring of a movie file.
24
Movie fragment
b) New extensions
25
26
SMVSampleEntry
27
EVRCSampleEntry
28
29
These codecs are used in 3GPP2 and there are no definitions in 3GPP file
format, so their encapsulations are defined.
30
31
32
33
Word wrap and link functionality for phone and mail are enhanced from 3GPP Timed
Text.
34
d) File identifications
35
3GPP2 has its own file extension, MIME types, and file brand identifier.
45
C.S0050-0 v1.0
3
4
5
6
The media types contained in the .3g2 file are restricted to those identified
for use in the .3gp file format [5],[6], specifically AMR and AMR-WB
speech, MPEG4 and H.263 video, MPEG4 AAC audio, and timed text
without optional textwrap feature.
8
9
10
11
12
13
FFMS is a generic standard and includes all features. Since some features may be
useful but others may not in some services, this section shows a usage guideline in
various services.
14
15
16
17
18
19
A streaming server stores multimedia content in 3GPP2 file format, reads out media
data from the file, and transmits to a client in an RTP packet. Hint tracks can be useful
to the server when creating RTP packets.
ISO file
moov
other boxes
trak (video)
trak (audio)
mdat
Interleaved, time-ordered, video
and audio frames, and hint
instructions
trak (hint)
20
21
22
23
24
25
26
46
C.S0050-0 v1.0
Client
Server
HTTP/1.1 200 OK
Start of playback
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
HTTP specifies the file as a control protocol, and the file is transmitted in the response
to the request. Playback starts during file download. This avoids a lengthy wait for a file
to finish downloading (due to the size of the file and/or the transmission channel
bitrate). Additionally, some receivers may not have sufficient memory to store an entire
file. To ensure smooth playback, the client must receive all necessary header
information and the first part of media data of sufficient temporal length to
accommodate the maximum jitter in the system. Accordingly, requirements for the file
format are as follows.
1) moov position
2)
moov has information necessary for decoding the media data in the file and
therefore needs to be positioned at the beginning of the file.
media interleave
3)
If multiple media are included in the file, then they should be interleaved as a
chunk. The size of a chunk relates waiting time to playback. A typical size of a
chunk is a few seconds.
movie fragmentation
4)
The size of moov becomes large for lengthy movies. Therefore, long movies should
be fragmented with each fragment having a header (moov or moof).
I-frame beginning
5)
The first video frame in mdat should start with an I-frame so that the client can
start decoding from the first frame.
Timed Text
Timed Text can be supported for the pseudo-streaming service. Text samples are
also interleaved as well as audio and video samples.
29
30
31
ftyp
moov
moof
32
33
34
chunk
fragment
File Format for Multimedia Services for cdma2000
47
C.S0050-0 v1.0
1
2
B.2 MMS
4
5
6
3GPP2 files used for the purpose of MMS contain rather short duration clips that are
transferred from a client to a server and vice versa. Considering the MMS feature
should require less complexity, the following restrictions are useful.
7
8
Movie fragment is not useful for short duration video, therefore it should not be
used.
9
10
The maximum number of tracks should be one for video, one for audio and one
for text.
11
12
The maximum number of sample entries should be one per track for video and
audio (but unrestricted for text).
13
14
15
16
17
18
19
20
A 3GPP2 file is downloaded to a client and played back locally. Random positioning in
the downloaded clip is an attractive feature. However, addressing information included
in the default boxes such as stts and stsc only have a relative time difference, thus
some processing is required to get the absolute address within the file that corresponds
to the indicated relative time difference. mfra Box provides direct address information
in the file and is useful for random positioning during play back.
21
22
23
24
25
26
27
The .qcp file format as described in [22] supports storage for EVRC and SMV
vocoders. The MIME type for .qcp files with EVRC is audio/evrc-qcp and the MIME
type for .qcp files with SMV is audio/smv-qcp.
28
29
30
48