Beruflich Dokumente
Kultur Dokumente
0 Introduction
As there are diverse kinds of CODECs available in the communications industry today; each
differing in specification, application and capability, so goes the variety of choices; and as the
demand grows it is necessary to build or adapt the necessary tools to meet up to the requirements
of the ever growing and expanding market of multimedia communication especially over the
internet.
One of the most essential tools is Video Compression/Coding whose utility cannot be over-
exaggerated. As the demand on video processing increases, it is essential to evolve video coding
techniques to suit the resulting effect.
Video Compression is a process of converting digital video into a format that makes for less
storage or transmission capacity. It is an indispensable tool in diverse applications such as digital
television, DVD-Video, mobile TV, video conferencing and internet video streaming. Hence it has
always been necessary to specify a standard for video compression which enables inter-operability
between different vendors. There are several existing standards but the H.264 standard is the
latest industry standard for video compression and will be the main subject of this work.
The H.264 standard defines a syntax for compressed video and a method for decoding this syntax
into a displayable video sequence. Video compression comes at a cost - loss of quality. Most video
compression formats are “lossy” since the quality of a video sequence has to be compromised to
achieve the desired compression.
Video quality is used to evaluate the performance of an encoding algorithm and is a subjective
measure since video quality rating would vary from individual to individual; this poses a challenge
to assessing video quality reliably. Video quality can be assessed subjectively or measured
objectively.
In subjective video quality assessment, humans observe and rate the quality of the compressed
video sequence. These ratings are compiled and analysed statistically and used to draw
conclusions on the compressed video quality. This method tends to be expensive and time
consuming but is important for a realistic measure of perceived video quality.
Objective video quality measurement uses mathematical means to estimate the quality of the
compressed video sequence; it produces more steady results but may not always correspond to
actual subjective quality. They’re particularly useful in situations where fast and repeatable
measurements of the quality are needed or in comparing the quality between a test video and a
reference video. The Peak Signal to Noise Ratio (PSNR) is commonly used as an objective
measure of video quality. Since the human eye is more sensitive to light intensity (luminance) than
colour intensity (chrominance), this knowledge is exploited to minimise the amount of data required
to transmit an acceptable full-colour image. Thus the video quality in this work is rated in terms
luminance- denoted PSNR y in decibels (dB).
This work is based on the same principle as Objective video quality measurement but in this case,
a software simulation (the JM encoder) is used to carry out the video compression exercise.
The H.264/AVC reference software CODEC is used as the reference codec. It uses a configuration
file containing many parameters. These parameters are classified into basic and advanced
parameters.
Some basic parameters to be investigated include: Context Based Adaptive Binary Arithmetic
Coding (CABAC), Context Based Adaptive Variable Length Coding (CAVLC), Rate Distortion
Optimization (RDO), Fast/Low Complexity options, B-Pictures, Multiple reference frames. There
are other advanced parameters but those will not be part this exercise.
An investigation will be carried out into choosing optimum parameters for H.264 encoding. The aim
would be to find a combination of parameters that optimize the rate-distortion performance and
minimize encoding time. 3 different test video sequences will be used to evaluate the performance
of the Video CODEC.
A standard is like a universal language used by different parties to aid communication. Video
coding standards allow for a larger volume of information exchange and would benefit vendors and
customers alike. Thus video standards should be efficient for compression of video content.
ITU-T Video Coding Experts Group (VCEG) and ISO Motion Picture Experts Group
(MPEG) are two formal organizations that develop video coding standards. These standards are
designed for a variety of video applications.
ISO/IEC MPEG created the MPEG-x series. They manage video storage, video broadcasting and
video streaming on the internet and mobile networks. ITU-T standards also known as
Recommendations and H.26x series are designed for applications, such as video conferencing
and video telephony.
Fig. 1-1: Timeline of Video Development (Source: T. Wiegand, G.J. Sullivan, G. Bjøntegaard and
A. Luthra, IEEE Transaction on Circuits and Systems for Video Technology, Vol. 13, no. 7, Jul. 2003)
Figure 1-1 shows the development of various standards over time up to the latest video coding
standard known as H.264 previously called H26L.
H.264 is the latest video coding standard in operation today. It was jointly developed by the ITU-T
as recommendation H.264 and the ISO/IEC as international standard 14496-10 (MPEG-4 Part 10).
2
It is also recognized as an extension of MPEG-4 which is known as Advanced Video Coding. Thus
it is also known as H.264/AVC. It can deliver significantly improved compression efficiency (up to
50% more), offers better compression efficiency, more flexibility and resilience compared with
previous video coding standards and supports higher quality video at lower bitrates.
Apart from being a standard, H.264 is also a format for video compression- the practice of
converting a digital video into a form so that it occupies less space during storage or transmission.
It is also a toolkit for video compression. It has a set of tools used for encoding: prediction and
reconstruction, transform and quantization, entropy coding and mode selection. These tools are
some of the features H.264 possesses that make it exceptional. But with these exceptional
qualities comes a drawback- high computational costs and complexity. Hence efforts are being
made to see how this downside of H.264 can be managed efficiently.
H.264 has a broad spectrum of applications ranging from low bitrate mobile video applications to
Video-IP, HDTV and HD-DVD.
3
In this work, the H.264/AVC Reference CODEC referred to as the JM CODEC is software based
and is used as the reference video CODEC. The whole encoding process was simulated on a
personal computer. The model of the CODEC used is JM 10 (FRExt) encoder version 10.2.
The JM encoder reads input parameters from a configuration file. A wide range of encoding
parameters can be changed using the configuration file.
The JM CODEC also gives relevant encoding details like bitrate of the encoded bitstream, video
quality in PSNR of luminance and chrominance components of the coded video and encoding
time. These values can be viewed in an output log file from the encoder.
Test Platform
A Dell Studio 1555 laptop with the following specifications was used as the test platform for the
software video codec.
Figure 2-1 shows the basic sequence or procedure for carrying out the tests.
H.264/AVC Configuratio
H.264/AVC
Observe encoding time Reference
Reference n File
(view log file) CODEC
CODEC
Altered Reference CODEC
The aim of this experiment is to evaluate and optimize the performance of the Video encoder while
minimizing the encoding time. The performance indicators used for evaluation are the video quality
(PSNR), bitrate and encoding time.
4
Akiyo: This is the video of a lady reading news; it has 300 frames and is in qcif format; it contains
little motion as it is only the lady’s head and facial motion that are evident against a static
background.
Bus: This video is one captured of a bus in motion. It has 150 frames and is in cif format. It has
high motion detail with a dynamic background.
Suzie: This is the video of a lady making a phone call; the background is plain and static; only the
movement of the lady’s head and the phone she’s holding is noticed. It’s in qcif format and
contains 150 frames.
The QP for the I and P slices was changed from 33 to 13 in decrements of 5 and the PSNR,
bitrate and encoding time were observed and recorded at each setting. The same QP values were
used throughout the course of the experiment.
Each test video sequence was encoded to derive both the original reference CODEC and the
altered reference CODEC. The original reference CODEC is obtained using the default parameter
values in the configuration file and the altered reference CODEC is obtained by tweaking the
desired parameter value in the configuration file. The parameter values in the configuration file are
changed by creating a batch file and running the batch file in DOS command prompt. It is
important to note that the batch files were created and stored in the same folder as the test video
sequences. This is to ensure that the file path is found during execution. Hence each test
sequence had its own separate compilation of batch files.
The syntaxes for the commands used to run the batch files are of the format:
This command tells the encoder (encoder_basic_200.cfg in this case) to look in the configuration
file and set the parameter value, execute the commands then dump the output of the process in a
log file.
For each output of the modified reference CODEC, the PSNR, bitrate and encoding time are read
from the log files and recorded. The bitrate is plotted against the PSNR and the resulting curve
from each altered reference CODEC is compared with the reference CODEC for evaluation.
The Reference CODEC is obtained first then each altered Reference CODEC is plotted on the
same set of axes as the Reference CODEC. This procedure is carried out for each test sequence.
The results are displayed below:
5
The number of frames encoded for each test video sequence is 50 frames unless otherwise stated
as in the encoding of B-Pictures.
All tests were carried out using the baseline profile except when using CABAC and B-Pictures as
these are only supported on the main or extended profiles.
The test video sequences are encoded with the parameters in the configuration file at their default
values.
QPP/ Encoding
QPI PSNR( Bitrate(kb time(seco
Slices dB) ps) nds)
33 34.68 19.09 1.291
28 38.14 36.28 1.428
23 41.59 79.89 1.614
18 45.16 153.94 2.133
13 48.79 296.32 3.714
6
Table 3-1b: Bus – Reference CODEC
Encoding
QPP/QP PSNR(dB Bitrate(kbps time(seconds
I Slices ) ) )
33 30.74 637.99 12.326
28 34.54 1314.71 18.562
23 38.42 2519.92 34.066
18 42.42 4494.38 54.049
13 46.76 7577.73 72.557
QPP/Q Encoding
PI PSNR(d Bitrate(kb time(secon
Slices B) ps) ds)
33 33.86 38.26 1.671
28 36.73 79.83 2.105
23 39.97 194.98 2.683
18 43.59 439.46 4.538
13 47.42 889.42 8.98
Test 2: RD Optimisation
By default this parameter is disabled (set to 0) which is the low complexity mode. It is enabled by
setting “RDOptimisation” in the configuration file to 1(high complexity mode). It tells the encoder to
make a rate-distortion mode optimised decision which requires exploring all the possible modes
before selecting the best mode.
QPP/Q Encoding
PI PSNR(d Bitrate(kb time(secon
Slices B) ps) ds)
33 34.85 18.36 78.625
7
28 38.14 34.49 68.674
23 41.7 74.36 60.922
18 45.33 147.45 55.219
13 48.83 273.43 62.941
QPP/Q Encoding
PI PSNR(d Bitrate(kb time(secon
Slices B) ps) ds)
33 30.89 615.59 297.934
28 34.64 1283.35 284.398
23 38.54 2474.82 284.768
18 42.54 4437.61 352.425
13 46.9 7499.19 441.109
QPP/Q Encoding
PI PSNR(d Bitrate(kb time(secon
Slices B) ps) ds)
33 33.97 36.38 83.035
28 36.76 75.08 74.77
23 40.05 183.69 67.885
18 43.69 427.32 68.051
13 47.53 867.07 78.536
This is an entropy coding method and is supported on only the extended and main profiles unlike
CAVLC (Context-Based Adaptive Variable Length Coding) which runs on the baseline profile.
When the CABAC circuit processes a new slice, it first builds the context table before processing
the first macroblock of the current slice. It offers higher video quality than CAVLC. It is enabled by
setting the “SymbolMode” parameter in the configuration file to 1.
QPP/Q Encoding
PI PSNR(d Bitrate(kb time(secon
Slices B) ps) ds)
33 34.68 18.24 1.248
28 38.14 34.58 1.308
23 41.59 74.81 1.458
18 45.16 144.42 1.77
8
13 48.79 278.55 2.537
QPP/Q Encoding
PI PSNR(d Bitrate(kbp time(second
Slices B) s) s)
33 30.74 602.68 11.177
28 34.54 1255.9 15.952
23 38.42 2424.58 21.862
18 42.42 4331.44 36.106
13 46.76 7260.85 48.496
Encoding
QPP/QP PSNR(dB Bitrate(kbps time(seconds
I Slices ) ) )
33 33.86 36.09 1.9
28 36.73 75.35 1.807
23 39.97 179.81 2.333
18 43.59 400.66 3.471
13 47.42 819.31 5.675
Test 4: B-Pictures
B-Pictures use both previous (I and P frames) and forward frames (P-frames) for reference or
prediction that’s why they’re termed “bi-directional”. The number of frames to be encoded for this
test was changed from 50 to 25 and 50 to 13 for B-Pictures = 1 and B-Pictures = 3 respectively
while encoding. This was done to account for the B-Pictures introduced. In general if n B-Pictures
are to be used, then the number of frames to be encoded will be number of frames / (n+1). The
“Frameskip” parameter specifies the number of frames to be skipped during the encoding process
and is equal to the number of B-Pictures used. “Frameskip” is used to skip from an I to a P frame
or P to P frame before the B-Picture can be encoded. Here’s an illustration of how it works for
B-Pictures = 1;
0 1 2 3 4 5 6 7 8...
The QPB slice parameter is also set to correspond with the values of the QPP and QPI slices.
B-Pictures = 1 B-Pictures = 3
9
Encoding
QPP/QPI Encoding QPP/QPI PSNR(dB Bitrate(kbps time(seconds
Slices PSNR(dB) Bitrate(kbps) time(seconds) Slices ) ) )
33 34.74 19.28 1.456 33 34.81 20.71 1.572
28 38.21 37.52 1.453 28 38.24 40.75 1.584
23 41.74 77.41 1.665 23 41.8 83.93 1.81
2 45.3 149.49 2.08 18 45.34 162.64 2.304
13 48.82 297.24 3.225 13 48.82 320.54 3.3
B-Pictures = 1 B-Pictures = 3
Encoding
QPP/QP PSNR(dB Bitrate(kbps Encoding QPP/QP PSNR(dB Bitrate(kbps time(seconds
I Slices ) ) time(seconds) I Slices ) ) )
33 30.78 646.18 13.225 33 30.56 851.11 15.706
28 34.54 1324.37 16.972 28 34.34 1653.11 20.924
23 38.42 2531.33 24.089 23 38.27 3020.74 28.765
18 42.41 4470.54 38.691 18 42.33 5107.52 38.691
13 46.75 7434.27 47.717 13 46.72 8159.01 53.598
B-Pictures = 1 B-Pictures = 3
QPP/QPI Encoding QPP/QPI Encoding
Slices PSNR(dB) Bitrate(kbps) time(seconds) Slices PSNR(dB) Bitrate(kbps) time(seconds)
Multiple reference frames are used to predict the current frame before encoding. During the
prediction process, the encoder tends to search for the best matching frame to use in predicting
10
the current frame. These multiple reference frames are previously decoded pictures that are stored
in the coded picture buffer.
11
Test 6: Fast/ Low Complexity options – Fast Motion Estimation (FME)
Fast motion estimation attempts to reduce the number of positions searched during encoding for
the best match while trying to maintain the same compression efficiency as the full search which
searches every position exhaustively before finding the best match. FME doesn’t guarantee the
best match but the full search does. Setting FME = 0 disables fast motion estimation and activates
full search while FME = 1 activates fast motion estimation.
FME = FME=
0 1
QPP/Q Encoding Encoding
PI PSNR(d Bitrate(kbp time(second QPP/QP PSNR(d Bitrate(kbp time(secon
Slices B) s) s) I B) s) ds)
FME = FME =
0 1
QPP/Q Encoding QPP/Q Encoding
PI PSNR(d Bitrate(kb time(secon PI PSNR(d Bitrate(kb time(secon
Slices B) ps) ds) Slices B) ps) ds)
12
Table 3-6c: Suzie – FME
FME = 0 FME = 1
Encoding
QPP/QP PSNR(dB Bitrate(kbps Encoding QPP/QP Bitrate(kbps time(seconds
I ) ) time(seconds) I PSNR(dB) ) )
See tables 3-7a, 3-7b and 3-7c for the R-D curves of each parameter compared with the reference
CODEC.
Table 3-7a: Akiyo – R-D Curves for each parameter compared with the reference CODEC
Table 3-7b: bus – R-D Curves for each parameter compared with the reference CODEC
Table 3-7c: Suzie – R-D Curves for each parameter compared with the reference CODEC
Table 3-8 shows a kind of “appraisal” for the encoder, rating its performance on each sequence
with the various encoding parameters in terms of the performance indicators: PSNR, bitrate and
encoding time compared with those of the reference CODEC. A “tick” indicates that the
performance matches or improves on the reference CODEC; while a “cross” indicates a poor
performance as compared with the reference CODEC. The optimum parameters got a “tick” under
each performance indicator.
13
Encoding Parameters
RDO ✔ ✔ x ✔ ✔ x ✔ ✔ x
CABAC ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔
B-Frames 1 ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔
B-Frames 3 ✔ x ✔ x x ✔ ✔ x ✔
M.Ref Frames 2 ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔
M.Ref Frames 4 ✔ ✔ x ✔ ✔ x ✔ ✔ x
FME = 0 ✔ ✔ x ✔ x x ✔ ✔ x
FME = 1 ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔
The optimum parameters were chosen based on the degree to which the reference CODEC was
matched or improved upon. Thus my final choice of optimal parameters would be: CABAC turned
ON, B-Pictures = 1, FME = 1 and number of reference frames = 2.
Encoding
QPP/QPI PSNR(dB) Bitrate(kbps) time(seconds)
33 34.72 18.62 2.636
28 38.17 34.9 2.312
23 41.76 68.12 2.411
18 45.32 130.8 2.677
13 48.88 261.3 3.538
14
Fig. 3-9a: Akiyo R-D Curve with Optimum parameters enabled.
Encoding
PSNR(dB Bitrate(kbp time(second
QPP/QPI ) s) s)
33 30.84 600.97 18.147
28 34.67 1222.7 21.627
23 38.48 2363.27 29.281
18 42.5 4216.53 38.011
13 46.8 7093.99 53.816
15
Table 3-9c: Suzie- Optimum parameters used
Encoding
QPP/Q PSNR(d Bitrate(kb time(secon
PI B) ps) ds)
16
would require higher bitrate values during encoding. This is because during prediction, more
motion vectors will be required for prediction which would generate more number of bits
consequently increasing the bitrate. The performance of the encoder is evaluated based on the
PSNR, bitrate and encoding time for each sequence Table 3.8 shows a kind of “summary” of the
encoder performance with the various parameters turned ON.
RDO improves PSNR noticeably and at a lower bitrate than the reference CODEC but the
encoding time increases hugely which is a disadvantage in real time communication e.g. video
conferencing. It will more suitable for offline applications where time constraint is not an issue e.g.
DVD playback.
CABAC gives a good performance all round, it maintains the PSNR while reducing the bitrate and
encoding time remarkably. Thus it will be suitable for internet applications where time is essential.
The full search (FME = 0) gives better PSNR and slightly higher bitrate and higher encoding time
thus it’s only suitable for applications that are not time sensitive and where video quality is
important. Fast motion estimation (FME = 1) tends to maintain the PSNR level while reducing the
bitrate and encoding time considerably. It’s suitable for time-constrained applications like live
television broadcasting.
Using B-Frames = 1 tended to match the reference CODEC performance with relatively better
encoding time; but when B-Frames increased to 3, the PSNR dropped slightly and bitrate
increased while it maintained a relatively low encoding time which is a poor performance.
Using multiple references frames = 2 yielded slightly better PSNR, lower bitrate, reasonable
encoding time (for Akiyo and Suzie) but relatively higher encoding time for the bus sequence.
5.0 Conclusion
This work only scratched the surface of the huge potential H.264 possesses. There are still
numerous prospects offered by the H.264 codec. This work was primarily aimed at studying the
effects of certain parameters on the performance of the encoder in terms of the PSNR, bitrate and
17
encoding time. Then an attempt was made to choose a set of parameters that optimize the rate-
distortion performance of the encoder.
Changing coding parameters, adding optional coding modes and selecting various coding
algorithms affected the output of the encoder, resulting in different levels of visual quality (PSNR),
bitrate and encoding time. Generally, increased compression yields lower video quality and good
video quality requires less compression and higher bitrate; attempts to keep the bitrate at an
affordable level and reduce encoding time always pose a challenge and results in more complex
coding schemes.
Video coding techniques are still evolving and H.264 is just another step in the rapidly advancing
multimedia field. Currently attempts are being made to improve on the already outstanding
features of the H.264/AVC in the shape of H.265 whose goal is to offer half the compression rate
of H264 while maintaining the same video quality and lower codec complexity. It’s still at the
preliminary stage and undergoing research.
References
1. http://www.vcodex.com/links.html. [Accessed October 11, 2009].
18
2. Overview of the H.264/AVC Video Coding Standard; T. Wiegand, G.J. Sullivan, G.
Bjøntegaard and A. Luthra, IEEE Transaction on Circuits and Systems for Video
Technology, Vol. 13, no. 7, Jul. 2003.
19