Sie sind auf Seite 1von 24

1 T P CH KHOA HC I HC LT T p 6, S 1, 2016 124

OVERVIEW OF VIDEO TRANSCODING FOR H.265/HEVC

Nguyen Van Phuca , Thai Duy Quyb, Vo Phuong Binha* , Phan Thi Thanh Ngaa ,
Nguyen Thi Huyen Tranga

a
Faculty of Information Tecnology, Dalat University, Lamdong, Vietnam
b
International Relations Division Department, Dalat University, Lamdong, Vietnam

Abstract

The latest video coding standard, H.265/High Efficiency Video Coding (HEVC), has been
finalized and became the state-of-the-art technology for video compression. HEVC adopts
various key techniques to achieve much higher coding efficiency than its predecessors such as
H.264/AVC (Advanced Video Coding), H.263, H.262/MPEG-2, and H.261. However, being
considered as the main video coding technique for a decade, H.264/AVC has dominated the
digital video markets and there was an enormous amount of legacy content encoded with
H.264 and other old standards. Therefore, efficient transcoding from previous standards to
HEVC is essentially needed. Moreover, the increasing popularity of mobile video applications
has led to an increasing demand for trans-rating and downscaling. In this paper, the basic
transcoding architectures are deeply reviewed. Besides, various numbers of the most recent
techniques are utilized for reducing the encoded video bitrates or trans-rating, trans-sizing.
Finally, transcoding the previous standards into HEVC is also discussed.

Keywords: H.265/HEVC; H.264/AVC; Video Coding; Video Transcoding; SDTA; FDTA.

1. INTRODUCTION

It is not practical to store or transmitted video in its original, raw, or uncompressed


form because it would take a lot of space. A video encoder converts a large-size video into
a small bitstream that is suitable for storing as well as memory transmitting over the
network. A compressed video has to be decompressed by a video decoder before it can be
displayed. By combining the encoder and the decoder processes, we have the codec video
or the video coding as shown in Figure 1.

Figure 1. Codec video process

*
Corresponding author: binhvp@dlu.edu.vn
2 T P CH KHOA HC I HC LT [CHUYN SAN KHOA HC T NHIN V CNG NGH]

Video codecs are applied everywhere. We might find them in smartphone encoding
video from the host camera just right before saving it back to the phones memory. It also
decoded video from a memory or from a YouTube stream before displaying it. We can find
video codecs applied in cameras, laptops, TVs and etc.

HEVC (High Efficiency Video Coding) is the latest video coding standard, it is the
joint video project of the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC
Moving Picture Experts Group (MPEG) standardization organization. The first version of
the HEVC standard was finalized in 2013 by both ITU- T and ISO/IEC. Further work is to
extend the standard in order to support additional application scenarios, including
extended-range uses with enhanced precision and color format support, scalable video
coding, and 3 D/stereo/multiview video coding (Gary, Jens, Woo, & Thomas, 2012). In
ISO/IEC, the HEVC standard will become MPEG-H Part 2 (ISO/IEC 23008-2) with the
aim to significantly improve the compression efficiency compared to the existing
H.264/AVC high profile (Pourazad, Doutre, Azimi, & Nasiopoulos, 2012).

HEVC has been designed to essentially address all existing applications of


H.264/MPEG-4 AVC and to particularly focus on two key issues: increas ing the video
resolution and increasing the usage of the parallel processing architectures. It has been
widely used in several applications including broadcast of high definition (HD) TV signals
over satellite, cable, and terrestrial transmission systems; video content acquisition and
editing systems, camcorders, security applications, Internet and mobile network video;
Blu-ray Discs and real-time conversational applications such as video chat, video
conferencing, and telepresence systems. Especially, it can be used to increase the diversity
of services, proven by the growing popularity of the HD videos, and the emergence of the
beyond HD formats (e.g., 4k2k or 8k4k resolution).

There are many video coding standards developed such as H.261, H.262/MPEG-2,
H.263, H.264/MPEG-4, and H.265/HEVC. The main issue is how to transform a
compressed video to another compressed video. Scalable video coding (SVC) is a coding
paradigm that allows once-encoded video content to be used in diverse bandwidth and
devices with rate control (Binh & Yang, 2016; Yang & Binh, 2016). Video transcoding is
the process that converts a video file from a compressed bitstream to another compressed
bitstream in order to support displaying videos on various platforms and devices.
TP CH KHOA HC I HC LT [CHUYN SAN KHOA HC T NHIN V CNG NGH] 3

Video transcoding techniques aim to reduce the computational complexity of a


trivial fully decoding and re-encoding transcoder in order to meet the requirements of real-
life applications. Video transcoding process consists of two phases. (1) Decoding: the
original data is transferred to an intermediate uncompressed format. (2) Re-encoding: the
decoded data will be transferred to the desired format or resolution/bitrate.

Several video transcoding methods have been proposed to the HEVC and previous
standards. In this paper, the video transcoding methods are introduced from the previous
standards to HEVC which shows the effectiveness of the methods.

There are four sections in this paper. The next section is the video transcoding
architectures. Some video transcoding methods in HEVC are introduced in Section 3.
Section 4 provided conclusions of this paper.

2. VIDEO TRANSCODING ARCHITECTURES

A video transcoder has a general model as shown in Figure 2 (Ahmad, Xiaohui,


Yu, & Ya-Qin, 2005). The transcoder receives a source compressed video stream as an
input and the output will be another video stream which is compressed with a difference
set of parameters such as target bitrate, resolution, and video format. Generally, a
transcoder includes two components: a decoder and an encoder (Moiron, Ghanbari,
Assuno, & Faria, 2009).

Video transcoding process typically consists of two phases. First, the source
compressed video stream is transferred to an intermediate uncompressed format. Then, the
decoded data will be re-compressed and transmitted to the new devices in the desired
format or bitrate. The former phase is called decoding and the later part of the process is
also known as re-encoding.

Figure 2. General model for video transcoder


4 T P CH KHOA HC I HC LT [CHUYN SAN KHOA HC T NHIN V CNG NGH]

2.1. Decoder-encoder cascade architecture

As mentioned in the above simple case, the video transcoder follows the general
model which consists of a decoder-encoder cascade. In this architecture, transcoding is
performed by fully decoding the incoming video, up to the pixel domain and then re-
encoding the decoded video into the target video stream (Ahmad et al., 2005; Moiron et al.,
2009). The cascade architecture is depicted in more details in Figure 3.

Figure 3. Cascade architecture

As shown in the figure, the source bitstream is firstly decoded by a Variable Length
Decoder (VLD). The decoded DCT coefficients are then inversely quantised and converted
into Inverse Discrete Coefficient Cosine Transform (IDCT). After being transformed, the
decoded pixels represent the residual information which will be added to the reference
frame after being passed through the Motion Compensation (MC) process. Finally, the
result uncompressed video frame is the re-encoded with a new set of coding parameters.
This architecture is not only simple but also flexible in a sense that it allows changing
video format without significantly reducing the image quality (Moiron et al., 2009).
Therefore, it is usually used to compare the performance with other architectures.

2.2. Open-loop architecture

Although the cascade architecture is simple, it is time-consuming because every


video frame must be decoded and then re-encoded. In addition, in the case of reducing
bitrate without changing any other parameters, it is proven to be redundant. The open- loop
architecture was proposed to overcome these drawbacks. It is the simplest yet fastest form
of video transcoding compared to the other architectures. Figure 4 shows the block
diagram of the open-loop architecture.
TP CH KHOA HC I HC LT [CHUYN SAN KHOA HC T NHIN V CNG NGH] 5

Figure 4. Open-loop architecture

First, the source bitstream is given to the entropy decoder to extract the quantised
DCT coefficients and the other information. Second, these DCT coefficients are inverse
quantised and then passed through the encoder. Third, in the encoder, a larger quantisation
step size Q T which is defined by the smaller target bitrate, will be used to re-quantise the
coefficients before sending them to the Variable Length Coder (VLC). This re-quantisation
step drives to more zero coefficients which also results in the lower number of bits needed
to re-encode them. The alternative method was proposed (Moiron et al., 2009) to remove
the high order DCT coefficients.

The computationally efficient open-loop architecture is a good choice for systems


requiring low complexity. However, since this technique discards some coefficients and
does not provide mechanism to compensate for their quality loss, the difference between
the encoder and decoder predictions is the drift error (Ahmad et al., 2005; Moiron et al.,
2009; Vetro, Christopoulos, & Sun, 2003) producing errors in predicted frames which will
then be accumulated in the loop. The drift error is only reset to zero when there is a non-
predicted picture decoded. Therefore, using the open- loop architecture usually leads to a
trade-off between distortion and bitrate reduction.

2.3. Close-loop architecture

To overcome the disadvantage of the open-loop architecture, a feedback loop is


added to the close-loop transcoder, as illustrated in Figure 5, to support the ability of
removing the mismatch between the residual and the predicted frame (Moiron et al., 2009;
Vetro et al., 2003). The motion compensation in the feedback loop minimizes the drift
distortion and prevents significant loss in the final quality. Consequently, the visual quality
is significantly improved with more computational complexity.
6 T P CH KHOA HC I HC LT [CHUYN SAN KHOA HC T NHIN V CNG NGH]

Figure 5. Close-loop architecture

2.4. Spatial domain transcoder architecture

One of the main drawbacks of the cascade architecture is that it does not reuse the
information retrieved from the decoder component such as motion information, mode
decision, and quantisation step size. Thus, the video encoder must encode by itself, though,
motion estimation process accounts for 60%-70% of the encoder computation (Ahmad et
al., 2005). The spatial domain transcoder architecture (SDTA) is based on the cascade
decoder-encoder but extended with a converter component between the decoder and the
encoder, as depicted in Figure 6 (Moiron et al., 2009). This new component has two
functional blocks: the SDR block to perform the spatial domain resolution reduction; and
the MVCR block to convert the motion vectors. New motion vectors are computed through
a mapping function such that input motion vectors are reused in the down-converted signal.

In the above mentioned architecture, the compressed-bitstream input is decoded to


get quantised DCT coefficients and extract other information. The motion information is
reused by passing itself to the MVCR to re-calculate the motion vectors. The quantised
DCT coefficients are inversely quantised to get DCT coefficients which are then passed
through the IDCT to inverse the DCT coefficient values. The DCT values are stored in a
reference frame Ref before being down-sampled by the SDR and sent to the encoder
(Moiron et al., 2009). This architecture is commonly used in the spatial resolution
reduction transcoding because it is flexible and drift free.
TP CH KHOA HC I HC LT [CHUYN SAN KHOA HC T NHIN V CNG NGH] 7

Figure 6. Spatial domain transcoder architecture

2.5. Frequency domain transcoder architecture

Similar to SDTA, the frequency domain transcoder architecture (FDTA) has a


converter component, as shown in Figure 7. This component contains the motion vector
converter/refinement (MVCR) block and the frequency domain spatial reduction (FDSR).
Because all the operations are performed in the frequency domain, the encoder end also
includes a frequency domain motion compensation (FDMC) block (Moiron et al., 2009).
This architecture exploits the redundancy in the structure of SDTA. It is simpler even it
fulfils ensuring the functionality as the SDTA (Ahmad et al., 2005).

Figure 7. Frequency domain transcoder architecture

In this architecture, several frequency domain filters is to filter low frequency


coefficients from four original blocks to form a new resized block. The objective is to find
an efficient way of merging four IDCTs and one DCT. In addition, when motion vectors
are downscaled to meet the new resolution, computation of the new motion vectors can be
calculated by various techniques such as averaging the four motion vectors and then
scaling them with respect to the new picture size. However, it is difficult to ensure any
good result, sometimes this results in a poor approximation (Moiron et al., 2009). In short,
this architecture needs less computation but it lacks the flexibility and may introduce drift
error.
8 T P CH KHOA HC I HC LT [CHUYN SAN KHOA HC T NHIN V CNG NGH]

2.6. Generic heterogeneous transcoding architecture

Heterogeneous transcoding is used to perform conversion between different


compression standards, such as H.264/AVC to H.265/HEVC, MPEG-2 to H.263, etc. Due
to the asymmetry between the decoder and the encoder, the complexity of heterogeneous
transcoders are often much higher than those of the homogeneous transcoders. This type of
transcoding requires a syntax conversion (SC) between the input and the output standards,
and may additionally change any format parameters to meet the capabilities of the target
devices (Ahmad et al., 2005; Moiron et al., 2009).

Figure 8. Heterogeneous transcoder architecture

Figure 8 shows the generic architecture for the heterogeneous transcoding in which
the compressed bitstream source is fully decompressed into the pixel domain before
changing frame type, frame rate, resolution, etc. The MVCR function block provides the
ability to reuse the motion information and picture type in the encoder side. The extracted
motion vectors from the decoder are post-processed according to output encoding structure
and might be scaled down to fit the lower spatial-temporal resolution encoder if required.
Besides, the extracted motion vectors can be refined to improve the encoding efficiency.
The decoded pictures are passed through the STR block to down-sample spatial or
temporal where downsized images are encoded with the new motion vectors. Due to the
spatial- temporal subsampling and different encoding formats of the output video, the
decoder and encoder motion compensation tasks in the heterogeneous transcoder are more
complex (Moiron et al., 2009).

3. VIDEO TRANSCODING FOR H.265/HEVC

3.1. Trans-rating
TP CH KHOA HC I HC LT [CHUYN SAN KHOA HC T NHIN V CNG NGH] 9

Trans-rating, also known as bit-rate transcoding or bit-rate reduction, is a type of


the video transcoding techniques. Although the transcoding may change any parameters of
the encoding, the trans-rating may only change the bit-rate. Trans-rating is the process of
converting source compressed bitstream to a reduced target bit-rate maintaining low
complexity, the original video format and achieving the highest quality poss ible (Vetro et
al., 2003). Applications such as television broadcast and Internet streaming require this
type of conversion to being able to transfer high resolution content quickly and smoothly.
In addition, the increasing popularity of mobile video view in website such as YouTube,
has led to an increasing demand for trans-rating.

Ideally, the quality of the reduced bit-rate stream should be as same as the quality
of the source bitstream that is directly encoded with the reduced bit-rate (Vetro et al.,
2003). To achieve this, the most straightforward approach is to design a transcoder which
follows the cascade architecture. Accordingly, the source video bitstream is fully decoded
and then re-encoded the reconstructed signal at a new rate. However, this method has a
high complexity. To overcome this drawback while still holding on acceptable quality,
simplified architectures such as open- loop and close- loop are considered. This section will
review some of the recent researches on trans-rating the HEVC videos.

Pham et al. (2013) proposed the fast trans-rating scheme, based on the early
prediction for the partition split- flags in P pictures. The proposed method to investigate
the correlation between coding information of the input CUs encoded at high quality and
coding information of the co-located output CUs at a reduced quality to form the decision
trees. These trees had been designed to reduce the trans-rating complexity by using the
machine learning techniques. The model is then used to predict the split-flag and gives the
associated prediction accuracy so that the split process in the transcoder is optimized. The
accuracy of the prediction is defined as the ratio of the correct to the incorrect instances
which were predicted by the decision trees. At each partition depth, the RD process for a
CU needs to be evaluated either only for the full size, or only for the sub-partitions, or
both. Table 1 shows the combinations of the predicted split- flag and the accuracy to
control the RD evaluation process. Accordingly, the results from the RD evaluation at
depth d and/or d+1 were used to decide whether or not the recursive splitting process
terminates. As the result, this reduced the RD evaluations for the partitioning process.
10 T P CH KHOA HC I HC LT [CHUYN SAN KHOA HC T NHIN V CNG NGH]

Table 1. RD evaluation for a CU at depth d (Y=Yes/N=No)

Experimental result shown that the proposed trans-rating scheme reduced the
complexity of the trans-rating process by over 76% but maintaining the coding efficiency
of a cascaded decoder-encoder (Pham et al., 2013). Figure 9 illustrates the comparison of
RD performance obtained by the new model with the unmodified HEVC cascaded
decoder-encoder and the trivial method.

Figure 9. Comparison of RD performance for trans-rating at QP=6

Pham et al. (2014) introduced another method for reducing the complexity of
transrating by optimizing the TZSearch algorithm. By default, the TZSearch applies a
diamond-shape search pattern and d refinement step to obtain the optimal integer motion
vector. The diamond search process starts at the initial search point and the output is then
refined. However, this algorithm is characterized by a fixed search area and search pattern.
Moreover, the authors observation indicated that the processing time for the diamond
search was 18% of the total encoding time (Pham et al., 2014). To address this problem,
the correlation of the input and output motion vectors was employed to reduce the
complexity of the TZSearch algorithm in the HEVC encoder.

The proposed approach consists of three steps. First, the initial search point is
adaptively selected from a set instead of using a fixed base motion vector. Second, based
on the cost of this starting point, the search range is classified into a large area or a smaller
TP CH KHOA HC I HC LT [CHUYN SAN KHOA HC T NHIN V CNG NGH] 11

area using an online trained Bayes decision rule. Finally, the integer motion vector is
searched by one of the two fast search patterns depending on the search range size. This
control flow is depicted in Figure 10. By applying the new TZSearch algorithm, the
complexity of trans-rating was reduced by 16.4% with a slight bitrate increase (2.54%)
compared to the cascaded transcoder (Pham et al., 2014).

Figure 10. Flowchart of the improved fast TZSearch algorithm for trans-rating

One of the main existing problems of trans-rating for HEVC is the high
computational complexity in the encoder part of the cascaded pixel domain transcoder
(Pham et al., 2016).. To overcome this obstacle, Pham et al. (2016) proposed various
techniques and derived an optimal strategy to reduce the transcoding complexity of the
closed- loop trans-rating for HEVC in both CU and PU optimization levels with a
complexity-scalable scheme. At the CU level, CUs were evaluated by two ways which are
top-to-bottom (T2B) and bottom-to-top (B2T) - with two fast techniques for each. For the
T2B, the CUs evaluation is performed from the lower depths to the higher depths. The first
technique was to re- use the CU structure from the input bitstream and modified the
structure by evaluating CUs at the lower depths. The second one is the machine learning
12 T P CH KHOA HC I HC LT [CHUYN SAN KHOA HC T NHIN V CNG NGH]

based method which was applied to exploit the correlation between coding information for
the input CUs and coding information of the co- located output CUs. For the B2T, the flow
of evaluating CUs moves downwards from the higher depths to the lower depths. Both
methods were used in this way, utilized the coding information of CUs from the incoming
bitstream and splitting neighbouring CUs. At the PU level, the PU candidates were
adaptively selected based on the probabilities of the CU size and the co- located input of
PU partitioning.

The transcoding performance of the proposed techniques was evaluated by


comparing with current state-of-the-arts related works. It is shown that the new proposed
methods outperform the others and can achieve a range of trade-offs between trans-rating
complexity and coding performance. The fastest approach is able to reduce the complexity
by 82% while still keeping the bitrate below 3% (Pham et al., 2016). Figure 11 illustrates
the comparison of the average performance of the proposed methods, the HEVC transcoder
and the trivial transcoder on four video sequences.

Figure 11. RD performance for trans-rating with a QP=6 using CBR scheme and the
LP configuration
Note: (a) Johnny sequence (b) BQSquare sequence. (c) PartyScene sequence. (d) Basketball Drive sequence.
TP CH KHOA HC I HC LT [CHUYN SAN KHOA HC T NHIN V CNG NGH] 13

3.2. Trans-sizing

In many applications such as video browsing, video conferencing, and so on. The
video is required to be transcoded at lower bit rates and resolutions. This task is known as
video trans-sizing or video downscaling. The objective is to reduce frame size in order to
make video content meet the requirements of different communication links and target
devices. In addition, heterogeneous networks, such as Internet, consist of many end
terminals with low displaying capability and limited processing power. Therefore,
transcoding the high quality video streams to lower resolution through downscaling is
highly needed. A straightforward approach to perform this operation is to decode each
frame of the input video, downscale each of them in pixel domain and re-encode to
generate output video sequence. However, this method is very time consuming and high
computational complexity because in DCT/IDCT operations, full search motion estimation
has to be performed to obtain motion vectors during re-encoding of downsized video. It
makes this approach ineffective for real time applications.

On the other hand, although many approaches have been proposed for the
resolution reduction transcoding, most of works targeted at the existing video coding
standards, such as H.263 and H2.64/AVC. However, directly applying these methods for
new HEVC standard may result in inefficiency and much higher complexity due to the
difference and complex of new coding features of HEVC. Moreover, in HEVC, using
complex quad-tree depth structure to reduce the bit rate of high resolution video contents
by utilizing large size CUs leads to heavy computational overhead due to the recursion
operation of searching the optimal CU size and partition mode (Minyong, Minwoo,
Minsik, & Won, 2014). In this section, some recent and effective downsizing techniques to
overcome these drawbacks are explained.

Figure 12. Transcoding process with utilizing decoded information.

First, a HEVC downscale transcoding technique which reduces the complexity of


the encoder by extracting and utilizing the CU composition information during the
14 T P CH KHOA HC I HC LT [CHUYN SAN KHOA HC T NHIN V CNG NGH]

decoding process was introduced by Minyong et al. In this method, the original quad-tree
information obtained from the decoding process was transformed, in consideration of the
scaling factor, for the downscaled frames. Figure 12 shows how the depth information of
each CU is extracted, transformed and stored with the absolute position of the CUs in each
LCU. During the encoding process, by only searching the optimal depths in the
reconstructed quad-tree, the encoder could be accelerated without losing the video quality.
For further optimization, an adaptive search range decision according to the picture order
count and reference relationship is also implemented. For each frame, a search range that
covers several depths is defined and used to perform the prediction search. The
experimental results show that the maximum encoding speedup of 2.18 with only 0.3%
BD-rate increase in the best case as compared to HEVC reference encoder (Minyong et al.,
2014).

Second, in 2015, Nguyen et al. proposed another method to handle one of the most
computationally intensive processes in HEVC: the CU size selection and thus, reduce the
complexity of the transcoder. The proposed technique used the temporal correlation of the
depth levels among CUs and the continuity of the motion vector field in the pre-coded
video to derive the most probable CU depth ranges and avoid the unnecessary exhaustive
CU size search at certain depth levels (Nguyen & Do, 2015). Based on the depth
correlation analysis, an early split CU strategy was proposed for the case of the current
depth level is less than the temporally predicted minimum depth and to avoid unnecessary
intensive mode decision at that level. To get more complexity degradation, the information
in the pre-coded video was employed to make a decision whether the current CU should be
pruned or not. With the result of examining the relation between the estimated and optimal
depth levels, the authors suggested to use the estimated depth level obtained from the sub-
optimal CU quad-tree as the predicted maximum depth level in each CU for an early tree
pruning. Experimental results show that the new method is effective as compared with the
existing ones. Particularly, it reduced the overall transcoding time by about 41% with only
negligible coding performance degradation.

Third, with the similar idea for reducing transcoding complexity by early
termination of CU splitting, Pham et al. presented a machine learning approach for
arbitrary downsizing of pre-encoded video in HEVC. Figure 13 shows the proposed
downsizing architecture (Pham, Praeter, Wallendael, Cock, & Walle, 2015). The coding
information was extracted during decoding input video and the downsizing scaling factor
TP CH KHOA HC I HC LT [CHUYN SAN KHOA HC T NHIN V CNG NGH] 15

was chosen based on the network bandwidth constrains or the target device screen
resolution. The decoded video is then downscaled by this scaling factor. The prediction
models utilized the extracted coding information to predict the coding mode of the CUs in
the output stream. Finally, the downsized video was re-encoded using this predicted
information.

Figure 13. A fast arbitrary downsizing architecture in HEVC

The Random Forests algorithm was used to build three prediction models which
were used to predict the splitting behaviour of CUs at three depth levels: 0, 1, and 2.
Particularly, the machine learning technique examined the correlation between input and
output coding information to predict the split- flag of coding units in a P-frame. The
transcoding complexity was controlled by threshold for the confidence of prediction. If the
confidence is larger than the threshold, the predicted split- flag would decide the splitting
behaviour of CUs. On the contrary, the CU was fully evaluated for both split or not split.
The experimental results indicated that with threshold varying from 0.9 to 0.5, the
proposed techniques can achieve a time saving from 20% to 70% with slightly bit rate
reduction. In addition, by adjusting the threshold value, the complexity can be controlled
and a good trade-off between complexity and coding performance can be achieved (Pham,
Praeter, et al., 2015).

Finally, in (Pham, Johan, Glenn, Jan, & Rik, 2015), Pham et al. made a
performance analysis of various machine learning strategies used for downsizing the pre-
encoded in HEVC to determine the optimal strategy. Each version of different algorithms
was evaluated with both online and offline training strategies. Besides, the benefits of
content-adaptive feature selection was examined. The research concluded that machine
learning algorithms should only be used when optimized. Otherwise, a trivial method may
outperform. The experimental results also shown that, among the investigated algorithms,
the random forests resulted in the best transcoding performance by reducing 70%
complexity on average with a bit rate increase of 5.4% (Pham, Johan, et al., 2015).
16 T P CH KHOA HC I HC LT [CHUYN SAN KHOA HC T NHIN V CNG NGH]

3.3. Transcoding Previous Standards into H.265/HEVC

Without transcoding by H.265/HEVC trans-sizing and trans-rating, there are some


another proposed methods for transcoding the previous standards (H.264/AVC, H.263,
H.262/MPEG-2, and H.261) into H.265/HEVC. In this paper, we review the methods of
Shanableh, Peixoto, and Izquierdo (2013); Peixoto, Macchiavello, Queiroz, and Hung
(2014); Antonio, Jose, Pedro, Jose, and Jose (2015) and Correa, Agostini, and Cruz (2016).

Shanableh et al. (2013) proposed an efficient video transcoder to transcode from


MPEG-2 to HEVC. The transcoder introduces a content-based machine learning solution
to predict the depth of the HEVC coding units. The result of this method shown that a
speedup factor of up to 3 is achieved, while reducing the bitrate of the incoming video by
around 50%.

The main idea of the proposed method is to formalize the mapping as a


classification problem (Figure 14). Four classes are used according to the depth of the CUs.
Therefore, the transcoder attempts to predict the depth of the current CU, and then encode
that CU using the predicted depth, while testing all PUs for that depth in a
rate-distortion sense. The incoming MPEG-2 macro-blocks are organized into the largest
CU sizes, i.e., 16 MBs arranged into 44 MBs correspond to 6464 pixels. The coding
parameters and motion information of the reorganized macro-blocks are used to generate
feature vectors that can be mapped into one of the four depth classes.

Figure 14. Proposed MPEG-2 to HEVC transcoding architecture.


TP CH KHOA HC I HC LT [CHUYN SAN KHOA HC T NHIN V CNG NGH] 17

Figure 14 shown the flow chart of this method. During the training process, the
transcoder is running in re-encoding mode, in which the MPEG-2 video is fully decoded
and fully re-encoded using HEVC. Once the model weights are generated, a switch
between re-encoding and transcoding takes place. In this case, the transcoder can switch to
the re-encoding mode, during which the required MPEG-2 and HEVC coding information
is gathered and used to regenerate the model.

In the experiments, six video test sequences are used, namely, BasketballDrill (832
480, 50 Hz), PartyScene (832 480, 50 Hz), BQMall (832 480, 60 Hz), RaceHorses
(832 480, 30 Hz), Vidyo1 (1280 720, 60 Hz), and BasketballDrive (1920 1080, 50
Hz). Clearly, both the training and testing times of the proposed transcoder are taken into
account in the experimental results to follow Figure 15.

Figure 15. Rate-distortion curves for the proposed transcoding solution versus
MPEG-2 encoding and full HEVC re-encoding
Note: (a) BasketballDrill sequence; (b) PartyScense sequence; (c) BQMall sequence; (d) RaceHorses
sequence; (e) Vidyol sequence; (f) BasketballDrive sequence

Zheng, Shi, Zhang, and Gao (2014) proposed a fast transcoding structure. It mainly
focuses on coding unit (CU) and prediction unit (PU) decision by using blocks
18 T P CH KHOA HC I HC LT [CHUYN SAN KHOA HC T NHIN V CNG NGH]

homogeneity characteristic. Experimental results illustrate that the proposed algorithm


achieves about 56.68% complexity saving in terms of encoding time with only 2.2%
BDBR loss when compared to the trivial transcoder.

According to the method of Zheng et al. (2014), the fast PU mode decision
algorithms by using RHIs are applied on CU depth 0 and 1. For the CU depth 2 and 3, a
conventional mode mapping method is employed. Meanwhile, an early CU partition
termination scheme is also discussed to further speed up the HEVC transcoder. The
proposed method is presented in Figure 16.

Figure 16. CU and PU mode decision for depth 0 and 1


TP CH KHOA HC I HC LT [CHUYN SAN KHOA HC T NHIN V CNG NGH] 19

The results shows the proposed algorithm can reduce 56.7% of the total encoding
time on average, with only 0.072dB BD-PSNR degradation or 2.172% BD-BR increase,
which can be neglected. For sequence BaskballDrive, both algorithms loss more bitrates
than others. This is mainly because the particular sequences have rapid movements, the
partition is difficult to be predicted more details of RD performance can be refer to Figure
17.

Figure 17. RD performance comparison of the proposed in terms of


PSNR vs. Bitrate

Antonio et al. (2015) presented an Adaptive Fast Quadtree Level Decision


(AFQLD) algorithm that is designed to exploit the information gathered at the H.264/AVC
decoder in order to make faster decisions on CU splitting in HEVC using a Native-Bayes
(NB) probabilistic classifier that is determined by a supervised data mining process.
Experimental results show that the proposed algorithm can achieve a good trade-off
between coding efficiency and complexity compared with the anchor transcoder, and,
moreover, it outperforms other related works available in the literature.

In the proposed method, C S and C N are the two categories to be predicted by our
decision function or classifier. If the chosen decision is C S, we need to take into account
some considerations. On the other hand, if the decision is CN, then the current depth is
considered as final and all PUs at this CU depth are evaluated and the algorithm for this
20 T P CH KHOA HC I HC LT [CHUYN SAN KHOA HC T NHIN V CNG NGH]

CTU finishes. Figure 18 schematically describes the proposed CU splitting


algorithm, which we have called Adaptive Fast Quadtree Level Decision (AFQLD), where
the term Adaptive refers to the fact that the algorithm can be dynamically adapted to the
content of each sequence.

Figure 18. Diagram of the Proposed AFQLD algorithm

Correa et al. (2016) proposed a fast transcoder based on an extensive data mining
process on H.264/AVC decoding attributes. Experimental results have shown an average
reduction of 44% in the transcoding time, with a small bit rate increase of 1.67%. In t his
method, the fast transcoder uses the most relevant attributes identified in the previous
section, obtained from the H.264/AVC decoding process, to fasten the HEVC encoding
process, as shown in Figure 19.

Figure 19. Fast H.264/AVC to HEVC transcoding


TP CH KHOA HC I HC LT [CHUYN SAN KHOA HC T NHIN V CNG NGH] 21

As shown in Figure 20, the encoder receives the attributes and the video sequence,
calculates the decision tree outcome for each CU and applies such decision to the HEVC
encoding flow. Figure 20 shows the flowchart of the transcoding algorithm at the encoder
side. For each CU larger than 88, the attributes corresponding to that image region in the
H.264-encoded video are retrieved from the decoder. At this point, the attributes for 3232
and 6464 CUs are calculated by summing up the values yielded for each MB within the
CU region. For 1616 CUs, the attributes are directly obtained without further processing.
After that, the attributes are applied to the corresponding decision tree for the specific CU
size. If the decision tree outcome is Split, the CU is partitioned into four sub-CUs and the
whole process is recursively repeated for each sub-CU. Otherwise, the encoder decides the
best partitioning mode and finishes the CU encoding process.

Figure 20. Flowchart of the transcoding algorithm at the encode side

Experimental results show that the above proposed transcoder achieves an average
CCR (compression efficiency and computational complexity reduction) of 44%, with a
BD-rate increase of only 1.67% in comparison to the trivial transcoder. The largest CCR is
noticed for the SlideEditing video, which is the only screen content sequence. As this
sequence is composed of large flat areas, a larger number of 6464 CUs are encoded and
the fast transcoder avoids testing smaller CUs more often.

4. CONCLUSION

Video transcoding is a core technology for providing Internet multimedia systems


by the end users with various bandwidths and devices. This paper reviewed several
existing video transcoding architectures. Several transcoding techniques provide trade-off
between the computational complexity and visual quality. The SDTA based architectures
22 T P CH KHOA HC I HC LT [CHUYN SAN KHOA HC T NHIN V CNG NGH]

provide the highest visual quality with more complexity, while the FDTA based
architectures provide a bit lower visual quality with lower complexity. Consequently,
transcoding applications can be designed in combined cases. Video compression
algorithms used in the standardizing H.265/HEVC are very different from that of in the
previous traditional video compression standards. To obtain inter-compatibility between
H.265/HEVC and other standards, H.265/HEVC related transcoding would become a more
challenge issue in the future research of video transcoding.

REFERENCES

Ahmad, I., Xiaohui, W., Yu, S., & Ya-Qin, Z. (2005). Video transcoding: an overview of
various techniques and research issues. IEEE transactions on multimedia, 7(5),
793-804. doi: 10.1109/TMM.2005.854472.
Antonio, J. D.-H., Jose, L. M., Pedro, C., Jose, A. G., & Jose, M. P. (2015). Adaptive Fast
Quadtree Level Decision Algorithm for H.264 to HEVC Video Transcoding. IEEE
Transactions on Circuits and Systems for Video technology, 26(1), 154-168. doi:
10.1109/TCSVT.2015.2473299.
Binh, V.P., & Yang, S.-H. (2016). Complexity-aware frame-level bit allocation and rate
control for H.264 scalable video coding, Journal of Information Science and
Engineering, 32(2), 329-347.
Correa, G., Agostini, L., & Cruz, L. A. d. S. (2016, 22-25 May 2016). Fast H.264/AVC to
HEVC transcoder based on data mining and decision trees. Paper presented at the
2016 IEEE International Symposium on Circuits and Systems (ISCAS).
Gary, J. S., Jens, R. O., Woo, J. H., & Thomas, W. (2012). Overview of the High Efficiency
Video Coding (HEVC) Standard. IEEE Transactions on Circuits and Systems for
Video technology, 22(12), 1649-1668.
Minyong, S., Minwoo, K., Minsik, K., & Won, W. R. (2014). Accelerating HEVC
Transcoder by Exploiting Decoded Quadtree. Paper presented at the IEEE ISCE
2014, Jeju, Korea.
Moiron, S., Ghanbari, M., Assuno, P., & Faria, S. (2009). Video Transcoding
Techniques. In M. Grgic, K. Delac & M. Ghanbari (Eds.), Recent Advances in
Multimedia Signal Processing and Communications (Vol. 231, pp. pp 245-270):
Springer Berlin Heidelberg.
Nguyen, V. A., & Do, M. N. (2015). Efficient coding unit size selection for HEVC
downsizing transcoding. Paper presented at the ISCAS, Lisbon, Portugal.
Peixoto, E., Macchiavello, B., Queiroz, R. L. d., & Hung, E. M. (2014, 17-20 Aug. 2014).
Fast H.264/AVC to HEVC transcoding based on machine learning. Paper presented
at the 2014 International Telecommunications Symposium (ITS).
Pham, V. L., Cock, J. D., Diaz-Honrubia, A. J., Wallendael, G. V., Leuven, S. V., &
Walle, R. V. d. (2014, 27-30 Oct. 2014). Fast motion estimation for closed-loop
HEVC transrating. Paper presented at the 2014 IEEE International Conference on
Image Processing (ICIP).
TP CH KHOA HC I HC LT [CHUYN SAN KHOA HC T NHIN V CNG NGH] 23

Pham, V. L., Cock, J. D., Wallendael, G. V., Leuven, S. V., Rodriguez-S, R., x00E, . . .
Walle, R. V. d. (2013, 15-18 Sept. 2013). Fast transrating for high efficiency video
coding based on machine learning. Paper presented at the 2013 IEEE International
Conference on Image Processing.
Pham, V. L., Johan, D. P., Glenn, V. W., Jan, D. C., & Rik, V. d. W. (2015). Performance
Analysis of Machine Learning for Arbitrary Downsizing of Pre-Encoded HEVC
Video. IEEE Transactions on Consumer Electronics, 61(4), 507-515.
Pham, V. L., Praeter, J. D., Wallendael, G. V., Cock, J. D., & Walle, R. V. d. (2015, 9-12
Jan. 2015). Machine learning for arbitrary downsizing of pre-encoded video in
HEVC. Paper presented at the 2015 IEEE International Conference on Consumer
Electronics (ICCE).
Pham, V. L., Praeter, J. D., Wallendael, G. V., Leuven, S. V., Cock, J. D., & Walle, R. V.
d. (2016). Efficient Bit Rate Transcoding for High Efficiency Video Coding. IEEE
transactions on multimedia, 18(3), 364-378. doi: 10.1109/TMM.2015.2512231.
Pourazad, M. T., Doutre, C., Azimi, M., & Nasiopoulos, P. (2012). HEVC: The New Gold
Standard for Video Compression: How Does HEVC Compare with H.264/AVC?
IEEE Consumer Electronics Magazine, 1(3), 36-46. doi:
10.1109/MCE.2012.2192754.
Shanableh, T., Peixoto, E., & Izquierdo, E. (2013). MPEG-2 to HEVC Video Transcoding
With Content-Based Modeling. IEEE Transactions on Circuits and Systems for
Video technology, 23(7), 1191-1196. doi: 10.1109/TCSVT.2013.2241352.
Vetro, A., Christopoulos, C., & Sun, H. (2003). Video Transcoding Architectures and
Techniques: An Overview. IEEE Signal Processing Magazine, 20, pp 18-29.
Yang, S.-H., & Binh, V. P (2016). Adaptive bit allocation for consistent video quality in
scalable high-efficiency video coding, IEEE Transactions on Circuits and Systems
for Video Technology, PP(99), (Early Access).
Zheng, F., Shi, Z., Zhang, X., & Gao, Z. (2014, 7-9 July 2014). Fast H.264/AVC to HEVC
transcoding based on residual homogeneity. Paper presented at the 2014
International Conference on Audio, Language and Image Processing (ICALIP).
24 T P CH KHOA HC I HC LT [CHUYN SAN KHOA HC T NHIN V CNG NGH]

TNG QUAN V CHUYN MA VIDEO CHUN H.265/HEVC

Nguy n Vn Phuca , Thai Duy Quyb, Vo Phng Bin ha , Phan Thi Thanh Ngaa ,
Nguy n Thi Huy n Tranga

a
Khoa Cng ngh thng tin, ai hoc a Lat , Lm ng, Vi t Nam
b
Pho ng Qua n ly khoa hoc -Hp ta c qu c t , ai hoc a Lat , Lm ng, Vi t Nam

Tom tt

Video chu n mi nht H.265/High Efficiency Video Coding (HEVC), c hon thin v
tr thnh cng ngh tiu chun cho vi c nn video hi n nay. HEVC s du n g ca c k thut quan
trng khc nhau t c hiu qu m ha cao hn nhiu so vi ca c chu n nen trc o
nh H.264/AVC (Advanced Video Coding), H.263, H.262 / MPEG-2, v H.261. Tuy nhin,
c coi l k thut chin h m ha video tra i qua ha ng thp k, H.264/AVC thng tr th
trng video k thut s v c mt s lng ln cc ni dung c m ha vi H.264 v tiu
chun c khc. Do , vi c chuyn m hiu qu t tiu chun trc sang chu n HEVC v
c bn r t cn thit. Hn na, s ph bin ngy cng tng ca cc ng dng video di ng
dn n mt nhu cu ngy cng tng cho vi c chuy n ma ti l bit v phn gia i. Trong bi
bo ny, cc kin trc chuyn m c bn c xem xt chi ti t. Bn cnh , xem xet lai ca c k
thut gn y nht c s dng gim ti l bit c m ha hoc chuyn i phn gia i.
Cui cng, chuyn m cc tiu chun trc sang chu n HEVC cng c tho lun.

T kho a : H.265/HEVC; H.264/AVC; Ma hoa video; Chuy n ma video; SDTA; FDTA.

Das könnte Ihnen auch gefallen