Beruflich Dokumente
Kultur Dokumente
Nguyen Van Phuca , Thai Duy Quyb, Vo Phuong Binha* , Phan Thi Thanh Ngaa ,
Nguyen Thi Huyen Tranga
a
Faculty of Information Tecnology, Dalat University, Lamdong, Vietnam
b
International Relations Division Department, Dalat University, Lamdong, Vietnam
Abstract
The latest video coding standard, H.265/High Efficiency Video Coding (HEVC), has been
finalized and became the state-of-the-art technology for video compression. HEVC adopts
various key techniques to achieve much higher coding efficiency than its predecessors such as
H.264/AVC (Advanced Video Coding), H.263, H.262/MPEG-2, and H.261. However, being
considered as the main video coding technique for a decade, H.264/AVC has dominated the
digital video markets and there was an enormous amount of legacy content encoded with
H.264 and other old standards. Therefore, efficient transcoding from previous standards to
HEVC is essentially needed. Moreover, the increasing popularity of mobile video applications
has led to an increasing demand for trans-rating and downscaling. In this paper, the basic
transcoding architectures are deeply reviewed. Besides, various numbers of the most recent
techniques are utilized for reducing the encoded video bitrates or trans-rating, trans-sizing.
Finally, transcoding the previous standards into HEVC is also discussed.
1. INTRODUCTION
*
Corresponding author: binhvp@dlu.edu.vn
2 T P CH KHOA HC I HC LT [CHUYN SAN KHOA HC T NHIN V CNG NGH]
Video codecs are applied everywhere. We might find them in smartphone encoding
video from the host camera just right before saving it back to the phones memory. It also
decoded video from a memory or from a YouTube stream before displaying it. We can find
video codecs applied in cameras, laptops, TVs and etc.
HEVC (High Efficiency Video Coding) is the latest video coding standard, it is the
joint video project of the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC
Moving Picture Experts Group (MPEG) standardization organization. The first version of
the HEVC standard was finalized in 2013 by both ITU- T and ISO/IEC. Further work is to
extend the standard in order to support additional application scenarios, including
extended-range uses with enhanced precision and color format support, scalable video
coding, and 3 D/stereo/multiview video coding (Gary, Jens, Woo, & Thomas, 2012). In
ISO/IEC, the HEVC standard will become MPEG-H Part 2 (ISO/IEC 23008-2) with the
aim to significantly improve the compression efficiency compared to the existing
H.264/AVC high profile (Pourazad, Doutre, Azimi, & Nasiopoulos, 2012).
There are many video coding standards developed such as H.261, H.262/MPEG-2,
H.263, H.264/MPEG-4, and H.265/HEVC. The main issue is how to transform a
compressed video to another compressed video. Scalable video coding (SVC) is a coding
paradigm that allows once-encoded video content to be used in diverse bandwidth and
devices with rate control (Binh & Yang, 2016; Yang & Binh, 2016). Video transcoding is
the process that converts a video file from a compressed bitstream to another compressed
bitstream in order to support displaying videos on various platforms and devices.
TP CH KHOA HC I HC LT [CHUYN SAN KHOA HC T NHIN V CNG NGH] 3
Several video transcoding methods have been proposed to the HEVC and previous
standards. In this paper, the video transcoding methods are introduced from the previous
standards to HEVC which shows the effectiveness of the methods.
There are four sections in this paper. The next section is the video transcoding
architectures. Some video transcoding methods in HEVC are introduced in Section 3.
Section 4 provided conclusions of this paper.
Video transcoding process typically consists of two phases. First, the source
compressed video stream is transferred to an intermediate uncompressed format. Then, the
decoded data will be re-compressed and transmitted to the new devices in the desired
format or bitrate. The former phase is called decoding and the later part of the process is
also known as re-encoding.
As mentioned in the above simple case, the video transcoder follows the general
model which consists of a decoder-encoder cascade. In this architecture, transcoding is
performed by fully decoding the incoming video, up to the pixel domain and then re-
encoding the decoded video into the target video stream (Ahmad et al., 2005; Moiron et al.,
2009). The cascade architecture is depicted in more details in Figure 3.
As shown in the figure, the source bitstream is firstly decoded by a Variable Length
Decoder (VLD). The decoded DCT coefficients are then inversely quantised and converted
into Inverse Discrete Coefficient Cosine Transform (IDCT). After being transformed, the
decoded pixels represent the residual information which will be added to the reference
frame after being passed through the Motion Compensation (MC) process. Finally, the
result uncompressed video frame is the re-encoded with a new set of coding parameters.
This architecture is not only simple but also flexible in a sense that it allows changing
video format without significantly reducing the image quality (Moiron et al., 2009).
Therefore, it is usually used to compare the performance with other architectures.
First, the source bitstream is given to the entropy decoder to extract the quantised
DCT coefficients and the other information. Second, these DCT coefficients are inverse
quantised and then passed through the encoder. Third, in the encoder, a larger quantisation
step size Q T which is defined by the smaller target bitrate, will be used to re-quantise the
coefficients before sending them to the Variable Length Coder (VLC). This re-quantisation
step drives to more zero coefficients which also results in the lower number of bits needed
to re-encode them. The alternative method was proposed (Moiron et al., 2009) to remove
the high order DCT coefficients.
One of the main drawbacks of the cascade architecture is that it does not reuse the
information retrieved from the decoder component such as motion information, mode
decision, and quantisation step size. Thus, the video encoder must encode by itself, though,
motion estimation process accounts for 60%-70% of the encoder computation (Ahmad et
al., 2005). The spatial domain transcoder architecture (SDTA) is based on the cascade
decoder-encoder but extended with a converter component between the decoder and the
encoder, as depicted in Figure 6 (Moiron et al., 2009). This new component has two
functional blocks: the SDR block to perform the spatial domain resolution reduction; and
the MVCR block to convert the motion vectors. New motion vectors are computed through
a mapping function such that input motion vectors are reused in the down-converted signal.
Figure 8 shows the generic architecture for the heterogeneous transcoding in which
the compressed bitstream source is fully decompressed into the pixel domain before
changing frame type, frame rate, resolution, etc. The MVCR function block provides the
ability to reuse the motion information and picture type in the encoder side. The extracted
motion vectors from the decoder are post-processed according to output encoding structure
and might be scaled down to fit the lower spatial-temporal resolution encoder if required.
Besides, the extracted motion vectors can be refined to improve the encoding efficiency.
The decoded pictures are passed through the STR block to down-sample spatial or
temporal where downsized images are encoded with the new motion vectors. Due to the
spatial- temporal subsampling and different encoding formats of the output video, the
decoder and encoder motion compensation tasks in the heterogeneous transcoder are more
complex (Moiron et al., 2009).
3.1. Trans-rating
TP CH KHOA HC I HC LT [CHUYN SAN KHOA HC T NHIN V CNG NGH] 9
Ideally, the quality of the reduced bit-rate stream should be as same as the quality
of the source bitstream that is directly encoded with the reduced bit-rate (Vetro et al.,
2003). To achieve this, the most straightforward approach is to design a transcoder which
follows the cascade architecture. Accordingly, the source video bitstream is fully decoded
and then re-encoded the reconstructed signal at a new rate. However, this method has a
high complexity. To overcome this drawback while still holding on acceptable quality,
simplified architectures such as open- loop and close- loop are considered. This section will
review some of the recent researches on trans-rating the HEVC videos.
Pham et al. (2013) proposed the fast trans-rating scheme, based on the early
prediction for the partition split- flags in P pictures. The proposed method to investigate
the correlation between coding information of the input CUs encoded at high quality and
coding information of the co-located output CUs at a reduced quality to form the decision
trees. These trees had been designed to reduce the trans-rating complexity by using the
machine learning techniques. The model is then used to predict the split-flag and gives the
associated prediction accuracy so that the split process in the transcoder is optimized. The
accuracy of the prediction is defined as the ratio of the correct to the incorrect instances
which were predicted by the decision trees. At each partition depth, the RD process for a
CU needs to be evaluated either only for the full size, or only for the sub-partitions, or
both. Table 1 shows the combinations of the predicted split- flag and the accuracy to
control the RD evaluation process. Accordingly, the results from the RD evaluation at
depth d and/or d+1 were used to decide whether or not the recursive splitting process
terminates. As the result, this reduced the RD evaluations for the partitioning process.
10 T P CH KHOA HC I HC LT [CHUYN SAN KHOA HC T NHIN V CNG NGH]
Experimental result shown that the proposed trans-rating scheme reduced the
complexity of the trans-rating process by over 76% but maintaining the coding efficiency
of a cascaded decoder-encoder (Pham et al., 2013). Figure 9 illustrates the comparison of
RD performance obtained by the new model with the unmodified HEVC cascaded
decoder-encoder and the trivial method.
Pham et al. (2014) introduced another method for reducing the complexity of
transrating by optimizing the TZSearch algorithm. By default, the TZSearch applies a
diamond-shape search pattern and d refinement step to obtain the optimal integer motion
vector. The diamond search process starts at the initial search point and the output is then
refined. However, this algorithm is characterized by a fixed search area and search pattern.
Moreover, the authors observation indicated that the processing time for the diamond
search was 18% of the total encoding time (Pham et al., 2014). To address this problem,
the correlation of the input and output motion vectors was employed to reduce the
complexity of the TZSearch algorithm in the HEVC encoder.
The proposed approach consists of three steps. First, the initial search point is
adaptively selected from a set instead of using a fixed base motion vector. Second, based
on the cost of this starting point, the search range is classified into a large area or a smaller
TP CH KHOA HC I HC LT [CHUYN SAN KHOA HC T NHIN V CNG NGH] 11
area using an online trained Bayes decision rule. Finally, the integer motion vector is
searched by one of the two fast search patterns depending on the search range size. This
control flow is depicted in Figure 10. By applying the new TZSearch algorithm, the
complexity of trans-rating was reduced by 16.4% with a slight bitrate increase (2.54%)
compared to the cascaded transcoder (Pham et al., 2014).
Figure 10. Flowchart of the improved fast TZSearch algorithm for trans-rating
One of the main existing problems of trans-rating for HEVC is the high
computational complexity in the encoder part of the cascaded pixel domain transcoder
(Pham et al., 2016).. To overcome this obstacle, Pham et al. (2016) proposed various
techniques and derived an optimal strategy to reduce the transcoding complexity of the
closed- loop trans-rating for HEVC in both CU and PU optimization levels with a
complexity-scalable scheme. At the CU level, CUs were evaluated by two ways which are
top-to-bottom (T2B) and bottom-to-top (B2T) - with two fast techniques for each. For the
T2B, the CUs evaluation is performed from the lower depths to the higher depths. The first
technique was to re- use the CU structure from the input bitstream and modified the
structure by evaluating CUs at the lower depths. The second one is the machine learning
12 T P CH KHOA HC I HC LT [CHUYN SAN KHOA HC T NHIN V CNG NGH]
based method which was applied to exploit the correlation between coding information for
the input CUs and coding information of the co- located output CUs. For the B2T, the flow
of evaluating CUs moves downwards from the higher depths to the lower depths. Both
methods were used in this way, utilized the coding information of CUs from the incoming
bitstream and splitting neighbouring CUs. At the PU level, the PU candidates were
adaptively selected based on the probabilities of the CU size and the co- located input of
PU partitioning.
Figure 11. RD performance for trans-rating with a QP=6 using CBR scheme and the
LP configuration
Note: (a) Johnny sequence (b) BQSquare sequence. (c) PartyScene sequence. (d) Basketball Drive sequence.
TP CH KHOA HC I HC LT [CHUYN SAN KHOA HC T NHIN V CNG NGH] 13
3.2. Trans-sizing
In many applications such as video browsing, video conferencing, and so on. The
video is required to be transcoded at lower bit rates and resolutions. This task is known as
video trans-sizing or video downscaling. The objective is to reduce frame size in order to
make video content meet the requirements of different communication links and target
devices. In addition, heterogeneous networks, such as Internet, consist of many end
terminals with low displaying capability and limited processing power. Therefore,
transcoding the high quality video streams to lower resolution through downscaling is
highly needed. A straightforward approach to perform this operation is to decode each
frame of the input video, downscale each of them in pixel domain and re-encode to
generate output video sequence. However, this method is very time consuming and high
computational complexity because in DCT/IDCT operations, full search motion estimation
has to be performed to obtain motion vectors during re-encoding of downsized video. It
makes this approach ineffective for real time applications.
On the other hand, although many approaches have been proposed for the
resolution reduction transcoding, most of works targeted at the existing video coding
standards, such as H.263 and H2.64/AVC. However, directly applying these methods for
new HEVC standard may result in inefficiency and much higher complexity due to the
difference and complex of new coding features of HEVC. Moreover, in HEVC, using
complex quad-tree depth structure to reduce the bit rate of high resolution video contents
by utilizing large size CUs leads to heavy computational overhead due to the recursion
operation of searching the optimal CU size and partition mode (Minyong, Minwoo,
Minsik, & Won, 2014). In this section, some recent and effective downsizing techniques to
overcome these drawbacks are explained.
decoding process was introduced by Minyong et al. In this method, the original quad-tree
information obtained from the decoding process was transformed, in consideration of the
scaling factor, for the downscaled frames. Figure 12 shows how the depth information of
each CU is extracted, transformed and stored with the absolute position of the CUs in each
LCU. During the encoding process, by only searching the optimal depths in the
reconstructed quad-tree, the encoder could be accelerated without losing the video quality.
For further optimization, an adaptive search range decision according to the picture order
count and reference relationship is also implemented. For each frame, a search range that
covers several depths is defined and used to perform the prediction search. The
experimental results show that the maximum encoding speedup of 2.18 with only 0.3%
BD-rate increase in the best case as compared to HEVC reference encoder (Minyong et al.,
2014).
Second, in 2015, Nguyen et al. proposed another method to handle one of the most
computationally intensive processes in HEVC: the CU size selection and thus, reduce the
complexity of the transcoder. The proposed technique used the temporal correlation of the
depth levels among CUs and the continuity of the motion vector field in the pre-coded
video to derive the most probable CU depth ranges and avoid the unnecessary exhaustive
CU size search at certain depth levels (Nguyen & Do, 2015). Based on the depth
correlation analysis, an early split CU strategy was proposed for the case of the current
depth level is less than the temporally predicted minimum depth and to avoid unnecessary
intensive mode decision at that level. To get more complexity degradation, the information
in the pre-coded video was employed to make a decision whether the current CU should be
pruned or not. With the result of examining the relation between the estimated and optimal
depth levels, the authors suggested to use the estimated depth level obtained from the sub-
optimal CU quad-tree as the predicted maximum depth level in each CU for an early tree
pruning. Experimental results show that the new method is effective as compared with the
existing ones. Particularly, it reduced the overall transcoding time by about 41% with only
negligible coding performance degradation.
Third, with the similar idea for reducing transcoding complexity by early
termination of CU splitting, Pham et al. presented a machine learning approach for
arbitrary downsizing of pre-encoded video in HEVC. Figure 13 shows the proposed
downsizing architecture (Pham, Praeter, Wallendael, Cock, & Walle, 2015). The coding
information was extracted during decoding input video and the downsizing scaling factor
TP CH KHOA HC I HC LT [CHUYN SAN KHOA HC T NHIN V CNG NGH] 15
was chosen based on the network bandwidth constrains or the target device screen
resolution. The decoded video is then downscaled by this scaling factor. The prediction
models utilized the extracted coding information to predict the coding mode of the CUs in
the output stream. Finally, the downsized video was re-encoded using this predicted
information.
The Random Forests algorithm was used to build three prediction models which
were used to predict the splitting behaviour of CUs at three depth levels: 0, 1, and 2.
Particularly, the machine learning technique examined the correlation between input and
output coding information to predict the split- flag of coding units in a P-frame. The
transcoding complexity was controlled by threshold for the confidence of prediction. If the
confidence is larger than the threshold, the predicted split- flag would decide the splitting
behaviour of CUs. On the contrary, the CU was fully evaluated for both split or not split.
The experimental results indicated that with threshold varying from 0.9 to 0.5, the
proposed techniques can achieve a time saving from 20% to 70% with slightly bit rate
reduction. In addition, by adjusting the threshold value, the complexity can be controlled
and a good trade-off between complexity and coding performance can be achieved (Pham,
Praeter, et al., 2015).
Finally, in (Pham, Johan, Glenn, Jan, & Rik, 2015), Pham et al. made a
performance analysis of various machine learning strategies used for downsizing the pre-
encoded in HEVC to determine the optimal strategy. Each version of different algorithms
was evaluated with both online and offline training strategies. Besides, the benefits of
content-adaptive feature selection was examined. The research concluded that machine
learning algorithms should only be used when optimized. Otherwise, a trivial method may
outperform. The experimental results also shown that, among the investigated algorithms,
the random forests resulted in the best transcoding performance by reducing 70%
complexity on average with a bit rate increase of 5.4% (Pham, Johan, et al., 2015).
16 T P CH KHOA HC I HC LT [CHUYN SAN KHOA HC T NHIN V CNG NGH]
Figure 14 shown the flow chart of this method. During the training process, the
transcoder is running in re-encoding mode, in which the MPEG-2 video is fully decoded
and fully re-encoded using HEVC. Once the model weights are generated, a switch
between re-encoding and transcoding takes place. In this case, the transcoder can switch to
the re-encoding mode, during which the required MPEG-2 and HEVC coding information
is gathered and used to regenerate the model.
In the experiments, six video test sequences are used, namely, BasketballDrill (832
480, 50 Hz), PartyScene (832 480, 50 Hz), BQMall (832 480, 60 Hz), RaceHorses
(832 480, 30 Hz), Vidyo1 (1280 720, 60 Hz), and BasketballDrive (1920 1080, 50
Hz). Clearly, both the training and testing times of the proposed transcoder are taken into
account in the experimental results to follow Figure 15.
Figure 15. Rate-distortion curves for the proposed transcoding solution versus
MPEG-2 encoding and full HEVC re-encoding
Note: (a) BasketballDrill sequence; (b) PartyScense sequence; (c) BQMall sequence; (d) RaceHorses
sequence; (e) Vidyol sequence; (f) BasketballDrive sequence
Zheng, Shi, Zhang, and Gao (2014) proposed a fast transcoding structure. It mainly
focuses on coding unit (CU) and prediction unit (PU) decision by using blocks
18 T P CH KHOA HC I HC LT [CHUYN SAN KHOA HC T NHIN V CNG NGH]
According to the method of Zheng et al. (2014), the fast PU mode decision
algorithms by using RHIs are applied on CU depth 0 and 1. For the CU depth 2 and 3, a
conventional mode mapping method is employed. Meanwhile, an early CU partition
termination scheme is also discussed to further speed up the HEVC transcoder. The
proposed method is presented in Figure 16.
The results shows the proposed algorithm can reduce 56.7% of the total encoding
time on average, with only 0.072dB BD-PSNR degradation or 2.172% BD-BR increase,
which can be neglected. For sequence BaskballDrive, both algorithms loss more bitrates
than others. This is mainly because the particular sequences have rapid movements, the
partition is difficult to be predicted more details of RD performance can be refer to Figure
17.
In the proposed method, C S and C N are the two categories to be predicted by our
decision function or classifier. If the chosen decision is C S, we need to take into account
some considerations. On the other hand, if the decision is CN, then the current depth is
considered as final and all PUs at this CU depth are evaluated and the algorithm for this
20 T P CH KHOA HC I HC LT [CHUYN SAN KHOA HC T NHIN V CNG NGH]
Correa et al. (2016) proposed a fast transcoder based on an extensive data mining
process on H.264/AVC decoding attributes. Experimental results have shown an average
reduction of 44% in the transcoding time, with a small bit rate increase of 1.67%. In t his
method, the fast transcoder uses the most relevant attributes identified in the previous
section, obtained from the H.264/AVC decoding process, to fasten the HEVC encoding
process, as shown in Figure 19.
As shown in Figure 20, the encoder receives the attributes and the video sequence,
calculates the decision tree outcome for each CU and applies such decision to the HEVC
encoding flow. Figure 20 shows the flowchart of the transcoding algorithm at the encoder
side. For each CU larger than 88, the attributes corresponding to that image region in the
H.264-encoded video are retrieved from the decoder. At this point, the attributes for 3232
and 6464 CUs are calculated by summing up the values yielded for each MB within the
CU region. For 1616 CUs, the attributes are directly obtained without further processing.
After that, the attributes are applied to the corresponding decision tree for the specific CU
size. If the decision tree outcome is Split, the CU is partitioned into four sub-CUs and the
whole process is recursively repeated for each sub-CU. Otherwise, the encoder decides the
best partitioning mode and finishes the CU encoding process.
Experimental results show that the above proposed transcoder achieves an average
CCR (compression efficiency and computational complexity reduction) of 44%, with a
BD-rate increase of only 1.67% in comparison to the trivial transcoder. The largest CCR is
noticed for the SlideEditing video, which is the only screen content sequence. As this
sequence is composed of large flat areas, a larger number of 6464 CUs are encoded and
the fast transcoder avoids testing smaller CUs more often.
4. CONCLUSION
provide the highest visual quality with more complexity, while the FDTA based
architectures provide a bit lower visual quality with lower complexity. Consequently,
transcoding applications can be designed in combined cases. Video compression
algorithms used in the standardizing H.265/HEVC are very different from that of in the
previous traditional video compression standards. To obtain inter-compatibility between
H.265/HEVC and other standards, H.265/HEVC related transcoding would become a more
challenge issue in the future research of video transcoding.
REFERENCES
Ahmad, I., Xiaohui, W., Yu, S., & Ya-Qin, Z. (2005). Video transcoding: an overview of
various techniques and research issues. IEEE transactions on multimedia, 7(5),
793-804. doi: 10.1109/TMM.2005.854472.
Antonio, J. D.-H., Jose, L. M., Pedro, C., Jose, A. G., & Jose, M. P. (2015). Adaptive Fast
Quadtree Level Decision Algorithm for H.264 to HEVC Video Transcoding. IEEE
Transactions on Circuits and Systems for Video technology, 26(1), 154-168. doi:
10.1109/TCSVT.2015.2473299.
Binh, V.P., & Yang, S.-H. (2016). Complexity-aware frame-level bit allocation and rate
control for H.264 scalable video coding, Journal of Information Science and
Engineering, 32(2), 329-347.
Correa, G., Agostini, L., & Cruz, L. A. d. S. (2016, 22-25 May 2016). Fast H.264/AVC to
HEVC transcoder based on data mining and decision trees. Paper presented at the
2016 IEEE International Symposium on Circuits and Systems (ISCAS).
Gary, J. S., Jens, R. O., Woo, J. H., & Thomas, W. (2012). Overview of the High Efficiency
Video Coding (HEVC) Standard. IEEE Transactions on Circuits and Systems for
Video technology, 22(12), 1649-1668.
Minyong, S., Minwoo, K., Minsik, K., & Won, W. R. (2014). Accelerating HEVC
Transcoder by Exploiting Decoded Quadtree. Paper presented at the IEEE ISCE
2014, Jeju, Korea.
Moiron, S., Ghanbari, M., Assuno, P., & Faria, S. (2009). Video Transcoding
Techniques. In M. Grgic, K. Delac & M. Ghanbari (Eds.), Recent Advances in
Multimedia Signal Processing and Communications (Vol. 231, pp. pp 245-270):
Springer Berlin Heidelberg.
Nguyen, V. A., & Do, M. N. (2015). Efficient coding unit size selection for HEVC
downsizing transcoding. Paper presented at the ISCAS, Lisbon, Portugal.
Peixoto, E., Macchiavello, B., Queiroz, R. L. d., & Hung, E. M. (2014, 17-20 Aug. 2014).
Fast H.264/AVC to HEVC transcoding based on machine learning. Paper presented
at the 2014 International Telecommunications Symposium (ITS).
Pham, V. L., Cock, J. D., Diaz-Honrubia, A. J., Wallendael, G. V., Leuven, S. V., &
Walle, R. V. d. (2014, 27-30 Oct. 2014). Fast motion estimation for closed-loop
HEVC transrating. Paper presented at the 2014 IEEE International Conference on
Image Processing (ICIP).
TP CH KHOA HC I HC LT [CHUYN SAN KHOA HC T NHIN V CNG NGH] 23
Pham, V. L., Cock, J. D., Wallendael, G. V., Leuven, S. V., Rodriguez-S, R., x00E, . . .
Walle, R. V. d. (2013, 15-18 Sept. 2013). Fast transrating for high efficiency video
coding based on machine learning. Paper presented at the 2013 IEEE International
Conference on Image Processing.
Pham, V. L., Johan, D. P., Glenn, V. W., Jan, D. C., & Rik, V. d. W. (2015). Performance
Analysis of Machine Learning for Arbitrary Downsizing of Pre-Encoded HEVC
Video. IEEE Transactions on Consumer Electronics, 61(4), 507-515.
Pham, V. L., Praeter, J. D., Wallendael, G. V., Cock, J. D., & Walle, R. V. d. (2015, 9-12
Jan. 2015). Machine learning for arbitrary downsizing of pre-encoded video in
HEVC. Paper presented at the 2015 IEEE International Conference on Consumer
Electronics (ICCE).
Pham, V. L., Praeter, J. D., Wallendael, G. V., Leuven, S. V., Cock, J. D., & Walle, R. V.
d. (2016). Efficient Bit Rate Transcoding for High Efficiency Video Coding. IEEE
transactions on multimedia, 18(3), 364-378. doi: 10.1109/TMM.2015.2512231.
Pourazad, M. T., Doutre, C., Azimi, M., & Nasiopoulos, P. (2012). HEVC: The New Gold
Standard for Video Compression: How Does HEVC Compare with H.264/AVC?
IEEE Consumer Electronics Magazine, 1(3), 36-46. doi:
10.1109/MCE.2012.2192754.
Shanableh, T., Peixoto, E., & Izquierdo, E. (2013). MPEG-2 to HEVC Video Transcoding
With Content-Based Modeling. IEEE Transactions on Circuits and Systems for
Video technology, 23(7), 1191-1196. doi: 10.1109/TCSVT.2013.2241352.
Vetro, A., Christopoulos, C., & Sun, H. (2003). Video Transcoding Architectures and
Techniques: An Overview. IEEE Signal Processing Magazine, 20, pp 18-29.
Yang, S.-H., & Binh, V. P (2016). Adaptive bit allocation for consistent video quality in
scalable high-efficiency video coding, IEEE Transactions on Circuits and Systems
for Video Technology, PP(99), (Early Access).
Zheng, F., Shi, Z., Zhang, X., & Gao, Z. (2014, 7-9 July 2014). Fast H.264/AVC to HEVC
transcoding based on residual homogeneity. Paper presented at the 2014
International Conference on Audio, Language and Image Processing (ICALIP).
24 T P CH KHOA HC I HC LT [CHUYN SAN KHOA HC T NHIN V CNG NGH]
Nguy n Vn Phuca , Thai Duy Quyb, Vo Phng Bin ha , Phan Thi Thanh Ngaa ,
Nguy n Thi Huy n Tranga
a
Khoa Cng ngh thng tin, ai hoc a Lat , Lm ng, Vi t Nam
b
Pho ng Qua n ly khoa hoc -Hp ta c qu c t , ai hoc a Lat , Lm ng, Vi t Nam
Tom tt
Video chu n mi nht H.265/High Efficiency Video Coding (HEVC), c hon thin v
tr thnh cng ngh tiu chun cho vi c nn video hi n nay. HEVC s du n g ca c k thut quan
trng khc nhau t c hiu qu m ha cao hn nhiu so vi ca c chu n nen trc o
nh H.264/AVC (Advanced Video Coding), H.263, H.262 / MPEG-2, v H.261. Tuy nhin,
c coi l k thut chin h m ha video tra i qua ha ng thp k, H.264/AVC thng tr th
trng video k thut s v c mt s lng ln cc ni dung c m ha vi H.264 v tiu
chun c khc. Do , vi c chuyn m hiu qu t tiu chun trc sang chu n HEVC v
c bn r t cn thit. Hn na, s ph bin ngy cng tng ca cc ng dng video di ng
dn n mt nhu cu ngy cng tng cho vi c chuy n ma ti l bit v phn gia i. Trong bi
bo ny, cc kin trc chuyn m c bn c xem xt chi ti t. Bn cnh , xem xet lai ca c k
thut gn y nht c s dng gim ti l bit c m ha hoc chuyn i phn gia i.
Cui cng, chuyn m cc tiu chun trc sang chu n HEVC cng c tho lun.