06415009

C. E. Rhee et al.
: A Survey of Fast Mode Decision Algorithms for Inter-Prediction and Their Applications to High Efficiency Video Coding
1375
A Survey of Fast Mode Decision Algorithms for Inter-Prediction and Their Applications to High Efficiency Video Coding
Chae Eun Rhee, Kyujoong Lee, Tae Sung Kim and Hyuk-Jae Lee
Abstract The emerging High Efficiency Video Coding
(HEVC) standard attempts to improve the coding efficiency by a factor of two over H.264/AVC using new compression tools with high computational complexity. The increased computational complexity makes the real-time execution with reasonable computing power become one of the critical concerns for the commercialization of HEVC. A large number of prediction modes are the main causes of the increased complexity of HEVC. Thus, a fast decision of a prediction mode needs to be effectively used to reduce the computational complexity. To take advantage of large amounts of previous works and to find a guide for application to HEVC, this paper presents a survey of these efforts for the previous standards, especially for H.264/AVC, and examines the possibility of the previous algorithms to be applicable for HEVC. To this end, previous algorithms are categorized and then the effectiveness of each category for HEVC is evaluated. For this evaluation, a previous algorithm is modified for HEVC when it is not applicable to HEVC directly. Simulation results show that most previous algorithms with slight modification, in general, improve the encoding speed with a relatively small degradation of the compression efficiency. Among them, hierarchical mode decision is especially effective whereas mode pre-decision using motion or spatial homogeneity often results in inaccurate results. 1
Index Terms Fast inter-prediction, Mode decision, Hardware encoder, HEVC, H.264/AVC.
I. INTRODUCTION Video compression technologies as well as video applications such as video conferencing, streaming, video storage and communication have attracted industry attention due to the increasing popular demand for high-definition (HD) video content. H.264/AVC [1] has been regarded as the state-of-the-art video coding standard and widely used. Recently, the next-generation video coding standard [2]-[4] known as High Efficiency Video Coding (HEVC) has been developed by ISO/IEC MPEG and ITUT VCEG. In the emerging HEVC standard, several new features
This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MEST) (No. 2012R1A2A2A06047297). Chae Eun Rhee, Kyujoong Lee, Tae Sung Kim and Hyuk-Jae Lee are with the Inter-university Semiconductor Research Center (ISRC), Department of Electrical Engineering and Computer Science, Seoul National University, Seoul, Korea (e-mail: chae@capp.snu.ac.kr, kjlee@capp.snu.ac.kr, tskim@capp.snu.ac.kr, hyuk_jae_lee@capp.snu.ac.kr). Contributed Paper Manuscript received 10/15/12 Current version published 12/28/12 Electronic version published 12/28/12.
1
are introduced, including a flexible block structure, the increased intra-coding directions, sophisticated interpolation filters, various in-loop filters, and enhanced entropy coding schemes. The HEVC standard aims at bitrate saving by a factor of two over H.264/AVC at the expense of an increase in computational complexity. Like H.264/AVC, mode decisions with motion estimation (ME) remain among the most time-consuming computations in HEVC. In an inter-prediction mode decision, a full-search algorithm searches for every possible block size and refines the results from integer-pel to quarter-pel resolution. Thus, a full-search algorithm guarantees the highest level of compression performance. However, the considerable computational complexity for a mode decision is critical for the encoding speed. Moreover, the main target resolution of HEVC is full HD (19201080) and beyond. Therefore, fast inter-prediction is not only an important challenge but also an urgent problem to be solved for HEVC compression to be used in real-time consumer electronic devices. Extensive research effort has been conducted to reduce the computational complexity for inter-prediction for H.264/AVC, pursuing an effective trade-off between the rate-distortion (RD) drop and the speed-up. In order to deal with the similar challenge for HEVC, this paper reviews principal algorithms which have already been attempted for H.264/AVC. A survey of these various algorithms and an evaluation of their contributions and limitations provide valuable leads for the development of fast algorithms for HEVC inter-prediction. Major differences between H.264/AVC and HEVC are also investigated from an algorithmic and architectural perspective. Previous algorithms for the fast H.264/AVC inter-predictions are then modified and re-designed for HEVC inter-predictions so as to explore the possibilities for application to HEVC. The rest of the paper is organized as follows. Section II gives an overview of inter-prediction in HEVC. Previous approaches for fast inter-predictions in H.264/AVC are surveyed in Section III, and the application of the fast inter-mode selection algorithms to HEVC is presented in Section IV. Conclusions are given in Section V. II. OVERVIEW OF INTER-PREDICTION IN HEVC A. Inter-Prediction Algorithm in HEVC To achieve high compression performance for highresolution videos, HEVC defines the coding unit (CU) as the basic processing unit instead of the macroblock (MB).
0098 3063/12/$20.00 2012 IEEE
1376
IEEE Transactions on Consumer Electronics, Vol. 58, No. 4, November 2012
Unlike an MB of which size is fixed as 1616 pixels, the size of a CU is not fixed, varying from 88 to 6464. A large CU reduces the motion information data. Thus, the compression efficiency is improved in a lossless manner, especially for high-resolution videos. A CU can be partitioned into smaller CUs and the structure among different CUs is represented by a quad-tree. The depth of this tree can be as large as four. The largest CU in depth 0 is denoted as LCU. For a CU of which size is denoted by 2N2N, predictions are performed for various block sizes of 2N2N, 2NN, N2N and NN. The processing unit for prediction is called prediction unit (PU). The HM5.0 reference software offers fast algorithms to speed up the prediction time by an early decision of the final prediction mode with the evaluation of only subsets of prediction modes. One of the fast algorithms is the early SKIP mode decision in which the computation for the SKIP mode is performed first for a 2N2N PU. If the RD cost is less than the average SKIP costs, as accumulated from the previous SKIP modes, not only the prediction of the other PU types at the same depth but also all predictions for the further depths are omitted. Another fast algorithms are the early CU determination and the coded block flag (CBF)-based fast mode decision, denoted as ECU and CFM, respectively. The CBF represents blocks with a zero residual. In the ECU, the RD costs for the SKIP, 2N2N, 2NN and N2N inter-modes as well as the intra-mode are calculated at the current depth. If the SKIP mode cost is the smallest, predictions for CUs smaller than the current CU are not performed. Meanwhile, the CFM is used to select the PU size in the current CU and to save computation power for predictions of less-probable PU sizes. The predictions for the SKIP, 2N2N, 2NN and N2N PUs are processed in sequence. If the CBF of the current PU happens to be all zeros, the prediction is terminated and the computation for the remaining PU sizes is saved, as a zero CBF indicates that the RD performance is adequate when the current PU is determined to be the best mode. Even if the current PU is different from the best PU, the difference in the RD cost between the current PU and the best PU may be negligible. B. Hardware Implementation for an HEVC Encoder This subsection examines the impact of the HEVC coding structure on the hardware implementation. In this paper, it is assumed that the pipeline architecture for the HEVC hardware encoder may be similar to the widely used architectures for H.264/AVC encoders [5]-[9] where the integer motion estimation (IME) is performed in stage 1, whereas the FME with the MC is performed in stage 2. As the hardware encoder takes advantage of parallel and/or pipelined execution of multiple hardware resources, the dependence between computations in the HEVC standard often causes an unexpected slow-down. To support the parallel execution of IMEs for all block-sizes in H.264/AVC, the sum of absolute differences (SADs) for all 44 blocks of an MB is calculated simultaneously. The
obtained SAD values are combined in the variable-block-size (VBS) adder tree and 41 SADs for all block sizes are generated in one cycle. The problem is that the rate term of the RD cost function can be computed only after the motion vectors (MVs) of the neighboring blocks are determined, which causes dependence among IMEs for various size blocks. In addition, when IME and FME are processed not serially but in a pipelined manner, the left MB is still in the FME stage. Thus, the best mode and the best MV of the left block are not available. In the H.264/AVC, the modified MV predictor (MVP) is applied for all 41 blocks [8]. Instead of the median value of MVs on the left, upper and upper-right blocks, the median value of MVs on the upper-left, upper and upper-right MBs are used for all 41 blocks equally in order to facilitate the parallel processing and the MB pipelining. The solution for parallel IME executions for H.264/AVC is able to be applied for an HEVC encoder. The MVP of 2N2N PU in LCU is used for all blocks equally. When this MVP is derived, the left and below-left candidates among the spatial MVP candidates are excluded. With a modification of the MVP derivation, the IME execution in HEVC now has large parallelism. Moreover, the parallelism in IME execution for HEVC is larger than that for H.264/AVC. In H.264/AVC, the parallel execution of IME is done for 1616 MB and subblocks, whereas 6464 LCU and all blocks smaller than LCU can have their IMEs processed in parallel in HEVC. When the same search range and the search scheme are used for H.264/AVC and HEVC, a 1616 MB and a 6464 LCU are expected to have an identical IME time. For FME, it is more difficult to exploit available parallelism than for IME because the two-step FMEs for half- and quarter-pixel precisions should be performed sequentially. Furthermore, the modified MVP or the mode reduction decreases the compression efficiency more seriously than IME fast computation algorithms, in general. Besides, recent study on H.264/AVC hardware encoders [10]-[16] also shows that the speed-up of the execution time is easier for IME than FME. Thus, in H.264/AVC, FME is usually conducted one by one for 41 blocks in a 1616 MB. As a result, the execution time for FME with the MC is most likely to be larger than that for IME. Even though the additional hardware resource is used for a parallel FME execution [12], the encoding time for an LCU is most likely determined by the time for the FME followed by the MC and mode decision. III. FAST INTER-MODE SELECTION ALGORITHMS FOR H.264/AVC In this section, fast inter-mode selection algorithms proposed for H.264/AVC are surveyed. An effective preselection of prediction block sizes is crucial for fast encoding. Reduction of the prediction block sizes often requires an RD drop. An effective trade-off between the RD drop and the speed-up has been one of the main research subjects to be tackled. The previous algorithms are categorized according to the decision stage and criteria, as shown in the classification tree in Fig. 1.
C. E. Rhee et al.: A Survey of Fast Mode Decision Algorithms for Inter-Prediction and Their Applications to High Efficiency Video Coding
1377
Neighboring (spatial and temporal) information ME pre-decision Motion characteristics of the current MB Spatial characteristics of the current MB Remaining prediction according to the prior prediction result Further prediction by comparing the prior predictions Mode pre-decision based on the rate-distortion cost from IME Reduction of FME calculation from the reuse of integer-pel MVs
Fast inter-mode selection
Hierarchical decision
FME pre-decision
Fig. 1. Classification tree of fast inter-predictions for H.264/AVC
There are roughly three categories of algorithms for fast inter-mode prediction. In the first category, candidate block sizes are determined prior to ME and prediction operations including ME are performed for only the selected candidate block sizes. This category is further classified into three subcategories. In the first sub-category, spatial and/or temporal correlation in a video is widely used to select candidates and the degree of correlation is obtained from neighboring information. For instance, if an MB is surrounded by neighboring MBs coded as the DIRECT or SKIP mode, the video sequences are assumed to be changing smoothly and the motion is similar to that in the neighboring area. In this case, the current MB is very likely to be coded in the DIRECT or SKIP mode or with a large block size such as 1616 [17]-[22]. In a similar manner, various algorithms [23]-[27] search for the best block size based on spatial and temporal homogeneity investigations of the neighboring blocks. The algorithms in the second sub-category take advantage of the correlation between the motion homogeneity and the best block size. Natural video sequences include stationary or motionless regions for which the optimal block sizes are mostly large. Thus, the MVs of spatially and temporally adjacent MBs are used to classify the motion characteristics [28], whereas the absolute difference between consecutive frames is used to detect motion homogeneity [26][29][30]. Liu et al [33] estimate the motion homogeneity of the current MB by MVs, which are generated from ME on 44 blocks inside the current MB. In the algorithms in the third sub-category, candidate block sizes are predicted through spatial characteristics of the current MB. A frame-level edge map or a variance of the MB is estimated to detect a homogeneous region [23][31][32]. In other studies, the image is down-sampled and pre-encoded [34][35]. The candidate block sizes are obtained after comparing the estimated RD cost during pre-encoding. The algorithms of the second category explore the best block size in a hierarchical manner. In other words, certain block sizes are estimated prior to the other block sizes and the decision for a further block size search is then made using the result of the prediction of the prior block sizes [29][31][36]-
[44]. In the first sub-category, the result of the prior prediction is tested. The decision regarding a further prediction is determined based on the test result. One of the most popular algorithms is the early SKIP mode decision, where predictions of remaining block sizes are performed only when the early SKIP condition is not satisfied. Kannangara et al [43] make the early SKIP mode prediction by estimating a Lagrangian RD cost function which incorporates an adaptive model for the Lagrangian multiplier parameter based on local sequence statistics. Other studies [37][38][41][44] propose a simple threshold-based algorithm to detect zero-coefficients blocks. Zero-coefficients represent the small distortion in the RD cost function, and the SKIP mode decision is made early without the expensive computation of the real RD cost. Or, if the RD cost of the SKIP mode is less than the threshold, the SKIP mode is selected as the best mode [36]. Here, the threshold is defined as N bits the Lagrangian multiplier parameter, where N bits are equal to the minimum number of the bits required for the non-SKIP mode. In the second sub-category, the results of the prior predictions are compared and further prediction is determined according to the comparison result. In the algorithms of Yu and Chois studies [37][40], the RD costs of block sizes are compared in the order of large to small block sizes. If the current RD cost is larger than the RD cost of the larger block size, further searches for blocks smaller than the current block are stopped. In Yin and Lees studies [36][42], the RD costs of square blocks, 1616, 88 and 44, are tested first. If the tendency of these RD costs is not monotonic, all other nonsquare blocks need to be tested. Otherwise, only block sizes between the best two square block sizes are searched. In addition to the hierarchical decision approach, much research has proposed a hybrid solution which selects candidate prediction block sizes prior to ME using the information mentioned in the first category, after which prediction block sizes are searched in a hierarchical manner [18][31][37]. In the first sub-category of the third category, IME is performed for every block size. Next, the results of IME are used to select the candidate block sizes for FME. In the simplest approach, called mode pre-decision (MPD), the best combination of various block sizes (VBS) for an MB is selected with the IME results. FME simply refines the integerpel MV of the selected block size to the quarter-pel precision. MPD suffers from a significant RD drop because the best block size from the IME may change after refinement in the FME. To achieve a better trade-off between the compression efficiency and computational complexity, the advanced MPD (AMPD) is proposed [45]. In the AMPD, more than one candidate block size for the subsequent FME operations is selected. Seven partitions, four 88 partitions together with the 168, 816 and 1616 partitions, are sorted according to their IME cost. As a result, N (N = 1~7) partitions are selected for the FME. In AMPD2 [45], one candidate is selected from the 168, 816 and 1616 partitions, whereas two are selected from the 88 partitions. Similarly, two partitions are selected by mode filtering (MF) for the FME operation from the IME phase [10]. One is selected from the 168, 816 and 1616 partitions and the other is selected from the 88, 168,
1378
816 and 1616 partitions, where the 88 partition consists of the best sub-block sizes. In this MF algorithm, the number of selected block sizes is relatively low and larger block sizes are more frequently selected than AMPD2. In the second sub-category, computation-reuse techniques are adopted. Shao et al [46] propose that the FME for each block size is performed one by one. If the integer MV of the current block is identical to that of the block already processed, no FME computation needs to be performed. In particular, in the homogeneous region, adjacent blocks tend to have the same integer MV after IME. Therefore, this reusing technique reduces the calculation for the prediction of the block size with no RD drop. The same algorithm is applied to block sizes larger than an 88 block [47]. The FMEs for blocks smaller than 88 are omitted. Thus, the encoding time decreases more than that of Shaos algorithm with a reasonable PSNR drop. IV. APPLICATION OF FAST INTER-MODE SELECTION ALGORITHMS TO HEVC As explained in Section II, HEVC supports larger and more various block sizes than H.264/AVC. If the early decision is made to select the prediction block size, the computational
complexity is significantly reduced by omitting the remaining predictions. Recently, several fast inter-mode selection algorithms have been proposed for HEVC. However, it is important first to take advantage of the considerable amount of previous work and to find a guide for application to HEVC. In this paper, several previous algorithms proposed for H.264/AVC are modified and tested for HEVC. Algorithms which require an additional calculation to judge the texture characteristic or motion homogeneity, such as a frame-level edge map, are not used for simplification. The following algorithms are implemented in the HM5.0 reference software. A. Prediction Block Size Pre-Decision In Section III.A, the block size prediction algorithms are classified into three sub-categories that utilize spatial/temporal correlations, motion vector information and the spatial characteristics of the current MB, respectively. This subsection evaluates the effectiveness of the three types of algorithms when they are used for early block size decisions in HEVC. To this end, the relationship between the above information and depth of the current LCU is examined with experiments. Note that the depth of the LCU determines the block size in HEVC. Ten video sequences, Akiyo, Container,
TABLE I CORRELATION BETWEEN THE CURRENT LCU AND THE NEIGHBORING LCUS FOR 352288 TEST VIDEOS
Max(Neighboring MVs) QP Depth 0 1 20 2 3 0 1 32 2 3 Akiyo 0.23 (0.41) 0.51 (0.56) 0.70 (2.01) 0.43 (0.45) 1.21 (2.29) 1.00 (0.86) Container Foreman Sean Stefan Akiyo Variance of the current LCU (103) Container Foreman Sean Stefan Akiyo 1.84 (0.82) 2.85 (0.15) 2.93 (0.07) 1.29 (0.44) 2.14 (0.42) 2.70 (0.23) Max(Neighboring Depths) Container Foreman 2.49 (0.39) 2.78 (0.20) 2.89 (0.13) 1.60 (0.63) 2.33 (0.60) 2.82 (0.28) 2.81 (0.18) 2.80 (0.17) 2.97 (0.03) 2.00 (0.54) 2.29 (0.42) 2.72 (0.21) Sean 1.85 (0.76) 2.75 (0.31) 2.97 (0.03) 1.50 (0.59) 2.27 (0.63) 2.90 (0.09) Stefan 2.88 (0.13) 2.95 (0.05) 2.98 (0.03) 2.21 (0.55) 2.55 (0.25) 2.94 (0.06)
0.22 5.26 3.81 7.25 2.04 3.41 2.75 1.81 3.33 (0.38) (21.81) (370.40) (91.98) (6258.15)(10528.18)(7506.87) (736.33) (4891.66) 0.28 7.05 2.73 13.96 2.26 2.67 3.81 2.60 3.08 (0.34) (120.28) (222.79) (295.62) (3020.20) (7049.42)(11020.76) (1746.03) (3889.57) 0.49 6.71 10.99 16.76 2.93 1.96 2.65 2.30 2.56 (0.51) (145.70) (1151.85) (382.39) (4844.59) (4592.76) (6035.12) (1767.59) (2883.81) 0.37 9.12 5.42 11.89 2.51 4.26 3.93 1.97 3.19 (0.86) (1042.00) (547.51) (171.20) (6869.12) (9560.06)(12065.05) (806.67) (5411.69) 0.43 8.91 6.87 10.81 2.95 2.96 2.99 3.07 2.88 (0.36) (203.11) (644.77) (194.82) (3940.74) (3748.08) (6930.70) (1279.87) (2980.45) 0.67 5.49 6.74 10.84 2.70 1.18 2.17 2.65 2.72 (0.67) (55.16) (542.93) (189.52) (3864.05) (1387.00) (1105.39) (1885.03) (1729.40)
TABLE II CORRELATION BETWEEN THE CURRENT LCU AND THE NEIGHBORING LCUS FOR 19201080 TEST VIDEOS
Max(Neighboring MVs) QP Depth 0 1 20 2 3 0 1 32 2 3 BasketBall BQTerrace Cactus Kimono1 Aspen Drive 69.63 73.68 27.17 11.82 15.18 (14436.13)(40665.65)(3582.74) (15.78) (624.55) 44.90 75.85 40.38 37.86 18.93 (7211.57) (22441.74)(8694.43)(16047.76) (721.20) 33.70 62.34 20.82 12.94 20.59 (3316.01) (10757.60)(3023.15) (3732.76) (893.41) 31.21 60.76 11.81 8.14 24.33 (3590.61) (7221.25) (3039.67) (509.86) (1560.22) 16.71 36.81 5.01 1.79 13.59 (1353.52) (2521.04) (101.63) (283.58) (440.45) 31.24 64.82 4.31 7.98 19.02 (3348.42) (8081.78) (52.68) (426.45) (1108.55) 12.48 63.44 4.78 12.27 23.72 (75.76) (4560.53) (68.73) (856.59) (2576.14) 55.96 57.45 4.02 14.01 26.78 (4838.08) (2710.68) (23.66) (1307.32) (6792.91) Variance of the current LCU (103) Max(Neighboring Depths)
BasketBall BasketBall Aspen BQTerrace Cactus Kimono1 Aspen BQTerrace Cactus Kimono1 Drive Drive 0.03 0.19 0.07 0.06 0.19 1.57 1.24 1.29 2.00 2.02 (12.57) (91.78) (0.45) (29.32) (135.24) (0.68) (0.75) (0.81) (0.70) (0.53) 0.43 0.52 1.83 0.96 0.39 2.21 2.51 2.17 2.83 2.40 (432.46) (761.29)(10125.72) (2365.81) (363.05) (0.44) (0.35) (0.73) (0.23) (0.39) 0.46 0.54 1.80 0.86 0.44 2.51 2.76 2.88 2.97 2.62 (332.50) (611.10) (4497.16) (1786.60) (587.28) (0.32) (0.20) (0.13) (0.03) (0.27) 0.52 0.69 1.92 1.00 0.46 2.77 2.95 3.00 2.99 2.82 (356.23) (666.90) (3834.08) (2284.00) (669.37) (0.19) (0.05) (0.01) (0.01) (0.15) 0.20 0.23 0.99 0.60 0.27 0.76 0.77 0.95 0.69 1.44 (108.80) (184.61) (1854.24) (875.66) (275.87) (0.60) (0.61) (0.93) (0.81) (0.57) 0.77 0.99 2.90 1.53 0.45 1.64 1.69 1.92 1.99 1.82 (680.71) (997.58) (7925.55) (4714.41) (539.68) (0.51) (0.51) (0.67) (0.54) (0.37) 0.58 1.06 2.38 1.03 0.55 2.14 2.18 2.43 2.47 2.02 (236.52) (640.00) (4373.66) (750.40) (696.87) (0.50) (0.38) (0.47) (0.36) (0.30) 0.51 1.03 2.02 0.98 0.68 2.07 2.58 2.81 2.81 2.34 (121.25) (601.11) (3148.50) (638.76) (613.05) (0.60) (0.27) (0.21) (0.20) (0.33)
1379
Foreman, Sean and Stefan with a resolution of 352288 as well as Aspen, BasketBallDrive, BQTerrace, Cactus and Kimono1 with a resolution of 19201080 are used. The 352288-size test videos use the same sequences used in the research for H.264 [17][28]. In Table I and Table II, the correlation between the depth of the current LCU and the information obtained from neighboring LCUs is presented for 352288 and 19201080 video sequences, respectively. The first column represents the quantization parameter (QP) values, while the second column represents the depth. The maximum value among the absolute MVs of the neighboring LCUs is obtained for each LCU and presented from the third to seventh columns of Tables I and II. The average and the variance (given in parenthesis) of the maximum MVs for each depth are shown in these columns. For a low resolution video in Table I, depth 0 is seldom selected. For these cases, the corresponding cells are left blank. For videos at a resolution of 352288 and with QP=20, it appears that the depth of the LCU increases as the average magnitude of the neighboring MVs increases. The variance is also not large in this case. This result follows the data proposed in the research [28], which uses H.264/AVC targeting 176144 and 352288-size videos. For 19201080 videos in Table II, however, the depth of the LCU does not increase along with the neighboring MVs and its variance is very large. This indicates that the correlation between the depth of the current LCU and the neighboring MVs does not exist. In high-resolution videos, the MV values are quite large, even when the motion seems stationary. Sometimes, a large block size is preferred, even with fast and complex motion, because the texture, brightness or colors of the same object can be changed in a different way in every frame. In this case, the elaborated ME with a small block size cannot reduce the prediction error. Therefore, the observation leads to the conclusion that the correlation between the depth of each LCU and the neighboring MVs becomes small for high-resolution videos. If the pixel variance of a certain region is small, this region is likely to be spatially homogeneous and is probably encoded as a large block size. This possibility is tested and the results are presented from the eighth to twelfth columns in Tables I and II. In these columns, the correlation between the depth and the variance of the current LCU is presented. For 19201080 videos encoded with QP=20 in Table II, the variance of the current LCU is quite low when its depth is 0. However, in other cases, the correlation between the depth and the variance of the current LCU is not very strong. In HEVC, the number of block sizes is significantly larger than that in H.264/AVC. Moreover, blocks can be encoded as the SKIP mode not only in the LCU size but also in every CU. These changed SKIP mode decision and the number of block sizes make it difficult to find a strong correlation between the depth and the variance. From the thirteenth to seventeenth columns in Tables I and II, the correlation between the depth of the current LCU and the depth information of the neighboring LCUs is presented. The depth of the current LCU becomes large as the neighboring depths increases while its variance is quite small. These results show that the correlation between the depths for the current and neighboring LCUs is positive.
From the above simulations, the neighboring depth information may be helpful for a prediction of the block size (or the LCU depth), whereas the MV or variance information may be not very useful. B. Hierarchical Decision of Prediction Block Size The algorithm of Yu [37] checks three conditions for an early SKIP mode decision. First, one of neighboring blocks is the SKIP mode block. Second, the sum of absolute difference (SAD) of the current MB is less than the average SAD of the neighboring MBs. Here, the SAD is the difference between a block in the current frame and a co-located block in the reference frame. Lastly, the result of the fast transformquantized coefficients is zero. In Table III, the early SKIP mode decision algorithms denoted by ES proposed in the HM5.0 reference software and Yus algorithm [37] are tested. For the simulation of a hardware-based HEVC encoder, the encoding time to process a LCU is estimated by adding the time for the FME with the MC of each CU inside an LCU, as the stage of the FME with the MC operations take the most time in the pipeline schedule as discussed in Section II.B. The configurations for the encoding are lowcomplexity, low-delay, and P picture-only and the number of reference frames is four at most. Twelve video sequences, BQMall, FlowerVase, Keiba and RaceHorses with a resolution of 832480; FourPeople, KristenAndSara, Johnny and Vidyo1 with a resolution of 1280720; and Aspen, BasketBallDrive, SnowMountain and Kimono1 with a resolution of 19201080, are used in the evaluation. There are 50 frames in each test sequence, and four QPs (20, 24, 28 and 32) are used. The first and the second columns represent the resolutions and test sequences used in the simulation. From the third to fifth columns, the increase in bitrate and PSNR and the time saved, denoted by B, P and T, respectively, are shown when the ES proposed in the reference software is applied. The time is reduced by 60.76%, whereas the bitrate slightly decreases and the PSNR is degraded by 0.02dB. Yus algorithm [37] makes the early SKIP mode decision considering neighboring information and the characteristics of the current CU unlike the RD costbased algorithm in the reference software. From the sixth to eighth columns, two algorithms are used together to complement each other. The time is reduced by 69.82%.
TABLE III RD PERFORMANCE DEGRADATION AND THE TIME SAVED BY AN EARLY SKIP MODE DECISION
Size Videos B (%) -0.32 -0.03 -0.13 -0.18 -0.84 -1.15 -1.06 -0.62 -0.19 -0.43 -0.50 -0.16 -0.47 ES in HM P (dB) -0.02 -0.01 -0.01 -0.02 -0.03 -0.03 -0.02 -0.02 -0.01 -0.02 -0.02 -0.01 -0.02 T (%) 53.81 71.87 38.96 31.48 82.55 78.36 79.08 78.08 59.42 54.68 58.79 42.09 60.76 ES in HM + Yus B P T (%) (dB) (%) 2.45 -0.13 67.62 -0.56 -0.09 81.95 2.76 -0.08 53.05 1.49 -0.08 39.98 -0.30 -0.09 88.00 -1.30 -0.07 84.50 -0.63 -0.08 84.92 -0.29 -0.10 85.61 0.68 -0.05 68.72 0.97 -0.07 63.27 -2.13 -0.10 68.53 0.77 -0.04 51.73 0.33 -0.08 69.82
832 BQMall 480 FlowerVase Keiba RaceHorses 1280 FourPeople 720 Johnny KristenAndSara Vidyo1 1920 Aspen 1080 BasketBallDrive SnowMoutain Kimono1 Average
1380
The ECU and CFM algorithms are applied to HEVC and the results are tabulated, as presented in Table IV. From the third to fifth columns, the ECU algorithm is used alone. The encoding time is reduced by 48.03% and the RD drop is marginal. When ECU and CFM algorithms are used together, the encoding time saved is 64.03%, whereas the PSNR is 0.07dB less than that of ECU. When the three algorithms of ECU, CFM and ES are used together, only 5% of time is additionally saved. According to the categorization in Section III, the early SKIP mode decision, ECU, and CFM are all classified as belonging in the first sub-category in the hierarchical decision. On the other hand, no algorithms for the second sub-category are defined in the HM5.0 reference software. The second subcategory algorithms for H.264/AVC are applied to HEVC compression and the effect on the RD performance and the speed-up is investigated. In a number of previous block-sizereduction algorithms, the prediction of the block size at the lower depth is performed first and the searches for deeper depths are then stopped if a certain condition is satisfied [18][31][37]. The following three algorithms are classified as belonging in the third sub-category in the hierarchical decision according to the categorization in Section III.A Early termination of CU, which is similar to the algorithm proposed by Lee [31] (denoted as ETCU1 henceforth), is applicable for the reduction of the block size search. The predictions for four CUs at depth (d+1) are performed after the prediction for CU at depth (d). Every time the prediction for the CU at depth (d+1) is finished, the RD cost of each CU is accumulated and compared with the early termination threshold. If the current accumulated RD cost at depth (d+1) is larger than the threshold, the total RD cost of four CUs at depth (d+1) is expected to be larger than that of the corresponding CU at depth (d). Thus, the ongoing prediction at depth (d+1) is terminated early. The threshold is derived from the RD cost at depth (d). In the Yus algorithm [37], if the RD cost of 2N2N at depth (d+1) is greater than a quarter of the best RD cost at depth (d), further searches on 2NN and N2N PUs at depth (d+1) as well as deeper depths are not performed. This algorithm is denoted as ETCU2 henceforth. Another early termination algorithm proposed not performing a FME operation at each depth [18]. This strategy is denoted as
FME_SKIP hereafter. The SKIP mode plays an important role in compression efficiency and SKIP mode prediction is, thus, always performed, even when various fast-mode decision schemes are applied. The result of the SKIP mode prediction is obtained very quickly due to its low complexity as compared to other inter- and intra-predictions. If the ME cost as estimated in the middle of its computation is greater than the SKIP cost, the ME operation is terminated. A specific algorithm is as follows. After IME, the IME cost is compared to the cost of the SKIP mode using the condition CFME_SKIP as defined in (1). Here, COSTSKIP is the cost of the SKIP mode, whereas COSTIME denotes the IME cost. If COSTSKIP is less than COSTIME multiplied by WFME_SKIP, FME is not performed for the current block. The weight value, WFME_SKIP, is chosen experimentally and is set to 0.8 because it is observed that the cost obtained from FME is approximately 80% of COSTIME on average. Therefore, the final cost of ME can be estimated as 0.8COSTIME, and this estimated ME cost is compared with COSTSKIP. CFME_SKIP: COSTSKIP < WFME_SKIPCOSTIME (1)
In Table IV, from the twelfth to fourteenth columns, ETCU1 algorithm is used alongside ECU and CFM algorithms. The encoding time is reduced by 68.26%, whereas the increase in bitrate and the PSNR drop are 1.88% and 0.23dB, respectively. ETCU2 algorithm from the fifteenth to seventeenth columns shows 75.25% of time saving but the RD drop is quite large. Lastly, from the eighteenth to twentieth columns, the simulation results are shown when ECU, CFM and FME_SKIP algorithms are used together. The time saving of 89.95% is achieved, whereas the RD performance is much better than those of ETCU1 and ETCU2. For three simulations including ECU+CFM+ETCU1, ECU+CFM+ETCU2 and ECU+CFM+FME_SKIP, using ES algorithm additionally is not helpful both for the time saving and the RD performance. C. Decision of Prediction Block Sizes before FME In H.264/AVC, AMPD (or AMPD2) or MF has been successfully used for block size reduction. As explained in Section II, the reduction of the FME time is very important for real-time encoding for a hardware-based encoder
TABLE IV RD PERFORMANCE DEGRADATION AND THE TIME SAVED BY THE ECU AND CFM ALGORITHMS PROPOSED IN THE HM5.0 REFERENCE SOFTWARE
ECU ECU+CFM ES+ECU+CFM ECU+CFM+ETCU1 ECU+CFM+ETCU2 ECU+CFM+FME_SKIP
Size 832 480
1280 720
1920 1080
B (%) BQMall -0.87 FlowerVase -1.77 Keiba -0.51 RaceHorses -0.27 FourP eople -1.15 Johnny -1.96 KristenAndSara -1.79 Vidyo1 -1.89 Aspen -0.34 BasketBallDrive -0.53 SnowMoutain -2.31 Kimono1 -0.16 Average -1.13 Videos
P (dB) -0.05 -0.07 -0.03 -0.02 -0.04 -0.04 -0.06 -0.05 -0.01 -0.02 -0.08 -0.01 -0.04
T (%) 44.84 66.12 30.16 14.67 67.92 64.72 64.35 65.23 45.75 35.41 52.21 24.94 48.03
B (%) -1.47 -3.97 -1.13 -0.66 -1.89 -3.95 -2.79 -2.80 -1.01 -1.27 -3.30 -0.32 -2.05
P T (dB) (%) -0.12 60.83 -0.21 82.30 -0.09 47.67 -0.09 32.62 -0.09 81.95 -0.13 80.83 -0.14 80.33 -0.13 80.14 -0.05 62.68 -0.07 53.89 -0.12 64.35 -0.03 40.79 -0.11 64.03
B (%) -1.81 -4.11 -1.36 -0.93 -2.42 -4.85 -3.52 -3.24 -1.24 -1.72 -3.46 -0.64 -2.44
P (dB) -0.16 -0.23 -0.11 -0.14 -0.12 -0.16 -0.18 -0.15 -0.07 -0.10 -0.13 -0.07 -0.13
T (%) 65.49 86.46 51.64 38.16 86.97 85.96 85.34 85.03 67.58 60.35 67.82 46.65 68.95
B (%) 3.48 1.79 3.23 5.98 2.71 -0.36 2.36 1.70 1.71 2.16 -3.20 1.04 1.88
P (dB) -0.25 -0.52 -0.17 -0.27 -0.28 -0.22 -0.30 -0.30 -0.12 -0.12 -0.17 -0.06 -0.23
T (%) 64.04 84.82 53.90 43.83 83.15 82.02 81.70 82.14 67.94 61.24 66.15 48.15 68.26
B (%) 19.78 5.44 10.49 16.76 10.20 5.91 9.21 6.88 1.47 3.69 -3.11 2.14 7.40
P (dB) -0.58 -0.63 -0.31 -0.48 -0.47 -0.39 -0.50 -0.44 -0.14 -0.17 -0.28 -0.10 -0.37
T (%) 72.04 86.34 64.86 55.73 86.34 85.71 85.15 84.90 74.94 69.32 75.54 62.14 75.25
B (%) -1.10 -2.63 -0.70 -0.18 -1.85 -3.91 -2.67 -2.85 -0.61 -1.17 -3.10 -0.06 -1.74
P (dB) -0.15 -0.29 -0.12 -0.10 -0.12 -0.18 -0.17 -0.16 -0.08 -0.09 -0.13 -0.05 -0.14
T (%) 82.49 96.35 78.27 66.37 95.49 95.05 95.30 94.92 86.06 81.94 82.42 76.73 85.95
1381
implementation. Thus, in this subsection, the above algorithms are applied for HEVC and their effectiveness is then tested through simulation. To apply AMPD2 or MF to the HEVC mode decision, candidate partitions should be defined during the IME phase. Fig. 2 shows one example of a block size prediction in the IME phase. In the Clusters 1 and 2, there are three 6464 CU partitions and three 3232 CU partitions, respectively, whereas there are four 1616 CU partitions in the Cluster 3. For the 88 CU partition, the best block size is selected based on the IME cost. For the 1616 CU partition, the IME costs of the 2N2N, N2N, 2NN and NN types are sorted in an ascending order, whereas the IME costs of the 2N2N, N2N and 2NN types are sorted in an ascending order for the 3232 and 6464 CU partitions. Through this process, ten partitions in total are selected for FME, as shown in Fig. 2.
the Cluster 3. The encoding time is reduced by 73.36%, whereas the increase in bitrate and the drop in the PSNR are 0.26% and 0.04 dB on average. From these simulations, this algorithm turns out to be very effective for speed-up without a significant RD degradation for all types of video sequences. D. Algorithm Evaluation From Sections IV.A to C, in HEVC, it can be inferred that pre-decisions of prediction block sizes are very difficult, whereas hierarchical decisions or decisions based on the results from IME are useful for saving time. However, some of these algorithms offers a different degree of performance according to the video characteristics. In Fig. 3, the times saved by various algorithms are compared for the RaceHorses and FourPeople video sequences denoted by black and gray bar graphs, respectively. The FourPeople sequence has slow motion and its texture is smooth, whereas the RaceHorses sequence includes fast and irregular motion. The hierarchical decision presented in the HM5.0 reference software, including the ES scheme, is very effective for the FourPeople sequence. However, for the RaceHorses, the benefit from those ES, ECU and CFM algorithms are not large and is less than half of that for the FourPeople sequence. Another notable observation is that the combination of the ES, ECU and CFM increases the time saving. However, the rate of increase is not significant as the effects of those schemes are overlapping in many cases. When the ECU and CFM schemes are combined with other hierarchical decision schemes of the ETCU1, ETCU2 and FME_SKIP, the time saving for the RaceHorses is improved substantially, whereas the amount of the time saving is increased slightly for the FourPeople sequence. Unlike other hierarchical decision algorithms, AMPD algorithms show the similar performance for both video sequences. The time saving is increased as the number of candidates are reduced. As shown in Fig. 3, most algorithms show significant time savings for the FourPeople sequence, whereas the variation in the saved time is very large in the RaceHorses sequence. Only four combinations, ECU+CFM+ETCU2, ECU+CFM+ FME_SKIP as well as the AMPD algorithms with 7 and 4 candidates, show time savings of over 50% for both FourPeople and RaceHorses.
64 64
Cluster1
Cluster2
Cluster3
Fig. 2. Prediction modes pre-determined in the IME phase
From the third to fifth columns in Table V, the RD performance degradation and the encoding time saved are shown when FME is performed for the ten candidate partitions of Fig. 2. The time saving is 30.15%, whereas the RD drop is marginal. From the sixth to eighth columns, the seven candidates, three from the Cluster 1, two from the Cluster 2 and two from the Cluster 3, are chosen. The time is reduced by 57.44 %, whereas the increase in bitrate and the drop in the PSNR are 0.21% and 0.04 dB on average. From the ninth to eleventh columns, FME is performed for the four candidate partitions. One from the Cluster 1 and another one from the Cluster 2 are selected, whereas two are selected from
TABLE V RD PERFORMANCE DEGRADATION AND ENCODING TIME SAVED ACCORDING TO MODES DETERMINED IN THE IME PHASE
Size Videos 10 candidates B P T (%) (dB) (%) 0.25 -0.03 30.23 -0.05 -0.04 30.23 0.16 -0.02 30.23 0.61 -0.04 30.23 -0.04 -0.01 30.08 -0.46 -0.01 30.08 -0.37 -0.02 30.08 -0.03 -0.01 30.08 -0.05 0.00 30.14 -0.10 -0.01 30.14 -0.34 -0.04 30.14 -0.09 0.00 30.14 -0.04 -0.02 30.15 7 candidates B P T (%) (dB) (%) 0.58 -0.05 57.24 0.20 -0.07 58.19 0.75 -0.05 56.74 1.30 -0.07 55.92 0.01 -0.03 57.71 -0.62 -0.03 57.64 -0.20 -0.03 57.71 -0.23 -0.02 57.71 0.39 -0.02 57.95 0.42 -0.02 57.54 -0.51 -0.06 57.12 0.39 -0.02 57.78 0.21 -0.04 57.44 4 candidates B P T (%) (dB) (%) 0.48 -0.05 72.98 0.29 -0.07 73.93 1.01 -0.04 72.42 1.37 -0.07 71.73 0.08 -0.03 73.87 -0.24 -0.04 73.68 -0.18 -0.03 73.75 -0.01 -0.03 73.75 0.41 -0.02 73.86 0.38 -0.02 73.50 -0.55 -0.06 73.15 0.08 -0.02 73.74 0.26 -0.04 73.36
832 BQMall 480 FlowerVase Keiba RaceHorses 1280 FourPeople 720 Johnny KristenAndSara Vidyo1 1920 Aspen 1080 BasketBallDrive SnowMoutain Kimono1 Average
Fig. 3. Algorithm comparison in terms of the time saved
1382
IEEE Transactions on Consumer Electronics, Vol. 58, No. 4, November 2012 [4] [5] T. Wiegand, W.J. Han, B. Bross, and J. R Ohm, and G.J. Sullivan, WD4: Working Draft 4 of High-Efficiency Video Coding, JCTVCF803, Torino, IT, July 2011. Y.-K. Lin, D.-W. Li, C.-C. Lin, T.-Y. Kou, S.-J. Wu, W.-C. Tai, W.-C. Chang, and T.-Sheuan Chang, A 242mW, 10mm2 1080p H.264/AVC High Profile Encoder Chip, in Proc. of Design Automat. Conf., pp.7883, July 2008. Y.-H. Chen, T.-D. Chuang, Y.-J. Chen, C.-T. Li, C.-J. Hsu, S.-Y. Chien, and L.-G. Chen, An H.264/AVC scalable extension and high profile HDTV 1080p encoder chip, in Proc. of Sym. on VLSI Circuits, pp.104105, Aug. 2008. Y.-H. Chen, T.-C. Chen, and L.-G. Chen, Power-scalable algorithm and reconfigurable macro-block pipelining architecture of H.264 encoder for mobile application, in Proc. Int. Conf. Multimedia Expo, pp.281284, Dec. 2006. T.-C. Chen, S.-Y. Chien, Y.-W. Huang, C.-H. Tsai, C.-Y. Chen, T.-W. Chen, and L.-G. Chen, Analysis and architecture design of an HDTV720p 30 frames/s H.264/AVC encoder, IEEE Trans. Circuits Syst. Video Technol., vol. 16, no. 6, pp. 673688, June, 2006. H.-C. Chang, Y.-C. Yang, J.-W. Chen, C.-L. Su, C.-A. Chien, J.-I. Guo, and J.-S. Wang, A dynamic quality-scalable H.264 video encoder chip, in Proc. Asia South Pacific Design Automat. Conf., pp. 125126, Feb. 2009. Y.-K. Lin, C.-C. Lin, T.-Y. Kuo, and T.-S. Chang, A HardwareEfficient H.264/AVC Motion-Estimation Design for High-Definition Video, IEEE Trans. Circuits and System I, vol. 55, no. 6, pp. 1526 1535, July, 2008. C. Yang, S. Goto and T. Ikenaga, High Performance VLSI Architecture of Fractional Motion Estimation in H.264 for HDTV, in Proc. of Int. Symposium on Circuits and Systems, pp.26052608, May, 2006. C.-Y. Kao, C.-L. Wu and Y.-L. Lin, A High-Performance Three-Engine Architecture for H.264/AVC Fractional Motion Estimation, IEEE Trans. Very Large Scale Integration Sys., vol. 18, no. 4, pp. 662666, April, 2010. P. K. Tsung, W.-Y. Chen, L.-F. Ding, S.-Y. Chien, L.-G. Chen, Cachebased Integer Motion/Disparity Estimation for Quad-HD H.264/AVC and HD Multiview Video Coding, in Proc. of the IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, pp. 20132016, April, 2009. C.-M. Ou, C.-F. Le, W.-J. Hwang, An efficient VLSI architecture for H.264 variable block size motion estimation, IEEE Trans. Consumer Electronics, vol. 51, no. 4, pp. 12911299, Nov., 2005. J. Kim and T. Park, A novel VLSI architecture for full-search variable block-size motion estimation, IEEE Trans. Consumer Electronics, vol. 55, no. 2, pp. 728733, May, 2009. L. Zhang and W. Gao, Reusable Architecture and ComplexityControllable Algorithm for the Integer/Fractional Motion Estimation of H.264, IEEE Trans. Consumer Electronics, vol. 53, no. 2, pp. 749756, May, 2007. X. Lu, A.M. Tourapis, P. Yin, and J. Boyce, Fast Mode Decision and Motion Estimation for H.264 with a Focus on MPEG-2/H.264 Transcoding, in Proc. of Int. Symposium on Circuits and Systems, vol. 2, pp.12461249, May, 2005. C. E. Rhee, J.-S. Kim, and H.-J. Lee, Cascaded Direction Filtering for Fast Multidirectional Inter-Prediction in H.264/AVC Main and High Profile Compression, IEEE Trans. Circuits Syst. Video Technol., vol. 22, no. 3, pp. 403413, March, 2012. B.-G. Kim, S.-K. Song, and C.-S. Cho, Efficient inter-mode decision based on contextual prediction for the P-slice in H.264/AVC video coding, in Proc. Int. Conf. Image Processing, pp.13331336, Oct., 2006. B.-G. Kim and C.-S. Cho, A fast inter-mode decision algorithm based on macro-Block tracking for P slices in the H.264/AVC video standard, in Proc. Int. Conf. Image Processing, vol. 5, pp. 301304, Sept., 2007. X. Jin, Y. Huang, Q. Liu, S. Wu, and T. Ikenaga, Fast Spatial Direct Mode Decision for B Slice based on Temporal Information in H.264 Standard, in Proc. of Int. Sym. on Intelligent Signal Processing and Communication Systems, pp.331334, Jan. 2009. T. Zhao, H. Wang, and S. Kwong, C. -C. J. Kuo, Fast Mode Decision Based on Mode Adaptation, IEEE Trans. Circuits Syst. Video Technol., vol. 20, no. 5, pp. 697705, May, 2010.
In Fig. 4, the RD performances of the ECU+CFM+ETCU2, ECU+CFM+FME_SKIP and AMPD algorithms with 7 and 4 candidates are compared to that of the HM5.0 reference software where no early decision algorithm is adopted. The horizontal and the vertical axes show the bitrate and the PSNR, respectively. The RaceHorses and FourPeople video sequences are used in Figs. 4(a) and (b), respectively. The RD performance of the three algorithms of the ECU+CFM+FME_SKIP and AMPD algorithms are comparable to that of the HM5.0 reference software, whereas the RD drop of the ECU+CFM+ETCU2 algorithm denoted by the dash curve is quite large.
46 42
PSNR (dB)
[6]
[7]
[8]
46 42
PSNR (dB)
[9]
38 34 30 0
ECU+CFM+FME_SKIP AMPD 7Cand AMPD 4Cand HM5.0 ECU+CFM+ETCU2
38 34 30 0
ECU+CFM+FME_SKIP AMPD 7Cand AMPD 4Cand HM5.0 ECU+CFM+ETCU2
[10]
5000 10000 Bitrate (kbps)
1000 2000 Bitrate (kbps)
3000
[11] [12]
(a) (b) Fig. 4. Algorithm comparison in terms of the RD performance: (a) 832480-size RaceHorses sequence (b) 1280720-size FourPeople sequence
V. CONCLUSION The HEVC standard employs a hybrid coding approach similar to that of the H.264/AVC standard. Thus, the two standards have much in common. In this paper, the fast mode decision algorithms for H.264/AVC are surveyed and then they are applied for the speed-up of HEVC encoding. One of the major differences is that the number of block sizes supported by HEVC is 10 times more than that of H.264/AVC. The other is that the execution time for FME becomes much larger than that for IME because IME execution can be speed up by exploiting parallelism while FME execution needs to be executed in a serial manner. This second difference needs to make the fast execution of FME become more important than that of IME when a hardware-based encoder is used for HEVC compression. It is experimentally shown that a hierarchical inter-mode decision algorithm is a very effective solution for HEVC because there are many opportunities to terminate further prediction during searching a tree of CUs. In the future, the previous algorithms tested in this paper need to be further elaborated and enhanced. REFERENCES Draft ITU-T Recommendation and Final Draft International Standard of Joint Video Specification (ITU-T Rec. H.264-ISO/IEC 14496-10 AVC), 2003. ISO/IEC JTC 1 SC29 WG11, "Joint Call for Proposals on Video Compression Technology," Doc. N11113, Jan. 2010. ISO/IEC JTC 1 SC29 WG11, "Vision, Applications and Requirements of High-Performance Video Coding," Doc. N11096, Jan. 2010.
[13]
[14] [15] [16]
[17]
[18]
[19]
[20]
[1] [2] [3]
[21]
[22]
C. E. Rhee et al.: A Survey of Fast Mode Decision Algorithms for Inter-Prediction and Their Applications to High Efficiency Video Coding [23] D. Wu, F. Pan, K. P. Lim, S. Wu, Z. G. Li, X. Lin, S. Rahardja, and C. C. Ko, Fast Intermode Decision in H.264/AVC Video Coding, IEEE Trans. Circuits Syst. Video Technol., vol. 15, no. 7, pp. 953 958, July, 2005. [24] C. Y. Chang, C. H. Pan, and H. Chen, Fast mode decision for P-frames in H.264, presented at the Picture Coding Symp., Dec., 2004. [25] S.-H. Ri, Y. Vatis, and J. Ostermann, Fast Inter-Mode Decision in an H.264/AVC Encoder Using Mode and Lagrangian Cost Correlation, IEEE Trans. Circuits Syst. Video Technol., vol. 19, no. 2, pp. 302 306, Feb., 2009. [26] X. Jing and L.-P. Chau, Fast approach for H.264 INTER mode decision, Electronics Letters, vol. 40, no. 17, pp.10501052, Aug., 2004. [27] A. Ahmad, N. Khan, S. Masud, and M.A. Maud, Selection of variable block sizes in H.264, in Proc. of the IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, vol. 3, pp. 173176, May, 2004. [28] H. Zeng, C. Cai, and K.-K. Ma, Fast Mode Decision for H.264/AVC Based on Macroblock Motion Activity, IEEE Trans. Circuits Syst. Video Technol., vol. 19, no. 4, pp. 491 499, April, 2009. [29] J. Bu, S. Lou, C. Chen, and J. Zhu, A predictive block-size mode selection for inter frame in H.264, in Proc. of the IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, vol. 2, pp. 917920, May, 2006. [30] H.Ko, K. Yoo, and K. Sohn, Fast mode-decision for H.264/AVC based on inter-frame correlations, Signal Processing: Image Commun., vol.24, no. 10, pp. 803-813, Nov. 2009. [31] J. Y. Lee and H. Park, A Fast Mode Decision Method Based on Motion Cost and Intra Prediction Cost for H.264/AVC, IEEE Trans. Circuits Syst. Video Technol., vol. 22, no. 3, pp. 393 402, March, 2012. [32] D. Wu, S. Wu, K. P. Lim, F. Pan, Z. G. Li, and X. Lin, Block intermode decision for fast encoding of H.264, in Proc. of the IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, vol. 3, pp. 181184, May, 2004. [33] Z. Liu, L. Shen, and Z. Zhang, An Efficient Intermode Decision Algorithm Based on Motion Homogeneity for H.264/AVC, IEEE Trans. Circuits Syst. Video Technol., vol. 19, no. 1, pp. 128132, Jan., 2009. [34] D. Zhu, Q. Dai, and R. Ding, Fast inter-prediction mode decision for H.264, in Proc. Int. Conf. Multimedia Expo, vol. 2, pp. 11231126, June, 2004. [35] C.-H. Kuo, M. Shen, and C.-C. J. Kuo, Fast inter-prediction mode decision and motion search for H.264, in Proc. IEEE Int. Conf. multimedia Expo, vol. 1, pp. 663666, June, 2004. [36] P. Yin, H.-Y.C. Tourapis, A.M. Tourapis, and J.Boyce, Fast mode decision and motion estimation for JVT/H.264, in Proc. of the IEEE Int. Conf. on Image Processing, vol. 3, pp.853856, Sept., 2003. [37] A. C. W. Yu, G. R. Martin, and H. Park, Fast Inter-Mode Selection in the H.264/AVC Standard Using a Hierarchical Decision Process, IEEE Trans. Circuits Syst. Video Technol., vol. 18, no. 2, pp. 186 195, April, 2009. [38] G. Kim, Y. Moon, and J. Kim, An early detection of all-zero DCT block in H.264, in Proc. Int. Conf. Image Processing, vol. 1, pp. 453 456, Oct. 2004. [39] J. Lee and B. W. Jeon, Fast mode decision for H.264 with variable motion block size, Lecture Notes in Computer Science, vol. 2869, pp. 723730, 2003. [40] I. Choi, J. Lee, and B. Jeon, Fast coding mode selection with ratedistortion optimization for MPEG-4 part-10 AVC/H.264, IEEE Trans. Circuits Syst. Video Technol., vol. 16, no. 12, pp. 15571561, Dec., 2006. [41] Y.-H. Kim, J.-W. Yoo, S.-W. Lee, J. Shin, J. Paik, and H.-K.Jung, Adaptive mode decision for H.264 encoder, Electronics Letters, vol. 40, no. 19, pp.11721173, Sept., 2004. [42] J. Lee and B. Jeon, Pruned mode decision based on variable block sizes motion compensation for H.264, Lecture Notes in Computer Science, vol. 2899, pp. 410418, Nov., 2003.
1383
[43] C.S. Kannangara, I.E.G. Richardson, M. Bystrom, J.R. Solera, Y. Zhao, A. MacLennan, and R. Cooney, Low complexity skip prediction for H.264 through Lagrangian cost estimation, IEEE Trans. Circuits Syst. Video Technol., vol. 16, no. 2, pp. 202208, Feb., 2006. [44] Y. Moon, G. Kim, and J. Kim, An improved early detection algorithm for all-zero blocks in H.264 video encoding, IEEE Trans. Circuits Syst. Video Technol., vol. 15, no. 8, pp. 10531057, Aug., 2005. [45] T.-C. Chen, Y.-W. Huang, and L.-G. Chen, " Fully utilized and reusable architecture for fractional motion estimation of H.264/AVC," in Proc. of the IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, vol. 5, pp. 9 12, May, 2004. [46] M. Shao, Z. Liu, S. Goto, and T. Ikenaga, Lossless VLSI oriented full computation reusing algorithm for H.264/AVC fractional motion estimation, IEIEC Trans. Fundamentals, vol.90-A, no.5, pp. 756763, April, 2007. [47] Y. Song, M. Shao, Z. Liu, S. Li, L. Li, T. Ikenaga, and S. Goto, H. 264/AVC fractional motion estimation engine with computation reusing in HDTV1080p real-time encoding applications, in Proc. of the IEEE Workshop on Signal Processing Systems, pp.509514, Oct., 2007. BIOGRAPHIES Chae Eun Rhee received the B.S., M.S. and Ph.D degrees in Electrical Engineering and Computer Science from Seoul National University, Seoul, Korea, in 2000, 2002 and 2011, respectively. From 2002 to 2005, she was with the Digital TV Development Group, Samsung Electronics Company Ltd., Suwon City, Korea, as an Engineer, where she was involved in bus architecture and MPEG decoder development. She is currently working as a research professor in Electrical Engineering and Computer Science at Seoul National University, Korea. Her research interests include algorithm and architecture design of video coding for HEVC and H.264/AVC and configurable video coding for real time systems. Kyujoong Lee received the B.S. degree in electrical engineering from Seoul National University, Seoul, Korea, in 2002 and the M.S. degree in electrical engineering from University of Southern California, Los Angeles, USA, in 2008. He is working toward Ph.D degree in electrical engineering of Seoul National University. From 2002 to 2005, he was with Com2us Corporation, Seoul, Korea, as a developer. His major research interests include the algorithm and architecture of H.264/AVC and SVC and noise reduction of video stream. Tae-Sung Kim received the B.S degree in electrical electronic engineering from Pusan National University, Pusan, Korea, in 2010. He is working toward M.S degree in electrical engineering of Seoul National University. His research interests include the algorithm and architecture of H.264/AVC and HEVC. Hyuk-Jae Lee received the B.S. and M.S. degrees in Electronics Engineering from Seoul National University, Korea, in 1987 and 1989, respectively, and the Ph.D. degree in Electrical and Computer Engineering from Purdue University at West Lafayette, Indiana, in 1996. From 1998 to 2001, he worked at the Server and Workstation Chipset Division of Intel Corporation in Hillsboro, Oregon as a senior component design engineer. From 1996 to 1998, he was on the faculty of the Department of Computer Science of Louisiana Tech University at Ruston, Louisiana. In 2001, he joined the School of Electrical Engineering and Computer Science at Seoul National University, Korea, where he is currently working as a Professor. He is a founder of Mamurian Design, Inc., a fabless SoC design house for multimedia applications. His research interests are in the areas of computer architecture and SoC design for multimedia applications.

06415009

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

06415009

Hochgeladen von

Copyright:

Verfügbare Formate

C. E. Rhee et al.

0098 3063/12/$20.00 2012 IEEE

IEEE Transactions on Consumer Electronics, Vol. 58, No. 4, November 2012

Fast inter-mode selection

Fig. 1. Classification tree of fast inter-predictions for H.264/AVC

IEEE Transactions on Consumer Electronics, Vol. 58, No. 4, November 2012

IEEE Transactions on Consumer Electronics, Vol. 58, No. 4, November 2012

Size 832 480

Fig. 3. Algorithm comparison in terms of the time saved

5000 10000 Bitrate (kbps)

1000 2000 Bitrate (kbps)

[14] [15] [16]

[1] [2] [3]

Das könnte Ihnen auch gefallen