On Transform Coding Tools Under Development For VP10

On Transform Coding Tools Under Development For VP10

Sarah Parker , Yue Chen , Jingning Han , Zoe Liu*, Debargha Mukherjee*, Hui Su*, Yongzhe
* * *
Wang*, Jim Bankoski*, Shunyao Li+

Email: {sarahparker, yuec , jingning
, zoeliu, debargha
, huisu , yongzhe, jimbankoski

}@google.com*

lishunyaothu@gmail.com +

*
Google, Inc., 1600 Amphitheatre Parkway, Mountain View, CA, USA 94043.
+
University of California, Santa Barbara, CA 93106.
ABSTRACT
Google started the WebM Project in 2010 to develop open source, royaltyfree video codecs designed specifically for
media on the Web. The second generation codec released by the WebM project, VP9, is currently served by YouTube,
and enjoys billions of views per day. Realizing the need for even greater compression efficiency to cope with the
growing demand for video on the web, the WebM team embarked on an ambitious project to develop a next edition
codec, VP10, that achieves at least a generational improvement in coding efficiency over VP9. Starting from VP9, a set
of new experimental coding tools have already been added to VP10 to achieve decent coding gains. Subsequently,
Google joined a consortium of major tech companies called the Alliance for Open Media to jointly develop a new codec
AV1. As a result, the VP10 effort is largely expected to merge with AV1. In this paper, we focus primarily on new tools
in VP10 that improve coding of the prediction residue using transform coding techniques. Specifically, we describe tools
that increase the flexibility of available transforms, allowing the codec to handle a more diverse range or residue
structures. Results are presented on a standard test set.

Keywords: video coding, VP8, VP9, VP10, webm, H.264, HEVC, prediction, motion, transform, transform, DCT, DST,
Identity.
1. INTRODUCTION
Google embarked on the WebM project [1] to develop opensource, royalty unencumbered video codecs for the Web.
The first codec released as part of the project was called VP8 [2] and is still used extensively in Google Hangouts. The
next edition of the codec, entitled VP9 [3][4], was released in mid2013 and is the current generation codec from the
WebM project. It achieves a coding efficiency similar to the latest video codec from MPEG entitled HEVC [5]. VP9 has
found huge success with adoption by YouTube, and has delivered big improvements to the YouTube service in terms of
quality of experience metrics such as watchtime and meantimetorebuffer over the primary format H.264/AVC [6].
Specifically, VP9 streams delivered by YouTube today are not only 3040% more compact than corresponding
H.264/AVC streams but are also somewhat higher in quality. Consequently, even with predominant software decoding
on compatible browsers Chrome, Firefox, Opera on potent devices, the number of VP9 videos viewed daily by
YouTube users today is in the order of billions. As VP9 hardware decoders become more readily available on mobile
devices we expect the proliferation of VP9 to accelerate even more.
Even though the gains achieved with VP9 are tangible and significant, the continued growth in online video consumption
has made the need for efficient video coding increasingly critical. The WebM project has been focusing on developing
the next generation video codec VP10 [7] since 2014, and modest gains in coding efficiency have already been achieved.
In 2015, Google joined a consortium of major tech companies called the Alliance for Open Media to jointly develop a
Applications of Digital Image Processing XXXIX, edited by Andrew G. Tescher, Proc. of SPIE
Vol. 9971, 997119 · © 2016 SPIE · CCC code: 0277-786X/16/$18 · doi: 10.1117/12.2239105
Proc. of SPIE Vol. 9971 997119-1
Downloaded From: http://proceedings.spiedigitallibrary.org/ on 10/23/2016 Terms of Use: http://spiedigitallibrary.org/ss/termsofuse.aspx

new royaltyfree codec to be named AV1. The plan is to propose the experimental tools developed in VP10 to the AV1
process in course of time. In this paper we primarily focus on the tools developed for VP10.
Though improvements in prediction modes can successfully decrease the prediction error, more than half of the bitrate in
modern video codecs is still spent coding the residual. In this paper we discuss the new transform coding tools that have
been added in VP10 to improve the coding of the residue. First, we discuss the super transform, which allows the
application of one large transform to a predictor created by combining several prediction blocks using overlapped block
motion compensation. Next, we discuss two extensions to our transform sizes: recursive transform units and rectangular
transforms. Finally, we discuss an expanded bank of transform types available to Intra and Inter prediction blocks.
Overall, we find that increasing the flexibility of available transforms allows VP10 to better handle a wide range of
residue structures and leads to a significant reduction in BDRATE.
2. VP9 TRANSFORM CODING FRAMEWORK
VP9 has already made great strides towards devolping effective transform coding tools. In the current implementation, a

recursive blockpartitioning scheme is used to break up each 64x64 superblock into a partitiontree of smaller prediction
blocks. Each prediction block can be encoded using either an Intra or Inter mode. Both Intra and Intra modes use square
transforms of sizes less than or equal to that of the prediction block. Additionally, side information is provided for
transform size.
The structure of Intra and Inter prediction residues tend to differ in several respects, and thus require different transform
coding methods. First, VP9 aims to tailor the transform type towards the most likely energy distribution produced by
Intra and Inter prediction modes. Intra prediction tends to become less accurate as you move farther from the prediction
border, producing residue with a higher energy concentration on one side. In these instances, an asymetrical transform
such as ADST is most appropriate. VP9 offers a choice between ADST and DCT in both the horizontal and vertical
directions for Intra modes, providing a total set of 4 modedependent transform types:
{DCT, ADST}horizontal x {DCT, ADST}vertical
or explicitly for horizontalvertical pairs:
DCTDCT, DCTADST, ADSTDCT, ADSTADST.
The structure of Inter mode residue is less easily defined, and DCTDCT is the only transform option for Inter prediction
blocks in VP9. VP9 also handles prediction differently for Intra and Inter modes. When the transform size is smaller than
the prediction block size, Intra coded blocks use a recursive prediction and transform to produce a full reconstruction,
allowing the next transform block to use the reconstruction as a better predictor. This process is not necessary for Inter
blocks since they use regions of previously reconstructed frames as predictors.
3. TRANSFORM TOOL ENHANCEMENTS IN VP10
VP10 seeks to build upon the previously mentioned transform tools in VP9, and introduce a richer and more flexible set

of available transforms for both Intra and Inter prediction modes. This secion provides an overview of all new transform
coding tools currently under exploration.
SUPERTRANSFORMS
VP9 uses a recursive blockpartitioning scheme for the purpose of prediction; however, the transform used to code the
prediction residue of a prediction block is restricted to be of a size no larger than the prediction block itself. VP10
attempts to remove this restriction for Inter modes by allowing transform blocks to span across multiple prediction
blocks. Specifically, at any level of the partition tree, the syntax can optionally indicate that a single large transform will
be used at that level, irrespective of how fine the partition tree may be below that level. Fig. 1 shows an example of a

partitiontree with two supertransform blocks indicating that the prediction residue will be coded jointly with a large
transform at these sizes.

Fig. 1. Partitiontree with supertransform blocks
Through our investigations, we found that a simple juxtaposition of prediction residues from different prediction blocks
to create one large final predictor is often nonideal. Instead, supertransform creates a new predictor based on a
recursive application of overlapped block motion compensation [8]. In particular, prediction residues from smallest
blocks within the super transform tree are aggregated together with overlapped block motion compensation successively
in a recursive fashion, until the final predictor bubbles up to the supertransform level. Note that predictors at each level
need to be extended by a width equivalent to the width of the smoothing filter across prediction boundaries.
RECURSIVE TRANSFORM UNITS
VP9 provides a wide range of available transform sizes, but each prediction block is limited to selecting only one of
these. In VP10, we remove this constraint and allow any Inter prediction block to use several different transform sizes.
Transforms within a single prediction block may now have recursive treestructured partitions. A simple 2way partition
quadtree with only squaresplit types is used to produce these recursive units. We have found that this size flexibility
allows finer targeting of highenergy regions in the resdual signal. Fig. 2 illustrates the available partition types in the
2way quadtree, as well as an example of a final transform partition tree within a single prediction block.
Fig. 2. Prediction block residue with recursively partitioned transform units using a 2way partition tree
RECTANGULAR TRANSFORMS
In VP9, we were restricted to a set of transform sizes that are always square. In VP10, we expanded our transform sizes

for Inter modes to include rectangular transforms that can be 4x8, 8x4, 8x16, 16x8, 16x32, or 32x16. Rectanguar
transforms are currently only available to rectanglar prediction blocks and are always the same size as the prediction
block.


Fig. 3. Rectangular transform units within a superblock
EXTENDED TRANSFORM TYPES
To code Inter prediction residues, VP9 exclusively uses DCT of different sizes namely 4x4, 8x8, 16x16 and 32x32;
however, for coding of Intra prediction residues, a richer set of transforms that include a hybrid combination of DCTs
and Asymmetric DSTs (ADST) are used [9][10][11]. Intra prediction residues are likely to be smaller near the
boundaries from where they are predicted. As such, the asymmetric DST is better suited to code it. Specifically, VP9
uses DSTIV, which is an approximation to the original ADST [9], but with a faster butterfly implementation [11]. For
ease of exposition, we still refer to this transform as the ADST. In VP9, for each Intra predicted block size, 4x4, 8x8 and
16x16, up to four different separable 2D transforms may be used: DCTDCT, DCTADST, ADSTDCT, ADSTADST,
where each transform pair listed denote the horizontal and vertical transforms of a separable 2D implementation
respectively.
For VP10, we are exploring a richer set of transforms for coding Inter and Intra prediction residues. Inter prediction
residues do not have a well defined structure as in the Intra case, but we have found that using a bank of transforms, each
adapted to a specific type of residue profile within the block, is generally helpful. In VP10, we use not only the ADST
(DSTIV) but also flipped version of the ADST (FlipADST) that applies ADST in reverse order. Further, an identity
transform (IDTX) is now available, which seems to be particularly useful for coding residue with sharp lines and edges.
Previously, we experimented with a symmetric DST, namely DSTII, but found the identity transform to be more
beneficial for coding efficiency. Finally, both Inter and Intra modes continue to make use of DCT. Thus, for each coded
block, we can choose to use one of up to 16 different transforms as follows:
{DCT, ADST, FlipADST, IDTX}horizontal x {DCT, ADST, FlipADST, IDTX}vertical
or explicitly for horizontalvertical pairs:
DCTDCT, DCTADST, ADSTDCT, ADSTADST, DCTFlipADST, FlipADSTDCT, FlipADSTFlipADST,
ADSTFlipADST, FlipADSTADST, DSTDST, IDTXDCT, DCTIDTX, IDTXADST, ADSTIDTX,
IDTXFlipADST, FlipADSTIDTX, IDTXIDTX.
As block sizes get larger, some of these transforms begin to act similarly. Thus, a reduced set of transforms is used for
16x16, 32x32 and 64x64 block sizes. In the transform selection process for Inter and Intra modes, the encoder does a
search over the entire set of transforms and selects the one that produces the best rdcost. Once a transform is selected, a
transform type symbol from the set of types available at that size is used to indicate the actual transform used in the
bitstream.
Note that the onedimensional transforms DCTIDTX or IDTXDCT in the list above are similar in spirit to directional
transforms [12] or 1D trasnforms [13] in the literature. However, we chose two use only two directions horizontal and
vertical, since these seem to be the minimal set that provide the best gains. Also, note that IDTXIDTX is equivalent to
transform skip, which yields substantial benefit for screen content.
While the multiple transforms do not add any decoding complexity since all transform sizes and types are explictly
signaled, there is significant added complexity needed on the encoder side to make the best rdbased decision by
searching over the set of available transform types. We are currently experimenting with methods to mitigate this

complexity. Specifically, we are exploring classification schemes based on simple features derived from the residue
signal, to prune out transform types from the rd search set. In particular, one classifier is trained to prune out either DCT
or IDTX, and a second classifier is trained to prune out either ADST or FlipADST in each direction. The DCT vs IDTX
classifier relies on features comprised of horizontal and vertical neighboring pixel correlation in the residual , while the
ADST vs. FlipADST classifier relies on features comprised of the energy distribution in various regions of the residual
signal. We continue to explore different methods to reduce the added encoder complexity burden produced by this
expanded transform set.
4. CODING RESULTS
To evaluate our new tools, we performed a controlled bitrate test using 3 different video sets:
● lowres, which includes 40 videos of CIF resolution,
● midres, wich includes 30 videos of 480p and 360p resolution, and
● hdres, which contains 38 videos at 720p and 1080p resolution.
where we code 150 frames of each video with a single keyframe. The coding results are shown in Tables 13 below.
For quality metrics we use average sequence PSNR and SSIM [14] computed by the arithmetic average of the
combined PSNRs and SSIMs respectively for each frame. Combined PSNR for each frame is computed from the
combined MSE of the Y, Cb and Cr components. In other words:
MSEcombined = [4MSEy + MSECb + MSECr]/6, assuming 4:2:0 sampling
PSNRcombined = min ( 10log10(2552 / MSEcombined), 100 )
SSIM for each component in each frame is computed by averaging the SSIM scores computed without applying a
windowing function over 8x8 windows for each component. Combined SSIM for the frame is computed from the SSIMs
of the Y, Cb and Cr components as follows:
SSIMcombined = 0.8 SSIMy + 0.1 (SSIMCb + SSIMCr)
To compare RD curves obtained by two codecs we use a modified BDRATE [15] metric that uses piecewise cubic
Hermite polynomial interpolation (pchip) on the the ratedistortion points before integrating the difference over a fine
grid using the trapezoid method. The OVERALL number at the bottom is the arithmetic average of the BDRATE
numbers over all the videos in the same column. The BDRATE is computed separately based on the average sequence
PSNR and SSIM metrics as computed above.
For all the tables below, we use a slightly modified version of VP9 as the baseline, referred to as VP9+ for ease of
exposition, which was also the starting point of the AV1 codec. VP9+ is better than VP9 by about 0.6% becasue it
already incorporates multiple explicit transforms for INTER and INTRA with the set of four original VP9 transforms as
described in Section 2. Specifically, all the results below are generared on the nextgenv2 branch of the libvpx
repository, where the configurations tested are as follows:
VP9+ baseline:
enableav1 [very similar to the AV1 baseline codec]
Extended Transform Set:
enableav1 enableexperimental enableexttx
Extended Transform Set + Rectangular Transforms:
enableav1 enableexperimental enableexttx enablerecttx
Super Transform:
enableav1 enableexperimental enablesupertx

All New transform tools:
enableav1 enableexperimental enablesupertx enableexttx enablerecttx
At the time of writing of this paper we found some bugs in the Recursive Transform Units tool and so the results for
those are excluded.

Table 1. VP10 BDRATE results on lowres set (VP9+ baseline)
Extended Transform Extended Transform Set Super Transform All New Transform

Set + Rectangular Tools
Transforms
Video BDRATE BDRATE BDRATE BDRATE BDRATE BDRATE BDRATE BDRATE

(PSNR) (SSIM) (PSNR) (SSIM) (PSNR) (SSIM) (PSNR) (SSIM)
akiyo_cif.y4m 2.656% 2.024% 3.074% 1.833% 1.099% 0.868% 3.971% 3.424%

basketballpass_240p.y4m 2.982% 4.09% 3.818% 4.622% 0.664% 1.232% 4.235% 5.248%
blowingbubbles_240p.y4m 2.078% 1.681% 3.051% 3.028% 1.433% 1.731% 3.94% 4.205%
blowing_cif.y4m 1.889% 1.737% 4.15% 4.523% 0.703% 0.34% 4.152% 4.588%
bqsquare_240p.y4m 3.409% 3.579% 3.53% 3.619% 0.979% 0.359% 4.505% 4.409%
bridge_close_cif.y4m 4.016% 4.434% 4.396% 5.059% 0.354% 0.619% 3.645% 4.052%
bridge_far_cif.y4m 3.286% 2.959% 3.393% 3.276% 0.807% 0.688% 3.182% 3.044%
bus_cif.y4m 3.018% 2.339% 3.854% 3.322% 1.504% 1.597% 4.842% 4.464%
cheer_sif.y4m 2.669% 3.209% 3.225% 4.174% 0.441% 0.605% 3.409% 3.821%
city_cif.y4m 2.853% 2.31% 3.393% 2.988% 1.667% 1.326% 4.664% 3.865%
coastguard_cif.y4m 2.344% 2.212% 3.286% 3.414% 0.801% 0.932% 3.726% 4.074%
container_cif.y4m 2.497% 1.685% 3.141% 2.351% 0.639% 0.871% 3.389% 2.844%
crew_cif.y4m 1.72% 1.545% 3.158% 3.19% 0.732% 0.547% 3.274% 3.093%
deadline_cif.y4m 3.566% 3.382% 4.713% 4.826% 0.893% 0.78% 5.18% 5.523%
flower_cif.y4m 2.836% 2.881% 3.724% 4.66% 2.071% 2.41% 4.95% 6.117%
flowervase_240p.y4m 2.728% 2.314% 4.07% 4.038% 1.803% 1.369% 5.007% 4.641%
football_cif.y4m 1.657% 1.19% 2.44% 2.048% 0.44% 0.38% 2.605% 2.043%
foreman_cif.y4m 2.829% 2.707% 3.679% 3.665% 1.119% 1.182% 4.404% 4.744%
garden_sif.y4m 2.659% 2.823% 3.07% 3.632% 1.394% 1.436% 4.058% 4.993%
hallmonitor_cif.y4m 1.519% 0.442% 2.389% 1.625% 0.58% 0.58% 2.014% 1.505%
harbour_cif.y4m 3.499% 3.048% 4.776% 4.835% 0.89% 0.931% 5.092% 5.282%
highway_cif.y4m 1.653% 1.227% 2.454% 1.945% 0.612% 1.302% 1.63% 1.486%
husky_cif.y4m 3.637% 3.366% 4.189% 4.271% 0.751% 0.576% 4.704% 4.827%
ice_cif.y4m 3.303% 3.652% 3.526% 4.109% 0.611% 0.247% 3.897% 4.31%
keiba_240p.y4m 2.737% 1.984% 3.558% 3.034% 0.433% 0.432% 3.504% 2.702%
mobile_cif.y4m 3.149% 3.659% 3.434% 4.44% 1.341% 1.49% 4.508% 5.435%
mobisode2_240p.y4m 3.136% 2.78% 4.726% 4.661% 0.884% 0.669% 5.081% 5.285%
motherdaughter_cif.y4m 1.78% 2.056% 2.57% 2.678% 1.437% 1.753% 3.429% 3.578%
news_cif.y4m 3.454% 3.627% 4.236% 4.518% 0.862% 0.907% 4.859% 5.549%
pamphlet_cif.y4m 2.426% 2.228% 3.353% 3.546% 0.382% 0.155% 3.374% 3.21%

paris_cif.y4m 3.838% 3.452% 4.621% 4.644% 0.786% 0.754% 4.949% 4.751%

racehorses_240p.y4m 1.847% 1.645% 2.458% 2.494% 0.694% 0.529% 2.691% 2.576%
signirene_cif.y4m 2.018% 1.884% 2.725% 2.65% 1.014% 0.728% 3.656% 3.553%
silent_cif.y4m 2.473% 2.18% 3.309% 2.983% 1.186% 1.487% 4.037% 4.039%
soccer_cif.y4m 3.419% 2.947% 4.354% 3.953% 0.696% 0.848% 4.495% 4.354%
stefan_sif.y4m 4.658% 5.264% 5.393% 6.513% 1.079% 0.942% 5.884% 6.671%
students_cif.y4m 2.815% 2.649% 3.55% 3.485% 1.217% 1.62% 4.468% 4.629%
tempete_cif.y4m 3.923% 4.078% 4.395% 4.705% 1.197% 1.485% 5.203% 5.757%
tennis_sif.y4m 3.285% 2.614% 3.819% 3.181% 1.063% 1.396% 4.566% 4.056%
waterfall_cif.y4m 3.383% 2.391% 4.465% 3.6% 2.385% 3.362% 6.235% 6.673%
OVERALL 2.841% 2.657% 3.637% 3.653% 0.991% 1.037% 4.293% 4.235%

Table 2. VP10 BDRATE results on midres set (VP9+ baseline)
Extended Transform ExtTx and Super Transform All New Transform

Set (ExtTx) Rectangular Tools
Transforms
Video BDRAT BDRAT BDRAT BDRAT BDRAT BDRAT BDRAT BDRAT

E E (SSIM) E E (SSIM) E E (SSIM) E E (SSIM)
(PSNR) (PSNR) (PSNR) (PSNR)
BQMall_832x480_60.y4m 2.631% 2.581% 3.79% 3.797% 1.9% 1.815% 5.263% 5.309%

BasketballDrillText_832x480_50.y4m 2.106% 1.828% 3.115% 2.516% 0.793% 0.697% 3.773% 3.168%
BasketballDrill_832x480_50.y4m 1.787% 1.809% 2.732% 2.676% 1.228% 1.412% 3.224% 3.022%
Flowervase_832x480_30.y4m 4.08% 3.629% 5.738% 6.134% 0.812% 0.847% 6.091% 6.49%
Keiba_832x480_30.y4m 5.077% 3.58% 6.513% 5.205% 0.096% 0.316% 6.597% 5.54%
Mobisode2_832x480_30.y4m 2.364% 1.973% 3.635% 3.322% 0.228% 0.05% 3.745% 3.611%
PartyScene_832x480_50.y4m 2.473% 2.29% 3.064% 3.105% 0.978% 0.75% 3.812% 3.69%
RaceHorses_832x480_30.yrm 1.209% 1.177% 1.61% 2.242% 0.407% 1.024% 2.735% 3.261%
aspen_480p.y4m 1.627% 1.647% 1.881% 1.903% 0.681% 0.497% 2.349% 2.337%
city_4cif_30fps.y4m 1.504% 1.664% 2.243% 2.507% 1.638% 1.658% 3.384% 3.348%
controlled_burn_480p.y4m 1.625% 1.534% 2.166% 2.201% 0.623% 0.489% 2.898% 2.628%
crew_4cif_30fps.y4m 0.773% 0.871% 2.173% 2.564% 0.266% 0.184% 2.073% 2.419%
crowd_run_480p.y4m 2.423% 2.121% 2.458% 2.554% 0.425% 0.318% 2.696% 2.265%
ducks_take_off_480p.y4m 2.655% 2.235% 4.625% 4.179% 0.499% 0.646% 4.819% 4.51%
harbour_4cif_30fps.y4m 1.381% 1.262% 3.252% 3.489% 0.274% 0.191% 3.25% 3.216%
ice_4cif_30fps.y4m 2.697% 3.609% 3.146% 4.447% 0.222% 0.253% 3.446% 4.501%
into_tree_480p.y4m 1.938% 1.551% 2.71% 2.494% 0.349% 0.215% 2.896% 2.507%
old_town_cross_480p.y4m 1.825% 1.899% 2.171% 2.174% 1.171% 0.963% 3.032% 2.644%
park_joy_480p.y4m 2.555% 1.725% 2.916% 2.393% 0.673% 0.723% 3.43% 3.417%
red_kayak_480p.y4m 1.393% 1.504% 2.049% 2.463% 0.124% 0.24% 2.062% 2.383%
rush_field_cuts_480p.y4m 2.318% 1.993% 2.506% 2.438% 0.469% 0.086% 2.672% 2.34%
sintel_trailer_2k_480p24.y4m 7.238% 2.588% 9.368% 3.717% 0.6% 0.107% 9.06% 3.706%
snow_mnt_480p.y4m 2.729% 2.074% 3.298% 2.695% 0.456% 0.034% 3.484% 2.858%

soccer_4cif_30fps.y4m 2.835% 2.399% 4.698% 4.169% 0.442% 0.327% 4.557% 4.136%

speed_bag_480p.y4m 1.366% 1.396% 2.6% 3.238% 0.546% 0.23% 2.793% 3.39%
station2_480p25.y4m 2.082% 1.704% 2.666% 2.01% 3.886% 4.536% 5.778% 6.066%
tears_of_steel1_480p.y4m 1.113% 0.923% 2.699% 2.349% 0.999% 0.621% 3.293% 2.952%
tears_of_steel2_480p.y4m 2.336% 2.412% 3.698% 3.882% 0.732% 0.974% 3.754% 3.378%
touchdown_pass_480p.y4m 1.242% 0.739% 1.82% 1.011% 0.553% 0.651% 2.447% 1.836%
west_wind_easy_480p.y4m 2.013% 1.64% 2.524% 2.439% 0.013% 0.044% 2.428% 2.248%
OVERALL 2.313% 1.945% 3.262% 3.011% 0.735% 0.659% 3.728% 3.439%

Table 3. VP10 BDRATE results on hdres set (VP9+ baseline)
Extended Transform ExtTx and Rectangular Super Transform All New Transform

Set (ExtTx) Transforms Tools
Video BDRATE BDRATE BDRATE BDRATE BDRATE BDRATE BDRATE BDRATE

(PSNR) (SSIM) (PSNR) (SSIM) (PSNR) (SSIM) (PSNR) (SSIM)
basketballdrive_1080p50.y4m 2.675% 2.993% 4.386% 5.034% 0.693% 0.947% 4.611% 5.362%

blue_sky_1080p30.y4m 1.625% 1.72% 2.048% 1.999% 1.219% 1.224% 2.757% 3.323%
bqterrace_1080p60.y4m 2.506% 2.366% 3.308% 3.732% 0.628% 0.942% 3.695% 4.261%
cactus_1080p50.y4m 2.054% 2.256% 3.087% 3.432% 1.426% 1.286% 3.989% 4.163%
chinaspeed_xga.y4m 11.902% 7.894% 12.627% 9.447% 0.453% 0.123% 12.864% 9.146%
city_720p30.y4m 1.449% 1.3% 2.422% 2.318% 0.723% 0.548% 3.127% 3.157%
crew_720p30.y4m 0.766% 0.751% 2.3% 2.381% 0.087% 0.014% 2.595% 2.782%
crowd_run_1080p50.y4m 1.516% 1.49% 1.97% 2.109% 0.542% 0.465% 2.19% 2.32%
cyclists_720p30.y4m 1.573% 1.778% 2.433% 2.169% 0.335% 0.573% 2.405% 2.066%
dinner_1080p30.y4m 1.98% 1.69% 3.378% 3.858% 0.541% 0.534% 3.555% 4.044%
ducks_take_off_1080p50.y4m 1.12% 1.077% 2.72% 2.857% 0.336% 0.546% 2.697% 2.955%
factory_1080p30.y4m 1.407% 1.145% 2.485% 2.031% 1.716% 1.687% 3.483% 2.874%
fourpeople_720p60.y4m 3.416% 3.625% 4.459% 4.744% 0.669% 0.823% 5.048% 5.264%
in_to_tree_1080p50.y4m 1.925% 2.117% 2.335% 2.403% 0.596% 0.555% 2.682% 2.807%
jets_720p30.y4m 4.027% 4.82% 5.712% 7.18% 1.175% 1.358% 5.212% 6.813%
johnny_720p60.y4m 4.414% 4.674% 6.109% 6.446% 0.915% 1.209% 6.466% 6.883%
kimono1_1080p24.y4m 0.92% 0.951% 1.237% 1.147% 0.192% 0.111% 1.316% 1.254%
kristenandsara_720p60.y4m 3.952% 3.549% 5.656% 5.426% 0.644% 0.644% 5.769% 5.543%
life_1080p30.y4m 4.289% 2.928% 5.631% 4.05% 0.759% 0.57% 6.111% 4.685%
mobcal_720p50.y4m 1.478% 1.055% 2.383% 1.963% 1.274% 1.029% 2.927% 2.54%
night_720p30.y4m 2.09% 2.009% 3.14% 3.207% 0.334% 0.236% 3.412% 3.411%
old_town_cross_720p50.y4m 2.195% 2.041% 2.802% 2.956% 0.778% 0.941% 3.245% 3.603%
parkjoy_1080p50.y4m 1.783% 1.379% 2.102% 1.974% 0.573% 0.654% 2.536% 2.851%
parkrun_720p50.y4m 2.376% 1.953% 3.11% 3.2% 0.659% 0.859% 3.4% 3.553%
parkscene_1080p24.y4m 2.069% 2.004% 2.797% 2.929% 0.591% 0.718% 2.933% 3.024%
ped_1080p25.y4m 1.642% 2.045% 3.001% 3.533% 0.003% 0.089% 2.857% 3.467%
riverbed_1080p25.y4m 0.399% 0.338% 0.922% 0.784% 0.062% 0.069% 0.85% 0.693%

rush_hour_1080p25.y4m 1.052% 1.438% 1.77% 2.111% 0.042% 0.203% 1.677% 2.19%

sheriff_720p30.y4m 1.907% 1.799% 3.257% 3.375% 0.279% 0.149% 3.41% 3.355%
shields_720p50.y4m 2.022% 2.01% 3.029% 3.453% 0.97% 0.93% 3.46% 3.909%
station2_1080p25.y4m 1.599% 1.55% 2.293% 2.127% 2.099% 2.78% 3.94% 4.459%
stockholm_ter_720p60.y4m 2.983% 3.151% 3.8% 4.285% 0.869% 0.999% 4.007% 4.51%
sunflower_720p25.y4m 1.415% 1.885% 2.191% 2.785% 0.068% 0.227% 1.669% 1.766%
tennis_1080p24.y4m 1.192% 1.152% 3.397% 3.381% 1.081% 1.363% 3.926% 4.214%
tractor_1080p25.y4m 1.539% 1.601% 2.845% 3.052% 0.149% 0.269% 2.625% 2.684%
vidyo1_720p60.y4m 3.748% 3.984% 4.735% 5.321% 0.682% 0.948% 4.851% 5.657%
vidyo3_720p60.y4m 3.794% 4.491% 5.565% 6.523% 1.79% 1.662% 6.535% 7.302%
vidyo4_720p60.y4m 2.791% 2.116% 3.946% 3.381% 1.076% 1.242% 4.433% 4.094%
OVERALL 2.41% 2.293% 3.458% 3.503% 0.708% 0.768% 3.77% 3.868%

We observe at least a 3% reduction in both PSNR and SSIM in all 3 video sets, confirming the advantage of a rich
veriety of transforms. Note that for comparison against VP9 (as opposed to VP9+ baseline), these results could be
expected to be 0.5 0.6 % better, however that is hard to verify given the current structure of our codebase.
5. CONCLUSION
In this paper we have presented a brief overview of the new transform coding tools that are being explored as part of
VP10 development. Preliminary results indicate that increasing transform flexibility can achieve at least a 3% decrease
in BDRATE for both average PSNR and SSIM. Although this is an encouraging improvement, we are left with several
avenues to explore within the space of transform flexibility, and still have a ways to go before we reach a viable next
generation codec. VP10 development is an opensource project, and we invite the rest of the video coding community to
join the effort to create tomorrow’s royaltyfree codec.
REFERENCES
[1] http://www.webmproject.org/
[2] J. Bankoski, J. Koleszar, L. Quillio, J. Salonen, P. Wilkins, Y. Xu, VP8 Data Format and Decoding Guide, RFC
6386, http://datatracker.ietf.org/doc/rfc6386/
[3] D. Mukherjee, J. Bankoski, R. S. Bultje, A. Grange, J. Han, J. Koleszar, P. Wlkins, Y. Xu, “The latest opensource
video codec VP9 an overview and preliminary results,” Proc. IEEE Picture Coding Symp., pp. 39093, San Jose,
Dec. 2013.
[4] D. Mukherjee, J. Bankoski, R. S. Bultje, A. Grange, J. Han, J. Koleszar, P. Wlkins, Y. Xu, “A Technical Overview
of VP9 the latest opensource video codec,” SMPTE Motion Imaging Journal, Jan/Feb 2015.
[5] Gary J. Sullivan, JensRainer Ohm, WooJin Han, and Thomas Wiegand, “Overview of the High Efficiency Video
Coding (HEVC) Standard,” IEEE Trans. on Circuits and Systems for Video Technology, Vol. 22, No. 12, Dec 2012.
[6] Thomas Wiegand, Gary J. Sullivan, Gisle Bjøntegaard; Ajay Luthra. "Overview of the H.264/AVC Video Coding
Standard," IEEE Transactions on Circuits and Systems for Video Technology, Vol. 13 No. 7, Jan 2011.
[7] D. Mukherjee, H. Su, J. Bankoski, A. Converse, J. Han, Z. Liu, Y. Xu, “An overview of video coding tools under
consideration for VP10: the successor to VP9,” Proc. SPIE, Applications of Digital Image Processing XXXVIII, vol.
9599, Sep 2015.
[8] Y. Chen, K. Rose, J. Han, and D. Mukherjee, "A Prefiltering Approach to Exploit Decoupled Prediction and
Transform Block Structures in Video Coding", Proc. IEEE International Conference on Image Processing (ICIP),
Oct. 2014.

[9] J. Han, A. Saxena, and K. Rose, “Towards jointly optimal spatial prediction and adaptive transform in video/image
coding,” Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Proc. (ICASSP), pp. 726–729, March 2010.
[10] J. Han, A. Saxena, V. Melkote, and K. Rose, “Jointly optimized spatial prediction and block transform for video and
image coding,” IEEE Transactions on Image Processing, vol. 21, pp. 18741884, April 2012.
[11] J. Han, Y. Xu, D. Mukherjee, “A butterfly structured design of the hybrid coding scheme,” Proc. IEEE Picture
Coding Symp., pp. 14, San Jose, Dec. 2013.
[12] C.L. Chang, Mina Makar, Sam S. Tsai, B. Girod, “Directionadaptive partitioned block transform for color image
coding,” IEEE Transactions on Image Processing, vol. 19, no. 7, July 2010.
[13] F. Kamisli and J. S. Lim, “1D transforms for the motion compensated residual,” IEEE Transactions on Image
Processing, vol. 20, no. 4, April 2011.
[14] Wang, Zhou; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. (20040401). “Image quality assessment: from error
visibility to structural similarity,” IEEE Transactions on Image Processing, vol. 13, No. 4, pp. 600–612, April 2004.
[15] G. Bjøntegaard, “Calculation of average psnr differences between rdcurves,” VCEGM33, 13th VCEG meeting,
Austin, Texas, March 2001.
Proc. of SPIE Vol. 9971 997119-10

On Transform Coding Tools Under Development For VP10

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

On Transform Coding Tools Under Development For VP10

Hochgeladen von

Copyright:

Verfügbare Formate

Proc. of SPIE Vol. 9971 997119-1

Downloaded From: http://proceedings.spiedigitallibrary.org/ on 10/23/2016 Terms of Use: http://spiedigitallibrary.org/ss/termsofuse.aspx

VP9 has already made great strides towards devolping effective transform coding tools. In the current implementation, a

VP10 seeks to build upon the previously mentioned transform tools in VP9, and introduce a richer and more flexible set

Proc. of SPIE Vol. 9971 997119-2

Downloaded From: http://proceedings.spiedigitallibrary.org/ on 10/23/2016 Terms of Use: http://spiedigitallibrary.org/ss/termsofuse.aspx

In VP9, we were restricted to a set of transform sizes that are always square. In VP10, we expanded our transform sizes

Proc. of SPIE Vol. 9971 997119-3

Downloaded From: http://proceedings.spiedigitallibrary.org/ on 10/23/2016 Terms of Use: http://spiedigitallibrary.org/ss/termsofuse.aspx

Proc. of SPIE Vol. 9971 997119-4

Downloaded From: http://proceedings.spiedigitallibrary.org/ on 10/23/2016 Terms of Use: http://spiedigitallibrary.org/ss/termsofuse.aspx

Proc. of SPIE Vol. 9971 997119-5

Downloaded From: http://proceedings.spiedigitallibrary.org/ on 10/23/2016 Terms of Use: http://spiedigitallibrary.org/ss/termsofuse.aspx

Extended Transform Extended Transform Set Super Transform All New Transform

Video BDRATE BDRATE BDRATE BDRATE BDRATE BDRATE BDRATE BDRATE

akiyo_cif.y4m ­2.656% ­2.024% ­3.074% ­1.833% ­1.099% ­0.868% ­3.971% ­3.424%

Proc. of SPIE Vol. 9971 997119-6

Downloaded From: http://proceedings.spiedigitallibrary.org/ on 10/23/2016 Terms of Use: http://spiedigitallibrary.org/ss/termsofuse.aspx

paris_cif.y4m ­3.838% ­3.452% ­4.621% ­4.644% ­0.786% ­0.754% ­4.949% ­4.751%

Extended Transform Ext­Tx and Super Transform All New Transform

Video BDRAT BDRAT BDRAT BDRAT BDRAT BDRAT BDRAT BDRAT

BQMall_832x480_60.y4m ­2.631% ­2.581% ­3.79% ­3.797% ­1.9% ­1.815% ­5.263% ­5.309%

Proc. of SPIE Vol. 9971 997119-7

Downloaded From: http://proceedings.spiedigitallibrary.org/ on 10/23/2016 Terms of Use: http://spiedigitallibrary.org/ss/termsofuse.aspx

soccer_4cif_30fps.y4m ­2.835% ­2.399% ­4.698% ­4.169% ­0.442% ­0.327% ­4.557% ­4.136%

Extended Transform Ext­Tx and Rectangular Super Transform All New Transform

Video BDRATE BDRATE BDRATE BDRATE BDRATE BDRATE BDRATE BDRATE

basketballdrive_1080p50.y4m ­2.675% ­2.993% ­4.386% ­5.034% ­0.693% ­0.947% ­4.611% ­5.362%

Proc. of SPIE Vol. 9971 997119-8

Downloaded From: http://proceedings.spiedigitallibrary.org/ on 10/23/2016 Terms of Use: http://spiedigitallibrary.org/ss/termsofuse.aspx

rush_hour_1080p25.y4m ­1.052% ­1.438% ­1.77% ­2.111% ­0.042% ­0.203% ­1.677% ­2.19%

Proc. of SPIE Vol. 9971 997119-9

Downloaded From: http://proceedings.spiedigitallibrary.org/ on 10/23/2016 Terms of Use: http://spiedigitallibrary.org/ss/termsofuse.aspx

Proc. of SPIE Vol. 9971 997119-10

Downloaded From: http://proceedings.spiedigitallibrary.org/ on 10/23/2016 Terms of Use: http://spiedigitallibrary.org/ss/termsofuse.aspx

Das könnte Ihnen auch gefallen

akiyo_cif.y4m 2.656% 2.024% 3.074% 1.833% 1.099% 0.868% 3.971% 3.424%

paris_cif.y4m 3.838% 3.452% 4.621% 4.644% 0.786% 0.754% 4.949% 4.751%

Extended Transform ExtTx and Super Transform All New Transform

BQMall_832x480_60.y4m 2.631% 2.581% 3.79% 3.797% 1.9% 1.815% 5.263% 5.309%

soccer_4cif_30fps.y4m 2.835% 2.399% 4.698% 4.169% 0.442% 0.327% 4.557% 4.136%

Extended Transform ExtTx and Rectangular Super Transform All New Transform

basketballdrive_1080p50.y4m 2.675% 2.993% 4.386% 5.034% 0.693% 0.947% 4.611% 5.362%

rush_hour_1080p25.y4m 1.052% 1.438% 1.77% 2.111% 0.042% 0.203% 1.677% 2.19%