Sie sind auf Seite 1von 9

Joint Video Team (JVT) of ISO/IEC MPEG & ITU-T VCEG Document: JVT-Q039

(ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6) Filename: 356147453.doc


17th Meeting: Nice, FR, 14-21 October, 2005

Title: CE7 Report, FGS coding for low-delay applications


Status: Input Document to JVT
Purpose: Proposal
Author(s) or Yiliang Bao, Marta Karczewicz +1 972 374 1369
Contact(s): Tel: yiliang.bao@nokia.com
Nokia Research Center
Email:
6000 Connection Drive
Irving, Texas 75039, USA
Source: Nokia Research Center, Nokia Inc.
_____________________________

Abstract
This contribution reports Nokias results of CE7 on improving the FGS layer coding performance
of close-loop P frames. Temporal prediction is introduced into FGS layer coding and it is formed
adaptively from the enhancement layer reference and the base layer reference, based on the
information coded in the base layer. The new algorithm can control the drift due to the partial
decoding, while at the same time achieve high coding efficiency. Much effort is also spent on
reducing the complexity of the new solution by using low-complexity motion compensation and
minimizing additional transform operations. With minimal increase in the complexity, the FGS
coding performance of close-loop P frames can be improved by as much as 4 dB for one FGS
layer on a base layer coded at QP 42.

Introduction
For real-time video communication applications, such as video conferencing, minimizing end-to-
end delay is very critical to ensure good interaction among the participants. The low-delay
requirement is usually met by encoding each video frame as either I-frames or P-frame.

JSVM currently uses the prediction only from the base layer in coding the FGS layer of close-
loop P frames in order to avoid drift. One problem of using the enhancement layer in prediction
is mismatch between the reference frame used by the encoder and that by the decoder when
the bitstream is only partially decoded. This mismatch in the reference frames could cause the
accumulation of error, and result in drift. The drift can be put under control if the accumulated
error is bounded. The leaky prediction is an effective method to achieve that by using a
reference signal which is the weighted average between the base layer reference and
enhancement layer reference.

JVT-O054 proposed a leaky prediction based solution to improve the FGS coding of the close-
loop P frames by using temporal prediction signal which is adaptively formed from both the
enhancement layer reference frame and base layer reference frame based on the information
coded in the base layer. This solution is referred to as AR-FGS (FGS coding with adaptive
reference). AR-FGS significantly improves the FGS layer coding efficiency with effective control
on the drift. However, the complexity of the FGS coder is increased because of the necessity of
additional motion compensation and transform operations.

File: 356147453.doc Page: 1 Date Saved: 2005-10-12


JVT-P087 proposed an alternative motion compensation technique to compute the reference
signal used in the enhancement layer coding. Specifically, it suggested to perform a simpler
interpolation on the difference between the enhancement layer reference frame and the base
layer reference frame, then add the results to the base layer reference signal to get the
enhancement layer reference signal. The base layer reference signal is already computed in
the base layer coding. This approach has the similar complexity as applying the same filter
directly on the enhancement layer reference frame, but quality of reference signal is better.

In this experiment, the idea in JVT-P087 is incorporated into AR-FGS. We also performed
further tuning on the design to reduce the complexity and improve the coding performance.

Description of the approach

Change to the JSVM FGS coder


This proposal only changes how the prediction signal used in FGS coding is calculated. The
new prediction signal will replace the original base layer prediction signal, and the other parts of
the FGS coder are not affected.

Formation of new prediction signal

In JVT-O054, the coefficients being coded in the enhancement layer are classified based on the
information coded in the base layer, and different leaky factors are used in forming the
prediction for coefficients in different categories. Specifically, it proposed the following algorithm
to form the prediction signal used in FGS layer coding.

n
For a block of size M N , X n , being coded in the FGS layer, the actual reference signal, Ra
n
, is formed as a weighted average between the base layer reconstruction, X b , and the
n 1 n
enhancement reference signal, Re , if the coefficients, Qb , coded in the base layer
collocated block are all zero.
Ran (1 ) X bn Ren 1 if Qb 0
n
(1)

n n 1
Otherwise, transform is performed on X b and Re to obtain the transform coefficients,
FX n f ( X bn ) , and FR n1 f ( Ren 1 ) respectively. A coefficient block FRan is formed based on
b e

the base layer coefficients.

FR n (u , v ) (1 ) FX n (u , v) FR n 1 (u , v ) if Qb (u , v) 0
n
(2)
a b e

FR n (u, v) FX n (u , v) if Q (u , v) 0
n
b (3)
a b

Inverse transform is performed on FRan to obtain the new reference block.


If all the coefficients in the base layer collocated block are zero, the base layer reconstructed
n n 1
block X b is the same as the base layer reference block Rb . Similarly if one particular
coefficient coded in the base layer is zero, the corresponding coefficient in base layer
reconstruction block is equal to the corresponding coefficient in the base layer reference block.
n
So equations (1)(2)(3) are equivalent to equations (4)(5)(6). In equation (6), DQb (u , v) is the
base layer dequantized coefficients.

Ran Rbn 1 ( Ren 1 Rbn 1 ) if Qb 0


n
(4)

File: 356147453.doc Page: 2 Date Saved: 2005-10-12


FR n (u, v) FR n1 (u , v ) ( FR n1 (u , v ) FR n1 (u, v)) if Qb (u , v) 0
n
a b e b

(5)
FR n (u , v) FR n 1 (u , v) DQ (u , v)
n
b if Qb (u , v) 0
n
(6)
a b

n
Equation (4)(5)(6) can be combined into a unified equation. Pb is the reconstructed prediction
residual coded in the base layer.

Ran Rbn 1 Rdn 1 ' Pbn X bn Rdn 1 ' (7)

The high-level structure of AR-FGS is illustrated in Figure 1. The highlighted area is the
additional module that needs to be added to JSVM.

Figure 1 Addition to the JSVM for incorporating AR-FGS

n 1
The adjusted differential reference block, Rd ' , is calculated from the differential reference
n 1 n 1 n 1
block, Rd Re Rb , as follows.

If the base layer collocated block does not have any nonzero coefficients. The differential
reference block is scaled by using a scaling factor .
Rdn 1 ' Rdn 1 if Qb 0
n
(8)
Otherwise, transform is performed on Rdn 1 to obtain the transform coefficients
n 1
FR n1 f ( R d ) . A coefficient FRdn1 (u , v ) is scaled by using a scaling factor if the base
d

layer coefficient is zero, otherwise it is set to 0.

File: 356147453.doc Page: 3 Date Saved: 2005-10-12


FR n1 ' (u , v) FR n1 (u , v) if Qb (u , v) 0
n
d d

(9)
FR n1 ' (u , v ) 0 if Qb (u , v) 0
n
d

(10)

Inverse transform is performed to get the adjusted differential reference block.

The adjusted differential reference block is added to base layer constructed block to obtain the
reference block used in FGS layer coding.

As suggested in JVT-P087, a simple interpolation filter can be applied to differential reference


n 1
frame to calculate Rd .

Additional complexity due to transform operations

Transform is needed only for the block that its base layer collocated block has nonzero
coefficients. For normal coding condition, the percentage of this type of block is usually
negligible.

According to the experimental results, the enhancement layer reference block no longer helps in
coding the current block if the base layer collocated block has certain amount of nonzero
coefficients. In generating the CE7 results, only these 4x4 blocks that their base layer
collocated blocks have 1 to 4 nonzero coefficients are transformed, and coefficient-based
scaling is performed. If there are more than 4 nonzero coefficients in the base layer collocated
block, no enhancement reference is used, and no transform is needed as well.

Additional classification on zero base layer blocks


The blocks that their base layer collocated blocks do not have any nonzero coefficients can still
be classified since they have different probabilities of becoming nonzero in the enhancement
layer. In the software submitted, the CABAC coding context for coded block flag is used for
classifying these blocks. If the coding context is nonzero, it means that at least one of the two
neighboring blocks are nonzero, and the current block is more likely to become nonzero in the
FGS layer. In this case, a smaller factor is used, i.e., less enhancement layer reference signal
is used. Some other classification schemes, like using explicit signaling, can be used. These
approaches are yet to be explored.

Proposed syntax modifications


slice_header_in_scalable_extension( ) { C Descriptor

if( slice_type == Progressive_Refinement ) {
ar_fgs_usage_flag f(1)
if(ar_fgs_usage_flag ) {
ref_scale_for_zero_base_block_plus_1 ue(v)
ref_scale_for_zero_base_coeff f(4)
}
}
}

File: 356147453.doc Page: 4 Date Saved: 2005-10-12


Semantics

ar_fgs_usage_flag signals that the FGS layer is coded with AR-FGS.

ref_scale_for_zero_base_block_plus_1 signals the scaling factor to be applied to differential


reference signal if the base layer collocated block does not have nay nonzero coefficients. Its
value is in the range [0, 32], inclusive. Actual scaling factor is (32
(ref_scale_for_zero_base_block_plus_1 - 1) )/32. ref_scale_for_zero_base_block_plus_1 of
value 0 indicates that the prediction is completely from the base layer.

ref_scale_for_zero_base_coeff signals the scaling factor to be applied to the coefficient in the


differential reference block if the coefficient in the base layer collocated block is zero. Its value
is in the range [0, 15], inclusive. Actual scaling factor is (16
ref_scale_for_zero_base_coeff)/16. ref_scale_for_zero_base_coeff of value 0 indicates that the
prediction is completely from the base layer.

Experiments and results


According to the test conditions specified in CE7 description document, JVT-P307r1, following
tests are performed.

Single FGS layer coding tests


o Test sequences, bus/foreman/football/mobile CIF sequences at 15 FPS,
city/crew/harbour/soccer 4CIF sequences 30 FPS.
o For each sequence, each frame is coded as a close-loop P frame, except for the
first intra frame. The base layer is coded at QP of 42, and one FGS layer is
coded on the base layer.
o Repeat the experiments using base layer QP of 30.
o Decode the bitstream at 10% bit rate increment starting from the base layer.
Combined scalability tests
o Test sequences, bus/foreman/football/mobile/crew.
o Using the common test conditions.

For single FGS layer coding tests, 4 different FGS coders are used.

Original JSVM
Modified JSVM with AR-FGS using AVC interpolation filter in enhancement layer motion
compensation
Modified JSVM with AR-FGS using Bilinear interpolation in differential reference frame
motion compensation
Modified JSVM with AR-FGS using 4-tap polyphase filters in differential reference frame
motion compensation.

The 4-tap polyphase filters originally proposed in JVT-D029 were tested. As analyzed in JVT-
D029, direct interpolation using 4-tap polyphase filter requires much less computation then 2-
step AVC interpolation.

{ 0, 16, 0, 0},
{-2, 14, 5, -1},
{-2, 10, 10, -2},
{-1, 5, 14, -2}

All the experiment data are listed in the attached file JVT-Q039.xls.

File: 356147453.doc Page: 5 Date Saved: 2005-10-12


The FGS coder performance with the new algorithms is generally significantly better than that of
original JSVM3.0. The improvement can be as much as 4dB for one layer of FGS coding, when
the base layer is encoded at QP of 42. However for some sequences, the improvement of FGS
coding performance at base layer QP of 30 is very small. These sequences are relatively noisy.
When base layer QP of 30 is used, the enhancement layer no longer provides any better
prediction.

Among 3 types of filters, AVC filters give the best performance. However, direct interpolation
using 4-tap polyphase filter is almost as good as AVC filters except for city. Bilinear filters also
give very good performance, especially considering the complexity of bilinear interpolation is
minimal.

In generating the results, the motion estimation is performed using a reference frame that is
upsampled by using AVC interpolation filter. The performance of FGS coder using either 4-tap
polyphase filter of bilinear filters can potentially be improved if the filtered used in motion
estimation matches that used in motion compensation.

For combined scalability tests, the improvement to coding performance can be up to 0.19dB for
luma, and 0.38dB for chroma, even though the new algorithms are applied only to coding of
FGS layer of close-loop P frames.

Software modifications
The new algorithms have been integrated into the latest JSVM software. The integrated
software was extensively tested with different configurations and has been distributed to other
participants of the CE.

Conclusions
The proposed solution significantly improves the coding performance of FGS layer of the close-
loop P frames. The additional complexity is minimal since the large coding gain can be
achieved by using an interpolation filter as simple as bilinear interpolation.

We propose that the method is included in the next release of JSVM.

References
1. Yiliang Bao, Marta Karczewicz, Justin Ridge, Xianglin Wang, JVT-O054, Improvements
to Fine Granularity Scalability for Low-Delay Applications, April 2005, Busan, Korea.
2. Julien Reichel, Heiko Schwarz, and Mathias Wien, JVT-P201, Scalable Video Coding
Working Draft 3, July 2005, Poznan, Poland.
3. Yiliang Bao, JVT-P 307r1, Core Experiment on FGS coding for low-delay applications
(CE-7), July 2005, Poznan, Poland.
4. Antti Hallapuro, Jani Lainema, and Marta Karczewicz, JVT-D029, "4-tap filter for bi-
predicted macroblocks", Klagenfurt, Austria, July, 2002.

File: 356147453.doc Page: 6 Date Saved: 2005-10-12


(Append for Proposal Documents)

JVT Patent Disclosure Form


International Telecommunication Union International Organization for Standardization International Electrotechnical Commission
Telecommunication Standardization Sector

Joint Video Coding Experts Group - Patent Disclosure Form


(Typically one per contribution and one per Standard | Recommendation)

Please send to:


JVT Rapporteur Gary Sullivan, Microsoft Corp., One Microsoft Way, Bldg. 9, Redmond WA 98052-6399, USA
Email (preferred): Gary.Sullivan@itu.int Fax: +1 425 706 7329 (+1 425 70MSFAX)

This form provides the ITU-T | ISO/IEC Joint Video Coding Experts Group (JVT) with information about the patent
status of techniques used in or proposed for incorporation in a Recommendation | Standard. JVT requires that all
technical contributions be accompanied with this form. Anyone with knowledge of any patent affecting the use of
JVT work, of their own or of any other entity (third parties), is strongly encouraged to submit this form as well.

This information will be maintained in a living list by JVT during the progress of their work, on a best effort basis.
If a given technical proposal is not incorporated in a Recommendation | Standard, the relevant patent information
will be removed from the living list. The intent is that the JVT experts should know in advance of any patent
issues with particular proposals or techniques, so that these may be addressed well before final approval.

This is not a binding legal document; it is provided to JVT for information only, on a best effort, good faith basis.
Please submit corrected or updated forms if your knowledge or situation changes.

This form is not a substitute for the ITU ISO IEC Patent Statement and Licensing Declaration, which should be
submitted by Patent Holders to the ITU TSB Director and ISO Secretary General before final approval.

Submitting Organization or Person:


Organization name Nokia
6000 Connection Drive,
Irving, TX75039
Mailing address
Country USA
Contact person Yiliang Bao
Telephone 972-374-1369
Fax
Email yiliang.bao@nokia.com
Place and date of Nice, France, October, 2005
submission
Relevant Recommendation | Standard and, if applicable, Contribution:
Name (ex: JVT) JVT, Scalable Video Coding
Title
Contribution number JVT-Q039

(Form continues on next page)

File: 356147453.doc Page: 7 Date Saved: 2005-10-12


Disclosure information Submitting Organization/Person (choose one box)

2.0 The submitter is not aware of having any granted, pending, or planned patents associated with the
technical content of the Recommendation | Standard or Contribution.

or,

The submitter (Patent Holder) has granted, pending, or planned patents associated with the technical content of the
Recommendation | Standard or Contribution. In which case,

2.1 The Patent Holder is prepared to grant on the basis of reciprocity for the above Recommendation |
Standard a free license to an unrestricted number of applicants on a worldwide, non-discriminatory
basis to manufacture, use and/or sell implementations of the above Recommendation | Standard.

2.2 The Patent Holder is prepared to grant on the basis of reciprocity for the above Recommendation |
Standard a license to an unrestricted number of applicants on a worldwide, non-discriminatory basis
and on reasonable terms and conditions to manufacture, use and/ or sell implementations of the above
Recommendation | Standard.

Such negotiations are left to the parties concerned and are performed outside the ITU | ISO/IEC.

x 2.2.1 The same as box 2.2 above, but in addition the Patent Holder is prepared to grant a royalty-free license
to anyone on condition that all other patent holders do the same.

2.3 The Patent Holder is unwilling to grant licenses according to the provisions of either 2.1, 2.2, or 2.2.1
above. In this case, the following information must be provided as part of this declaration:
patent registration/application number;
an indication of which portions of the Recommendation | Standard are affected.
a description of the patent claims covering the Recommendation | Standard;

In the case of any box other than 2.0 above, please provide the following:

Patent number(s)/status

Inventor(s)/Assignee(s)

Relevance to JVT

Any other remarks:

(please provide attachments if more space is needed)

(form continues on next page)

File: 356147453.doc Page: 8 Date Saved: 2005-10-12


Third party patent information fill in based on your best knowledge of relevant patents granted, pending, or
planned by other people or by organizations other than your own.

Disclosure information Third Party Patents (choose one box)

3.1 The submitter is not aware of any granted, pending, or planned patents held by third parties associated
with the technical content of the Recommendation | Standard or Contribution.

3.2 The submitter believes third parties may have granted, pending, or planned patents associated with the
technical content of the Recommendation | Standard or Contribution.

For box 3.2, please provide as much information as is known (provide attachments if more space needed) - JVT will
attempt to contact third parties to obtain more information:

3rd party name(s)

Mailing address
Country
Contact person
Telephone
Fax
Email
Patent number/status
Inventor/Assignee
Relevance to JVT

Any other comments or remarks:

File: 356147453.doc Page: 9 Date Saved: 2005-10-12