Beruflich Dokumente
Kultur Dokumente
2058
Intra-view Inter-view Inter-view First, VISIs are generated for KFs and WZF; KFs are
camera camera camera
the temporal reference frames of motion compensated pre-
diction. It is possible to apply any algorithm to interpo-
Time t-1 late inter-view frames. The most important requirement of
this view synthesis process is providing geometrically cor-
Motion/Disparity Motion Compensation rect predictions. In this paper, depth maps are estimated for
Estimation the view synthesis target frame and simple 3D warping is
Time t used to synthesize the view [17]. Note that depth estimation
is conducted at the decoder side, not at the encoder side: this
means that no depth information is encoded. Depth maps d
are estimated by minimizing the following cost function E;
Figure 4: MVME scheme.
E (d) = ∑ (IL (d p , p) − IR (d p p))2 + λ ∑ |d p − dq | (1)
p {p,q}∈N
Intra-view Inter-view Intra-view Inter-view
camera camera camera camera , p represents a pixel in the depth map, N represents the set of
Time t-1 KF KF adjacent pixels, d p represents the depth value of pixel p, and
IL (d p , p) and IR (d p , p) represent the pixel value of the corre-
VISI
Motion Estimation Motion Compensation sponding pixels of center view’s pixel p with depth d p on left
and right views, respectively. This minimization problem is
Time t KF WZF KF WZF solved by graph cut.
VISI
Second, all generated VISIs are low pass filtered to im-
prove the reliability of the motion vectors because warping-
based view interpolation introduces artificial noise, espe-
Figure 5: Basic concept for the VSME technique. cially at high frequencies. Next, a block matching algorithm
is used to estimate the motion of each block in KF by using
VISI for KF and VISI for WZF. The parameters are the block
size, the search window size, and the search range.
over, the proposed method uses only temporal image signals Third, the weighted vector median filter is applied to in-
to generate the final SI while MVME uses both temporal and crease spatial coherence of estimated motion vectors [18].
inter-view. Last, motion compensated frame prediction is performed
by using the obtained motion vector field and KF. If both for-
3. VIEW SYNTHESIS MOTION ESTIMATION ward and backward KFs are available, motion vector fields
are estimated separately and then the bidirection motion
As already described, one of the fatal drawbacks of MCTI, compensation of standard video coding is performed to ob-
which interpolates frames temporally, is the assumption of tain the final SI.
linear uniform motion. On the other hand, VISI, which is
one of the geometrically precise inter-view frame interpola-
4. EXPERIMENTS
tion methods, fails to generate high quality SI due to its in
ability to compensate the inter-view signal mismatch caused The breakdancers and ballet multiview test sequences were
by heterogeneous camera settings and non-Lambert reflec- used in the simulations [19]. The spatial resolution is
tion. Therefore, we propose a novel SI generation method 256x192 and temporal resolutions are 15 fps for both se-
by combining these two methods while eliminating their de- quences. Both sequences contain rapid and random motions
fects. In other words, the proposed method predicts image with relatively large static background.
signals temporally with no assumption of motion model. There are many possibilities for the positions of the KFs
The main idea behind the proposed method, View Syn- and WZFs. We chose the simplest setting as illustrated in
thesis Motion Estimation (VSME), is depicted in Figure 5. Figure 6. The views are separated into two categories: intra-
The motion vectors are estimated between VISIs, which are view and inter-view. Inter-view is decoded jointly with the
generated on both KF and WZF. Because VISI can achieve other views while intra-view is decoded without any other
geometrically precise frame interpolation, it is possible to views. Every second view was defined as intra-view and
consider VISI on WZF as a correct estimate of the scene of the others as inter-view. All the frames in intra-view are en-
WZF. Therefore, it becomes possible to estimate motions be- coded by H.264/AVC Intra. However the proposed scheme
tween KF and WZF with no assumption of a motion model doesn’t care whether the intra-views employ the mono-view
by estimating motions between VISI for KF and VISI for DVC scheme or not as long as the frames that have the same
WZF. It is also possible to conduct motion estimation be- timestamps with KFs and WZF, which are reference and tar-
tween KF and VISI for WZF. However it is difficult to find get frames for frame interpolation, are available. We also
precise motion vectors because of the signal mismatch be- assumed that KFs are placed at every second position, and
tween KF and VISI; the motion estimation errors degrade encoded by H.264/AVC Intra. Note that it is easy to consider
the resulting SI quality. The drop in SI quality can be off- the case where the KF interval is longer. In the experiments
set by the low computational complexity because the process below, only view 4 was evaluated because it is easy to extend
of VISI generation requires a lot of computation. Therefore, the simulation to the other inter-views.
one future work is to establish an accurate motion estimation In the experiments, we evaluated just the SI quality.
algorithm with KF and VISI. Details of VSME are explained Since many SIs have been proposed for MDVC, fusion tech-
below. niques have also been investigated to generate better SI by
2059
time MCTI DCVP MVME-Motion MVME-All VISI AFVISI VSME
37
Intra-view
(view 3) 35
27
2060
MCTI DCVP MVME-Motion MVME-All VISI AFVISI VSME
44
43
42
41
PSNR-Y [dB]
40
39
38
37
36
35
0.30 0.50 0.70 0.90 1.10 1.30 1.50
bitrage [bpp]
PSNR-Y [dB]
41
40
39
IEEE Workshop on Statistical Signal Processing, 2003,
38
pp. 30–33.
37
[5] G. Toffetti, M. Tagliasacchi, M. Marcon, A. Sarti,
36
S. Tubaro, and K. Ramchandran, “Image compression 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6
in a multi-camera system based on a distributed source bitrage [bpp]
coding approach,” in Processdings of European Signal
Processing Conference 2005, September 2005.
Figure 9: RD performance(top:breakdancers, bottom:ballet).
[6] X. Artigas, E. Angeli, and L. Torres, “Side information
generation for multiview distributed video coding using
a fusion approach,” in Proceedings of the 7th Nordic [14] J. Ascenso, C. Brites, and F. Pereira, “Improving frame
Signal Processing Symposium 2006 (NORSIG 2006), interpolation with spatial motion smoothing for pixel
June 2006, pp. 250–253. domain distributed video coding,” in the 5th EURASIP
[7] M. Ouaret, F. Dufaux, and T. Ebrahimi, “Iterative mul- Conference on Speech and Image Processing, Multime-
tiview side information for enhanced reconstruction in dia Communications and Services, July 2005.
distributed video coding,” EURASIP Journal on Image [15] S. Shimizu, Y. Tonomura, H. Kimata, and Y. Ohtani,
and Video Processing, 2009. “Improved view interpolation for side information in
[8] M. Ouaret, F. Dufaux, and T. Ebrahimi, “Multiview multiview distributed video coding,” in Proceedings
Distributed Video Coding with Encoder Driven Fu- of Third ACM/IEEE International Conference on Dis-
sion,” in The 2007 European Signal Processing Con- tributed Smart Cameras, 2009, pp. 1–8.
ference (EUSIPCO-2007), 2007. [16] X. Artigas, F. Tarres, and L. Torres, “A comparison of
[9] M. Ouaret, F. Dufaux, and T. Ebrahimi, “Fusion-based different side information generation methods for mul-
multiview distributed video coding,” in VSSN ’06: Pro- tiview distributed video coding,” in SIGMAP, 2007, pp.
ceedings of the 4th ACM international workshop on 450–455.
Video surveillance and sensor networks, 2006, pp. 139– [17] S. Yea and A. Vetro, “View synthesis prediction
144. for rate-overhead reduction in ftv,” in Proc. 3DTV-
[10] D. Slepian and J. K. Wolf, “Noiseless coding of corre- Conference, May 2008, pp. 145–148.
lated information sources,” Information Theory, IEEE [18] L. Alparone, M. Barni, F. Bartolini, and V. Cappellini,
Transactions on, vol. 19, no. 4, pp. 471–480, July 1973. “Adaptively weighted vector-median filters for motion-
[11] A. D. Wyner and J. Ziv, “The rate-distortion function fields smoothing,” in Proceedings of IEEE Interna-
for source coding with side information at the decoder,” tional Conference on Acoustics, Speech, and Signal
Information Theory, IEEE Transactions on, vol. 22, Processing, 1996, pp. 2267 – 2270.
no. 1, pp. 1–10, January 1976. [19] C. L. Zitnick, S. B. Kang, M. Uyttendaele, S. Winder,
[12] R. Puri and K. Ramchandran, “Prism: A video cod- and R. Szeliski, “High-quality video view interpolation
ing architecture based on distributed compression prin- using a layered representation,” ACM Trans. Graph.,
ciples,” EECS Department, University of California, vol. 23, no. 3, pp. 600–608, 2004.
Berkeley, Tech. Rep. UCB/ERL M03/6, 2003. [20] X. Artigas, J. Ascenso, M. Dalai, S. Klomp,
[13] A. Aaron, R. Zhang, and B. Girod, “Wyner-ziv coding D. Kubasov, and M. Ouaret, “The discover codec: Ar-
of motion video,” in Proc. Asilomar Conference on Sig- chitecture, techniques and evaluation,” in Proceedings
nals and Systems, 2002, pp. 240–244. of Picture Coding Symposium 2007, 2007.
2061