Sie sind auf Seite 1von 4


A . Giannoula and D. Hatzinakos

The Edward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto, Toronto, Ontario M5S 3G4 e-mail: { alexia,dimitris}

Considerable research has been conducted lately on attempting to integrate the dual processes of signal compression and perceptual coding, in order to achieve high signal quality at low hit rates, while ensuring, at the same time, copyright protection of the transmitted digital media. An unconventional data hiding technique leading to compressed forms of source video signals, is introduced in this paper, where both the audio information and the chrominance (1 and Q ) components of the video frames are hidden in the wavelet transform coefficients of the luminance (U) component, on a frame-by-frame basis. Furthermore, the proposed scheme allows for a subsequent JPEG compression of each frame for additional bitrate reduction, while maintaining satisfying quality of the reconstructed audio and chrominance information.

Various compression algorithms aimed at distributing video signals through communication networks, operating on channels with low bandwidth have been proposed so far. The majority of these techniques [ I , 2, 31 attempt to minimize the bit rate in the digital representation of the input signal while preserving acceptable levels of signal quality, hy exploiting perceptual irrelevancy of the source data and removing spatial and temporal redundancies. At the same time, data hiding has lately emerged as a means of copyright enforcement (watermarking) and content verification of multimedia data, in order to prevent their illegal manipulation and retransmission through digital networks. By allowing signature information to be imperceptibly embedded in the host signal, data hiding techniques perform in contradiction to the process of compression, since they add redundancy to the encoder bit stream. Therefore, several methods have been explored [4, 5, 6, 7, 81, in order to achieve a fair tradeoff between these processes and enhance the efficiency of digital communications. In [9],
This work was suppotied by lhe Communications loformalion Technology Ontano (CITO)

a generic source and channel coding framework for hiding compressible signature data in host video is proposed. In [IO] an unconventional approach is undertaken, where wavelet-based data hiding is used for improving practical compression of still images. The specific technique has been, in deed, reported to outperform JPEG and straightforward SPIHT. A compressive data hiding technique for video signals is introduced in this paper, operating on a frame-by-frame basis. The proposed scheme. essentially evolves the one presented in [IO], in a sense that it is still based on snhhand and wavelet decompositions and, by utilizing perceptual redundancies, it considers direct replacement of specific wavelet transform coefficients of the luminance (Y component) of each frame with a) the color information (1 and Q components) after subsampling and b) the DCT coefficients of the a u d o signal. The novelty of the proposed framework, is highlighted mainly on hiding the audio along with the color information for benefiting with regard to compression and synchronization issues. The suggested scheme guarantees eflicient reproduction of both the audio signal and the chrominance components at the receiver, even alter having imposed JPEG frame compression for further reduction of the bitrate.


Consider that the source video signal to be compressed and transmitted through a given communication network, is represented by a sequence of N still color images (frames) denoted by Pi, i = l , . . . ,N and an audio signal S. The YIQ color space is treated throughout this paper, where the luminance component Y , of each frame, which contains most of the visual information, acts as the host signal. Furthermore, the discrete wavelet transform (DWT) domain is incorporated in our method, because of its good energy compaction properties and its frequency splitting, which allows for efficient coding matched to the statistics of each frequency band and to the characteristics of the human visual system. Finally, the audio signal is treated independently and is,



0-7803-7750-8/03/.$17.0002003 IEEE

1 - 529

initially, divided into segments of equal length for increased efficiency. Subsequently, each audio segment is D C I transformed and then hidden in the wavelet coefficients of the luminance coordinate. For this purpose, a group of frames (GOF) is considered, such that the same audio information is embedded in a few successive frames, in order for the method to efficiently deal with adding or dropping frames and to increase robustness to compression and noise. The choice ofthe DCT domain was based on the well-estahlished audio compression standards [I I], since it involves concentration of energy in a few low-frequency coefficients and it was experimentally verified to be the hest selection for the preprocessing of the sound signal prior to embedding, for the scheme under consideration.

for the numher of successive frames to host the same sound information. Such approach was indicated to introduce an additional layer of tolerance to image processing operations and to achieve higher reproduction efficiency. However the size of the GOF is limited by the host suhhand bandwidth and can be easily shown that it can he described by the following formula:

where N 1 ,

NZrepresent the original frame dimensiofs and

/ defines the final level of the frame decomposition (in our case 1 = 2).


According to the technique developed in [IO], the luminance component 1 undergoes, initially, a one-level wavelet : decomposition (DWT), on a frame-by-frame basis, based on an 8 tap Dauhachies filter:
{YLLir YLJfir Y J f L i r I ' J f H i } ,

Each audio segment is, then, DCT transformed and the resulting coefficients Xk(n),k = 1 , . . . , M , n = 1 , . . . N, replace N, wavelet coefficients ofthe Y L L , suhband at a ,~~ number of GOF successive frames (it is implied that Ai, 5 For handling security issues, a pseudo-random criterion known to both the transmitter and receiver and an encryption key, accessible only by authorized receivers, are utilized. The produced modified suhband is denoted by



= 1, . . . >


The mid-frequency subhands, I ~ H , Y H L ~ ,choand are sen to host the chrominance information, since they provide a good compromise between visual imperceptibility and robustness to image processing ope.rations, such as lossy JPEG compression and lowpass filtering, which are known to mainly affect the high-frequency coefficients. After a further wavelet decomposition is performed, the perceptually insignificant subbands, YHL,,II I'LH~J~, and are directly replaced by the lowpass chrominance subbands 1211. and Q ~ L I . , respectively, obtained by a two-level DWT of the li and Qi. components of each frame. However, an energy normalization to match the perceptual quality of the corresponding host parts, is performed prior to emhedding and the corresponding normalization values N I , NQ need to be transmitted to the decoder. To proceed, the lower YLL< suhband is, now, selected to hide the audio data, since the effective reproduction of the audio signal at the receiver, appears to be a much more critical issue of concern than the color information. However, directly modifying the lower suhhands may introduce severe quality degradation. Therefore, in.iecting the audio information in the Y , , ~ < ; h hwavelet coefficients, after again decomposing the subhand and zeroing out some of the generated highpass (LL-hh) coefficients, would perform satisfactorily both in terms of imperceptibility and reconstruction accuracy. For this purpose, the audio signal S of length N,is first dividedin M = [&I segmentszk(n), k = I , . . . , M,n= 1 , .. . N , of equal length N, = 1 1 9, where GOE'stands

After having performed the color and audio embedding processes, the modified subbands Y f H & , and Yf1,<, YEL* are obtained by applying the inverse wavelet transform (IDWT) on the corresponding two-level decomposition subhands:
{~ZIli,YHLi,li~iYHL,,l~I~h } YHL,,h


(4) (5)

lQzai, Y L H i , l h , ) ' L H i , h l , I l i H . , h h }
{YLL,,II; l r L L i r l h i I L L i , h l r y L L , , h h }


Finally, the embedded luminance coordinate Y;e of each frame, is evaluated by the following IDWT:
l ' f H j , Y & L i , Y ~ ~ , + IT, }

i = 1,

Thc generated signal exhibits no perceptual disparities to the host frame and the only information required for transmission through the digital network, is the sequence of the embedded luminance frame components ye: = I , . . . N, i along with the normalization values N,, and the value NQ of the GOF variable. The latter values can he sent as a header in the beginning of the video stream and no other control hits, such as synchronization information, which would increase the overhead, need to he transmitted. In addition, for a further hitrate reduction, lossy JPEG compression is applied on each frame and the resulting sequence of the compressed luminance frames i = 1 , . . . ,N , is finally transmitted. The efficiency of the proposed technique to successfully reconstructing the hidden information, even after the JPEG compression, will be demonstrated in Section 5, using numerical experiments

I - 530


At the receiver, the color video frames are retrieved from i = 1, . . . ,N, by first exthe compressed hit stream tracting the single-subhand estimates on a frame-by-frame basis:

y,'""" + ( ~ ~ L ~ , ~ ? ~ , , ~ ~ L i.= 1,.. .H,N ~(7) , ,~? " }

The estimated chrominance information obtained by performing the DWT:

and &;, is
, .

in terms of hitrate reduction and robustness to compression, an uncompressed video signal operating at 25 frames per second was used, consisting of 100 RGB color frames (24hit video) and a speech mono audio signal, sampled at 8 KHz with 16 hits per sample. The maximum value for the GOF variable was set equal to 3 according to (2). The application of the proposed technique, in the case of no additional compression, resulted in a reduction of 66.7% a the t average data rate and provided perfect audio information reprodnction and very good perceptual quality.

{ Y H L i , l l = ~~lli,YHLi,lhrYHL.,hl,~HLi,hh}

- -


and, then, by upsampling to the original @age dimensions AnestimateoftheDCTcoefficientsXk(n),k = 1, ..., M , n = 1 , . . . , N , of each audio segment, is calculated by using the appropriate security key and apR!ying the same pseudo-random criterion on the estimated subband and taking, subsequently, the average over GOF successive frames:

Figure 1: (a) original RGB frames No 19, 50,65 and 97 (b) reconstructed frames No 19,50,65,97 afler data hiding and JPEG 92% (GOF=3)

.," , "~, .~. . ~ - - .; ,~ ~ ~ .~. ." .i------., .

!?*,..I r
. o


1 .

. . ~

(a) The IDCT transform is applied, afterwards, to each of the


M estimated audio transform coefficients Xb(n.),to obtain the corresponding audio segments %(n),k = 1,.. . , M in
the time domain and produce the recovered sound signal s^. After zeroing out YfT~<,~i, ~ the N, pre-speY L H , , and ~ cified coefficients, the subhands Y H L ~ , and VLH, ?LL< are reconstructed by a one-level IDWT as follows:

Figure 2: (a) original speech signal (h) reconstructed speech signal after data hiding and JPEG 92% (GOF=3) A large number of experiments was performed to investigate the robustness of the proposed technique to JPEG compression. An additional bitrate reduction was achieved in the previous experiment, by applying a frame-by-frame P E G compression of 92% quality. The signal to noise ratio (SNR) for the recovered speech was measured 17.1 ds. Fig. 2 presents both the original and the reconstructed audio waveforms, while Fig. 1 compares the visual quality of four source video frames and their reconstructed versions, where no significant degradation can be perceived. The SNR values for the extracted speech signal against various JPEG quality factors, for three different values of the GOF variable (GOF = 1,2,3), are shown in Fig. 3 . Visual inspection indicates the enhancement of the audio reconstruction efficiency, as the JPEG quality factor takes higher values (better image quality). Furthermore, the small but steady improvement in the signal to noise ratio value as the GOF increases, highlights the benefit gained by using as many successive frames, as the host subband bandwidth permits, to hide the same audio data.

{o,~HL,,lh,k?HL,,hli~HLi,hh} L i -* V H
{O,YLH;,6h,YLHi,hl,~~H.,hh}L H . --t Y

, .

( ~ L L , , l l , P ~ , L , , 6 h , ~ L L , , h 6 , ~ ~ ~ = h h ( x ~ YLL. ,, (n)


(14) and finally, an estimate of the luminance I^: is achieved, according to the formula:

{PLL~,~?LH,,?HL~, i = 1 , . .. , N ~ ? H H , } Y,,

, .



To illustrate the performance of the proposed video compressive data hiding scheme andestablish its efficiency, both


T. Ebrahimi, E. Reusens, and W. Li, New trends in very low hitrate video coding, Pioceedings o lhe f 7EEE, vol. 83, pp. 877-891, June 1995. D. Marpe and H. L. Cycon, Very low bit-rate video coding using wavelet-based techniques, IEEE Darrs-

actiotis on Circuits and System for video rechnolop, vol. 9, no. 1, February 1999. Figure 3: SNR values (in &) for the reconstructed audio signal at various P E G quality factors for 3 different GOFs. Table I : Average NCD evaluation for the reconstructed frames JPEG 75 [ 80 [ 85 I 90 I 95 NCD I 0.0867 I 0.0773 I 0.0659 I 0.0518 I 0.0354

E Kossentini, W. Chung, and M. Smith, Ratedistortion-constrained subband video coding, IEEE Transactions on Inrage Pmcessing, vol. 8, no. 2, February 1999.
J. Lacy, S. Quackenhush, A. Reibman, D. Shur, and .. I Snyder, On combining watermarking with percep tual coding, in Proc. gf ICASSP98, Seattle, Washington. USA, 1998, vol. 6, pp. 3725-3728.
H. J . Wang and C . J. Kuo, Embedding visible video watermarks in the compressed domain, in Pmc. o f IC.4SSP98, Chicago, USA, 1998, vol. 1. D. Kundur and D. Hatzinakos, Mismatching perceptual models for effective watermarking in the presence of compression, in Ptoc. of P I E , Mirltiniedia Svstems and Applications II, Sept. 1999. C. Fci, D. Kundur, and R. Kwong, The choice of watermark domain in the presence of compression, in Proc. IEEE Int. Conf on Injbmiarion Technologv: Coding arid Computing, April. 2001
B. Zhu and A. H. Tewfik, Media compression via data hiding, in TMrfy-Jirst Asiloarar Coir/ on Sgnals. Systetrrs und Coarputers, 1997, vol. 1, pp. 647-650.
D. Mukherjee, J. .. I Chae, S. K. Mitra, and B. S. Manjunath, A source and channel-coding framework for vector-based data hiding in video, IEEE Transactions 011 Circuits arid Sjuterns.for. video teclinologv, vol. IO, no. 4, June 2000.

In a last set of experiments, a quantitative performance evaluation was conducted on the reconstructed color frames, by measuring the average normalized color distance (INCD 1121) over the entire frame sequence. The correspondiiig average NCD values for six JPEG qualities (GOP = 2 ) can he seen in Table 1 in decreasing order, as expected, while the JPEG quality increases. However, all the values still remain within acceptable levels.


A compressive data hiding technique for effectively trans-

mitting video signals through communication networks with low bandwidth, is introduced in this paper. Both the chrominance components at their coarsest resolution and the DCT coefficients of the segmented audio signal, are hidden in the luminance component, on a frame-by-frame basis, by directly replacing the middle and a portion of the low (LL-hh) frequency subbands, respectively, aRer a two-level wavelet decomposition. The proposed scheme guarantees visual imperceptibility and efficient audio reproduction even after applying an additional JPEG compression on each frame. Further experiments are being conducted on the enhancement of the algorithm by applying a wavelet-based encoder instead of the DCT-JPEG, such as the JPEG 2000 standard or the SPIHT algorithm, to match the nature ofthc entire data hiding model. In addition, tests are being performed for dealing with capacity issues and managing to hide higherquality and/or multi-channel audio. Future work, also, involves hiding other sorts of dam and/or metadata, such as motion information, content descriptors, text ctc. Capacity issues arise in these cases, as well.

[IO] P. Campisi, D. Kundur, D. Hatzinakos, and A. Neri, Compressive data hiding: An unconventional approach for improved color image coding, EURASIP Jotrrnal qfApplied Signal Pmcessing, , no..2, pp. 152163,2002.

[I I1 Stephen J. Solari, Digital ndeo and Audio Corirpression, McGraw-Hill, 1997.

1121 K. N. Plataniotis and A. N. Venetsanopoulos, Color

Iiiiuge Processing and Applications, Springer, 2000.

I - 532

Das könnte Ihnen auch gefallen