1518Daanba2003 3B3.1
SingapDn
Abstract known as "critical bands": if a faint tone lies in the critical band
of a louder tone, the faint one results tu be masked. However,
A method to implement digital audio watermarking in the fre also a temporal masking is present, i.e., to hear a faint tone fol
quency domain is presented. The embedding is performed by lowing a louder one, a give amount of time must be wait.
using chaos based sequences to increase the signature biding The MF'EG audio compression algorithms are based on these
properties. The proposed scheme takes into account the MPEG masking effects: perceptually irrelevant informations are re
psychoacoustic model I to improve the robusmess against the moved to increase the data compression. L e t us briefly review
MP3 compression. Some tests have been performed10 verify the the basic steps involved in the psychoacoustic model 1 algo
system robustness against the MP3 compression and also against rithm. Audio data time alignment: a frame structure, with a
the signal cropping, resampling, requantization and filtering. given number of samples, depending on the MF'3 layers is cre
ated, frequency domain representation: the audio samples are
1 Introduction converted to their frequency domain representation, by using a
Fast Fourier Transform (FIT); tom1 and nontonal components
identification: the spectral values are divided into tonal and nou
Researches on digital audio ahd video signals have permitted to tonal components; the model identifies tonal components on the
design systems with low distortion and noise, by increasing the basis of the local peaks of the audio power spectrum, then it sums
digital source diffusion on the worldwide market. The negative all the remaining components (nuntonal) to obtain a single value
counterpart is the facility to copy audio and video materials and for each critical band; maskedcomponents removal: by means of
to tamper them with digital files. an empirical masking threshold, chosen close to the lower bound
As a result, performers, studios, distributors and retailers need of the sound audibility, the masked components are decimated,
a reliable, tamperproof and permanent audio watermarking so global masking thres?sholdchoice: for each band a global mask
lution ([I] [2], [3], [4]) to embed inaudible (or not visible) and ing threshold is chosen, on the basis of the remaining tonal and
indelible information, to protect and track inclusive content. In nontonal components; signaltomask ratio (SMR)evaluation:
particular, to meet today's copyright protection riquirements, the SMR, defined as the ratio between the signal energy within
electronic watermarking techniques, applied to audio signals, a given subband and the minimum masking threshold for that
must satisfy four basic requirements [5]: the watermark should band, is computed. This value is used by the MPEG encoder to
be inaudible, i.e., the sound quality must not be significantly cor decide how many bits allocate in each band the greater is the
rupted, indelible, i.e., the watermark should not be removed from SMR, the more bits are allocated.
the audio signal; robust, i.e., the watermark should be resistant
to the main digital manipulations; invisible, i.e., the watermark
presence should not be easily verified in order to prevent its re
3 Watermark Embedding
moval. Besides, multiple watermarks should be supported to
track the ownership passing or the movement of the proprietary The watermark embedding is performed by dividing the audio
source. Finally, the watermark detection should not require the track into blocks, and by superimposing the same watermark
copy of the original audio track. ing sequence on each block. So, the sequence casting is real
ized more times along the audio content. In particular, the main
This paper proposes a new scheme to protect audio signal steps to make the embedding are: the audio file is decoded to
copyright taking into consideration the above mentionedrequire get the audio sample stream; the audio sample stream is di
ments. An often used watermarking algorithm is the Patchwork vided into blocks and subblock each block contains N sub
approach [I]. A similar approach which is working in the time blocks, where the subblock length is equal to the FIT block
domain has been discussed by (21. In [31 a frequency domain size, L,therefore, each block length is N L . the FIT is applied
approach is shown, where the watermark detection requires the to each subblock with resulting vector of length L / 2 , due to the
copy of original audio tracks. Other methods act in frequency FIT symmetry; a chaotic sequence sized N L / 2 is generated,
domain. e.g., [4]. The novelty of our work, with respect to [I], the chaotic sequence is superimposed to each block, with size
[2], [3], [4], is that it integrates different aspects: the water
N L / 2 , containing the samples in the frequency domain.
mark detection does not require the copy of the original track,
the watermark is applied in the frequency domain, it is based on The signature sequence tu be hidden can be chosen in many
chaotic sequences and the embedding algorithm is based on the ways, but watermarking schemes are often using pseudorandom
MPEG psychoacoustic model I, tu improve robustness against (PN) number generators. In this work we are interested to use
MP3 compression. chaotic sequence generators, giving easy control on the chaotic
trace statistical behavior in term of expected value, auto and
2 MPEG Psychoacoustic Model I cross correlation, due to that some studies in the Literature are
showing better performance of chaotic watermarking with re
spect to the PN one.
The human auditory system [6] can detect sounds with frequen
cies between 20Hz and 20kHz and acts as a frequency analyzer. The chaotic watermarking signal for audio sources can be ob
Thus, it can be modeled as a set of 32 bandpass filters with band tained by the recursive application of a suitable onedimensional
widths increasing with the frequency. This 32 bands are usually discretetime dynamical system, i.e., a chaotic map [71 [81. Let
0 2003 lEEE
0780381858/03/$17.00 1609
..
.e.
......................... ...
...
Figure 1: Absolute threshold curve, function of the frequency Figure 2: Example of watermarking sequence after shaping.
(kw.
to enhancc the reliahility of the watennarkmg detection process,
us refer to a particular class of maps [9], called Piecewise Afline as discusscd in the following. The final embedding normalized
Markov Maps (PWAM), characterized by a M : X $ X , where sequence resu~tsz* = ik&.
X = [0,1], with Zk+1 = M ( z k ) and assume that n 1 points +
0 = < a1 < ... < anl < a , = 1 existdefiningtheintervals The E, value is chosen such that z k << 1, conditiiin guruarm
X j = [aj1, a j ] , for j = 1,...,n,such that, for any couples of tning a good delection. as rcponed in Section 4. An example Of
indices j and k, either xk M ( X j ) or X k n M ( X j ) = 0. the normalixed shaped chaotic wquencs!, cvolution is reponed
A particular class of PWAM are the so called (n,t)tailed shift in Figure 2.
maps [lo], where n, t are integer such that n is even and t < The ca..ting of [he watermark is made by using a cpread spec
42: trum technique. as in 1121 and [13]. A possihk implemenwlion
of this twhnique is ohtained hy slightly modifying the magni
M ( z )= (n t ) z ( m o d e ) +i if 0 5 z < tude of each frequency tone in each lrvquency subhlock (in this
t(z  %)(modi) otherwise sxw nons modification is imposed to the tone p h w ) .
Let us call Rc and U; the onginal FIT sample module and
The sequences that we have superimposed to the audio content
in the frequency domain have been obtained by iterating the thc watermarked sample module, respectively. Then: U'k =
(n,t)tailed shift maps, where the lint iteration element 20 is R*(l T z*).
the seed of the sequence and it represents the Watermarking key. Note lhat the amplitudc of the cmtng IS modulated by mciin of
Before the casting, the generic chaotic sequence ZL has been the parameter E,. Another possible casling rule h;is been inves
processed by means of the function Q : M + [1, +1], where tigated: Wk = Rk T Zk. which is characterized by a simple addi
Q(x) = 2 Z k  1, in order to obtain a sequence with null mean. tive proces. Unfonunaoly.tlus possible rule suifers paniculaly
With this, the final chaotic sequence is $ k = &(a&). Further of the prohleni o i magnitude clipping. i.e., the crop of possihle
more, we have considered just N L / 4 different values of the fi negative amplitude$. due to the ,upcrimpsition of negative wa
nal chaotic sequences, i k , with k = 0,. ...NL/4  1,and put termark values. In particular the negative clipping desmys hoth
them into the even position of the audio frequency block with the audibility and the watermarking detection capability, so thk
natural sign, and into the odd one with the opposite sign, i.e., technique has not further investigated. Finally, let us uhwrve
yk = (  l ) k i l k , z j , with k = 0,....N L / 2  1. This procedure that, bince R* and t k ;in. indepcndcnt and the sequence Z k h a
permits to have lower crosscorrelations, as verified by means null mean, it follows that:
of experimental tests. Let us underline that this mapping does
not change the watermarking sequence auto and cross correla
tion statistics, considering that the same law has been used in the k k
detection unit.
In order to minimize the sound distonion, let us observe that 4 Watermark Detection
audio spectrum components which are less audible by human
hearing, can be more affected by the watermark cast with re The classical watermarking detection system, independently on
spect to components more audible. So, we have shaped the the domain in which the watermarking is applied, is based on a
chaotic sequence in each subhlock to enhance terms assigned to correlation scheme. Thus, the uivial way to implement a detec
those spectrum components which are less audible by the human tion consists to perform a simple correlation between the water
hearing. In particular, we have considered the absolute thresh marking sequence and the investigated audio stream. By imple
old curve, T ( J ) ,i.e., the minimum audibility threshold for the menting this strategy the correlation is:
human auditory system, function of the frequency, studied in
[ I l l and reponed in Figure 1. Then we have identified a sim
ple first degree polynomial function, based on T ( f ) ,such that
the shaped quantized chaotic sequences in each block, i k , with k k k k
k=O,... , N L / 2  1 , r e s ~ l t s : where the approximation follows the observation in (1) and
where k = 0,. ...N L / 2  1 (i.e., correlation is computed on
each block).
Observe that y k and the sample Rk are independent each other
and that this property holds also when the ZL sequence is consid
ered, due to the independence of the absolute threshold curve
where S indicates the shaping strength, J M is the maximum fre from R k . With this, the previous correlation should he approxi
quency of the absolute threshold curve, TM and T,,, represent
mated to Ckz; E, RA.Even if the term C kZ: is well known,
the maximum and minimum values of such curve, respectively.
After the shaping rule application, the i ksequence energy
the term xi
R, depends on the particular audio stream. So, it
should be measured or approximatively evaluated at the receiver
E5 = Ckii has been forced to a referring value, E,, in order from the watermarking stream, with some possible errors. In
1610
practice, the optimal detection should be based on the knowl and cross correlations show a Gaussian shape with mean values
edge of the original audio stream. around 1and 0, respectively and variances depending on the wa
termarking sequence statistical properties and on the audio track
Another problem with this approach is related to the potential characteristics.
high dynamic of R k : in the correlation computation the terms
with higher energy are suongly considered with respect to the Note that many types of manipulations, such as compression,
others, so the watermarking sequence statistical properties are filtering, resampling or requantization, act shifting versus low
broken by the particular & stream. values the autocorrelation pdf and its expected value, and by
increasing the relative variance. Also the crosscorrelation pdf
To reduce the two above cited problems, we propose to use a shows a slight shift versus left values and a variance increase. In
logarithmic correlation index, as follows: any way, the auto and cross correlation variances increase can
be rednced by increasing the watermarking sequence length (in
our case NB).
By calling i k the watermarking sequence used for the detec
By observing that we have chosen E, such that Zk < 1 and re tion, it follows that:
calling (I), it is possible to consider the approximation In Wk =
In[&(l + Z k ) ] = In& + + +
l n ( l zb) F;! h& 21. that gives
C sz 1 if the watermarking is present and it is vanishing other
wise. In such a way we have both neglected the problem of the
knowledge of the original audio track and of the potential high (2)
dynamic of R k . Note that the variances of the auto and cross where
 C; is the correlation on the block i. If i k = Z k we expect
correlations decrease, independently on the audio sueam con
sidered, by increasing the watermarking sequence length. So, to
c F;! 1,if i k #Z k c F;! 0.
enhance this effect we have averaged the correlation on a certain The knowledge of the auto and cross correlation pdf Gaus
number of available blocks, N E , each of dimension N L / 2 , by sian shape, with means and variances, permits to select the op
obtaining e. timal threshold value [3]. In particular, if we assume that the
Gaussian pdfs have means m, and m. and variances and w,,
Regarding the watermarking robustness to the MPEG audio for cross and auto correlations, respectively, the false alarm and
compression, let us observe that tones having the greater SMR false rejection probabilities, Pf. and PfT,respectively, result:
in a subblock have allocated for their representation the largest
number of bits. So these tones have the higher probability to
he not corrupted by the MPEG compression. Thus, in order to
improve the Watermarking robusmess, the detection may take
into account only sets of tones, in each subblock, having the
greater SMR. In particular, to identify the fraction of tones se
lected for the detection with respect to the total, a parameter P, where th is the decision threshold. The optimal threshold can be
representing the ratio between the selected tones and the number obtained by considering Pf. = Pfv,having specified means and
of available tones L/2, has been introduced. When P = 1 the variances.
psychoacoustic model is not considered. By considering P < 1
and by decreasing P the robustness against the MPEG compres Unfortunately, means and variances of the auto and cross
sion increases, i.e., the mean value of the correlations do not correlation pdfs are changing with the watermarking sequences,
considerably change with respect the case without attack, even with the audio stream considered and with the particular digi
if, since the number of watermarking sequence samples is lower, tal manipulation applied to the audio file, while the threshold
the variance increase (note that this trend can be neglected by th must be fixed a priori. So, th is in general fixed by con
increase the number of blocks N E ) . sidering only one watermarking sequence, a given audio stream
and without consider any attack [3]. To avoid dramatic effects
Furthermore, if we consider P < 1, an intrinsic robustness to when attacks are present or when a detection with different se
a handpass filter manipulation is acquired, due to that the de quences is tried, th should be selected by considering all possible
tection is performed only by considering components located at watermarking sequences and all possible digital manipulations,
medium frequencies, which are the more audible, following the independently from the audio sample. Note that experimental
psychoacoustic model. Note that, in order to jointly threat the vials have shown that the attacks act in a relevant way espe
cases P = 1 (without tone selection) and P < 1 (with tone cially on the autocorrelation. while the cross correlation is mi
selection), the indexes of the sums present in the various mathe nus affected; its mean remains close to 0 and its variance does
matical expressions are not explicitly specified. not vary if the watermark sequence is applied to a long audio
To take a decision regarding the presence of the watermarking, stream. So, due to the low variability of the crosscorrelation
pdf, a possible solution to make the th selection is to fix it more
c
the correlation must be compared with a threshold, in order to close as possible to 0 than 1 (to counteract the shift of the auto
verify if the watermark is present (c (c
F;! 1)or not F;! 0). Let correlation), having evaluated the average mean and variance of
us note that should be null both in the case of absence of the the crosscorrelation pdf in different cases, and having fixed a
watermark selected to prove the ownership and in the case in given Pf.. In particular, we estimate the average mean and vari
which the watermark selected is present but we prove to detect ance of the crosscorrelation pdf using a given set of watermark
a watermark not correct. The detection threshold should be in ing sequences (e.g., generated by a selected chaotic map with a
the interval [0, I ] , and its choice is depending on how the a u t o given number of digits to specify the seed), considering an ex
and cross correlations are distributed around 1 and 0, respec tended hut limited set of audio samples, and by considering a
tively. In particular this choice should minimize the probabilities given set of attacks. On the basis of this average mean and vari
to have false alarm (i.e., in the audio track a wrong watermark ance (depending on the watermark length), we set the Pf.. and
is detected) and false rejection (i.e., in the audio track the se obtain the relative th. Having fixed th, we can proceed to verify
lected watermark is not detected). In general, the pdf of the auto a posteriori the actual values for Pf. and Pfr.
1611
1
s
o

a
02 0. 0. 0.
2 1 0 I 2 3
1612
Figure 6: Experimental autocorrelation pdf in the case of re Figure 8: Experimental autworrelation pdf in the case of filter
sampling attack and Gaussian distribution. ing attack and Gaussian distribution.
6 Conclusions
A new watermarking scheme applied to digital audio streams,
Figure 7: Experimental autocorrelation pdf in the case of re with the scope of copyright protection, has been proposed and
quantization attack and Gaussian distribution. tested. The embedding of the digital signature into the audio
file has heen performed in the frequency domain by integrating
a shaping function derived from MPEG psychoacoustic model
psychoacoustic modcl is used in the detection. the experimental I, whose effect is to enhance the reliability of the watermarking
pdfs have mean c l o r to I . whilc with P = 1 they arc shifted to detection, without degradate the audio quality. ?he watermark
w d IefL In both cases vmanccs can he maintained low hy con ing signals have been generated by considering a class of PWAM
sidering a number of hlocks NO sufficiently high. Consequenily chaotic maps, guaranteeing a high number of distinguishable wa
false rejection prohahilily results lower if psychoacoustic model termarks.
IS used with respect to the case without it.
The proposed watermarking technique, thanks to the psychoa
To prove the robustness against the rcsampling of the au coustic model I, shows high rohusmess against MF'3 compres
dio uack we have considered an original audio signal sampled sion. Furthermore, this scheme guarantees high reliability in
with rate W . I kliz. Then. we have applied the watermarking case of digital manipulations like resampling, requantization,
sequence. Furthermore, the sampling rate of the watermarked handpass filtering and cropping. Finally, this technique sup
audio stream ha been reducud to 22 k"1 and. finally. the track ports multiple watermarking and does not require the original
ha5 b u n rcsampled ~ t t hthe original rate of 44.1 kH1. AI signal during detection.
though the ahove pnreming CBUW noticeable distortions. the
watermark remainsdetectahlt., as reported in Figure 6, where the
autosorrelation is shown. In this c w the averdge value, with References
NO = I is a b u t 0.76 and the variance 0.14. With th = 0.36.
block numbers Ne = 32,84.108, we have PJ. = 0.GO. IO', [I1 B ~ W . . G r u h l D . . M l r n o m N . . L " . . ' T e e h d ~ u ~ f a d a r a h i d i n g . ' ' , I B M s y ~ a e w
j o d s , Vol. 35. 19%.
1.00. and 4.00. I V 9 . mspatiwly, and PJ, = 0.77. IO"
for N e = 32 and Pip lesser than IO'* for .VR = 64,108. [21 Bmsia P..pitan I., "Robnt a d o w 'n in ktime do, in Roc. of €U
SIpcO'98. Rhodes.% Sepr e 2 f Z 8 .
Regarding the rnhustnecs against the requantization. the wa [3] Booey L., Tevh 1; A. H . , . H d y K.N., "Dl 'tal w fa Audio Sigaplr.", in
termarked audio samples ha\e heen wquantiml at %hi& and Roc.of EUSIPCO96, Tnesfe. l a y , SepL I&.
then back at 16bits. A l l h o u g h the ahove processing causc no
riceahle distortions. the watermarks remains detectable, as m
portcd in Figure 7, where the autocorrelation is shown for
N E = I . Inthis~nsetheaveragevalucisabut0.78andthev a r 
ance 0.064. With th = 0.36. hlock numben N B = 32,51. 108,
we have the same PJ. of the c a . . with re,arnpling and PJ?
l e s w thm
[7] A. Lasota M.C. Mac*, 'Thaoa, Fmtala, and NoiJe': SpringerVcrlag. 2nd Edition
To test the rohustness aeainst filterine. the watermarked sienal 1994
has been processed with abandpass 2 4 order filter. The cuCoff 181 s. Tsekdou,v. sotachidis,N. NllrolaidLs. 1. Rfas, 'sfaLuticalAnalysis ofa
frequencies are 200Hz and 10kHz. This filtering causes audible on Infmmatioo Sy~temBswdooBsmoulli
Watwlpr*mg Themptlc
Digital w ~Scquenna':
Chaotic " g 81,ElawisSpc"d
(6) (zwi) pp, Issue
1273
distortions on the original signals. Also in this case the water 1293.
mark remained detectable as reported in Figure 8, for N E = 1. 9''. to b; p
In this case the average value is about 0.7 and the variance 0.15.
With th = 0.36, block numbers NE = 32,54,108, we have the
19' ~"~~,,~~g,,,,c~tic watermar*m
[lo] M. P.Kmnedy. R. Rovam, G. ssoi. 'cbaotic Eleswonis~in TclpomumSatiolu",
same Pf. of the case with resampling and PfT= 0.66. lo', mc p s . zm.
P f , = 4.18. and Pf. < lo", respectively. [Ill AA.VY'Rycheacowtics.': MpJ/ww.c%s~.cdCours&eriM
3 6 5 n i l ~ ~ ~ C h ~ ~ ~ . ~ ~ ~ . 4 . h ~ .
As described in Section 3, the same watermarking sequence
is superimposed on blocks of the audio stream, We It21 COX I., Kilian J.. highion T.. SbamoOO T., "A SeCUm. RObvJI Waramar* fOr MUl
dlntdia.", W&hQ On InfOmdon Hidng. Newton InsdWfe. Udv. Of Cambridge,
have selected blocks of length a b u t 0.1s. In case of cropping, May 1996.
correct detection is possible by using a synchronism research [131 Cor ,., ~ T , Shamxn T,, ,, spnead rmmWaarmar*inp
procedure that find the starting point of the watermarking se for MullioEdia", IFJ@Trani. on lmage Recessing, 6. 1 2 . 1 6 ~ 1 6 8 71997.
.
quence inside the audio track. This synchronism research pro
1613