Beruflich Dokumente
Kultur Dokumente
A N D WAVELET DECOMPOSITION
M. Deriche
S. Boland
ABSTRACT
Most current wlork in the area of high quality audio coding falls under one of two categories: transform or sub-band
coding. LPC coders since based on modelling human voice
production systems are found to be inappropriate in modelling music and other non-speech sounds. A more improved
model for such signals is shown to be the Multipulse LPC
model. m
I this paper we propose to improve the quality of
the Multipulse model by first passing the signal of interest
through a filter bank and then extracting the Multipulse
parameters from each of the bandpass filter outputs. The
idea of the wavellet decomposition is utilised for the design of
the filter bank. Both the Multipulse model and the wavelet
decomposition are well known. But a combination of both
has not been exploited yet. This combination is expected to
lead to a new wiiy in high quality low bit rate audio coding.
1.
INTRODUCTION
2.
WAVELET TRANSFORM
In [ 8 ] ,Daubechies showed that in the space of square integrable functions, a signal f ( t ) can be represented by translates and dilations of a single wavelet W ( t )as
W
j = J k=--m
00
k=-m
(1)
3067
as
K-1
w(t)=
(-1)kCl-k9(2t
- k)
(2)
k=O
where the Ck are the coefficientsthat define the scaling function, g ( t ) , which obeys the dilation equation given by:
4.
To construct the wavelet W ( t ) ,the coefficients Ck must satisfy certain conditions. In most cases, attention is restricted
to wavelets with compact support i.e. C k is nomero o d y for
0 5 IC S K - 1 .
Mallat [9]has shown that the discrete wavelet transform
can be implemented by a recursive algorithm. This is done
by using the coefficients C k as the low pass filter coefficients
of a pair of quadrature mirror atera. The output of the low
pass filter, G, is the approximation of the input signal for
that level. While the output of the high pass filter, H, is the
detail for that level. The impulse response of the low pass
filter { c k } and that of the high pass filter { d k } are related
through the equation,
dk
= (-1) k C 1 - k
Band (a)
1
(4)
1
I
t
Figure 2. Block Diagram of Multipulse LPC Coder for
subband i
The Multipulse coders assume that the given input of interest can be modelled by an all-pole filter driven by a train of
pulses of different amplitudes and not necessarily equidistant from each other. The excitations (amplitudes and positions) are computed using an analysis-by-synthesis procedure such as the one described in [7] (see Figure 2 ) .
The error signal between the original signal and the output of the all-pole filter excited by the train of pulses is sent
back to the excitation generator. The excitation positions
and amplitudes are evaluated through the minimisation of
the energy of the error signal. Note that since the Multipulse
LPC procedure is preceded by a frequency decomposition,
the perceptual weighting of the error signal is not crucial.
The advantage of decomposing the audio signal into subbands is that a different number of pulses and parameter
set can be used for modelling each of the subband signals.
This is opposed to the coder in [4] where the same number of pulses and LPC order are used for the entire audio
spectrum. Also since the filter bank resembles the discrete
wavelet transform, the advantages of the wavelet decompo-
3068
5.
The quality of the reconstructed audio signal and the overall bit rate required are dependent upon several parameters.
A 768 point frame was found to be the best choice as input to the filter bank. This was done after investigating
frame sizes of 560, 768 and 960 points. Each frame is multiplied by a hamming window, decomposed into subbands
and then moved 256 points each time. Decimation was used
in the wavelet decomposition to give 192,192 and 384 point
subband signals in the 0-5.5 kHz, 5.5-11 kHz and 11-22 kHz
subbands respectively. The excitation was computed for the
middle 64 pointri in the 0-5.5 kHz and 5.5-11 kHz subbands,
and the middle 128 points in the 11-22 kHz subband.
The quality of the reconstructed audio signal was found to
largely depend u.pon the excitation pulse density in the 0-5.5
kHz and 6.5-11 kHz subbands. For high quality audio, 10-15
pulses were needed in both the subbands. When more than
15 pulses were used, only a very small difference in both
the subjective quality and the average segmental SNR was
recorded. Due to auditory masking, the excitation pulse
density of the 1l-22 kHz subband had a very small bearing
on the quality. Consequently 10 pulses or less were typically
used in the 11-22 kHz subband.
CONCLUSION
The proposed audio coder takes advantage of the constant
Q properties of the wavelet transform by filtering the audio
signal into subbands. Combined with Multipulse LPC, this
is an original design for coding high quality digital audio
signals. This algorithm is easy to implement and currently
the proposed audio coder is being intensively tested for a
variety of audio signals. Complete results will be presented
at the conference.
7.
8.
REFERENCES
[l]K. Brandenburg and G. Stoll, The ISO/MPEG-Audio
Cod ec: A generic standard for coding of high quality
digital audio, Presented at 92nd AES Convention, Vienna, Austria, March 1992.
DISCUSSION
high quality, the bit rates of the proposed coder were found
to be in the range of 80-90 kb/s. To obtain lower bit rates
further areas in the coder design are currently being considered.
ACKNOWLEDGEMENT
This work was partially supported by a grant from the Australian Telecommunications and Electronics Research Board
(ATERB).
3069