Speech Processing Research Paper 24

HIGH QUALITY AUDIO CODING USING MULTIPULSE LPC
A N D WAVELET DECOMPOSITION
M. Deriche
S. Boland
Signal Processing Research Centre

Queensland University of Technology
GPO Box 2434 Brisbane Q 4001, AUSTRALIA
ABSTRACT
Most current wlork in the area of high quality audio coding falls under one of two categories: transform or sub-band
coding. LPC coders since based on modelling human voice
production systems are found to be inappropriate in modelling music and other non-speech sounds. A more improved
model for such signals is shown to be the Multipulse LPC
model. m
I this paper we propose to improve the quality of
the Multipulse model by first passing the signal of interest
through a filter bank and then extracting the Multipulse
parameters from each of the bandpass filter outputs. The
idea of the wavellet decomposition is utilised for the design of
the filter bank. Both the Multipulse model and the wavelet
decomposition are well known. But a combination of both
has not been exploited yet. This combination is expected to
lead to a new wiiy in high quality low bit rate audio coding.
1.
INTRODUCTION
Without compression, the bit rate needed for transmission

or storage of compact disk (CD) quality audio is 705 kb/s
(16 bit PCM and 44.1 kHz sampling frequency). Emerging
applications such as digital audio broadcasting, multimediaJhypesmedia and satellite TV require high quality audio
at low bit rates. Several attempts have been investigatedfor
transparent coding of CD quality digital audio at low bit
rates. Most previous methods use either subband coding [l]
or transform coding [2]. It is known that the MPEG layer
I11 audio coder is capable of transparent or near transparent
quality of monophonic CD audio signals at 64 kb/s. For the
above applications, however, further reductions in the total
bit rate is desirable. The aim of this research is to achieve
lower bit rates without loss of quality in the reconstructed
audio signals.
Recently, the wavelet transform has been investigated for
high quality audio coding [3]. Unlike the Fourier transform,
the wavelet transform is characterised by a constant relative bandwidth or constant-Q. This is attractive for coding audio signals since audiological research has shown that
the human ear behaves in a similar way. The ear integrates
sounds over ranges of frequencies called critical bands.
Below 500 Hz, the critical bands are 100 Hz, while above
500 Hz, the critical bands are approximately a third octave. Bit rates of 48-66 kb/s are claimed using the wavelet
transform based coder proposed in [3]. While this coder is
capable of achieving lower bit rates than previous methods,

the drawback of this design is the optimisation procedure
that is computationally very expensive and not guaranteed
to converge to the optimum solution. Another drawback is
the large search of the dynamic dictionary.
Linear Predictive Coding (LPC) based coders are generally
considered unsuitable for coding music and other non-speech
sounds since they rely on human vocal tract models. For
speech signals, LPC coders use a time-varying linear predictor to approximate the original signal. For voiced speech,
the linear predictor is driven with a train of impulses separated by the pitch period, while for unvoiced speech, the
excitation is white noise. Multipulse LPC coders, however, use an analysis-by-synthesistechnique where the pulse
amplitudes and pulse positions of the excitation signal are
computed for each frame. Originally developed for speech
coding, work performed in [4] has shown that Multipulse
LPC is equally applicable for coding CD-quality audio signals down to 128 kb/s. Also a hybrid scheme consisting of a
modified Multipulse LPC algorithm and a subband decomposition was proposed in [5].
In this paper we propose a hybrid audio coder using a combination of Multipulse LPC and wavelet like decomposition
of the audio signal. This has not been exploited previously
and is expected to achieve low bit rates while preserving
high quality.
2.
WAVELET TRANSFORM
In [ 8 ] ,Daubechies showed that in the space of square integrable functions, a signal f ( t ) can be represented by translates and dilations of a single wavelet W ( t )as
W
j = J k=--m
00
k=-m
(1)
where b ( j , k ) = J f ( t ) f i W ( 2 j t - k)dt and u(j,Ic) =

J f(t)fig(2jt - b)dt. This expansion provides a multiresolution decomposition of the signal f ( t ) . The coefficients
b ( j , k ) represent details of the original signal at different
levels of resolution j and coefficients u ( J , k ) represent an
approximation of the original signal f ( t ) at resolution J .
The wavelet W ( t )is obtained from a scaling function g(t)
3067
0-7803-24314/95 $4.00 0 1995 IEEE
as
Table 1. Chosen Subbands
K-1
w(t)=
(-1)kCl-k9(2t
- k)
(2)
k=O
where the Ck are the coefficientsthat define the scaling function, g ( t ) , which obeys the dilation equation given by:
4.
To construct the wavelet W ( t ) ,the coefficients Ck must satisfy certain conditions. In most cases, attention is restricted
to wavelets with compact support i.e. C k is nomero o d y for
0 5 IC S K - 1 .
Mallat [9]has shown that the discrete wavelet transform
can be implemented by a recursive algorithm. This is done
by using the coefficients C k as the low pass filter coefficients
of a pair of quadrature mirror atera. The output of the low
pass filter, G, is the approximation of the input signal for
that level. While the output of the high pass filter, H, is the
detail for that level. The impulse response of the low pass
filter { c k } and that of the high pass filter { d k } are related
through the equation,
dk
= (-1) k C 1 - k
Band (a)
1
(4)
1
I
Frequency Range (Hz)

0-5513
5513-11025
11025-22050
HYBRID AUDIO CODER
In subband coding, the frequency ranges of the decomposed

subbands cannot be made the same as the critical bands.
However, the discrete wavelet transform (DWT) and in particular, the wavelet packet transform (WPT), can be used
to obtain a subband decomposition very close to the critical
band divisions [3]. The advantage here is that the perceptibility of the quantisation noise can be controlled more accurately. The constant Q property of the wavelet transform
is exploited in the proposed coder by firstly decomposing
the audio signal into non-equal subbands. The full wavelet
decomposition is not performed. Instead, the audio signal,
which is sampled at 44.1 kHz with 16 bits/sample PCM, is
decomposed into the three subbands listed in table 1. This
is similar to the speech compression technique proposed in
[SI*
In contrast to the short-time Fourier transform (STFT)

where the bandwidth of the bandpass filters used for the
decomposition is constant, the bandwidth of the WT bandpass filters is proportional to the central frequency or equivalently, the'filter's quality factor is independent of the frequency (see Figure 1).
t
Figure 2. Block Diagram of Multipulse LPC Coder for
subband i
The Multipulse coders assume that the given input of interest can be modelled by an all-pole filter driven by a train of
pulses of different amplitudes and not necessarily equidistant from each other. The excitations (amplitudes and positions) are computed using an analysis-by-synthesis procedure such as the one described in [7] (see Figure 2 ) .
After filtering the audio signal into subbands, each of the

subbands signals, i, is modelled using the modified Multipulse LPC coder illustrated in figure 2 . In the Multipulse
coder, the signal is modelled by a linear filter driven by
an excitation consisting of a sequence of pulses. The pulse
amplitudes and locations for each excitation signal, U;,are
computed using a modified version of the algorithm proposed in [7]. The algorithm is based on the minimisation
of the mean square error between the original subband signal, si, and the synthetic signal, &. For each subband, the
energy, the LPC coefficients, the pulse amplitudes and the
pulse locations are determined, quantised, then sent to the
receiver.
The error signal between the original signal and the output of the all-pole filter excited by the train of pulses is sent
back to the excitation generator. The excitation positions
and amplitudes are evaluated through the minimisation of
the energy of the error signal. Note that since the Multipulse
LPC procedure is preceded by a frequency decomposition,
the perceptual weighting of the error signal is not crucial.
The advantage of decomposing the audio signal into subbands is that a different number of pulses and parameter
set can be used for modelling each of the subband signals.
This is opposed to the coder in [4] where the same number of pulses and LPC order are used for the entire audio
spectrum. Also since the filter bank resembles the discrete
wavelet transform, the advantages of the wavelet decompo-
Figure 1. Constant Q Behaviour of the Wavelet Transform

3.
MULTIPULSE LPC CODERS
3068
sition for processing audio signals are obtained.
5.
SELECTION OF AUDIO CODER

PARAMETERS
The quality of the reconstructed audio signal and the overall bit rate required are dependent upon several parameters.
A 768 point frame was found to be the best choice as input to the filter bank. This was done after investigating
frame sizes of 560, 768 and 960 points. Each frame is multiplied by a hamming window, decomposed into subbands
and then moved 256 points each time. Decimation was used
in the wavelet decomposition to give 192,192 and 384 point
subband signals in the 0-5.5 kHz, 5.5-11 kHz and 11-22 kHz
subbands respectively. The excitation was computed for the
middle 64 pointri in the 0-5.5 kHz and 5.5-11 kHz subbands,
and the middle 128 points in the 11-22 kHz subband.
The quality of the reconstructed audio signal was found to
largely depend u.pon the excitation pulse density in the 0-5.5
kHz and 6.5-11 kHz subbands. For high quality audio, 10-15
pulses were needed in both the subbands. When more than
15 pulses were used, only a very small difference in both
the subjective quality and the average segmental SNR was
recorded. Due to auditory masking, the excitation pulse
density of the 1l-22 kHz subband had a very small bearing
on the quality. Consequently 10 pulses or less were typically
used in the 11-22 kHz subband.
At this stage more work has to be performed to determine

the optimal number of subbands and the frequency divisions of the subbands. In our initial experiments the three
band decomposition was found to give better results than
the two- and the four-band decompositions. Other decompositions are currently being investigated with respect to
audio quality for a certain bit rate. However, it was found
that as the number of subbands increased above five, the
quality began to decrease significantly.
CONCLUSION
The proposed audio coder takes advantage of the constant
Q properties of the wavelet transform by filtering the audio
signal into subbands. Combined with Multipulse LPC, this
is an original design for coding high quality digital audio
signals. This algorithm is easy to implement and currently
the proposed audio coder is being intensively tested for a
variety of audio signals. Complete results will be presented
at the conference.
7.
8.
REFERENCES
[l]K. Brandenburg and G. Stoll, The ISO/MPEG-Audio
Cod ec: A generic standard for coding of high quality
digital audio, Presented at 92nd AES Convention, Vienna, Austria, March 1992.
Experiments were also performed with loth, 16th and 24th

order LPC filters to investigate the effect of the predictor
order. In all cases the predictor order was found to have
little effect on the quality and the segmental SNR of the
reconstructed audio signal. A 10th order LPC filter was selected for each subband signal, in order to obtain the lowest
possible bit rate.
For quantisation of the LPC coefficients, initial experiments

backed by the rlesults in [4] suggest that differential quantisation is most appropriate. A similar approach to [4] has
been adopted in quantising the pulse positions and pulse
amplitudes. The actual value of the first pulse location is
uniformly quantised. The remaining locations are uniformly
quantised as differences from the previous location. A differential logarithmic quantiser is used to quantise a gain value
per frame. This is the magnitude of the largest pulse in a
frame. Tlhe individual pulses are then encoded with a sign
bit and as a fraction of the gain value (in dB). Other possible
schemes for qumtising the LPC and excitation parameters
are being investigated also.
6.
DISCUSSION
For our experiments, the proposed audio coder was tested

for several seconds of piano, bass, drums, guitar and orchestral signals. Listening tests using headphones were used to
assess
the quality of the reconstructed audio signals. For
high quality, the bit rates of the proposed coder were found
to be in the range of 80-90 kb/s. To obtain lower bit rates
further areas in the coder design are currently being considered.
ACKNOWLEDGEMENT
This work was partially supported by a grant from the Australian Telecommunications and Electronics Research Board
(ATERB).
[2] J. Johnston, Perceptual transform coding of wideband

stereorsignals, Proc. ICASSP, pp.1993-1995, 1989.
[3] D. Sinha and A. Tewfik, Low bit rate transparent audio compression using adapted wavelets,IEEE Transactions on Signal Processing, pp.3464-3479, vol. 41, no.
12, 1993.
[4] S. Singhal, High quality audio coding using multipulse
LPC, Proc. ICASSP, pp. 1101-1104, 1990.
[5] X. Lin and R. Steele, Subband coding with modified
multipulse LPC for high quality audio, Proc. ICASSP,
pp. 201-204, 1993.
[6] S. Park, Speech compression using ARMA model and
the wavelet transform, Proc. ICASSP, pp. 209-212,
1994.
[7] S. Singhal and B. Atal, Amplitude optimization and
pitch prediction in multipulse coders, IEEE Transactions on Acoustics, Speech and Signal Processing,
pp.317-327, vo1.37, no.3, 1989.
[SI I. Daubechies Orthonormal bases of compactly supported wavelets, IEEE Transactions on Information
Theory, Vol. 36, pp. 961-10 05, September, 1990.
[9] S. Mallat, A theory for multiresolution signal decomposition : The wavelet representation, IEEE Transactions on Pattern Analysis and Machine Intelligence,
vol. 11, no. 7, 1989.
3069

Speech Processing Research Paper 24

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Speech Processing Research Paper 24

Hochgeladen von

Copyright:

Verfügbare Formate

HIGH QUALITY AUDIO CODING USING MULTIPULSE LPC

Signal Processing Research Centre

Without compression, the bit rate needed for transmission

capable of achieving lower bit rates than previous methods,

where b ( j , k ) = J f ( t ) f i W ( 2 j t - k)dt and u(j,Ic) =

0-7803-24314/95 $4.00 0 1995 IEEE

Table 1. Chosen Subbands

Frequency Range (Hz)

HYBRID AUDIO CODER

In subband coding, the frequency ranges of the decomposed

In contrast to the short-time Fourier transform (STFT)

After filtering the audio signal into subbands, each of the

Figure 1. Constant Q Behaviour of the Wavelet Transform

MULTIPULSE LPC CODERS

sition for processing audio signals are obtained.

SELECTION OF AUDIO CODER

At this stage more work has to be performed to determine

Experiments were also performed with loth, 16th and 24th

For quantisation of the LPC coefficients, initial experiments

For our experiments, the proposed audio coder was tested

the quality of the reconstructed audio signals. For

[2] J. Johnston, Perceptual transform coding of wideband

Das könnte Ihnen auch gefallen