Sie sind auf Seite 1von 5

International Conference on Communication Technology

TCCT98

October 22-24, 1998

Beijing, China

A Wavelet Filter Optimization Algorit

for Speech Recognition


Yao Kaisheng and Cao Zhigang
Department of Electronic Engineering, Tsinghua University
100084, Beijing, P.R.China
ksyao@mail .cic.tsinghua.edu.cn, caozg@mdc.tsinghua. edu.cn
ABSTRACT
In this paper, a wavelet filter optimization algorithm
is proposed. The proposed algorithm is based on
wavelet matrix parameterization, through which
wavelet filters can be factorized to a smaller set of
parameters. Based on the cross validation cost
function defined in this paper, Monte Carlo
simulation is used to search the optimal wavelet
filters for speech recognition. Using this wavelet
filter optimization algorithm, we can obtain optimal
wavelet filters for an isolated word recognition
system. Also in this paper, me1 wavelet cepstral
coefficients, the speech feature generated by the
optimized symmetrical shiftinvariant wavelet, is
described. Simulation results show that the
Continuous Density HMM based Isolated Word
Recognition System can be optimized to achieve a
recognition rate of 100% for 10 digits.
Key Words: Wavelet, Speech Recognition, Hidden
Markov Models

I.

INTRODUCTION

Wavelet is an attractive mathematical tools


because of its powerful structure and enormous
freedom. The multiresolution structure of wavelets
allows one to zoom in on local signal behaviour to
analyse signal detail, or zoom out to get a global
view of the signal. For this reason, currently, there
are many efforts in adopting discrete wavelet
transform (DWT) in the field of speech
recognition. However, the classical discrete
wavelet transforms are signal independent and
fixed transforms. It is clearly desirable to have
some transforms that adapt to the input signal for
a given signal processing task. Another well
known disadvantage of discrete wavelet transform
is its lack of shift invariance, which is a critical
preliminary request for many signal processing
tasks. Existence of many legitimate DWTs for
different shifted versions of the same signal
results in this disadvantage. In the speech
recognition field, wavelet transforms have shown
some advantages over ordinary transforms such as
FFT, for its constant-Q property. Since there are
many different kinds of wavelet filters with

different properties, it would be appropriate for


searching optimised wavelet filters for a speech
recognition system.
In this paper, symmetrical and shft invariant
wavelet transforms rather than ordinary DWTs
are used for speech feature generation. Using
wavelet matrices parameterisation algorithm, we
can represent wavelet filters by a smaller set of
factorised parameters. Through Monte Carlo
simulations, optimal wavelet filters for each class
of speech features are searched, resulting in a h g h
recognition rate for an Isolated Word Recognition
system.
This paper is organised as follows: In the second
section, Me1 Wavelet Cepstral Coefficients, which
is the speech feature for the given speech
recognition system, and symmetrical shiftinvariant wavelet are described. The third section
is on wavelet matrix parameterisation and
optimisation. The proposed algorithm details and
simulation results are shown in the fourth and
fifth section respectively. Conclusions are drawn
in the last section.
11.

MELWAVELET
CEPSTRAL
COEFFICIENTS AND SYMMETRICAL
SHIFT-INVARIANT
WAVELET

In this paper, the Me1 Wavelet Cepstral


Coefficients (MWCC) are used as speech features
for speech recognition. Procedure for generation
of MWCC is genedised as follows: The input
speech is first pre-emphasised by a factor of a ,
which is typically set to 0.98 or 0.95. After a
Hamming Window processing, a frame of the
processed speech signal is transformed to its
wavelet domain using a symmetrical and shiftinvariant wavelet. Then the transformed signal is
extracted into M me1 frequency channel with M
typically set to 13. DCT of log powers of these
channels is used to obtain the final MWCC feature
for a frame of input speech signal. It should be
noticed that, we use symmetrical and sMtinvariant wavelet to calculate MWCC, whereas in

S36-06-1

[4], MWCC is calculated by ordinary DWT,


which does not have shift-invariant property.
As above mentioned, one of disadvantages of
ordinary DWT is its lack of shift-invariant
property. For this reason, this paper adopts shiftinvariant wavelets for signal decomposition.
Several approaches have been reported for
constructing shift-invariant wavelets[2,5,6]. In
this paper, the autocorrelation shell of orthonomal
basis is used to construct shft-invariant wavelet
decomposition filter coefficients[7].
TWO symmetrical filters, P = { pl}-L+151sL-1
and

Q=IdLi15idL-l

are applied recursively to the

input speech signal, projecting input speech onto


the two orthogonal subspace Vi and W j at each
scale j . Suppose that S / and D/ are the
.
projected signal onto the space Vj and Wj

111.

Since different wiwelet filter will have different


properties to decomposed signals, we wish to
decide which filter is the best for a given signal
processing task. In this paper, a wavelet matrix
parameterisation proce$dure[1,3] is used for
decomposing wavelet filter into a smaller set of
parameters, through whlich optimal wavelet filter
can be obtained. Elased on a given cross validation
cost function, Mointe Cairlo simulation can be used
to search optimal wavelet filter.
A.
Wavelet Matrices
Given integers m 2 2 and q 2 1 , wavelet matrix is
an m x mqmatrix

A = (4A'. .. A~-')
( A j are

/=-L=l

where 0 5 i < N , 1 I j I
J , N is the input
signal length, J is the depth of wavelet
transform, and L is the low-pass filter length of
ordinary DWTs.
The filter coefficients of pi and qj are,

WAVELET
M 4m1x
PARAMETERIZATION AND
OPTIMIZ
ATICIN

(6)

m x m blocks) from whch the scaling and

wavelet functions of thLemultiresolution analysis


can be butlt. also^, this matrix fully determines the
transform. The given signal is convoluted with
every row of the .wavelet matrix and the results are
down-sampled by the factor m. Thts process is
repeated recursively until the desired depth is
reached.
Thefirstrowof A ,

q T ~( M=. . * h m G - l )

(3 1
and

YK,
- Pi,

qi;(

fori= o
otherwise

(4)

where coefficients {ai},=,


,L-l
are the correlation

(7)
plays a special role both in the transform and in
the construction of suc;h matrices. The necessary
and sufficient conditioins on a wavelet matrix for
the existence of the coi-responding multiresolution
analysis and the wavelet basis (both of which are
unique)
- . can be formulated as follows:
i. the shifted alrthogonality condition

E A~A;, = solr

,,L-, , of ordinary
of the low-pass filter, {hl}i=o,

DWT. i.e.,

11.

the basic regularity condition

Chi=&

ai = 2
c
:
' h,hl+i, for i is odd,
0,

fori is even.

LOW-passwavelet filter coefficients

(5)

...

111.

{ hl}l=o,,L-l

can be set to values of low-pass filter coefficients


in wavelet filter families of Daubeclues or Meyer.
From this point of view, low-pass wavelet filter
,L-i
of ordinav DWTs are
coefficients {411=*,
initial estimated values for searching optimal
wavelet filter.

the Lawton matrix condition: 1 must be a


simple eigenvalue of the Lawton matrix
the elements of which are

( We =

1,
hnhn+

lpml

The P,arameterisation
B.
The wavelet matrix can be written as a product of
linear factors
A = ( A , A , ... A,,-,)= H 6 F 06...OF,-, (8)
where

S36-06-2

=(PiI - P , ) , a n d P,isasymmetric
projector ( P, = Pi = P,?) and
2. H = A, is orthogonal ( HHT = I ) .

where y controls the sensitivity of Gk(

The polynomial product 8 is defined as follows:

d k ( X ) , and 4- is a s M t of classification

1. F,

Gk(X , Q k ) =

1 + eXP(- y d k (X)
+ 5)

(BOB,
... Bpi) 8 (CoC1... C,-J

(12)

x,&) to

= (4Q... Dp+i-2) distance.

The total cost function for input speech feature


X is defined as,
(9)
The basic regularity condition (ii) is then
equivalent to the requirement that the orthogonal

matrix

H has

G( Xi)=

(13)

where S(.)is Kronecker function.

its first row e q d to (I/&+.

When (i) and (ii) hold, the Lawton matrix


condition is not satisfied only in some exceptional
degenerate cases.
Using the results mentioned above, all we need for
the parameterisation of wavelet matrices are
parameterisation of matrices with orthonoraml
columns. On condition that all the projectors P,
in the factorisation are of rank 1, we thus need
just one normalised m-vector to describe each of
the linear factors; For the case of m equals to 2,
we will use a fixed orthogonal matrix H . This
leaves us with q free parameters.

If the input speech feature Xi is in the testing


speech set, G(X) is a cross validation cost
function, whch can prevent over-fitting case.
Averaged cost function for a set of input speech
feature for class k can be defined as,

where

Xi is one element of speech feature set for

class k , and Ik is the total number of elements in


the speech feature set for class k .
IV.

The Cross Validation Cost Function


C.
The choice of the cost function is a state-of-art
task. It must capture the essence of the problem
and fine tuning maybe required to achieve the
desired results.
In this paper. the cross validation cost function
was used as the cost function for optimisation of
wavelet filter for a 10 digits speech recognition
system.
For the HMM based speech recognition system,
the score of classlfying input speech feature X
into class k is defined as,

(10)
where 0 5 k 5 V- 1 , V is the total class numbers
of input speech, Q k is the k th HMM parameter,
and B k represents a state of k th HMM.
If the input speech feature X belongs to class k ,
the classification distance between class k and
other classes can be defined as,
dk( = -g, (X,@)+ maxgi(
(11)

x#)

x)

Gk(X,Qk)G( XE C k )
k=O

/k

The cost function for the input speech feature X


belongs to class k is defined as,

THEWAVELET
FILTER
OPTIMIZATION
ALGORITHM

The framework of the wavelet filter optimisation


algorithm for speech recognition is shown in
figure 1. The algorithm is a repetitious procedure.
Each repetition of the procedure consists of two
phases: one is training phase. In this phase, for a
time index of n + l , previous wavelet filter
,L-l > H[MM parameters
coefficients

{f?w},=o,

&(n) for speech of class k , and cross validation


cost function G (n) for the previous testing set of
speech belongs to class k are stored. After this, a
new set of wavelet matrices parameters,
(ek(n+1), O S i I q - 1 } , through whch new
wavelet filter coefficients, (h; ( n+ I)},,~

L-,
,,

, can

be constructed, are randomly generated, and new


HMM parameters, ak(
n + 1) , for this class of
speech is trained by the resulting training set of
features,
{ X:(n+ I), 0 I i 4 Ikt,a,nmg } ( Iktrammg is the total
number of training utterance for speech of class
k ); Another phase is testing phase: In this phase,
of
speech
features,
testing
set
{ X;(n+l), 0 I i < Iktwlng } ( where Iktemng is the
total number of testing utterance for speech of

Mwcc

S36,-06-3

class k ) , are extracted by symmetrical sluftinvariant


wavelet
fdters
and
generated in the previous training phase. Each
elements in the testing set of speech features are
classified by the trained HMMs. Current cross
validation cost function, G k( n+ 1) , for the testing
set of speech belongs to class k is calculated and
compared with the stored cross validation cost
functions G k ( n ). If G k ( n + l ) is smaller than
Gk(n), {h;(n+l)}l=or,,L-, are further tested by
Lawton matrix condition. If the condition can be
satisfied by { h: ( n+ l)},=, L-l , the wavelet filter
,

calculate 13 me1 chanriel MWCC features, which


was dependent on each class's wavelet filter
{hf(n)},+ ,L-l at time index n . For each class of
speech, 50 uttemnces of speech were used in the
training phase and 10 utterances of speech were
used in the testing phase.
Wavelet filter for each class of speech signals was
optimised using abok e mentioned optimisation
algorithm. Optirnisation procedure was terminated
after 100 repetitions for each class of speech.
The speech recognitioin system used in this paper
was a Continuos Density HMM based speaker
dependent isolated digits recognition system.
After optimisatilon of wavelet filter for each class
of input speech, the system can have a recognition
rate of 100% for recognition of 10 Chinese digits
(from 0 to 9).

parameters cDk(n+ 1) are used in the next


Input Speech

repetition; otherwise, wavelet filter coefficients


and HMM Parameters Q k (4 are
restored. Each class of wavelet filter will be
recursively optimised by this procedure until a
given total cost function threshold, or a maximum
allowed number of optimisation repetitions is
reached.

{ h: (n)}l=o,
,L-,

Parametelizatim

SIMULATION
RESULTS

V.

In this paper, we test the proposed algorithm by a


10 digits speech recognition system. V was set to
10 for this system. For each class k of input
speech, optimisation of
wavelet filter,
{h;}l=o,
,L-, , with a filter length L of 8 was
performed through the proposed optimisation
algorithm. In the simulation, factorised wavelet
matrices at time index n for speech class k were
set as follows,

Figure 1 'The W,avelet Filter Optimisation

cosD,!(n)
sinok(n)
(n)=
- sin 0; (n) cos o;(n)

ck

Algorithm

where O<o,k(n)<27r, O S i - t m , O S k < V - 1


and m = 2 . 0,"(n) was randomly generated in the
training phase of optimisation procedure for class
k at time index n .
For each class k , pure speech signals were
digitised at 11025Hz, and stored in A-Law format.
After conversion to a linear scale, a Hamming
analysis window with a frame length of 512 that
was shifted with 128 one frame was used to

CONCLUMONS
By parameterisation of wavelet matrices, we can
factorise wavelet filter into a smaller set of
parameters, wlhich cain be assigned with randomly
generated value if the shifted orthogonality
condition, the basic regularity con&tion, and the
Lawton matrix condition can be satisfied by the
wavelet filter. Based on the defined cross

S36-06-4

VI.

validation cost function, Monte Carlo simulation


was used to search optimal wavelet filters for a
Continuous Density HMM based isolated word
recognition system. Symmetrical Shift-Invariant
wavelets were used in the feature generation for
the speech recognition system. After 100
repetitions of wavelet filter optimisation for each
class of speech, the speech recognition system can
achieve a recognition rate of 100% for a speaker
dependent 10 digits recognition task.
References:
Jaroslav Kautsky and Radka Turcajova,
Adaptive Wavelets for Signal Analysis, Lecture
Notes in Computer Science, Vol. 970, V. Hlavac
and R. Sara editors, Springer Verlag 1995,pp.
906-911;
Haitao Guo, Theoq and Applicationsof the
Shift-Invariant,Time-Varying and Undecimated
Wavelet Transform, M.Sci. Thesis, Electrical and
Computer Engineering Department,Rice
University, May, 1995;
Yvette Mallet, Danny Coomans, Jerry Kautsky,
and Olivier De Vel ,ClassificationUsing
Adaptive Wavelets for Feature Extraction, IEEE
Trans. on Pattern Analysis and Machine
Intelligence, Vol. 19, No. 10, October 1997;
Hubert Wassner and Gerard Chollet, New
time-frequency derived cepstral coefficients for
automatic speech recognition, Proceedings of the
8th European Signal Processing Conference
(Eusipco96);
Mark JShensa, The Discrete Wavelet
Transform: Wedding the A Trow and Mallat
Algorithms, IEEE Trans. on Signal Processing,
Vol. 40, No. 10, Oct., 1992;
Olivier Rioul and Pierre Duhamel, Fast
Algorithm for Discrete and ContinuousWavelet
Transform,IEEE Trans. on Information Theory,
Vol. 38, No. 2, Mar. 1992;
N. Saito and G. Beylkin, Multiresolution
representationsusing the auto-correlation
functions of compactly supported wavelets, IEEE
Trans. on Signal Processing, 41 (1993), pp. 35853590.

S36-06-5

Das könnte Ihnen auch gefallen