Beruflich Dokumente
Kultur Dokumente
Marc Thuillard
Siemens Building Technologies
Alte Landstrasse
CH-8708 Maennedorf
Switzerland
mailto:Marc.Thuillard@bluemail.ch
INTRODUCTION
Over the last 5 years, we have developed a number of techniques combining wavelet theory
and soft computing. (Soft computing is generally defined by its techniques: neural networks,
fuzzy logic, uncertainty modeling, evolutionary computing). These techniques have been
successfully applied in the domain of fire detection. For instance, the combination of fuzzy
logic and multiresolution analysis has lead to new algorithms. They permit a better and more
reliable detection of a fire with multisensors fire detectors. Wavelet analysis plays also a
central role in our last generation of flame detectors. The detection of a fire with a flame
_____________________________________________
*Email: Marc.thuillard@CCH.CERBERUS.CH
“Fuzzy logic in the wavelet framework,“ M. Thuillard, Proc. Toolmet’2000 —Tool Environments and
Development Methods for Intelligent Systems, April 13-14 2000 ”, L. Yliniemi, E. Juuso (eds.), Oulu, 15-36
(2000). For a complete exposition see “Wavelets in Soft Computing”, M. Thuillard, World Scientific Press
(2001).
detector represents a difficult problem. The radiation from a flame is typically one hundred
times smaller than the measured radiation of the sun in the considered optical bands. The
reaction time of the flame detector must be of the order of a few seconds. A flame detector
detects a fire by measuring the pulsating radiation of a flame. A model predicting the
pulsation frequency was developed and compared to the outcome of numerous experiments
[1]. A number of expert rules that characterize the spectra of flames radiation were found.
These rules are used on-line by combining fuzzy techniques and wavelet analysis [2]. The
detector is capable with these new algorithms to recognize the fingerprint of a flame and to
exclude possible false alarms due, for instance, to the flickering of the sun reflected on a
water surface. This represents a major advance in flame detection.
In this article, we will first review the connections existing between fuzzy logic and wavelets
and explain some of the techniques used in flames detectors. In the second part of the article,
we will focus on on-line learning. We present methods to determine locally and adaptively an
appropriate resolution of the membership functions as well as the fuzzy rules describing a
control surface. These methods have the advantage to necesitate only a very small computing
power and can be implemented in detectors either for diagnosis or during field campaigns
that are made early in the development of a new detector.
Flame
Wavelet analysis
Detector
Signal
Fuzzy logic
Decision Algorithms
Figure 1: The new generation of flames detectors from SBT (Cerberus Div.) associates fuzzy
logic and wavelet analysis for signal processing.
1. Fuzzy-wavelet methods
Fuzzy logic has found applications in basically all domains of science, from biology to
particle physics. The majority of applications are clearly in the domain of control. The
linguistic interpretability of fuzzy rules is certainly one of the main reasons for the success of
fuzzy logic. Fuzzy logic furnishes a framework to fuse qualitative or even unprecise
knowledge to information contained in a databank or mathematical expressions. A major
challenge to fuzzy logic is the translation of the information contained implicitely in a
collection of datapoints into linguistically interpretable fuzzy rules. Neurofuzzy methods
have been developed for this purpose. A serious difficulty with most neurofuzzy methods is
that they do often furnish rules without a transparent interpretation; a rule is referred as being
transparent if it has a clear and intuitively correct linguistic interpretation. A solution to this
“Fuzzy logic in the wavelet framework,“ M. Thuillard, Proc. Toolmet’2000 —Tool Environments and
Development Methods for Intelligent Systems, April 13-14 2000 ”, L. Yliniemi, E. Juuso (eds.), Oulu, 15-36
(2000). For a complete exposition see “Wavelets in Soft Computing”, M. Thuillard, World Scientific Press
(2001).
problem is furnished by multiresolution techniques. The basic idea is to use a dictionary of
membership functions forming a multiresolution and to determine which membership
functions are the most appropriate to describe the control surface (figure 2). In order to
associate a linguistic interpretation to each membership function, the membership functions
are chosen among the family of scaling functions that have the property to be symmetric,
everywhere positive and with a single maxima. This family includes among others splines and
some radial functions [3]. The main advantage of using a dictionary of membership functions
is that each term, such as „small“, „large“ is well defined beforehand and is not modified
during learning. The multiresolution properties of the membership functions in the dictionary
function permits to fuse or split membership functions quite easily so as to put the control
surface under a linguistically understandable and intuitive form for the human expert.
Different techniques, generally referred by the term „fuzzy-wavelet“, have been developed
for data on a regular grid. Before explaining them, let us explain fuzzy logic in the
framework of the Takagi-Sugeno model.
In the Takagi-Sugeno model the fuzzy rules are expressed under the form [4]:
Ri : if x is Ai then y j = f j (x ) . (1)
Here Ai are linguistic terms, x is the input linguistic variable, while y=(y1 ,...,yj,...ymax) is the
output variable. The value of the input linguistic variable may be crisp or fuzzy. If the value
of the input variable is a crisp number then the variable x is called a singleton. As an
example, suppose that xi is a linguistic variable for the temperature. The value x̂ i of the input
linguistic variable may be given by a crisp number such as „30 (°C)“ or by „about 25“ in
which „about 25“ is itself a fuzzy set.
yˆ j = ∑ â i ⋅ f j ( xˆ )/ ∑ â i (2)
i i
in which the degree of fulfillment â i is given by the expression: has â i = ì A ( x̂ ) with ì A ( x̂ )
i i
the membership function to the linguistic term Ai. In many applications, a linear function is
taken :f (xˆ) = a T ⋅ xˆ + b . If a constante bj is chosen to describe the crisp output yi, the system
becomes :
Ri : if x is Ai then y j = bj . (3)
In this particular case, the ouput y is a linear sum of translated and dilated splines. This means
that under this last form the Takagi-Sugeno model is equivalent to a multiresolution spline
model. It follows that wavelet-based techniques can be applied here.
“Fuzzy logic in the wavelet framework,“ M. Thuillard, Proc. Toolmet’2000 —Tool Environments and
Development Methods for Intelligent Systems, April 13-14 2000 ”, L. Yliniemi, E. Juuso (eds.), Oulu, 15-36
(2000). For a complete exposition see “Wavelets in Soft Computing”, M. Thuillard, World Scientific Press
(2001).
large y
medium
Signal
1 2
small x
1.2.1 Introduction
Though the idea of combining fuzzy logic and wavelet theory is new, wavelet is a well-
established domain of research. Wavelet analysis started in the 80’s. Scientists processing
recording of seismic waves recognized the need for methods allowing the analysis of the
signal at different resolutions. In the 90’s, multiresolution analysis has grown into a very
active field, with the appearance of very efficient computing methods, that has lead to a
synthetic view of the work done in the signal processing and the mathematical community.
A wavelet decomposition consists of the iterative decomposition of a signal into a coarse and
a detail approximation. The original signal can be reconstructed with a second algorithm. The
possibility of reconstructing the signal after decomposition has resulted in several applications
in the domain of noise reduction and data compression.
“Fuzzy logic in the wavelet framework,“ M. Thuillard, Proc. Toolmet’2000 —Tool Environments and
Development Methods for Intelligent Systems, April 13-14 2000 ”, L. Yliniemi, E. Juuso (eds.), Oulu, 15-36
(2000). For a complete exposition see “Wavelets in Soft Computing”, M. Thuillard, World Scientific Press
(2001).
Signal
Time
One of the first applications of multiresolution analysis was in the domain of data
compression. Multiresolution techniques were successfully implemented to compress the FBI
fingerprint datafiles.
Some of the most fruitful soft computing methods are inspired by the biological world. The
nervous system has found its mathematical pendent in the neural network, darwinism has
influenced evolutionary computing and genetic algorithms. Wavelets or multiresolution
analysis can be related to the biological world. The mechanisms behind the perception of
colour by the brain seem indeed to rely on wavelets. Wavelet theory can also be related to
common sense. Everydays’ experience, teaches that the difference between two actions lies
often in small details. Finding the important details is difficult, since experience also shows
that focusing only on details leads to a tremendous waste of time and unproductive works.
Finding the right balance between details and coarse features or actions is a highly human
activity, that finds somehow its mathematical expression in wavelet theory.
Multiresolution analysis has become in the last few years a quite standard tool in signal
processing. The main applications of multiresolution analysis have been so far mostly in the
domain of image processing and speech recognition. The image processing community has
“Fuzzy logic in the wavelet framework,“ M. Thuillard, Proc. Toolmet’2000 —Tool Environments and
Development Methods for Intelligent Systems, April 13-14 2000 ”, L. Yliniemi, E. Juuso (eds.), Oulu, 15-36
(2000). For a complete exposition see “Wavelets in Soft Computing”, M. Thuillard, World Scientific Press
(2001).
been using algorithms containing elements of multiresolution analysis for already quite some
years. Multiresolution has also been used successfully in speech processing and data
compression. Historically, one generally finds the roots of wavelet theory in the work of
Morlet, a frenchman working for Elf-Aquitaine, in the domain of oil research. Morlet
recognized the need for signal processing techniques for the analysis of seismic waves going
beyond Gabor analysis of short-time signals. Morlet modified the Gaussian window used by
Gabor. In order to palliate to a drawback of Gabor’s approach, namely the bad resolution
obtained at high frequencies due to the constant window-size of the window, Morlet used
variable-sized windows. Morlet tagged the name wavelet, meaning little wave. Due to lack of
funding and interest by his company, no real-world applications appeared then. Grossmann,
another french scientist, heard of Morlet’s work. Helped with his background in quantum
mechanics, he grasped rapidly the potential of Morlet’ s wavelet and contributed significantly
to further developments. In the 80’s a further important development took place. Mallat, a
french scientist working at the time in New-York, proposed an algorithm that permits to
reduce very considerably the computing burden of a wavelet transform. After the discovery of
this algorithm, the close connection existing between the theory of subband coding,
quadrature filters and the fast wavelet decomposition was recognized. This has permitted to
unify two seemingly disjuncted fields, wavelet theory that was essentially the domain of
mathematicians with filter theory. Recently, wavelets of the second generation have appeared.
They are more flexible and permit to solve important problems, such as the representation of a
signal close to boundaries.
In the last years, new developments have shown the utility of wavelet theory and
multiresolution analysis in the domain of soft computing. The connection between wavelet
and neural networks was recognized, followed by the use of evolutionary algorithms in
connection to search algorithms. Very recently, new important applications of wavelet theory
have appeared in the domain of fuzzy techniques.
One of the main idea of wavelet analysis is already contained in the short-time Fourier
transform, namely the decomposition of a signal on dilated and translated of a basis function.
The main difference between the wavelet transform and the Gabor transform is that the time-
frequency window of the Gabor transform is independant of the position and dilation of the
window, while for the case of a wavelet transform, the time-frequeny window depends on the
dilation factor of the wavelet. At low frequency, the time-window is much larger than at
higher frequencies. This property is often very desirable in signal processing. It is often
necessary to have a result on the high-frequency part of the signal with a good time resolution,
while a less good resolution for the low frequencies is not so much of a problem in most
applications. Figure 4 compares the windows of the short-time Fourier transform to the
wavelet transform. We give below first a general definition of a wavelet.
Definition 1:
A function ψ is called a wavelet if there exists a dual function Ø~ such that a function f
∈ L2 (ℜ) can be decomposed as
∞
∑ < f, Ø
~
f(x) = m, n > Ø m, n (x) (5)
m, n = −∞
The series representation of f is called a wavelet series. The wavelet coefficients dm,n are given
by dm,n= < f, Ø~ m, n > .
“Fuzzy logic in the wavelet framework,“ M. Thuillard, Proc. Toolmet’2000 —Tool Environments and
Development Methods for Intelligent Systems, April 13-14 2000 ”, L. Yliniemi, E. Juuso (eds.), Oulu, 15-36
(2000). For a complete exposition see “Wavelets in Soft Computing”, M. Thuillard, World Scientific Press
(2001).
Definition 2:
orthonormal basis of L2 (ℜ) ( < ψ j, k, ψ l, m>= ∫Ø j,k (x) Ø l,m (x)dx = δ j,l δ k,m ) and every
−∞
∞
f(x) = ∑d m, n Ø m, n (x) (6)
m, n = −∞
The wavelet coefficients dm,n of an orthogonal wavelets are given by dm,n= < f, Ø m, n > . This
follows from the fact that for an orthogonal wavelet the dual function Ø~ is identical to the
wavelet Ø .
Frequency Frequency
Time Time
Haar
Wavelet
Figure 4: Time-frequency tiling of the time frequency domain. Left: Fast Fourier transform,
Right: Wavelet transform. Below example of dilated and translated wavelets.
The definition of an orthogonal wavelet is quite similar to the definition of a Fourier series.
Actually the only difference lies in the definition of the candidates functions to realize the
projection and also in the relaxing of the condition that the function must be periodic. In a
Fourier series, cosine and sine are used as basis functions together with integer dilated of the
“Fuzzy logic in the wavelet framework,“ M. Thuillard, Proc. Toolmet’2000 —Tool Environments and
Development Methods for Intelligent Systems, April 13-14 2000 ”, L. Yliniemi, E. Juuso (eds.), Oulu, 15-36
(2000). For a complete exposition see “Wavelets in Soft Computing”, M. Thuillard, World Scientific Press
(2001).
two basis functions cos(2 πt) and sin(2 πt). In orthogonal wavelets, dilated and translated of a
function are taken: ø m, n = 2 m/2 ⋅ ø(2 m (x − n)) with m,n ∈ Ζ.
1 0 ≤ x< 1/2
ψ H(x) =
-1 ½< x ≤ 1 (7)
0 otherwise
The fast wavelet transform is from the practical point of view the most important algorithm in
multiresolution analysis. Contrarely to the fast Fourier transform that can be applied in most
cases without a deep knowledge of the algorithm, it is recommandable to understand the fast
wavelet algorithm before using wavelets in an application.
The fast wavelet transform, permits the computation of the wavelet transform. At each level
of the transform, the data are processed through a low-pass and a high-pass filter. The high-
pass filtered data are known as the detail wavelet coefficients. The result of the low-pass
transform is used as input data to compute the next level of detail wavelet coefficients.
In order to understand the fast wavelet transform algorithm, we will first introduce a few new
concepts that represent the foundations of multiresolution analysis. Most textbooks follows
here a somewhat abstract approach. We will simplify the formulation to convey the main
ideas of multiresolution analysis.
One of the most important concept of multiresolution analysis lies in the definition of nested
spaces. Nested spaces are like russians dolls, they fit nicely into each others and the smaller
doll is contained in the larger dolls. Figure 9 shows an example of a nested space, together
with a representation of the complementary spaces W0 and W-1 .
W0
V1
W -1
V0
V-1
“Fuzzy logic in the wavelet framework,“ M. Thuillard, Proc. Toolmet’2000 —Tool Environments and
Development Methods for Intelligent Systems, April 13-14 2000 ”, L. Yliniemi, E. Juuso (eds.), Oulu, 15-36
(2000). For a complete exposition see “Wavelets in Soft Computing”, M. Thuillard, World Scientific Press
(2001).
Figure 5: Example of nested spaces: V-1 ⊂ V0 ⊂ V1 . The space W-1 is the complementary space
of V-1 with W-1 ⊕ V-1 . Similarly V0 = W0 ⊕V0 .
The concept of nested spaces can be applied to spaces generated by linear combinations of a
function, say ö . We define V1 as the space generated by ö (2x) and its integer translates. The
space V1 corresponds to all possible combinations of ö and its integer translates:
V1 :{ ö (2x-n)}.
Let us consider now a second space V0 , generated by the 2x dilated function ö (x) and its
translates: V0 : {ö (x-n))}.
Generally speaking, it follows from V0 ⊂ V1 that any function in V0 can be written as a linear
combination of the functions generating V1 .
The space W0 is the complement of the space V0 . Following the same line of thought as
previously, we have W0 ⊂ V1 . It follows that any function ψ in W0 can be written as a linear
combination of the basis functions in V1 .
The two equations above are the so-called dilation equations or two scales relations. These
equations are central to multiresolution analysis. They permit the reconstruction of a signal
starting from the wavelet coefficients (or details coefficients) and the lowest level of
approximation coefficients.
Figure 6: Illustration of the relation ö (x) = ∑ gn ö (2x-n) for the second order B-spline. The
triangular spline function can be decomposed into the sum of translated triangular functions
at the higher level of resolution.
d m, n : wavelet coefficients
or details coefficients at
the m th level of
decomposition
d m-1,n d m-2,n d m-3,n
High-pass filter
c m,n : approximation
coefficients or coarse
approximation coefficients
at the m th level of
decomposition
The decomposition algorithm uses the following iterative decomposition relation to compute
the approximations and details coefficients at one lower level level of resolution:
with the filter coefficients g r −2n corresponding to the low-pass filter coefficients and
h r −2n to the high-pass filter coefficients.
1.3 Determining an appropriate local resolution for the membership functions in a fuzzy
system.
“Fuzzy logic in the wavelet framework,“ M. Thuillard, Proc. Toolmet’2000 —Tool Environments and
Development Methods for Intelligent Systems, April 13-14 2000 ”, L. Yliniemi, E. Juuso (eds.), Oulu, 15-36
(2000). For a complete exposition see “Wavelets in Soft Computing”, M. Thuillard, World Scientific Press
(2001).
The problem of finding a good description of a control surface in terms of the scaling
functions can be solved by using the larger coefficients among the reconstructed coefficients
c′m,n .and the lowest level approximation coefficients. The coefficients c′m,n correspond to
thresholded coefficients of the scaling function after reconstruction with the filter coefficients
qn .This procedure is illustrated in figure 8.
c 'm, n =0 if ∑q ⋅d
r
n−2r m−1,l <T
∑
c 'm,n = qn−2r ⋅dm−1,n
r
if ∑q ⋅ d
r
n−2r m−1,n ≥T
A matching pursuit algorithm, similar to the one in [10], may be used to decompose the signal
as a sum of scaling functions. For low order splines, we have used a slightly modified
matching pursuit, which is described below. The algorithm is especially efficient if the
original signal f(x) is a sum of splines with a few large coefficients.
Residue
Figure 9: Appropriate membership functions and rules to describe a set of data may be
determined with a matching pursuit algorithm.
Define a dictionary D ={ Φ km , n } of spline scaling functions. The index k indexing the order of
the spline function, m the dilation and and n the translation. (The functions Φ km , n fulfill the
following condition: ∑ Φ km , n ( x ) = 1, ∀k , m ).
n
“Fuzzy logic in the wavelet framework,“ M. Thuillard, Proc. Toolmet’2000 —Tool Environments and
Development Methods for Intelligent Systems, April 13-14 2000 ”, L. Yliniemi, E. Juuso (eds.), Oulu, 15-36
(2000). For a complete exposition see “Wavelets in Soft Computing”, M. Thuillard, World Scientific Press
(2001).
c
1) For each scaling function in the dictionary compute the coefficients km, n =< Φ km ,n ( x ), f ( x ) > .
.
The condition c k
m ,n > β sup m ',n ' c km ', n ' with 0<β≤1 ensures the convergence of the matching
pursuit [10]. Further, the algorithm tries to locate single large coefficients. This is done by
choosing a not too high value for beta and by keeping the lowest resolution coefficients
among the ones satisfying the above condition.
Figure 10 illustrates the algorithm with a simple example: the decomposition of a second
order spline function with a semi-orthogonal spline construction using a dictionary of second
order splines. A value β above 0.68 restricts the best matching coefficients to the bald
coefficients. The coefficient corresponding to the scaling function with the lowest resolution
is then chosen (underlined). In this example, the algorithm furnishes the best matching
function after a single iteration of the algorithm.
Coefficients:
0 0.5 1 0.5 0
0 1 0
Figure 10: Illustration of the search algorithm with a modified matching pursuit algorithm
for a second order spline using a dictionary of second order splines.
Preprocessing with orthogonal wavelets can be used in connection to fuzzy logic to express
fuzzy rules in the frequency domain. Multiresolution analysis simplifies, for instance,
considerably the design of a fuzzy controller containing rules in the frequency domain of the
form
|Tm (ω)|2
Daubechies Filter Trees
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 2 4 6 8
Frequenz [Hz]
10 12 14 16 High-Pass
Low-Pass
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 2 4 6 8 10 12 14 16
Frequenz [Hz]
Figure 11: Fuzzy rules in the frequency domain can be designed with a lot of flexibility using
wavelet-packets techniques. The membership functions are determined by the filter tree used
for the multiresolution.The bottom part shows how to introduce a new membership function to
reach a better resolution (arrows).
The method uses the fact that a multiresolution analysis with orthogonal wavelets corresponds
to the iterative filtering of the signal with quadrature filters [2]. An orthogonal wavelet
decomposition fulfills therefore the power complementary condition. This means that the
frequency windows Tm corresponding to the first high-pass filters and the pth low-pass filter
Tlow satisfy the condition:
2 2
∑ Tm (ω) + Tlow (ω) =1 ∀ ω. (15)
m =1...p
p
µ (Tm) = ∑ (d m,n ) 2 /( ∑ ∑ (d m,n ) 2 + ∑ (c ) 2) (16)
n m =1 n n low , n
The main advantage of this method resides in its simplicity. In the Fourier domain, the same
result would be obtained by multiplying the power spectra obtained after Fourier transforming
the signal by |Tm(ω)|2 . The Fourier approach is more demanding in computing power than the
“Fuzzy logic in the wavelet framework,“ M. Thuillard, Proc. Toolmet’2000 —Tool Environments and
Development Methods for Intelligent Systems, April 13-14 2000 ”, L. Yliniemi, E. Juuso (eds.), Oulu, 15-36
(2000). For a complete exposition see “Wavelets in Soft Computing”, M. Thuillard, World Scientific Press
(2001).
wavelet approach. From the industrial point of view, the wavelet approach is extremely
interesting. The wavelet coefficients may be used for other purposes such as signal denoising
or for removing an offset. Many different functions can be carried out at a reasonable
computing power. This is very important in speed-, price- or current critical applications.
In the first part, we have addressed the problem of developping a fuzzy controller from
experimental data. We have considered datapoints on a regular grid and presented methods to
find locally appropriate membership functions and rules to describe the underlying control
surface. Learning was off-line and the experimental data were stored in a databank. In this
second part, we address the on-line learning problem. We present 3 different learning methods
based on biorthogonal wavelet networks [6-9], perceptrons and estimators. We consider the
case in which the signal processor is capable of making some computations but has too little
memory to store many datapoints. This case is quite often encountered in on-line problems,
for instance in sensorics. Under these conditions, most cross-validation methods are not
implementable (For reviews on wavelet-based estimators see [11,12]). A very simple cross-
validation technique must be used. We propose two related approaches based on a validation
procedure using either the reconstruction algorithm or the fast wavelet decomposition
algorithm. In the first case, the local estimation of the underlying curve in the space spanned
by the scaling function at a given level of resolution must be consistent with an estimation in
the space spanned by the scaling function and the biorthogonal wavelets at one level of
resolution lower. The second cross-validation method compares the estimation of the surface
at one level of resolution with the estimation at one lower level of resolution. This is done by
decomposing the approximation coefficients with the low-pass filter associated to the fast
wavelet decomposition algorithm.
The similarities existing between the structure of a perceptron and a wavelet decomposition
have been used in so-called wavelet networks. We will discuss here a simple case, the one-
dimensional case and compare both the perceptron and the wavelet decomposition structure.
In several dimensions, the formalism is essentially the same, the notations are only slightly
more complicated.
Perceptron
k
f(x) = ∑ ωi ⋅ ø(a i ⋅ x + b i ) (17)
i=1
withø the activation function and ai, bi, ωi the network parameters (weights) that are
optimized during learning. A possibility is to use a wavelet as activation function. In this case
the perceptron is described as a wavelet network (or sometimes a wavenet). Depending on the
problem, different approaches can be taken. The dilation, translation and weights can be
optimized by the network. In this case, the parameters are generally obtained from a gradient
“Fuzzy logic in the wavelet framework,“ M. Thuillard, Proc. Toolmet’2000 —Tool Environments and
Development Methods for Intelligent Systems, April 13-14 2000 ”, L. Yliniemi, E. Juuso (eds.), Oulu, 15-36
(2000). For a complete exposition see “Wavelets in Soft Computing”, M. Thuillard, World Scientific Press
(2001).
descent or through least mean squares. If the network is properly initialized, then the network
can be quite parcimonious. An interesting alternative consists of using a dictionary of dyadic
wavelets and to optimize only the weights ωi.
x2
ψ(a i ⋅ x + bi )
k
... f (x ) = ∑ω ⋅ ψ(a
i= 1
i i ⋅ x + bi )
Wavelet as
activation functions
xn
Figure 12: The structure of a wavelet network is very often the one of the perceptron.
Wavelet network
In its simplest version, a wavelet network corresponds to a 3-layers perceptron using wavelets
as activation functions.
_
f(x) = ∑ d m,n ⋅ ø m, n (x) + f (18)
n,m
_
with f the average value of f , d m,n the coefficients of the neural network and ø the wavelet.
If orthogonal wavelets are taken and only the weights ci are optimized, then a simple gradient
descent will lead to a global minimum under the conditions described below. Let us assume a
function f(x) = ∑ d m,n ⋅ ø m,n (x) (with a finite number of coefficients). At each step, a new
m,m
datapoint (yk , xk ), satisfying f(xk )=yk , is furnished to the network. The error E(k) is given
by E(k) = (f̂(x) − ∑ d m,n ⋅ ø m,n (x) ) 2 = ∑ ((d̂ m,n − d m,n ) ⋅ ø m,n (x)) 2 . The diagonal terms
m,n m,n
vanish due to orthogonality! Taking the derivative along d̂ m , n one obtains:
Fuzzy- wavenets
Fuzzy wavenets are wavelet networks based on wavelets with some special properties: the
scaling function associated to these wavelets must be symmetric, everywhere positive and
with a single maxima. Under these conditions, the scaling functions can be interpreted as
membership functions.
Wavelet network
Perceptron
Figure 13: The most popular wavelet networks are based on the perceptron structure. Fuzzy
wavelet networks, also called fuzzy wavenets, can be regarded as a neurofuzzy model which
belongs at the same time to the set of wavelet networks.
2.2 Fuzzy-wavenets
Figure 14 shows the architecture of the learning algorithm. It consists of a series of neural
networks, using both wavelets ψ m,n (x) and scaling functions ö m,n (x) as activation functions.
Each neural network uses activation functions of a given resolution. The mth neural network
optimizes the coefficients ĉ m, n and d̂ m , n , with fm(x) the output of the mth neural network.
The structure of the network is similar to wavenets [5,6]. The main difference is that the
method is generalized to biorthogonal wavelets. The motivation is that if spline-wavelets or
radial wavelets are taken, then it is straightforward to transform the results into a linguistically
interpretable fuzzy system (see part 1).
The evolution equation for the details d̂ m,n (k ) and the approximations coefficients
“Fuzzy logic in the wavelet framework,“ M. Thuillard, Proc. Toolmet’2000 —Tool Environments and
Development Methods for Intelligent Systems, April 13-14 2000 ”, L. Yliniemi, E. Juuso (eds.), Oulu, 15-36
(2000). For a complete exposition see “Wavelets in Soft Computing”, M. Thuillard, World Scientific Press
(2001).
ĉ m ,n ( k) at step k are given by
~
with yk (x), the kth input point and LR the learning rate, ø~ m,n (x), ö m,n(x), the dual functions
to ø m,n(x) and ö m,n(x). The evolution eq.(22,23) describe the evolution of fm(x). Let us
assume that the datapoints yk lie on the function f(x) and that xk are i.i.d (uniform
distribution). At each step the function fm(x)is updated by a term which expectation is
proportional to the difference between fm(x) and the projection of f(x) on the space Vm+1 ,
spanned by { ø m, n(x) , ö m, n (x) }:
< f ( x ) − f m (x ), ~
ö m,n ( x ) > ⋅ö m ,n ( x) + < f ( x ) − f m ( x ), ø~ m ,n ( x) > ⋅ø m,n ( x) =
< f ( x ), ~
ö m ,n ( x) > ⋅ö m,n ( x) + < f ( x), ~
ø m,n (x ) > ⋅ø m,n ( x ) − f m (x )
In the adiabatic sense, the expectation of the function fm(x) converges to the projection of f(x)
on the space Vm+1 . Since ø m, n(x) , ö m, n (x) are independent, it follows that ĉ m ,n → c m ,n
and d̂ m,n → d m,n . A simple local validation criterium for an approximation coefficient
ĉ m,n is to request that this coefficient can be approximated from the approximations and
details coefficients {ĉ m −1,n , d̂ m −1,n } at one lower level of resolution. At each iteration step,
the weights from the different networks are cross-validated using a property of the wavelet
decomposition, namely that the approximation coefficients ĉ m,n at level m can be computed
from the approximation and wavelet coefficients at level m-1 using the reconstruction
algorithm [10].
Input ...
ĉ 2 ,n − ∑r p n−2r ⋅ ĉ1,r +q n−2r ⋅ d̂ 1, r ≤∆
In order for a coefficient to be validated, the difference between the weight of the membership
function (model m) and the weight computed from the approximation and wavelet
coefficients at one level of resolution lower (model m-1) must be smaller than a given
threshold. As validation criterium for the coefficient ĉ m,n , we require
Figure 15 shows an example, using this strategy with i.i.d datapoints. Biorthogonal spline
wavelets and scaling functions (biorthogonal spline wavelets with p=2; ~p = 4 ) ) proposed by
Cohen [9] are used as activation functions. The model consists of an array of 3 neural
networks, each corresponding to a different resolution. The most appropriate membership
functions and rules are chosen adaptively during learning. With only a few points, not much
information on the control surface is known and the control surface is better described with a
small number of rules. As the number of points increases, the number of rules is raised if
necessary. The method furnishes an automatic procedure to determine adaptively the „best“
membership functions and rules. The decision on which coefficient to use is obtained from the
validation eq.(12) (∆=0.1). The „best“ coefficients are chosen adaptively among the set of
validated coefficients. The validated coefficients corresponding locally to the highest
resolution are kept (default coefficient=average value).
“Fuzzy logic in the wavelet framework,“ M. Thuillard, Proc. Toolmet’2000 —Tool Environments and
Development Methods for Intelligent Systems, April 13-14 2000 ”, L. Yliniemi, E. Juuso (eds.), Oulu, 15-36
(2000). For a complete exposition see “Wavelets in Soft Computing”, M. Thuillard, World Scientific Press
(2001).
Figure 15: Input function and output of the fuzzy-wavenet after 60 steps (above).
Below:output of the 3 neural networks at step 60.
The convergence of the fuzzy wavenet method is not too fast, as the method requires for
stability reasons to use a small learning rate in comparison to a perceptron. For this reason,
one may consider another approach using only scaling functions. The basic structure of the
network is similar to the fuzzy wavenets except that the mth neural network optimizes the
coefficients ĉ m,n , with fm(x) the output of the mth neural network.
The validation procedure uses the decomposition algorithm to compare the results at two
levels of resolution.
with g the coefficients of the filter associated to the lowpass decomposition filter in the fast
wavelet decomposition algorithm. The validation criteria for ĉm,n is then
k max k max
f(x) = ∑ K ((x - x k )/ λ) ⋅ y k / ∑ K ((x - x k )/ë ) (30)
k =1 k =1
Nadaraya-Watson estimators have two interesting properties: they are local mean-squares
estimators and in the case of a random design they can be shown to be bayesian estimators of
(xk ,yk ), in which (xk ,yk ) are i.i.d copies of a continuous random variable (X,Y). (In order to
simplify the formalism and without loss of generality, we have used 1-dimensional
estimators.)
The spline functions ö(x) and their dual ~ ö (x ) can be used as estimators. Let us use first the
−
function ~ö (x) to estimate f(x) with λ = 2 (m integer) at xn with xn .2m ∈ Ζ:
m
Using the symmetry of ~ö (x) , eq.(26) for the dual spline is equivalent to using estimators
centered at xn .
k max k max
f̂(x n ) = ∑ ö ((x k - x n ) ⋅ 2 ) ⋅ y k / ∑ ~
~ m ö ((x k - x n ) ⋅ 2 m ) (31)
k =1 k =1
Datapoints
Estimation on regular grid
Projection on dual splines
Figure 16: Multiresolution spline estimators use dual spline estimators based on the functions
ö~m, n (x) to estimate the coefficients ĉ m,n in f m (x)= cˆm,n ⋅ ϕm,n(x).
In order to validate the coefficient ĉ m,n , two validation conditions are necessary:
k max
~
∑ ö ((x k - x n ) ⋅ 2 m ) > T (34)
k =1
The strength of this approach is that the computation of a coefficient ĉ m,n requires only the
storage of two values: the denominator and the nominator in eq.31. The method is therefore
well suited to on-line learning using low end microprocessors with a low capacity memory.
Conclusions
We have presented a number of methods to determine a fuzzy system from data. The
membership functions are chosen adaptively with multiresolution algorithms. These methods
have been used with success during the development of several fire detectors. Also wavelet
techniques in combination with fuzzy logic are implemented in commercial products. Fuzzy-
wavelet methods represent a good compromise between quality of modeling and linguistic
transparency of the fuzzy rules. For on-line learning, the tables below summarize the different
methods described in the previous chapters.
*: For a small learning rate LR, the expected error on the coefficient c m, n is bounded by a value ε (LR) as the
number of datapoints tends to infinity, provided the datapoints are iid (independent identically distributed)
copies random variable X of uniform distribution.
References
[1] Thuillard,M. (1999) New Results on the flames’pulsation mechanisms permit to improve
the quality of detection of pool fires, Proc. Fire Suppression and Detection Research
Application Symposium, Feb. 24-26, 1999 (Orlando), Ed. Fire Protection Research
Foundation, 171-189.
[2] Thuillard,M. (1998) Fuzzy-wavelets: theory and applications, Proc. EUFIT’98, Sixth
European Congress on Intelligent Techniques and Soft Computing, Sept.8-10,1998 (Aachen),
Ed. H.-J. Zimmermann, Mainz Verlag, Vol. 2, 1149.
[3] Micchelli, C. A., Rabut, C., Utreras, F. I. (1991) Using the Refinement Equation for the
Construction of Pre-Wavelets III: Elliptic Splines”, Num. Algorithms 1, 331-352.
[4] Takagi, T. and Sugeno, M. (1985) Fuzzy identification of systems and its applications to
modeling and control, IEEE Trans. Syst. Man, Cybern. 15,116-132.
[6] Bakshi, B.B., and Stephanopoulos G. (1992) Wavelets as Basis Functions for Localized
Learning in a Multi-Resolution Hierarchy“, IJCNN Int. Joint Conf. On Neural Networks, vol.
1, IEEE, Baltimore, II-141.
[7] Zhang Q. and Benveniste,A. (1992) Wavelet Network, IEEE Trans. Neural Networks 3,
889.
[8] Szu, H. H. Telfer B., and Kadambe,S. (1992) Neural Network Adaptive Wavelets for
Signal Representation and Classification“, Opt. Eng. 31, 1907.
[9] Cohen, A., Daubechies, I., and Feauveau, J.-C.(1982) Biorthogonal Bases of Compactly
Supported Wavelets. Commun. On Pure and Appl. Math. 45, 485.
[10] Mallat, S., 1998,“A Wavelet Tour of Signal Processing“, Academic Press (San Diego).
[12] Abramovich F., Sapatinas T. & Silverman B.W. (2000) Stochastic expansions in an
overcomplete wavelet dictionary . Probability Theory & Related Fields , to appear.
“Fuzzy logic in the wavelet framework,“ M. Thuillard, Proc. Toolmet’2000 —Tool Environments and
Development Methods for Intelligent Systems, April 13-14 2000 ”, L. Yliniemi, E. Juuso (eds.), Oulu, 15-36
(2000). For a complete exposition see “Wavelets in Soft Computing”, M. Thuillard, World Scientific Press
(2001).