Fuzzy Logic in The Wavelet Framework

“Fuzzy logic in the wavelet framework,“ M. Thuillard, Proc.
Toolmet’2000 —Tool Environments and

Development Methods for Intelligent Systems, April 13-14 2000 ”, L. Yliniemi, E. Juuso (eds.), Oulu, 15-36
(2000). For a complete exposition see “Wavelets in Soft Computing”, M. Thuillard, World Scientific Press
(2001).
FUZZY LOGIC IN THE WAVELET FRAMEWORK
Marc Thuillard
Siemens Building Technologies
Alte Landstrasse
CH-8708 Maennedorf
Switzerland
mailto:Marc.Thuillard@bluemail.ch
Abstract: The translation of knowledge contained in databank into linguistically interpretable

fuzzy rules has proven in real applications to be difficult. A solution to this problem is
furnished by multiresolution techniques. A dictionary of functions forming a multiresolution
is used as candidate membership functions. The membership functions are chosen among the
family of scaling functions that have the property to be symmetric, everywhere positive and
with a single maxima. This family includes among others splines and some radial functions.
The main advantage of using a dictionary of membership functions is that each term, such as
„small“, „large“ is well defined beforehand and is not modified during learning. After
reviewing the connection between a Takagi-Sugeno fuzzy model and spline modeling, we
show how a multiresolution fuzzy system can be developed from data by using wavelet
techniques. For regularly spaced datapoints, a matching pursuit algorithm may be used to
determine appropriate fuzzy rules and membership functions. For on-line problems,
biorthogonal splines wavenets are taken to determine the fuzzy rules and the resolution of the
membership functions. An alternative technique, based on wavelet estimators is also
presented. Multiresolution fuzzy techniques, also known as „fuzzy-wavelet“, have found
applications in fire detection. For instance, wavelet analysis has been combined with fuzzy
logic in flame detectors for on-line signal processing. The resulting algorithms have greatly
contributed to translate a new understanding of flames‘ dynamics into algorithms that are
capable of discriminating between a real fire and possible interferences, such as those caused
by the sun‘ s radiation.
Keywords: wavelet, fuzzy, neurofuzzy, spline, wavenet, estimator, learning, perceptron,

validation
INTRODUCTION
Over the last 5 years, we have developed a number of techniques combining wavelet theory
and soft computing. (Soft computing is generally defined by its techniques: neural networks,
fuzzy logic, uncertainty modeling, evolutionary computing). These techniques have been
successfully applied in the domain of fire detection. For instance, the combination of fuzzy
logic and multiresolution analysis has lead to new algorithms. They permit a better and more
reliable detection of a fire with multisensors fire detectors. Wavelet analysis plays also a
central role in our last generation of flame detectors. The detection of a fire with a flame
_____________________________________________
*Email: Marc.thuillard@CCH.CERBERUS.CH
“Fuzzy logic in the wavelet framework,“ M. Thuillard, Proc. Toolmet’2000 —Tool Environments and
(2001).
detector represents a difficult problem. The radiation from a flame is typically one hundred
times smaller than the measured radiation of the sun in the considered optical bands. The
reaction time of the flame detector must be of the order of a few seconds. A flame detector
detects a fire by measuring the pulsating radiation of a flame. A model predicting the
pulsation frequency was developed and compared to the outcome of numerous experiments
[1]. A number of expert rules that characterize the spectra of flames radiation were found.
These rules are used on-line by combining fuzzy techniques and wavelet analysis [2]. The
detector is capable with these new algorithms to recognize the fingerprint of a flame and to
exclude possible false alarms due, for instance, to the flickering of the sun reflected on a
water surface. This represents a major advance in flame detection.
In this article, we will first review the connections existing between fuzzy logic and wavelets
and explain some of the techniques used in flames detectors. In the second part of the article,
we will focus on on-line learning. We present methods to determine locally and adaptively an
appropriate resolution of the membership functions as well as the fuzzy rules describing a
control surface. These methods have the advantage to necesitate only a very small computing
power and can be implemented in detectors either for diagnosis or during field campaigns
that are made early in the development of a new detector.
Flame
Wavelet analysis
Detector
Signal
Fuzzy logic
Decision Algorithms
Figure 1: The new generation of flames detectors from SBT (Cerberus Div.) associates fuzzy
logic and wavelet analysis for signal processing.
1. Fuzzy-wavelet methods
1.1 Fuzzy rule-based systems
Fuzzy logic has found applications in basically all domains of science, from biology to
particle physics. The majority of applications are clearly in the domain of control. The
linguistic interpretability of fuzzy rules is certainly one of the main reasons for the success of
fuzzy logic. Fuzzy logic furnishes a framework to fuse qualitative or even unprecise
knowledge to information contained in a databank or mathematical expressions. A major
challenge to fuzzy logic is the translation of the information contained implicitely in a
collection of datapoints into linguistically interpretable fuzzy rules. Neurofuzzy methods
have been developed for this purpose. A serious difficulty with most neurofuzzy methods is
that they do often furnish rules without a transparent interpretation; a rule is referred as being
transparent if it has a clear and intuitively correct linguistic interpretation. A solution to this
(2001).
problem is furnished by multiresolution techniques. The basic idea is to use a dictionary of
membership functions forming a multiresolution and to determine which membership
functions are the most appropriate to describe the control surface (figure 2). In order to
associate a linguistic interpretation to each membership function, the membership functions
are chosen among the family of scaling functions that have the property to be symmetric,
everywhere positive and with a single maxima. This family includes among others splines and
some radial functions [3]. The main advantage of using a dictionary of membership functions
is that each term, such as „small“, „large“ is well defined beforehand and is not modified
during learning. The multiresolution properties of the membership functions in the dictionary
function permits to fuse or split membership functions quite easily so as to put the control
surface under a linguistically understandable and intuitive form for the human expert.
Different techniques, generally referred by the term „fuzzy-wavelet“, have been developed
for data on a regular grid. Before explaining them, let us explain fuzzy logic in the
framework of the Takagi-Sugeno model.
In the Takagi-Sugeno model the fuzzy rules are expressed under the form [4]:
Ri : if x is Ai then y j = f j (x ) . (1)
Here Ai are linguistic terms, x is the input linguistic variable, while y=(y1 ,...,yj,...ymax) is the
output variable. The value of the input linguistic variable may be crisp or fuzzy. If the value
of the input variable is a crisp number then the variable x is called a singleton. As an
example, suppose that xi is a linguistic variable for the temperature. The value x̂ i of the input
linguistic variable may be given by a crisp number such as „30 (°C)“ or by „about 25“ in
which „about 25“ is itself a fuzzy set.
For a crisp input , the output of the fuzzy system is given by :
yˆ j = ∑ â i ⋅ f j ( xˆ )/ ∑ â i (2)
i i
in which the degree of fulfillment â i is given by the expression: has â i = ì A ( x̂ ) with ì A ( x̂ )
i i
the membership function to the linguistic term Ai. In many applications, a linear function is
taken :f (xˆ) = a T ⋅ xˆ + b . If a constante bj is chosen to describe the crisp output yi, the system
becomes :
Ri : if x is Ai then y j = bj . (3)
If spline functions Nk are taken, for instance, as membership function ì A ( xˆ ) = N k (2 m ( xˆ − n))

i
then the system of eq.(3) is equivalent to
yˆ j = ∑ b j ⋅ Nk (2 m ( x̂ − n)) . (4)
In this particular case, the ouput y is a linear sum of translated and dilated splines. This means
that under this last form the Takagi-Sugeno model is equivalent to a multiresolution spline
model. It follows that wavelet-based techniques can be applied here.
(2001).
large y
medium
Signal
1 2
small x
cm-1,n Low-pass High-pass d m- 1,n

very small "very" large very
filter filter small to very
small medium to very large
c5 large c9
c6 c7 c8
c m-2,n Low-pass High-pass d m-2,n quite

small medium quite
large
filter filter c1 c2 c3
Figure 2: The multiresolution structure of splines wavelets permits to develop fuzzy

algorithms in the Takagi- Sugeno formalism that have the nice property to be linguistically
interpretable. In this approach, the scaling functions are interpreted as membership
functions.
1.2 What is wavelet theory?
1.2.1 Introduction
Though the idea of combining fuzzy logic and wavelet theory is new, wavelet is a well-
established domain of research. Wavelet analysis started in the 80’s. Scientists processing
recording of seismic waves recognized the need for methods allowing the analysis of the
signal at different resolutions. In the 90’s, multiresolution analysis has grown into a very
active field, with the appearance of very efficient computing methods, that has lead to a
synthetic view of the work done in the signal processing and the mathematical community.
A wavelet decomposition consists of the iterative decomposition of a signal into a coarse and
a detail approximation. The original signal can be reconstructed with a second algorithm. The
possibility of reconstructing the signal after decomposition has resulted in several applications
in the domain of noise reduction and data compression.
(2001).
Signal
Time
Figure 3: The wavelet decomposition of a signal corresponds to a multiresolution analysis of

the signal.
One of the first applications of multiresolution analysis was in the domain of data
compression. Multiresolution techniques were successfully implemented to compress the FBI
fingerprint datafiles.
An interesting application of multiresolution analysis in the domain of noise reduction has

been the processing of the only recording of Brahms playing a sonate. The recording was of
such bad quality that transposing the music was not possible. After processing with
multiresolution techniques, it became possible to compare Brahms partition with its own
interpretation.
Some of the most fruitful soft computing methods are inspired by the biological world. The
nervous system has found its mathematical pendent in the neural network, darwinism has
influenced evolutionary computing and genetic algorithms. Wavelets or multiresolution
analysis can be related to the biological world. The mechanisms behind the perception of
colour by the brain seem indeed to rely on wavelets. Wavelet theory can also be related to
common sense. Everydays’ experience, teaches that the difference between two actions lies
often in small details. Finding the important details is difficult, since experience also shows
that focusing only on details leads to a tremendous waste of time and unproductive works.
Finding the right balance between details and coarse features or actions is a highly human
activity, that finds somehow its mathematical expression in wavelet theory.
Multiresolution analysis has become in the last few years a quite standard tool in signal
processing. The main applications of multiresolution analysis have been so far mostly in the
domain of image processing and speech recognition. The image processing community has
(2001).
been using algorithms containing elements of multiresolution analysis for already quite some
years. Multiresolution has also been used successfully in speech processing and data
compression. Historically, one generally finds the roots of wavelet theory in the work of
Morlet, a frenchman working for Elf-Aquitaine, in the domain of oil research. Morlet
recognized the need for signal processing techniques for the analysis of seismic waves going
beyond Gabor analysis of short-time signals. Morlet modified the Gaussian window used by
Gabor. In order to palliate to a drawback of Gabor’s approach, namely the bad resolution
obtained at high frequencies due to the constant window-size of the window, Morlet used
variable-sized windows. Morlet tagged the name wavelet, meaning little wave. Due to lack of
funding and interest by his company, no real-world applications appeared then. Grossmann,
another french scientist, heard of Morlet’s work. Helped with his background in quantum
mechanics, he grasped rapidly the potential of Morlet’ s wavelet and contributed significantly
to further developments. In the 80’s a further important development took place. Mallat, a
french scientist working at the time in New-York, proposed an algorithm that permits to
reduce very considerably the computing burden of a wavelet transform. After the discovery of
this algorithm, the close connection existing between the theory of subband coding,
quadrature filters and the fast wavelet decomposition was recognized. This has permitted to
unify two seemingly disjuncted fields, wavelet theory that was essentially the domain of
mathematicians with filter theory. Recently, wavelets of the second generation have appeared.
They are more flexible and permit to solve important problems, such as the representation of a
signal close to boundaries.
In the last years, new developments have shown the utility of wavelet theory and
multiresolution analysis in the domain of soft computing. The connection between wavelet
and neural networks was recognized, followed by the use of evolutionary algorithms in
connection to search algorithms. Very recently, new important applications of wavelet theory
have appeared in the domain of fuzzy techniques.
One of the main idea of wavelet analysis is already contained in the short-time Fourier
transform, namely the decomposition of a signal on dilated and translated of a basis function.
The main difference between the wavelet transform and the Gabor transform is that the time-
frequency window of the Gabor transform is independant of the position and dilation of the
window, while for the case of a wavelet transform, the time-frequeny window depends on the
dilation factor of the wavelet. At low frequency, the time-window is much larger than at
higher frequencies. This property is often very desirable in signal processing. It is often
necessary to have a result on the high-frequency part of the signal with a good time resolution,
while a less good resolution for the low frequencies is not so much of a problem in most
applications. Figure 4 compares the windows of the short-time Fourier transform to the
wavelet transform. We give below first a general definition of a wavelet.
Definition 1:
A function ψ is called a wavelet if there exists a dual function Ø~ such that a function f
∈ L2 (ℜ) can be decomposed as
∞
∑ < f, Ø
~
f(x) = m, n > Ø m, n (x) (5)
m, n = −∞
The series representation of f is called a wavelet series. The wavelet coefficients dm,n are given
by dm,n= < f, Ø~ m, n > .
(2001).
Definition 2:
A function ψ ∈ L2 (ℜ) is called an orthogonal wavelet, if the family { ψ j,k} is an

∞
orthonormal basis of L2 (ℜ) ( < ψ j, k, ψ l, m>= ∫Ø j,k (x) Ø l,m (x)dx = δ j,l δ k,m ) and every
−∞
f ∈ L2 (ℜ) can be written as
∞
f(x) = ∑d m, n Ø m, n (x) (6)
m, n = −∞
with ψ m,n = 2m/2 ψ(2m (x-n)), m,n ∈ Ζ .
The wavelet coefficients dm,n of an orthogonal wavelets are given by dm,n= < f, Ø m, n > . This
follows from the fact that for an orthogonal wavelet the dual function Ø~ is identical to the
wavelet Ø .
Frequency Frequency
Time Time
Haar
Wavelet
Figure 4: Time-frequency tiling of the time frequency domain. Left: Fast Fourier transform,
Right: Wavelet transform. Below example of dilated and translated wavelets.
The definition of an orthogonal wavelet is quite similar to the definition of a Fourier series.
Actually the only difference lies in the definition of the candidates functions to realize the
projection and also in the relaxing of the condition that the function must be periodic. In a
Fourier series, cosine and sine are used as basis functions together with integer dilated of the
(2001).
two basis functions cos(2 πt) and sin(2 πt). In orthogonal wavelets, dilated and translated of a
function are taken: ø m, n = 2 m/2 ⋅ ø(2 m (x − n)) with m,n ∈ Ζ.
The simplest example of an orthogonal wavelet is the Haar function defined as
1 0 ≤ x< 1/2
ψ H(x) =
-1 ½< x ≤ 1 (7)
0 otherwise
1.2.2 What is a multiresolution?
The fast wavelet transform is from the practical point of view the most important algorithm in
multiresolution analysis. Contrarely to the fast Fourier transform that can be applied in most
cases without a deep knowledge of the algorithm, it is recommandable to understand the fast
wavelet algorithm before using wavelets in an application.
The fast wavelet transform, permits the computation of the wavelet transform. At each level
of the transform, the data are processed through a low-pass and a high-pass filter. The high-
pass filtered data are known as the detail wavelet coefficients. The result of the low-pass
transform is used as input data to compute the next level of detail wavelet coefficients.
In order to understand the fast wavelet transform algorithm, we will first introduce a few new
concepts that represent the foundations of multiresolution analysis. Most textbooks follows
here a somewhat abstract approach. We will simplify the formulation to convey the main
ideas of multiresolution analysis.
One of the most important concept of multiresolution analysis lies in the definition of nested
spaces. Nested spaces are like russians dolls, they fit nicely into each others and the smaller
doll is contained in the larger dolls. Figure 9 shows an example of a nested space, together
with a representation of the complementary spaces W0 and W-1 .
W0
V1
W -1
V0
V-1
(2001).
Figure 5: Example of nested spaces: V-1 ⊂ V0 ⊂ V1 . The space W-1 is the complementary space
of V-1 with W-1 ⊕ V-1 . Similarly V0 = W0 ⊕V0 .
The concept of nested spaces can be applied to spaces generated by linear combinations of a
function, say ö . We define V1 as the space generated by ö (2x) and its integer translates. The
space V1 corresponds to all possible combinations of ö and its integer translates:
V1 :{ ö (2x-n)}.
Let us consider now a second space V0 , generated by the 2x dilated function ö (x) and its
translates: V0 : {ö (x-n))}.
The space V0 is nested in V1 if V0 ⊂ V1 .
Generally speaking, it follows from V0 ⊂ V1 that any function in V0 can be written as a linear
combination of the functions generating V1 .
ö (x) = ∑n gn ö (2x-n) (8)
Since V0 ⊂ V1 the space V0 can be written as V1 = V0 ⊕ W0
The space W0 is the complement of the space V0 . Following the same line of thought as
previously, we have W0 ⊂ V1 . It follows that any function ψ in W0 can be written as a linear
combination of the basis functions in V1 .
ψ(x) = ∑ n hn ö (2x-n) (9)
The two equations above are the so-called dilation equations or two scales relations. These
equations are central to multiresolution analysis. They permit the reconstruction of a signal
starting from the wavelet coefficients (or details coefficients) and the lowest level of
approximation coefficients.
Figure 6: Illustration of the relation ö (x) = ∑ gn ö (2x-n) for the second order B-spline. The
triangular spline function can be decomposed into the sum of translated triangular functions
at the higher level of resolution.
1.2.3 Decomposition and reconstruction algorithms

(2001).
The fast wavelet decomposition corresponds to a cascade filter. The signal is iteratively
filtered with a low-pass and a high-pass filter. The wavelet coefficients correspond to the
high-passed signal coefficients, while the approximation coefficients result from the low-pass
filtering. The low-pass coefficients are then decimated by a factor two and used as input
signal at the next level of resolution. After the decimation, the same two filters are applied to
the data. The algorithm is invertible and the signal can be reconstructed iteratively from the
wavelet coefficients together with the last level coefficients of the low-pass filter.
The decomposition algorithm of a function f ∈ V1 is computed through filtering (figure 7).
d m, n : wavelet coefficients
or details coefficients at
the m th level of
decomposition
d m-1,n d m-2,n d m-3,n
High-pass filter
Cm,n Cm-1,n C m-2,n Cm-3,n Low-pass filter
c m,n : approximation
coefficients or coarse
approximation coefficients
at the m th level of
decomposition
Figure 7: Decomposition algorithm.
The decomposition algorithm uses the following iterative decomposition relation to compute
the approximations and details coefficients at one lower level level of resolution:
c m −1,n = ∑ g r −2 n ⋅ c m,n (11)

r
d m−1,n = ∑ h r −2n ⋅ d m ,n (12)
r
with the filter coefficients g r −2n corresponding to the low-pass filter coefficients and
h r −2n to the high-pass filter coefficients.
The decomposition algorithm can be used iteratively in a cascade filter.
The decomposition algorithm is invertible, permitting the lossless reconstruction of the

original signal. The reconstruction algorithm is given by the expression:
c m , n = ∑ p n − 2r ⋅ c m −1, r + q n − 2r ⋅ d m −1, r (13)

r
1.3 Determining an appropriate local resolution for the membership functions in a fuzzy
system.
(2001).
1.3.1 A simple approach
The problem of finding a good description of a control surface in terms of the scaling
functions can be solved by using the larger coefficients among the reconstructed coefficients
c′m,n .and the lowest level approximation coefficients. The coefficients c′m,n correspond to
thresholded coefficients of the scaling function after reconstruction with the filter coefficients
qn .This procedure is illustrated in figure 8.
c 'm, n =0 if ∑q ⋅d
r
n−2r m−1,l <T
∑
c 'm,n = qn−2r ⋅dm−1,n
r
if ∑q ⋅ d
r
n−2r m−1,n ≥T
c'm, n d M-1, n c'm-1, n d m-2, n Detail coefficients
C m,n Cm-1,n C m-2, n

Approximation coefficients
Figure 8: A very simple algorithm to determine the best fuzzy rules consists of keeping the
larger coefficients among the wavelet coefficients expressed in terms of the scaling functions
and the approximation coefficients at the lowest level of resolution (i.e.bold coefficients).
1.3.2 Matching pursuit approaches
A matching pursuit algorithm, similar to the one in [10], may be used to decompose the signal
as a sum of scaling functions. For low order splines, we have used a slightly modified
matching pursuit, which is described below. The algorithm is especially efficient if the
original signal f(x) is a sum of splines with a few large coefficients.
Residue
Figure 9: Appropriate membership functions and rules to describe a set of data may be
determined with a matching pursuit algorithm.
Description of the algorithm
Define a dictionary D ={ Φ km , n } of spline scaling functions. The index k indexing the order of
the spline function, m the dilation and and n the translation. (The functions Φ km , n fulfill the
following condition: ∑ Φ km , n ( x ) = 1, ∀k , m ).
n
(2001).
c
1) For each scaling function in the dictionary compute the coefficients km, n =< Φ km ,n ( x ), f ( x ) > .
.
2) Keep for each k (the spline order), the approximation coefficient c k

m, n with the lowest m
with
ckm , n > β supm ',n ' c km ', n' with 0<β≤1.
3) Choose the coefficient that minimizes the residue (i.e. write f(x) = c k
m, n ⋅ Φ km , n ( x ) + R ( x )
and choose the coefficient that minimizes <R(x),R(x)>.
4) Take the residue as new input file.
Repeat the procedure till the residue is below a given value.
The condition c k
m ,n > β sup m ',n ' c km ', n ' with 0<β≤1 ensures the convergence of the matching
pursuit [10]. Further, the algorithm tries to locate single large coefficients. This is done by
choosing a not too high value for beta and by keeping the lowest resolution coefficients
among the ones satisfying the above condition.
Figure 10 illustrates the algorithm with a simple example: the decomposition of a second
order spline function with a semi-orthogonal spline construction using a dictionary of second
order splines. A value β above 0.68 restricts the best matching coefficients to the bald
coefficients. The coefficient corresponding to the scaling function with the lowest resolution
is then chosen (underlined). In this example, the algorithm furnishes the best matching
function after a single iteration of the algorithm.
Coefficients:
0 0.5 1 0.5 0
0 1 0
0.03 -0.11 0.68 -0.11 0.03
Figure 10: Illustration of the search algorithm with a modified matching pursuit algorithm
for a second order spline using a dictionary of second order splines.
1.3 Fuzzy logic in the frequency domain
Preprocessing with orthogonal wavelets can be used in connection to fuzzy logic to express
fuzzy rules in the frequency domain. Multiresolution analysis simplifies, for instance,
considerably the design of a fuzzy controller containing rules in the frequency domain of the
form
if „signal frequency “ is low .... then ... (14)

(2001).
|Tm (ω)|2
Daubechies Filter Trees
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 2 4 6 8
Frequenz [Hz]
10 12 14 16 High-Pass
Low-Pass
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 2 4 6 8 10 12 14 16
Frequenz [Hz]
Figure 11: Fuzzy rules in the frequency domain can be designed with a lot of flexibility using
wavelet-packets techniques. The membership functions are determined by the filter tree used
for the multiresolution.The bottom part shows how to introduce a new membership function to
reach a better resolution (arrows).
The method uses the fact that a multiresolution analysis with orthogonal wavelets corresponds
to the iterative filtering of the signal with quadrature filters [2]. An orthogonal wavelet
decomposition fulfills therefore the power complementary condition. This means that the
frequency windows Tm corresponding to the first high-pass filters and the pth low-pass filter
Tlow satisfy the condition:
2 2
∑ Tm (ω) + Tlow (ω) =1 ∀ ω. (15)
m =1...p
Figure 11 is an illustration for Daubechies wavelets. The different domains frequency

windows |Tm (ω)|2 can be used as membership functions. The degree of membership of a
variable to the frequency window is given by
p
µ (Tm) = ∑ (d m,n ) 2 /( ∑ ∑ (d m,n ) 2 + ∑ (c ) 2) (16)
n m =1 n n low , n
The main advantage of this method resides in its simplicity. In the Fourier domain, the same
result would be obtained by multiplying the power spectra obtained after Fourier transforming
the signal by |Tm(ω)|2 . The Fourier approach is more demanding in computing power than the
(2001).
wavelet approach. From the industrial point of view, the wavelet approach is extremely
interesting. The wavelet coefficients may be used for other purposes such as signal denoising
or for removing an offset. Many different functions can be carried out at a reasonable
computing power. This is very important in speed-, price- or current critical applications.
2. On-line learning with biorthogonal wavelets
In the first part, we have addressed the problem of developping a fuzzy controller from
experimental data. We have considered datapoints on a regular grid and presented methods to
find locally appropriate membership functions and rules to describe the underlying control
surface. Learning was off-line and the experimental data were stored in a databank. In this
second part, we address the on-line learning problem. We present 3 different learning methods
based on biorthogonal wavelet networks [6-9], perceptrons and estimators. We consider the
case in which the signal processor is capable of making some computations but has too little
memory to store many datapoints. This case is quite often encountered in on-line problems,
for instance in sensorics. Under these conditions, most cross-validation methods are not
implementable (For reviews on wavelet-based estimators see [11,12]). A very simple cross-
validation technique must be used. We propose two related approaches based on a validation
procedure using either the reconstruction algorithm or the fast wavelet decomposition
algorithm. In the first case, the local estimation of the underlying curve in the space spanned
by the scaling function at a given level of resolution must be consistent with an estimation in
the space spanned by the scaling function and the biorthogonal wavelets at one level of
resolution lower. The second cross-validation method compares the estimation of the surface
at one level of resolution with the estimation at one lower level of resolution. This is done by
decomposing the approximation coefficients with the low-pass filter associated to the fast
wavelet decomposition algorithm.
2.1 Introduction to wavelet networks
The similarities existing between the structure of a perceptron and a wavelet decomposition
have been used in so-called wavelet networks. We will discuss here a simple case, the one-
dimensional case and compare both the perceptron and the wavelet decomposition structure.
In several dimensions, the formalism is essentially the same, the notations are only slightly
more complicated.
Perceptron
The output of the 3-layers perceptron is
k
f(x) = ∑ ωi ⋅ ø(a i ⋅ x + b i ) (17)
i=1
withø the activation function and ai, bi, ωi the network parameters (weights) that are
optimized during learning. A possibility is to use a wavelet as activation function. In this case
the perceptron is described as a wavelet network (or sometimes a wavenet). Depending on the
problem, different approaches can be taken. The dilation, translation and weights can be
optimized by the network. In this case, the parameters are generally obtained from a gradient
(2001).
descent or through least mean squares. If the network is properly initialized, then the network
can be quite parcimonious. An interesting alternative consists of using a dictionary of dyadic
wavelets and to optimize only the weights ωi.
Input x1 Hidden Layer Output
x2
ψ(a i ⋅ x + bi )
k
... f (x ) = ∑ω ⋅ ψ(a
i= 1
i i ⋅ x + bi )
Wavelet as
activation functions
xn
Figure 12: The structure of a wavelet network is very often the one of the perceptron.
Wavelet network
In its simplest version, a wavelet network corresponds to a 3-layers perceptron using wavelets
as activation functions.
_
f(x) = ∑ d m,n ⋅ ø m, n (x) + f (18)
n,m
_
with f the average value of f , d m,n the coefficients of the neural network and ø the wavelet.
If orthogonal wavelets are taken and only the weights ci are optimized, then a simple gradient
descent will lead to a global minimum under the conditions described below. Let us assume a
function f(x) = ∑ d m,n ⋅ ø m,n (x) (with a finite number of coefficients). At each step, a new
m,m
datapoint (yk , xk ), satisfying f(xk )=yk , is furnished to the network. The error E(k) is given
by E(k) = (f̂(x) − ∑ d m,n ⋅ ø m,n (x) ) 2 = ∑ ((d̂ m,n − d m,n ) ⋅ ø m,n (x)) 2 . The diagonal terms
m,n m,n
vanish due to orthogonality! Taking the derivative along d̂ m , n one obtains:
∂E(k)/ ∂d̂ m,n x = −2d̂ m,n ⋅ ( ŷ(x k ) − y(x k )) (19)

k
In this case, a gradient descent will converge to a global minima.
Fuzzy- wavenets
A subset of wavelet-networks are the so-called fuzzy-wavelet networks or fuzzy wavenets.

Using the two-scale relation eq.(9), a wavelet can be decomposed into a sum of scaling
(2001).
functions ø(x) = ∑ h n − 2r ö(2x − r) . The wavelet network, given by eq.(18), can be put under
r
the form:
f(x) = ∑ ∑ d m,n ⋅ h n-2r ⋅ ö m + 1, n (x) + f (20)

m,n r
Fuzzy wavenets are wavelet networks based on wavelets with some special properties: the
scaling function associated to these wavelets must be symmetric, everywhere positive and
with a single maxima. Under these conditions, the scaling functions can be interpreted as
membership functions.
Fuzzy wavelet networks or

fuzzy wavenets
Wavelet network
Perceptron
Figure 13: The most popular wavelet networks are based on the perceptron structure. Fuzzy
wavelet networks, also called fuzzy wavenets, can be regarded as a neurofuzzy model which
belongs at the same time to the set of wavelet networks.
2.2 Fuzzy-wavenets
Figure 14 shows the architecture of the learning algorithm. It consists of a series of neural
networks, using both wavelets ψ m,n (x) and scaling functions ö m,n (x) as activation functions.
Each neural network uses activation functions of a given resolution. The mth neural network
optimizes the coefficients ĉ m, n and d̂ m , n , with fm(x) the output of the mth neural network.
f m(x) = ∑d̂ m, n ⋅ ø m, n (x) +∑ĉ m, n ⋅ ö m, n (x) (21)

n n
The structure of the network is similar to wavenets [5,6]. The main difference is that the
method is generalized to biorthogonal wavelets. The motivation is that if spline-wavelets or
radial wavelets are taken, then it is straightforward to transform the results into a linguistically
interpretable fuzzy system (see part 1).
The evolution equation for the details d̂ m,n (k ) and the approximations coefficients
(2001).
ĉ m ,n ( k) at step k are given by
d̂ m,n (k) = d̂ m,n (k −1) − LR ⋅ (f m (x) − y k (x)) ⋅ ø~ m,n (x ) (22)
ĉ m,n (k) = ĉ m,n (k −1) − LR ⋅ (f m (x) − y k (x)) ⋅ ö~ m,n (x) (23)
~
with yk (x), the kth input point and LR the learning rate, ø~ m,n (x), ö m,n(x), the dual functions
to ø m,n(x) and ö m,n(x). The evolution eq.(22,23) describe the evolution of fm(x). Let us
assume that the datapoints yk lie on the function f(x) and that xk are i.i.d (uniform
distribution). At each step the function fm(x)is updated by a term which expectation is
proportional to the difference between fm(x) and the projection of f(x) on the space Vm+1 ,
spanned by { ø m, n(x) , ö m, n (x) }:
< f ( x ) − f m (x ), ~
ö m,n ( x ) > ⋅ö m ,n ( x) + < f ( x ) − f m ( x ), ø~ m ,n ( x) > ⋅ø m,n ( x) =
< f ( x ), ~
ö m ,n ( x) > ⋅ö m,n ( x) + < f ( x), ~
ø m,n (x ) > ⋅ø m,n ( x ) − f m (x )
In the adiabatic sense, the expectation of the function fm(x) converges to the projection of f(x)
on the space Vm+1 . Since ø m, n(x) , ö m, n (x) are independent, it follows that ĉ m ,n → c m ,n
and d̂ m,n → d m,n . A simple local validation criterium for an approximation coefficient
ĉ m,n is to request that this coefficient can be approximated from the approximations and
details coefficients {ĉ m −1,n , d̂ m −1,n } at one lower level of resolution. At each iteration step,
the weights from the different networks are cross-validated using a property of the wavelet
decomposition, namely that the approximation coefficients ĉ m,n at level m can be computed
from the approximation and wavelet coefficients at level m-1 using the reconstruction
algorithm [10].
cm,n = ∑pn−2r ⋅ cm−1,r +qn−2r ⋅ dm−1,r (24)

r
with p n-2r and q n-2r the filter coefficients for reconstruction.

(2001).
f1 (x) = ∑ d̂ 1,n ⋅ ø 1, n (x) +∑ ĉ1,n ⋅ ö 1, n (x)

n n
f2 (x) = ∑ d̂ 2,n ⋅ ø 2,n (x) + ∑ ĉ2,n ⋅ ö 2,n (x)

n n
Input ...
ĉ 2 ,n − ∑r p n−2r ⋅ ĉ1,r +q n−2r ⋅ d̂ 1, r ≤∆
f j (x) = ∑ d̂ m, n ⋅ ø m, n (x) + ∑ ĉm, n ⋅ ö m, n (x)
∑r p n−2r ⋅ ĉ m−1, r +q n−2r ⋅ d̂ m−1,r

n n
ĉ m, n − ≤∆
Neural Network Array Validation

Figure 14: Structure of a fuzzy-wavenet. The input signal is approximated at several
resolution as a weighted sum of wavelets ψn,m and scaling functions ö n,m at a given
resolution. The validation module compares the approximation coefficients ĉ m,n to the
approximations and wavelet coefficients at one level of resolution lower.
In order for a coefficient to be validated, the difference between the weight of the membership
function (model m) and the weight computed from the approximation and wavelet
coefficients at one level of resolution lower (model m-1) must be smaller than a given
threshold. As validation criterium for the coefficient ĉ m,n , we require
ĉm ,n − ∑ pn −2r ⋅ ĉm −1,r +qn −2r ⋅ dˆm −1,r ≤ ∆ (25)

r
Figure 15 shows an example, using this strategy with i.i.d datapoints. Biorthogonal spline
wavelets and scaling functions (biorthogonal spline wavelets with p=2; ~p = 4 ) ) proposed by
Cohen [9] are used as activation functions. The model consists of an array of 3 neural
networks, each corresponding to a different resolution. The most appropriate membership
functions and rules are chosen adaptively during learning. With only a few points, not much
information on the control surface is known and the control surface is better described with a
small number of rules. As the number of points increases, the number of rules is raised if
necessary. The method furnishes an automatic procedure to determine adaptively the „best“
membership functions and rules. The decision on which coefficient to use is obtained from the
validation eq.(12) (∆=0.1). The „best“ coefficients are chosen adaptively among the set of
validated coefficients. The validated coefficients corresponding locally to the highest
resolution are kept (default coefficient=average value).
(2001).
Figure 15: Input function and output of the fuzzy-wavenet after 60 steps (above).
Below:output of the 3 neural networks at step 60.
2.3 Learning with multiresolution perceptron-like networks
The convergence of the fuzzy wavenet method is not too fast, as the method requires for
stability reasons to use a small learning rate in comparison to a perceptron. For this reason,
one may consider another approach using only scaling functions. The basic structure of the
network is similar to the fuzzy wavenets except that the mth neural network optimizes the
coefficients ĉ m,n , with fm(x) the output of the mth neural network.
fm (x) = ∑ ĉm,n ⋅ öm,n(x) (26)

n
The evolution equation is given by the following expression
ĉ m,n (k) = ĉ m,n (k −1) − LR ⋅ (f m (x) − y k (x)) ⋅ ~

ö m,n ( x) (27)
The validation procedure uses the decomposition algorithm to compare the results at two
levels of resolution.
cm,n = ∑gp−2n ⋅ cm+1,p (28)

p
with g the coefficients of the filter associated to the lowpass decomposition filter in the fast
wavelet decomposition algorithm. The validation criteria for ĉm,n is then
ĉm,n − ∑gp−2n ⋅ ĉm+1,p < Ä (29)

p
2.4 On-line learning with biorthogonal spline-wavelet estimators

(2001).
A third approach to determine adaptively an appropriate resolution to describe locally an
underlying hypersurface during on-line learning uses wavelet estimators. A well-known
estimator is the Nadaraya-Watson estimator. The equation of the hypersurface f(x) is estimated
by the expression:
k max k max
f(x) = ∑ K ((x - x k )/ λ) ⋅ y k / ∑ K ((x - x k )/ë ) (30)
k =1 k =1
Nadaraya-Watson estimators have two interesting properties: they are local mean-squares
estimators and in the case of a random design they can be shown to be bayesian estimators of
(xk ,yk ), in which (xk ,yk ) are i.i.d copies of a continuous random variable (X,Y). (In order to
simplify the formalism and without loss of generality, we have used 1-dimensional
estimators.)
The spline functions ö(x) and their dual ~ ö (x ) can be used as estimators. Let us use first the
−
function ~ö (x) to estimate f(x) with λ = 2 (m integer) at xn with xn .2m ∈ Ζ:
m
Using the symmetry of ~ö (x) , eq.(26) for the dual spline is equivalent to using estimators
centered at xn .
k max k max
f̂(x n ) = ∑ ö ((x k - x n ) ⋅ 2 ) ⋅ y k / ∑ ~
~ m ö ((x k - x n ) ⋅ 2 m ) (31)
k =1 k =1
The expectation of the nominator in eq.(27) is proportional to the approximation coefficient

c m,n . Eq.(27) furnishes an estimatation of ĉ m,n in fm(x) = ∑ ĉm,n ⋅ öm,n(x) :
n
ĉ m,n = f̂ (x n ) (32)
Datapoints
Estimation on regular grid
Projection on dual splines
Figure 16: Multiresolution spline estimators use dual spline estimators based on the functions
ö~m, n (x) to estimate the coefficients ĉ m,n in f m (x)= cˆm,n ⋅ ϕm,n(x).
In order to validate the coefficient ĉ m,n , two validation conditions are necessary:
ĉm, n − ∑ g p − 2n ⋅ ĉm + 1, p < Ä (33)

p
(2001).
with the filter coefficients g corresponding to the lowpass decomposition coefficient for
splines. Further, one requires also that
k max
~
∑ ö ((x k - x n ) ⋅ 2 m ) > T (34)
k =1
to prevent divisions by a very small values.
The strength of this approach is that the computation of a coefficient ĉ m,n requires only the
storage of two values: the denominator and the nominator in eq.31. The method is therefore
well suited to on-line learning using low end microprocessors with a low capacity memory.
Conclusions
We have presented a number of methods to determine a fuzzy system from data. The
membership functions are chosen adaptively with multiresolution algorithms. These methods
have been used with success during the development of several fire detectors. Also wavelet
techniques in combination with fuzzy logic are implemented in commercial products. Fuzzy-
wavelet methods represent a good compromise between quality of modeling and linguistic
transparency of the fuzzy rules. For on-line learning, the tables below summarize the different
methods described in the previous chapters.
Method Networks Learning

~
Fuzzy-wavenets fm (x) = ∑d̂ m, n ø m, n (x) +∑ĉm,n ⋅ ö m,n (x) ĉ m,n (k) = ĉ m, n (k − 1) − LR ⋅ (f m (x) − y k (x)) ⋅ ö m,n (x)
n n
d̂m, n (k) = d̂m, n (k − 1) − LR ⋅ (f m (x) − y k (x)) ⋅ ~
ø m,n (x)
Biorthogonal fm(x) = ∑ĉ m,n ⋅ ö m,n(x) k k

spline- wavelet n ĉ m, n (k) = ∑ ~ ö m, n (x i ) ⋅ yi / ∑ ~ö m, n (x i )
i =1 i=1
estimator
„Perceptron-like“ fm(x) = ∑ ĉm,n ⋅ ö m,n (x) ĉ m,n (k) = ĉ m, n (k − 1) − LR ⋅ (f m (x) − y k (x)) ⋅ ö~ m, n ( x)
networks n
Method Input Data: Cross-validation

xk i.i.d, kà ∞
Fuzzy-wavenets E ( ĉ m ,n − c m, n ) < ε *
Reconstruction algorithm
if LR<<1
Biorthogonal
spline- wavelet E ( ĉ m ,n − c m, n ) < ε k max
~ j
estimator Decomposition algorithm+ ∑ φ ((xk - x n )/2 ) > T
k =1
(2001).
„Perceptron-like“
networks E ( ĉ m ,n − c m, n ) < ε Decomposition algorithm.
if LR<<1
*: For a small learning rate LR, the expected error on the coefficient c m, n is bounded by a value ε (LR) as the
number of datapoints tends to infinity, provided the datapoints are iid (independent identically distributed)
copies random variable X of uniform distribution.
References
[1] Thuillard,M. (1999) New Results on the flames’pulsation mechanisms permit to improve
the quality of detection of pool fires, Proc. Fire Suppression and Detection Research
Application Symposium, Feb. 24-26, 1999 (Orlando), Ed. Fire Protection Research
Foundation, 171-189.
[2] Thuillard,M. (1998) Fuzzy-wavelets: theory and applications, Proc. EUFIT’98, Sixth
European Congress on Intelligent Techniques and Soft Computing, Sept.8-10,1998 (Aachen),
Ed. H.-J. Zimmermann, Mainz Verlag, Vol. 2, 1149.
[3] Micchelli, C. A., Rabut, C., Utreras, F. I. (1991) Using the Refinement Equation for the
Construction of Pre-Wavelets III: Elliptic Splines”, Num. Algorithms 1, 331-352.
[4] Takagi, T. and Sugeno, M. (1985) Fuzzy identification of systems and its applications to
modeling and control, IEEE Trans. Syst. Man, Cybern. 15,116-132.
[5] Thuillard,M. (1999) Fuzzy wavenets: an adaptive, multiresolution, neurofuzzy learning

scheme, Seventh European Congress on Intelligent Techniques and Soft Computing, Sept.13-
16,1999 (Aachen), Contrib. cc6-1, CD Proc. .
[6] Bakshi, B.B., and Stephanopoulos G. (1992) Wavelets as Basis Functions for Localized
Learning in a Multi-Resolution Hierarchy“, IJCNN Int. Joint Conf. On Neural Networks, vol.
1, IEEE, Baltimore, II-141.
[7] Zhang Q. and Benveniste,A. (1992) Wavelet Network, IEEE Trans. Neural Networks 3,
889.
[8] Szu, H. H. Telfer B., and Kadambe,S. (1992) Neural Network Adaptive Wavelets for
Signal Representation and Classification“, Opt. Eng. 31, 1907.
[9] Cohen, A., Daubechies, I., and Feauveau, J.-C.(1982) Biorthogonal Bases of Compactly
Supported Wavelets. Commun. On Pure and Appl. Math. 45, 485.
[10] Mallat, S., 1998,“A Wavelet Tour of Signal Processing“, Academic Press (San Diego).
[11] Antoniadis, A. (1999) Wavelets in Statistics: A Review, J. Italian Stat. Soc. 6, 1.
[12] Abramovich F., Sapatinas T. & Silverman B.W. (2000) Stochastic expansions in an
overcomplete wavelet dictionary . Probability Theory & Related Fields , to appear.
(2001).

Fuzzy Logic in The Wavelet Framework

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Fuzzy Logic in The Wavelet Framework

Hochgeladen von

Copyright:

Verfügbare Formate

“Fuzzy logic in the wavelet framework,“ M. Thuillard, Proc.

Toolmet’2000 —Tool Environments and

FUZZY LOGIC IN THE WAVELET FRAMEWORK

Abstract: The translation of knowledge contained in databank into linguistically interpretable

Keywords: wavelet, fuzzy, neurofuzzy, spline, wavenet, estimator, learning, perceptron,

1.1 Fuzzy rule-based systems

For a crisp input , the output of the fuzzy system is given by :

If spline functions Nk are taken, for instance, as membership function ì A ( xˆ ) = N k (2 m ( xˆ − n))

cm-1,n Low-pass High-pass d m- 1,n

c m-2,n Low-pass High-pass d m-2,n quite

Figure 2: The multiresolution structure of splines wavelets permits to develop fuzzy

1.2 What is wavelet theory?

Figure 3: The wavelet decomposition of a signal corresponds to a multiresolution analysis of

An interesting application of multiresolution analysis in the domain of noise reduction has

A function ψ ∈ L2 (ℜ) is called an orthogonal wavelet, if the family { ψ j,k} is an

f ∈ L2 (ℜ) can be written as

with ψ m,n = 2m/2 ψ(2m (x-n)), m,n ∈ Ζ .

The simplest example of an orthogonal wavelet is the Haar function defined as

1.2.2 What is a multiresolution?

The space V0 is nested in V1 if V0 ⊂ V1 .

ö (x) = ∑n gn ö (2x-n) (8)

Since V0 ⊂ V1 the space V0 can be written as V1 = V0 ⊕ W0

ψ(x) = ∑ n hn ö (2x-n) (9)

1.2.3 Decomposition and reconstruction algorithms

The decomposition algorithm of a function f ∈ V1 is computed through filtering (figure 7).

Cm,n Cm-1,n C m-2,n Cm-3,n Low-pass filter

Figure 7: Decomposition algorithm.

c m −1,n = ∑ g r −2 n ⋅ c m,n (11)

The decomposition algorithm can be used iteratively in a cascade filter.

The decomposition algorithm is invertible, permitting the lossless reconstruction of the

c m , n = ∑ p n − 2r ⋅ c m −1, r + q n − 2r ⋅ d m −1, r (13)

1.3.1 A simple approach

c'm, n d M-1, n c'm-1, n d m-2, n Detail coefficients

C m,n Cm-1,n C m-2, n

1.3.2 Matching pursuit approaches

Description of the algorithm

2) Keep for each k (the spline order), the approximation coefficient c k

4) Take the residue as new input file.

Repeat the procedure till the residue is below a given value.

0.03 -0.11 0.68 -0.11 0.03

1.3 Fuzzy logic in the frequency domain

if „signal frequency “ is low .... then ... (14)

Figure 11 is an illustration for Daubechies wavelets. The different domains frequency

2. On-line learning with biorthogonal wavelets

2.1 Introduction to wavelet networks

The output of the 3-layers perceptron is

Input x1 Hidden Layer Output

∂E(k)/ ∂d̂ m,n x = −2d̂ m,n ⋅ ( ŷ(x k ) − y(x k )) (19)

In this case, a gradient descent will converge to a global minima.

A subset of wavelet-networks are the so-called fuzzy-wavelet networks or fuzzy wavenets.

f(x) = ∑ ∑ d m,n ⋅ h n-2r ⋅ ö m + 1, n (x) + f (20)

Fuzzy wavelet networks or

f m(x) = ∑d̂ m, n ⋅ ø m, n (x) +∑ĉ m, n ⋅ ö m, n (x) (21)

d̂ m,n (k) = d̂ m,n (k −1) − LR ⋅ (f m (x) − y k (x)) ⋅ ø~ m,n (x ) (22)

ĉ m,n (k) = ĉ m,n (k −1) − LR ⋅ (f m (x) − y k (x)) ⋅ ö~ m,n (x) (23)

cm,n = ∑pn−2r ⋅ cm−1,r +qn−2r ⋅ dm−1,r (24)

with p n-2r and q n-2r the filter coefficients for reconstruction.

f1 (x) = ∑ d̂ 1,n ⋅ ø 1, n (x) +∑ ĉ1,n ⋅ ö 1, n (x)

f2 (x) = ∑ d̂ 2,n ⋅ ø 2,n (x) + ∑ ĉ2,n ⋅ ö 2,n (x)

f j (x) = ∑ d̂ m, n ⋅ ø m, n (x) + ∑ ĉm, n ⋅ ö m, n (x)

∑r p n−2r ⋅ ĉ m−1, r +q n−2r ⋅ d̂ m−1,r

Neural Network Array Validation

ĉm ,n − ∑ pn −2r ⋅ ĉm −1,r +qn −2r ⋅ dˆm −1,r ≤ ∆ (25)