RBF Classnote Mtech Spring2013

Radial Basis Functions
An Introduction
Prof. Sarat K. Patra
Senior Member, IEEE
National Institute of Technology, Rourkela
Odisha, India
Email: skpatra@nitrkl.ac.in
Presentation Outline
3/31/2014
Prof. Sarat Kumar Patra Radial Basis Function
2
Books and reference materials:
S Haykin; Neural Networks A comprehensive
foundation; Pearson Education
Christopher M Bishop; Neural Networks for
Pattern recognition; Oxford University Press
B Mulgrew; Applying radial basis functions ;
Signal Processing Magazine, IEEE; Volume: 13 ,
Issue: 2; 1998
What are we going to cover
Introduction
Soft computing Techniques
NN Architectures
Linear and non-linearly separable
Basis Functions
Regularized RBF; Generalized RBF
RBF Training and Examples
Difference with MLP
Conclusion
3/31/2014
3
Different NN Architectures
Perceptron (Only one neuron)
Linear decision boundary
Limited functionality
MLP
RBF
Recurrent networks
Self organizing maps
Many more
3/31/2014
4
Linear and Non-linearly Separable
Take a 2 input single output
Plot the each category output in input space
using different symbols
Take inputs in x-y plane
Can you have a line separating the points into
2 categories?
Yes linearly separable (OR. AND gate)
No Non-linearly separable (EX-OR gate)
3/31/2014
5
Why network models beyond MLN
?
MLN (MLP) was already universal, but
MLN (MLP) can have many local minima.
It is often too slow to train MLN.
Sometimes, it is extremely difficult to optimize
the structure of MLN.
There may exist other network architectures
in terms of number of elements in each layer
whose performance could be superior to the
one used.
3/31/2014
6
Radial Basis Function (RBF)
Networks
RBFN are artificial neural networks for
application to problems of supervised learning:
Regression
Classification
3/31/2014
7
Pragmatic Regression
Parametric regression-the form of the function is known but
not the parameter values.
Typically, the parameters (both the dependent and
independent) have physical meaning.
E.g. fitting a straight
line to a bunch
Of points-
3/31/2014
8
Non-Pragmatic Regression
No prior knowledge of the true form of the
function.
Using many free parameters which have no
physical meaning.
The model should be able to represent a very
broad class of functions.
3/31/2014
9
Classification
Purpose: assign previously unseen patterns to
their respective classes.
Training: previous examples of each class.
Output: a class out of a discrete set of classes.
Classification problems can be made to look
like nonparametric regression.
3/31/2014
10
Time Series Prediction
Estimate the next value and future values of a
sequence, such as:
The problem is that usually it is not an explicit function
of time. Normally time series are modeled as auto-
regressive in nature, i.e. the outputs, suitably delayed,
are also the inputs:
To create the training set from the available historical
sequence first requires the choice of how many and
which delayed outputs affect the next output.
3/31/2014
11
Supervised Learning in RBFN
Neural networks, including radial basis function
networks, are nonparametric models and their
weights (and other parameters) have no
particular meaning in relation to the problems to
which they are applied.
Estimating values for the weights of a neural
network (or the parameters of any nonparametric
model) is never the primary goal in supervised
learning.
The primary goal is to estimate the underlying
function (or at least to estimate its output at
certain desired values of the input).
3/31/2014
12
The idea of RBFNN
The MLN is one way to get non-linearity.
The other is to use
The generalized linear discriminate function
3/31/2014
13
=
j
j j
w y ) (x |
The idea of RBFNN
For Radial Basis Function (RBF), the basis function
is radial
Symmetry with respect to the input, whose value
is determined by the - distance from the data point
to the RBF center.
3/31/2014
14
=
=
=
M
m
m j m j
j
j
j j
x
1
2
2
] ) ( [ || || distance, Euclidean For
measure. distance the is || ||
width. the center, the represents where
) 2 / || || exp( ) (
c c x
c x
c
c x x
o
o |
The Gaussian Kernel
Covers Theorem
A complex pattern-classification problem cast in
high-dimensional space nonlinearly is more likely
to be linearly separable than in a low
dimensional space
(Cover, 1965).
3/31/2014
15
Radial Basis Function Networks
In its most basic form Radial-Basis Function
network (RBF) involves three layers with entirely
different roles.
The input layer is made up of source nodes that
connect the network to its environment.
The second layer, the only hidden layer, applies a
nonlinear transformation from the input space to
the hidden space.
The output layer is linear, supplying the response
of the network to the activation pattern applied
to the input layer.
3/31/2014
16
The idea of RBFNN
For RBFNN, we expect that the function to be
learnt can be expressed as a linear
superposition of a number of RBFs.
3/31/2014
17
The function is
described as a linear
superposition
Of three basis
functions.
RBF Structure
RBFNN: a two-layer network
Free parameters
--The network weights win
the 2nd layer
--The form of basis functions
--The number of basis
functions
--The location of basis
functions.
E.g.: for Gaussian RBFNN, they
are the number, the centers
and the widths of basis
functions
3/31/2014
18
y
|
x
w
Some Theory
Given a set of N different
points {x
i
e R
m0
, i=1,2,...,N}
and a corresponding set of N
real numbers {d
i
e R
1
,
i=1,2,...,N}, find a function
F:R
N
->R
1
that satisfies the
interpolation condition
F(x
i
) = d
i
, i=1,2,...,N
The radial-basis function
technique consists of
choosing a function F
F(x) = E
N
i=1
w
i
(x-x
i
)
3/31/2014
19
Some Theory
Micchellis Theorem
Let {x
i
}
N
i=1
be a set of distinct points in R
m0
Then
the N-by-N interpolation matrix u, whose joy-the
element is
ij
= (x
j
-x
i
) is non-singular.
3/31/2014
20
Regularization Networks
The regularization network is a universal approximator
The regularization network has the best approximation
property
The solution computed by the regularization network is
optimal.
3/31/2014
21
Generalized RBF Networks
When N is large, the one-
to-one correspondence
between the training input
data and the Greens
function produces a
regularization network
that may be considered
expensive. ->
An approximation of the
regularized network.
3/31/2014
22
Generalized RBF Networks
The approach taken involves searching for suboptimal solution in a
lower-dimensional space that approximates the regularized solution
(Galerkins method).
F*(x) = E
m
1 i=1
w
i

i
(x),
where {
i
(x) | i=1,2,...,m
1
s N} is a new set of linearly independent
basis functions and the w
i
constitute a new set of weights.
We set
i
(x) = G(x-t
i
), i=1,2,... m
1
where the set of centers {t
i
|
i=1,2,...,m
1
} is to be determined.
Note that this particular choice of basis functions is the only that
guarantees that in the case of m
1
= N and x
i
= t
i
i=1,2,...,N the
correct solution is consistently recovered.
3/31/2014
23
localized Non-localized
RBF Structure (2)
Universal approximation: for Gaussian RBFNN,
it is capable to approximate any function.
3/31/2014
24
Exact Interpolation
The idea of RBFNN is that we interpolate the
target function by using the sum of a number of
basis functions.
To illustrate this idea, we consider a special case
of exact interpolation, in which the number of
basis functions M is equal to the number of data
points N (M=N) and all
The basis functions are centered at the data
points. We want the target values are exactly
interpolated by the summation of basis functions.
3/31/2014
25
Exact Interpolation
3/31/2014
26
t w
c x
= u
=
= =
=
or
||) (||
, 1 for ,
1
M
j
n
j
n
j j
n n
t w
N n t y
|
Since M=N, u is a square matrix

and is non-singular for general
cases, the result is
t w
1
u =
RBF Output with 3 centers
1-Dimensional problem
Center location (-1, 0, 1)
3/31/2014
27
RBF Output with 4centres (EX-OR)
3/31/2014
28
2
= 0.1
2
= 1.0
RBF Output with 4centres
3/31/2014
29
2
= 0.1
2
= 1.0
An example of exact interpolation
For Gaussian RBF (1D input)
21 data points are generated by y=sin(tx)
plus noise (strength=0.2)
3/31/2014
30
The target data points are
indeed exactly interpolated,
but the generalization
performance is not good.
The hybrid training procedure
The number of basis functions needs not to be equal to
the number of data points. Actually, in a typical
situation, Mshould be much less than N.
The centers of basis functions are no longer
constrained to be at the input data points. Instead, the
determination of centres becomes part of the training
process.
Instead of having a common width parameter o, each
basis function can has its own width, which is also to
be determined by learning.
3/31/2014
31
An example of RBFNN
3/31/2014
32
Exact interpolation, o=0.1
RBFNN, 4 basis functions, o=0.4
The hybrid training procedure
Unsupervised learning in the first layer. This is to fix the
basis functions by only using the knowledge of input
data. For Gaussian RBF, it often includes deciding the
number, locations and the width of RBF.
Supervised learning in the second layer. This is to
determine the network weights in the second layer. If
we choose the sum-of-square error, it becomes a
quadratic function optimization, which is easy to solve.
In summary, the hybrid training avoids to use
supervised learning simultaneously in two layers, and
greatly simplify the computational cost.
3/31/2014
33
Basis function optimization
The form of basis function is predefined, and is
often chosen to be Gaussian.
The number of basis function has often to be
determined by trials, e.g. though monitoring
the generalization performance.
The key issue in unsupervised learning is to
determine the locations and the widths of basis
functions.
3/31/2014
34
Algorithms for basis function
optimization
Subsets of data points.
To randomly select a number of input data
points as basis function centers.
The width can be chosen to be equal and to
be given by some multiple of the average
distance between the basis function centers.
3/31/2014
35
optimization
Gaussian mixture models.
The choice of basis functions is essential to
model the density distribution of the input
data (intuitively we want the centers of basis
functions to be at high density regions). We
may assume input data is generated by a
mixture of Gaussian distribution. Optimizing
the probability density model returns the
basis function centers and widths.
3/31/2014
36
optimization
Clustering algorithms.
In this approach the input data is assumed to
consist of a number of clusters. Each cluster
corresponds to one basis function, with the
center being the basis function center. The
width can be set to be equal to some multiple
of the average distance between all centers.
3/31/2014
37
K-means clustering algorithm (1)
The algorithm partition data points into K disjoint
subsets (K is predefined).
The clustering criteria are:
The cluster centers are set in the high density regions
of data
A data point is assigned to the cluster with which it
has the minimumdistance to the center
Mathematically, this is equivalent to minimizing
the sum-of-square clustering function
3/31/2014
38
3/31/2014
39
cluster in points data the of mean the :
1
points data containing cluster th the :
where
|| ||
1
2
j
S n
n
j
j
j j
K
j S n
j
n
S
N
N j S
J
j
j
e
= e
=
=
x c
c x
3/31/2014
40
Step 1: Initially randomly assign data points to one of K
clusters. Each data point will then have a cluster label.
Step 2: Calculate the mean of each cluster C.
Step 3:Check whether each data pointed has the right
cluster label. For each data point, calculate its distances
to all K centers. If the minimum distance is not the
value of this data point in its cluster center, the cluster
identity of this data point will then be updated to the
one that gives the minimumdistance.
Step 4: After each epoch checking (one turn for all data
points), if no updating occurs, i.e., J reaches the
minimum value, then stop. Otherwise, go back to step-
2.
An example of data clustering
3/31/2014
41
Before clustering After clustering
The network training
The network output after clustering
3/31/2014
42
term bias the : 1 ) (
clustering by obtained centers the :
RBF Gaussian the : 0 for ), 2 / || || exp( ) (
) ( ) (
0
2 2
0
=
= =
=
=
x
c
c x x
x x
|
o |
|
j
j j
j
K
j
j
j
w y

= =
=
N
n
n
M
j
n
j j
t w E
1
2
0
) (
2
1
) ( | w
The error output is
RBF in Time series Prediction
We will show an example of using RBFNN for time
series prediction.
Time series prediction: to predict the system
behavior based on its history.
Suppose the time course of a system is denoted
as{S(1),S(2),S(n)}, where S(n) is the system state
at time step n. The task is to predict the system
behavior at n+1 based on the knowledge of its
history. i.e., {S(n),S(n-1),S(n-2),}. This is possible
for many problems in which system states are
correlated over time.
3/31/2014
43
RBF in Time series Prediction
Consider a simple example, the logistic map,
in which the system state x is updated
iteratively according to
Our task is to predict the value of x at any step
based on its values in the previous two steps,
i.e., to estimate x
n
based on x
n-2
and
3/31/2014
44
) 1 (
1 n n n
x rx x =
+
Generating training data from the logistic
map
The logistic map, though is simple, shows many
interesting behaviors. (More detail can be found
at
http://mathworld.wolfram.com/LogisticMap.html
The data collecting process:
Choose r=4, and the initial value of x to be 0.3
Iterate the logistic map 500 steps, and collect 100
examples from the last
100 iterations (chopping the data into triplets, each
triplet gives one input-output pair).
3/31/2014
45
Generating training data from the logistic
map
3/31/2014
46
The input data space
The time course of the system state
Clustering the input data
We cluster the input data by using the K-means clustering
algorithm.
We choose K=4. The clustering result returns the centers of
basis functions and the scale of width.
3/31/2014
47
The training result of RBFNN
3/31/2014
48
2
and between ip relationsh The
n n
x x
The training result of RBFNN
3/31/2014
49
1
and between ip relationsh The
n n
x x
Time series predicted data
3/31/2014
50
Comparison with MLP
RBF
Simple structure: one hidden layer,
linear combination at the output
layer
Simple training: the hybrid training:
clustering + the quadratic error
function
Localized representation: the input
space is covered by a number of
localized basis functions. A given
input typically only activate
significantly a limited number of
hidden units (those are within a
close distance)
MLP
Complicated structure: often
many layers and many hidden
units
Complicated training: optimizing
multiple layer together, local
minimum and slow convergence.
Distributed representation: for a
given input, typically many
hidden units will be activated.
3/31/2014
51
Comparison with MLP (2)
Different ways of interpolating data
3/31/2014
52
MLP: data are classified by hyper-planes. RBF: data are classified according to clusters
Shortcomings of RBFNN
Unsupervised learning implies that RBFNN
may only achieve a sub - optimal solution,
since the training of basis functions does not
consider the information of the output
distribution.
3/31/2014
53
Example: a basis function is
chosen based only on the
density of input data, which
gives p (x). It does not match
the real output function h (x).
Shortcomings of RBFNN
3/31/2014
54
Example: the output function is only determined by one input component, the
other component is irrelevant. Due to unsupervised, RBFNN is unable to detect
this irrelevant component, whereas, MLP may do (the network weights
connected to irrelevant components will tend to have smaller values).
Some Theory
The XOR problem: (x
1
OR x
2
) AND NOT (x
1
AND x
2
)
3/31/2014
55
Summary
The structure of an RBF network is unusual in
that the constitution of its hidden units is
entirely different from that of its output units.
Tikhonovs regularization theory provides a
sound mathematical basis for the formulation
of RBF networks.
The Greens function G (x, ) plays a central role
in the theory.
3/31/2014
56
Queries ?????

RBF Classnote Mtech Spring2013

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

RBF Classnote Mtech Spring2013

Hochgeladen von

Copyright:

Verfügbare Formate

Radial Basis Functions

Since M=N, u is a square matrix

Das könnte Ihnen auch gefallen