Beruflich Dokumente
Kultur Dokumente
Introduction
In this lecture we will look at RBFs, networks where the activation of hidden is based on the distance between the input vector and a prototype vector
RBFs allow for a straightforward interpretation of the internal representation produced by the hidden layer RBFs have training algorithms that are signicantly faster than those for MLPs RBFs can be implemented as support vector machines
CSCI 5521: Paul Schrater
Error function
Mixed- Different error criteria typically used for hidden vs. output layers. Hidden: Input approx. Output: Training error
Optimization
Simple least squares - with hybrid training solution is unique.
CSCI 5521: Paul Schrater
Non-linear functions are radially symmetric kernels, with free parameters CSCI 5521: Paul Schrater
In practice, a trade-off exists between using a small number of basis with many parameters or a larger number of less exible functions [Bishop, 1995]
CSCI 5521: Paul Schrater
RBF vs. NN
$ w "( x # x )
i i i = 1 :n
Form
Solve for w
CSCI 5521: Paul Schrater
L "1n (% w1 ( % t1 ( L " 2 n *'w 2 * 't 2 * = ' * " ij = " x i # x j * ' * M M M M ' * ' * L " nn * w t )&r n ) & n ) r "w = t r #1 r T T w= " " " t
Solving conditions
Micchellis Theorem: If the points xi are distinct, then the ! matrix will be nonsingular.
Mhaskar and Micchelli, Approximation by superposition of sigmoidal and radial basis functions, Advances in Applied Mathematics,13, 350-373, 1992.
# w " ( x) + w
ki i i= 1:m
k0
# w " (x),
ki i i= 0:m
" 0 (x ) = 1
Learning
RBFs are commonly trained following a hybrid procedure that operates in two stages
Unsupervised selection of RBF centers
RBF centers are selected so as to match the distribution of training examples in the input feature space This is the critical step in training, normally performed in a slow iterative manner A number of strategies are used to solve this problem
#
l =1:L
#{ y k ( x l ) " t
r t r r r E = # ( y ( x l ) " tl ) ( y ( x l ) " tl )
l =1:L k =1:n
r r t r r t t = # ($ ( x l )W " tl ) ($ ( x l )W " tl )
l =1:L
E=
# (%W
l =1:L
# W% %W
t l =1:L
W = (% %) %t T
t t
CSCI 5521: Paul Schrater
"1
Once the center positions have been selected, the spread parameters j can be estimated, for instance, from the average distance between neighboring centers
Clustering
Alternatively, RBF centers may be obtained with a clustering procedure such as the k-means algorithm The spread parameters can be computed as before, or from the sample covariance of the examples of each cluster
Density estimation
The position of the RB centers may also be obtained by modeling the feature space density with a Gaussian Mixture Model using the Expectation- Maximization algorithm The spread parameters for each center are automatically obtained from the covariance matrices of the corresponding Gaussian components
Introduce Mixture
CSCI 5521: Paul Schrater
where
Make a penalty on large curvature $ d 2# i ( x ) '$ d 2# j ( x ) ' "ij = * & )dx )& 2 2 % dx (% dx (
Minimize
T
Equivalent Kernel
Linear Regression solution admit a kernel interpretation
r w = (" " + #$) " y
T %1 T
Let y pred
(" " + #$) " = S " r ( x ) = "( x )' S & ( x ) y r = ' "( x ) S & ( x ) y = ' K ( x, x ) y
T T
%1
#
i
Eigenanalysis and DF
An eigenanalysis of the effective kernel gives the effective degrees of freedom (DF), i.e. # of basis functions
Evaluate K at all points K ij = K ( x i , x j ) Eigenanalysis K = VDV "1 = USU T