Sie sind auf Seite 1von 8

Support Vector Machine Classiers for

Asymmetric Proximities
Alberto Mu noz
1
, Isaac Martn de Diego
1
, and Javier M. Moguerza
2
1
University Carlos III de Madrid, c/ Madrid 126, 28903 Getafe, Spain
{albmun,ismdiego}@est-econ.uc3m.es
2
University Rey Juan Carlos, c/ Tulipan s/n, 28933 M ostoles, Spain
j.moguerza@escet.urjc.es
Abstract. The aim of this paper is to aord classication tasks on
asymmetric kernel matrices using Support Vector Machines (SVMs). Or-
dinary theory for SVMs requires to work with symmetric proximity ma-
trices. In this work we examine the performance of several symmetriza-
tion methods in classication tasks. In addition we propose a new method
that specically takes classication labels into account to build the prox-
imity matrix. The performance of the considered method is evaluated on
a variety of articial and real data sets.
1 Introduction
Let X be an np data matrix representing n objects in IR
p
. Let S be the nn
matrix made up of object similarities using some similarity measure. Assume
that S is asymmetric, that is, s
ij
= s
ji
. Examples of such matrices arise when
considering citations among journals or authors, sociometric data, or word as-
sociation strengths [11]. In the rst case, suppose a paper (Web page) i cites
(links to) a paper (Web page) j, but the opposite is not true. In the second
example, a child i may select another child j to sit next in their classroom, but
not reciprocally. In the third case, word i may appear in documents where word
j occurs, but not conversely.
Often classication tasks on such data sets arise. For instance, we can have an
asymmetric link matrix among Web pages, together with topic labels for some of
the pages (computer sicence, sports, etc). Note that there exists no Euclidean
representation for Web pages in this problem, and classication must be done
using solely the cocitation matrix: we are given the S matrix, but there is no
X matrix in this case. SVM parametrization [1,2] of the classication problem
is well suited for this case. By the representer theorem (see for instance [3,8]),
SVM classiers will always take the form f(x) =

i

i
K(x, x
i
), where K is a
positively denite matrix. Thus, if we are given the similarity matrix K = (s
ik
)
and this matrix admits an Euclidean representation (via classical scaling), this
is all we need to classify data using a SVM. In the case of asymmetric K = S,
Sch olkopf et al [9] suggest to work with the symmetric matrix S
T
S. Tsuda [10]
O. Kaynak et al. (Eds.): ICANN/ICONIP 2003, LNCS 2714, pp. 217224, 2003.
c Springer-Verlag Berlin Heidelberg 2003
218 A. Mu noz, I. Martn de Diego, and J.M. Moguerza
elaborates on the SVD of S, producing a new symmetric similarity matrix, that
serves as input for the SVM.
A standard way to achieve symmetrization is to dene K
ij
=
s
ij
+s
ji
2
, taking
the symmetric part in the decomposition S =
1
2
(S+S
T
)+
1
2
(SS
T
). This choice
can be interpreted in a classication setting as follows: we assign the same weight
(one half) to s
ij
and s
ji
before applying the classier. However, note that this
choice is wasting the information provided by classication labels. In addition,
ignoring the skew-symmetric part implies a loss of information.
In next section we elaborate on an interpretation of asymmetry that could ex-
plain why and when some symmetrization methods may success. In addition we
show the existing relation between the methods of Tsuda and Sch olkopf and his
coworkers. In section 3 we propose a new method to build a symmetric Gram
matrix from an asymmetric proximity matrix. The proposed method specically
takes into account the labels of data points to build the Gram matrix. The dif-
ferent methods are tested in section 4 on a collection of both articial and real
data sets. Finally, section 5 summarizes.
2 A Useful Interpretation of Asymmetry
There is a particular choice of s
ij
that makes sense in a number of interesting
cases. Denote by the fuzzy and operator, and dene:
s
ij
=
|x
i
x
j
|
|x
i
|
=

k
| min(x
ik
, x
jk
)|

k
|x
ik
|
(1)
where the existence of a data matrix X is assumed. Suppose X corresponds to a
terms documents matrix. |x
i
| measures the number of documents indexed by
term i, and |x
i
x
j
| the number of documents indexed by both i and j terms.
Therefore, s
ij
may be interpreted as the degree in which topic represented by
term i is a subset of topic represented by term j. This numeric measure of
subsethood is due to Kosko [4]. In the case of a cocitation matrix, |x
i
| is the
number of cites received by author (or Web page) i, and |x
i
x
j
| measures the
number of authors (or Web pages) that simultaneously cite authors i and j.
All these problems have in common that the norms of individuals (computed
by the |x
i
|s) follow a Zipfs law [6]: there are a few individuals with very large
norms (very cited), and in the opposite side of the distribution, there are a lot
of individuals with very small norms. This asymmetry can be interpreted as a
particular type of hierarchy. Individuals organize in a kind of tree: in the top lie
words with large norms, corresponding to broad topics (authorities in the case
of Web pages). In the base would lie words with small norms, corresponding to
rare topics.
We are going next to relate norms with asymmetry. In the decomposition s
ij
=
1
2
(s
ij
+s
ji
) +
1
2
(s
ij
s
ji
), the second term conveys the information provided by
Support Vector Machine Classiers for Asymmetric Proximities 219
asymmetry (it equals to zero if S is symmetric). This skew-symmetric term can
be written as follows:
1
2
(s
ij
s
ji
) =
1
2
(
|x
i
x
j
|
|x
i
|

|x
i
x
j
|
|x
j
|
) =
|x
i
x
j
|
|x
i
||x
j
|
(|x
j
| |x
i
|) (|x
j
| |x
i
|) (2)
Thus asymmetry is directly related to dierence in norms, and will naturally
arise when the norms of data points follow Zipfs law.
The method suggested by Scholkopf et al in [9] consists in taking K = S
T
S
as kernel matrix. This method makes sense for the case of cocitation matrices,
because K
ij
= 1 when there is a k such that s
ki
= s
kj
= 1: there exists an
author that simultaneously cites both i and j. However, we will loose a case of
similarity that happens when two authors both cite a third (this information is
conveyed by SS
T
).
The method proposed by Tsuda [10] builds a symmetric kernel matrix as follows:
a transformation WH is dened, where H(x) = (s
(x,x
1
)
, . . . , s
(x,x
n
)
), and W =
L
1/2
U
T
. Here L and U come from the SVD of the data matrix HX, whose
i-th column vector is H(x
i
): HX = ULV
T
. The new similarity matrix is now
symmetric: K
ij
= (WH(x
i
))
T
(WH(x
j
)) = (H(x
i
))
T
UL
1
U
T
H(x
j
).
Tsudas method produces a kernel matrix close to S
T
S. Consider the SVD of
the original asymmetric similarity matrix S = (s
ij
): S = HX = ULV
T
. It
is straightforward to show that the corresponding kernel matrix is V LV
T
. To
conclude, note that the kernel matrix S
T
S = V L
2
V
T
.
3 Combining Kernels for Asymmetric Proximities
Following the notation of section 1, let X and S be respectively, the data matrix
and the asymmetric proximity matrix. For the sake of clarity, in this paper we will
focus on binary classication problems. Let C
1
and C
2
denote the two classes. To
use SVM classiers on X, S needs to be a positively denite symmetric matrix.
Thus we are faced to transform S in order to match SVM conditions.
From a geometric point of view, the solution of a binary classication problem is
given by a hyperplane or some type of decision surface. If it is possible to solve a
classication problem in this way, then the following topologic assumption must
be true: given a single datum, points in a suciently small neighborhood should
belong to the same class (excluding points lying on the decision surface). As a
consequence, if we are going to classify a data set relying on a given proximity
matrix, points close each other using such proximities should in general belong
to the same class.
Therefore, K
ij
should be large for i and j in the same class, and small for i and
j in dierent classes. We have two possibly contradictory sources of information:
s
ij
and s
ji
. We should dene K
ij
as a function f(s
ij
, s
ji
) that conforms to the
preceding rule. In this work we will adopt a simple and intuitive choice:
220 A. Mu noz, I. Martn de Diego, and J.M. Moguerza
K
ij
=

max (s
ij
, s
ji
), if i and j belong to the same class
min (s
ij
, s
ji
), if i and j belong to dierent classes
(3)
In this way, if i and j are in the same class, it is guaranteed that K
ij
will be the
largest possible, according to the available information. If i and j belong to dif-
ferent classes, we can expect a low similarity between them, and this is achieved
by the choice K
ij
= min(s
ij
, s
ji
). This kernel matrix K is now symmetric and
reduces to the usual case when S is symmetric. However, positive deniteness
is not assured. In this case, K should be replaced by K + I, for > 0 large
enough to make all the eigenvalues of the kernel matrix positive. We will call
this method the pick-out method.
Note that this kernel makes sense only for classication tasks, since we need
class labels to build it.
4 Experiments
In this section we show the performance of the preceding methods on both
articial and real data sets. The testing methodology will follow the next scheme:
After building the K matrix, we have a representation for point x
i
given by
(K(x
i
, x
1
), . . . , K(x
i
, x
n
)). Consider the X matrix dened as (K(x
i
, x
j
))
ij
. Next,
we produce Euclidean coordinates for data points from matrix X by a classic
scaling process. The embedding in a Euclidean space is convenient to make the
notion of separating surface meaningful, and allows data visualization. Next, we
use a linear SVM on the resulting data set and nally, classication errors are
computed. For all the methods, we use 70% of the data for training and 30% for
testing.
Regarding the pick-out method, we need a wise to calculate K(x, x
i
) for non-
labelled data points x. Given a point x, we will build two dierent sets of K
xi
=
K(x, x
i
). The rst, assuming x belongs to class C
1
, and the second assuming x
belongs to class C
2
. Suppose you have trained a SVM classier with labelled data
points. Now, calculate the distance of the two Euclidean representations of x to
the SVM hyperplane. Decide x to belong to class C
1
if the second representation
is the closest to this hyperplane, and to belong to C
2
in the other case.
4.1 Articial Data Sets
The two-servers data base. This data set contains 300 data points in IR
2
.
There are two groups linearly separable. At the beginning, there is a kernel
matrix dened by: s
ij
= 1 d
ij
/ max{d
ij
}, where d
ij
denotes Euclidean dis-
tance. Suppose that entries of the matrix are corrupted at random: for each pair
(i, j), one element of the pair (s
ij
, s
ji
) is substituted by a random number in
[0, 1]. This data set illustrates the situation that happens when there are two
groups of computers (depending on two servers) sending e-mails among them:
d
ij
corresponds to the time that a message takes to travel from computer i to
computer j. The asymmetry between d
ij
and d
ji
is explained by two dierent
Support Vector Machine Classiers for Asymmetric Proximities 221
ways of travelling information between i and j. The randomness is introduced
because it is not always true that d
ij
< d
ji
or conversely. Therefore, it is not
possible to nd kernels K
1
and K
2
that allow to express the kernel in the form
K =
1
K
1
+
2
K
2
.
We run the four methods and the average results are shown in table 1.
Table 1. Classication errors for the two-servers database.
Method Train error Test error
Pick-out 6.6 % 8.0 %
1/2(S + S
T
) 10.0 % 11.5 %
S
T
S 21.3 % 23.1 %
Tsuda 14.0 % 15.9 %
The pick-out method achieves the best performance. Since we are introducing
information about labels in the pick-out kernel, we expect this kernel will be
more useful than the others for data visualization. To check this conjecture, we
represent the two rst coordinates obtained by multidimensional scaling for each
of the methods. The result is shown in gure 1, and conrms our supposition.
0.4 0.2 0.0 0.2

0
.
3

0
.
2

0
.
1
0
.
0
0
.
1
0
.
2
0
.
3
(a) MDS for Pickout matrix
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ +
+ + +
+
+
+
+
+
+
+ +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
0.3 0.1 0.1 0.3

0
.
3

0
.
2

0
.
1
0
.
0
0
.
1
0
.
2
0
.
3
(b) MDS for 1/2(S+ S
T
) matrix
+
+ +
+ +
+
+
+
+
+
+
+
+
+
+
+
+
+ +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ +
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+ +
+
+
+
+ +
+
+
+
+
+
+
+
+
+
+
0.10 0.00 0.05 0.10

0
.
1
0

0
.
0
5
0
.
0
0
0
.
0
5
0
.
1
0
(c) MDS for S
T
S matrix
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ +
+
+
+
+
+
+
+
+
+
+
+
+
+ +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
0.2 0.0 0.1 0.2

0
.
2

0
.
1
0
.
0
0
.
1
0
.
2
(d) MDS for Tsudas matrix
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
Fig. 1. Multidimensional scaling (MDS) representation of symmetrized kernels.
222 A. Mu noz, I. Martn de Diego, and J.M. Moguerza
Two groups with dierent scattering matrices. In this data set there
are 350 points in IR
2
, divided in two groups (175 of each class). Each group
C
i
correspond to a normal cloud with diagonal covariance matrix
2
i
I. Here

2
= 5
1
. The overlap in the data set amounts to about 5%. We will dene s
ij
=
e
d
2
ij
/
2
j
, where
2
j
denotes the variance in the vicinity of point j, estimated as
the sample variance using the k-nearest neighbors of point j. Here we will take
k = 3. The underlying idea is to use a local-normalized distance: if distance of
point i to point j is large relative to the average of distances in the neighborhood
of j, then s
ij
will be small.
Results for this data set are shown in table 2.
Table 2. Classication errors for the two groups with dierent scattering matrices.
Method Train error Test error
Pick-out 5.1 % 8.0 %
1/2(S + S
T
) 6.4 % 11.5 %
S
T
S 7.1 % 8.5 %
Tsuda 6.9 % 9.2 %
Again the pick-out method attains the best results. The MDS representations of
the symmetrized kernel matrices are very similar to the preceding case, and will
not be displayed.
A false two-groups classication problem. A natural question is whether
the pick-out method will separate any data set with arbitrary labels. It should
not. To test this hypothesis we have generated a normal spherical cloud in IR
2
,
and assigned random labels to data points. In this case there is no continuous
classication surface able to separate the data in two classes. As expected, the
classication error rates are close to 50% for each of the proposed methods.
4.2 A Text Data Base
Next we will work on a small text data base, to check the methods in a high
dimensional setting. The rst class is made up of 296 records from the LISA
data base, with the common topic library science. The second class contains
394 records on pattern recognition from the INSPEC data base. There is a
mild overlap between the two classes, due to records dealing with automatic
abstracting. We select terms that occur in at least 10 documents; there are 982.
Labels are assigned to terms by voting on the classes of documents in which
these terms appear. The similarity coecient dened by eq. (1) is used, and
therefore we are in the asymmetry situation described in section 2. The overlap
in the term data set comes form words common to both topics and also from
Support Vector Machine Classiers for Asymmetric Proximities 223
common words present in records of the two classes. The task is to classify
database terms using the information provided by the matrix (s
ij
). Note that
we are dealing with about 1000 points in 600 dimensions, and this is a near
empty set. This means that it will be very easy to nd a hyperplane that divides
the two classes. Notwithstanding, the example is still useful to guess the relative
performance of the proposed methods.
Following the same scheme of the preceding examples, table 3 shows the result
of classifying terms using the SVM with the symmetrized matrices returned by
the four studied methods.
Table 3. Classication errors for the term data base.
Method Train error Test error
Pick-out 2.0 % 2.2 %
1/2(S + S
T
) 2.1 % 2.4 %
S
T
S 3.8 % 4.2 %
Tsuda 3.3 % 3.6 %
0.6 0.2 0.0 0.2 0.4 0.6

0
.
2
0
.
0
0
.
2
0
.
4
0
.
6
(a) MDS for Pickout matrix
+
+
+
+ +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+ +
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+ +
+
+
+
+
+
+ +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+ +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+ +
+
+
+
+
+
+
+
+
+
+
+ +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ +
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+ +
+
+
0.6 0.2 0.2

0
.
4

0
.
2
0
.
0
0
.
2
0
.
4
0
.
6
(b) MDS for 1/2(S+ S
T
) matrix
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+ +
+
+
+
+
+ +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ +
+
+
+
+
+
+
+
+
+
+
+
+ +
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ +
+
+
+
+
+
+
+
+
+
+ + +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ + +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ +
+
+
+
+
+
+
+ +
+
+
+
+
+
+
+ +
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+ +
+
+
+
+
+
+
+
+
+ +
+
+
+ + +
+
+ +
+
+
+
+
+
++
+
+
+
+
+
+
+ +
+
+
+
+
+
+
+
+
+
++
+ +
+
+
+
+
+
+ + +
+
+
++ +
+
+
+
+
+
+
++
++
+
+
+
+
+
+
++
+
+
+
+
+
+ +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ +
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+ +
+
+
+
+
+
+ + +
+ +
+
+
+
+
+
+
+
+
+ +
+++
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ +
+
+
+
+
+
+ +
+
+
+
+
+
+
+
+
+ +
+
+ +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
0.6 0.4 0.2 0.0

0
.
5

0
.
4

0
.
3

0
.
2

0
.
1
0
.
0
0
.
1
(c) MDS for S
T
S matrix
+
+
+
+
+
+
+
+
+
+
+
+
+
+ +
+
+
+
+
+
+ +
+
+
+
+
+
++
+
+
+
+
+
+
+
+ + + + +
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+ + +++ + +
+ + +
++
+
+
+
+
+
+
+
+
+
++
+
+
+ +
+ ++
+
+
+
+
+
+
+
+ +
+
+
+ +
+
+
+
+
+
+
+ +
+
+
+
+
+
+
+
+
+
+
+
+ +
+
++
+
+
+
+
+
++
+
+
+
++
+
+
+
++
+
+
+
+
+
+
++
+ +
+
+
+
+
+
+
+
+
+
+
+
+
+ ++
+
+
+
+ +
+
+
+
+
+ + +
+
++
+
+
+
+
+ + ++
++
+
++
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ +
+ +
+
++ + + +
+
+
+ +
+
+
+
++
+
+
+
+
+
+ +
+
+
+
+
+
+
+
+
+
++
+
+ +
+
+ +
+ +
+
+
+
+ +
+
+
+
+ +
+ +
++ +
+ +
+
+
+
+
+
++ +
+
+
+
+
+
+
+
+
+
+
+ +
+
+ +
+ + + +
+
+
+
++
+
+ + +
+
+
+
+
+
+
+
+
+
+
+ +
+
+
+
++
+
++
+
+
+
+
+
+ +
+
+
+
+
+
+
+
+ + +
+
+
+
+
+ ++
+ +
+
+
+
+ +
+
+
+
+
+
+
+
+ +
+
+
+
+
+
+ +
+ +
+
+
+
+ +
+
+
+
+
+ +
+
++
+
+ +
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+ +
+
+
+
+
+ + +
+
+
+
+
+
++
+
+ +
+
+
+
+ ++ +
+
+
+
+
+
+
+ +
+
+ +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ + +
+ +
+
+
+ +
+
+
++ ++
+
+
+
+
+
+ + +
+
+
+
+
+
+
+ +
0.1 0.1 0.3 0.5

0
.
5

0
.
4

0
.
3

0
.
2

0
.
1
0
.
0
0
.
1
(d) MDS for Tsudas matrix
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+ +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++ +
+
+
+
+
+
+
+
+
+
+
+
+ +
+
+
+
+ + +
++
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ ++
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+ +
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Fig. 2. MDS representation of symmetrized kernels.
The best results are obtained for the pick-out method. The MDS representation
of the symmetrized kernel matrix for each method is shown in gure 2. The
224 A. Mu noz, I. Martn de Diego, and J.M. Moguerza
symmetrization methods achieves a similar performance for this data set. This
fact is due to the high sparseness of the data set, as explained above. The best
visualization is obtained when using the pick-out kernel matrix. Working with
larger textual data sets [5,7], the method using K = 1/2(S +S
T
) seems to give
poor results, due to the loss of the skew-symmetric part of the similarity matrix.
5 Conclusions
In this work on asymmetric kernels we propose a new technique to build a sym-
metric kernel matrix from an asymmetric similarity matrix in classication prob-
lems. The proposed method compares favorably to other symmetrization meth-
ods proposed in the classication literature. In addition, the proposed scheme
seems appropriate for data structure visualization. Further research will focus
on theoretical properties of the method and extensions.
Acknowledgments. This work was partially supported by DGICYT grant
BEC2000-0167 and grant TIC2000-1750-C06-04 (Spain).
References
1. C. Cortes and V. Vapnik. Support Vector Networks. Machine Learning, 20:125,
1995.
2. N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines.
Cambridge University Press, 2000.
3. T. Evgeniou and M. Pontil and T. Poggio. Statistical Learning Theory: A Primer.
International Journal of Computer Vision, vol. 38, no. 1, 2000, pages 913.
4. B. Kosko. Neural Networks and Fuzzy Systems: A Dynamical Approach to Machine
Intelligence. Prentice Hall, 1991.
5. M. Martin-Merino and A. Mu noz. Self Organizing Map and Sammon Mapping for
Asymmetric Proximities. Proc. ICANN (2001), LNCS, Springer, 429435.
6. A. Mu noz. Compound Key Words Generation from Document Data Bases using a
Hierarchical Clustering ART Model. Journal of Intelligent Data Analysis, vol. 1,
no. 1, 1997.
7. A. Mu noz and M. Martin-Merino. New Asymmetric Iterative Scaling Models for the
Generation of Textual Word Maps. Proc. JADT (2002), INRIA, 593603. Avail-
able from Lexicometrica Journal at www.cavi.univ-paris3.fr/lexicometrica/index-
gb.htm.
8. B. Scholkopf, R. Herbrich, A. Smola and R. Williamson. A Generalized Representer
Theorem. NeuroCOLT2 TR Series, NC2-TR2000-81, 2000.
9. B. Scholkopf, S. Mika, C. Burges, P. Knirsch, K. M uller, G. Ratsch and A. Smola.
Input Space versus Feature Space in Kernel-based Methods. IEEE Transactions on
Neural Networks 10 (5) (1999) 10001017.
10. K. Tsuda. Support Vector Classier with Asymmetric Kernel Function. Proc.
ESANN (1999), D-Facto public., 183188.
11. B. Zielman and W.J. Heiser. Models for Asymmetric Proximities. British Journal
of Mathematical and Statistical Psychology, 49:127146, 1996.

Das könnte Ihnen auch gefallen