Sie sind auf Seite 1von 12

Authors Accepted Manuscript

Computational Performance Optimization of


Support Vector Machine Based on Support
Vectors

Xuesong Wang, Fei Huang, Yuhu Cheng

www.elsevier.com/locate/neucom

PII: S0925-2312(16)30564-1
DOI: http://dx.doi.org/10.1016/j.neucom.2016.04.059
Reference: NEUCOM17176
To appear in: Neurocomputing
Received date: 31 July 2015
Revised date: 18 March 2016
Accepted date: 6 April 2016
Cite this article as: Xuesong Wang, Fei Huang and Yuhu Cheng, Computational
Performance Optimization of Support Vector Machine Based on Support
Vectors, Neurocomputing, http://dx.doi.org/10.1016/j.neucom.2016.04.059
This is a PDF file of an unedited manuscript that has been accepted for
publication. As a service to our customers we are providing this early version of
the manuscript. The manuscript will undergo copyediting, typesetting, and
review of the resulting galley proof before it is published in its final citable form.
Please note that during the production process errors may be discovered which
could affect the content, and all legal disclaimers that apply to the journal pertain.
Computational Performance Optimization of Support Vector Machine

Based on Support Vectors

Xuesong Wang, Fei Huang, Yuhu Cheng


School of Information and Electrical Engineering, China University of Mining and Technology, 221116, Xuzhou, China

*Corresponding author. chengyuhu@163.com

Abstract

The computational performance of support vector machine (SVM) mainly depends on the size and dimension of
training sample set. Because of the importance of support vectors in the determination of SVM classification
hyperplane, a kind of method for computational performance optimization of SVM based on support vectors is
proposed. On one hand, at the same time of the selection of super-parameters of SVM, according to
Karush-Kuhn-Tucker condition and on the precondition of no loss of potential support vectors, we eliminate
non-support vectors from training sample set to reduce sample size and thereby to reduce the computation complexity
of SVM. On the other hand, we propose a simple intrinsic dimension estimation method for SVM training sample set by
analyzing the correlation between number of support vectors and intrinsic dimension. Comparative experimental results
indicate the proposed method can effectively improve computational performance.

Keywords: support vector machine; support vector; sample size; intrinsic dimension; computational performance

1. Introduction
Support vector machine (SVM) proposed by Vapnik [1] is one of important machine learning model.
SVMs can achieve good performance by using the implementation of the structural risk minimization
principle and the introduction of the kernel trick [2]. Due to the powerful abilities of generalization,
small-sample and nonlinear processing, SVMs have been successfully applied to classification decision [3,
4], regressive modeling [5], fault diagnosis [6, 7] and bioinformatics [8, 9]. Although SVM can effectively
avoid the problem of curse of dimensionality, the degraded computational performance caused by the
increase of sample size or dimension cannot be solved, which are usually dealt by the following two ways:
one is the improvement of learning algorithm, such as sequential minimal optimization (SMO) [10],
successive overrelaxation (SOR) [11] and LIBSVMCBE [12] while the other is to simplify computation by
reducing sample size or dimension of training sample set, which are respectively studied in this paper.
In dealing with large datasets, how to reduce the scale of training samples is one of the most important
aspects to improve computation efficiency. There are some major difficulties when processing large data
concerning a fully dense nonlinear kernel matrix. To overcome computational difficulties, some authors
have proposed low-rank approximation to the full kernel matrix. As an alternative, Lee and Mangasarian
[13] have proposed the method of reduced support vector machine (RSVM). The key ideas of the RSVM
are as follows. Prior to training, it selects a small random subset as to generate a thin rectangular kernel
matrix. Then, it uses this much smaller rectangular kernel matrix to replace the full kernel matrix in the
nonlinear SVM formulation. Ke and Zhang [14] proposed a method called editing support vector machine
(ESVM) by removing some samples near the boundary from the training set. Its basic scheme is similar to
that editing nearest neighbor methods in statistical pattern recognition, which randomly divides the
training set into subsets, using one subset to edit the other one and then get the final decision boundary

1
using the remained samples. A reverse algorithm was proposed by Koggalage and Halgamuge [15], i.e.,
the scale of training set is reduced by removing samples having no contributions to classification. In
summary, the basic idea of the methods mentioned above is to represent original training sample set by
finding out more suited alternatives. Generally speaking, the more representative of the reduced dataset,
the better performance of SVM is achieved. However, the reduced dataset in the above methods is chosen
randomly and thus lack of representativeness.
In dealing with large datasets, how to reduce the dimension of training samples is another important
mean to improve the computation efficiency of SVM. There are two ways to realize dimensionality
reduction (DR): feature selection and feature extraction. Feature selection simplifies the training sample
set by selecting a subset of relevant features from the original input features and eliminating irrelevant
features [16, 17]. Through conducting a certain mathematical transformation on input features, feature
extraction projects the high-dimensional data into a low-dimensional space. Of course, dimensionality
reduction has its downside, for whatever kinds of projection will result in information loss of original
high-dimensional data. Then the difficulty in DR is to obtain the simplest low-dimensional representation
on the condition that the essential features of the original high-dimensional data are retained. That is how
to find the possible lowest dimension (intrinsic dimension) [18] while retaining as many as possible the
essential features.
For various dimensionality reduction methods, intrinsic dimension is a critical parameter that needs
further study so far. Accurate estimation of the intrinsic dimension from high-dimensional data plays an
important role for subsequent dimensionality reduction. The current estimation methods can be classified
into two categories: one is algebra-based Eigenvalue method [19] while the other is geometrical
feature-based method such as maximum likelihood estimator (MLE) [20], correlation dimension
(CorrDim) [21], packing numbers [22], nearest neighbor dimension (NearNbDim) [23], and geodesic
minimum spanning tree (GMST) [24] etc. The Eigenvalue estimation method estimates the intrinsic
dimension by sorting eigenvalues of covariance matrix, followed by determining the number of important
features or entropying threshold of eigenvalue (e.g. accumulative contribution rate in PCA). However, the
threshold value or proper number of retainable features cannot be indicated and thus the effect of
nonlinear manifold estimation is not satisfactory. Through the likelihood function for neighbor distance is
established, MLE gets the maximum likelihood estimation for intrinsic dimension. A research of Belkin
indicates that MLE is an unbiased estimation [25]. The neighborhood graph is built up through calculating
Euclidean distance between samples in NearNbDim method. Therefore, computation complexity grows
up exponentially with sample size, which is not suitable when size is large. The neighborhood graph in
GMST method is alternatively built up by replacing Euclidean distance with geodesic distance, causing
computation complexity as well.
It has been known that the computational performance of SVM also highly depends on the selection of
super-parameters. Therefore, the optimal choice is of critical importance to obtain a good performance in
handling pattern recognition problems with SVMs. From SVMs being introduced, much effort has been
made to develop efficient methods to optimize super-parameters. The commonly used super-parameter
selection methods, including empirical selection, grid search, gradient descent and intelligent optimization
[26, 27], have some defects, such as low efficiency, high computation complexity or undesirable
super-parameters. Determination of appropriate super-parameters is not an easy thing due to large-scale
training sample set.
In summary, sample size reduction, intrinsic dimension estimation and super-parameters selection are
research focuses in the field of SVM. Related researches have been carried out and presented. However,

2
existing methods are merely applicable for only one of the topics and none of them is for all. As
classification hyperplane of SVM is entirely determined by support vectors (SVs) [28], thus a novel
method for computational performance optimization of SVM based on support vectors is proposed in this
paper. There are three contributions in this paper: 1) All the 3 topics are taken simultaneously from
support vectors point of view; 2) We present a method for properly eliminating non-support vectors from
original training sample set by using Karush-Kuhn-Tucker (KKT) condition. Thus, computation
complexity is reduced by a smaller sample size; 3) We design a simple intrinsic dimension estimation
method by analyzing the correlation between number of support vectors and intrinsic dimension, which is
proved by experiment.
The paper is organized as follows. SVM is briefly introduced in Section 2 while the proposed method
based on support vectors is described in Section 3, followed by experimental validation, discussion and
then conclusions.

2. SVM model
A task of supervised binary classification is referred to. We denote the training sample set as

T {( xi , yi ) | xi R k } , in which xi means the training sample labeled with class yi {-1,1} . It is


the goal of SVM that the classification accuracy is maximized by widening as possible the margin between the
decision hyperplane and datasets. With regard to a classical linear SVM, a class label is assigned, based on
the position information of the decision hyperplane aspect. It is defined as [1]:

wT x b 0 (1)

where w and b represent the weight vector and the bias term of the hyperplane respectively. In the
original feature space, the constraint for perfect classification can be described as:

yi (w T xi b) 1 (2)

Next, the decision function is expressed as:

f ( x) sgn( w T xi b) (3)

where sgn() is written as:

1, a0
sgn( a) (4)
1, a0
1
and the slack variable , the problem of obtaining
2
By introducing the regularization term w
2
optimal hyperplane can be transformed into the following quadratic programming problem [1]:
l
1
w , b, arg min ( w C i )
2

w ,b, 2
i 1
(5)
yi ( w T xi b) 1 i
s.t.
i 0, C 0
where
2
stands for the L2-norm, C is the penalty factor that balances the importance between the
maximization of the margin width and the minimization of the training error. By introducing a Lagrange
multiplier , we can obtain the dual problem:

3
1 1 l l l
min W ( ) W1 W2 yi y j i j ( xi , x j ) j
2 2 i 1 j 1 j 1

l (6)
yi i 0
s.t. i 1
0 C
i

The corresponding decision function is:


l
f ( x ) sgn( i yi ( xiT x ) b) (7)
i 1

SVM is an excellent kernel-based alternative for data classification. With regard to some real
applications that make data linearly inseparable, the original data is commonly mapped to a linearly
separable kernel feature space of high dimension by the kernel trick method. Suppose () : R k R k
T
representing a mapping from original to high-dimensional feature space, xi x can be replaced by
T ( xi )( x) . If there is a kernel function K () satisfying the Mercer theorem, then:

K ( xi , x) T ( xi )( x) (8)

The decision function is written as:


l
f ( x ) sgn( i yi K ( xi , x ) b) (9)
i 1

Kernel function with various expressions and parameters implicitly defines the nonlinear mapping from
the input space to a high-dimensional feature space. Traditional kernel functions include linear,
polynomial, Gauss, Sigmoid and Fourier series. In particular, Gauss kernel is highlighted due to simplicity,
less computation, high efficiency and easy access. Therefore, such a function is preferably selected, i.e.:
2
K ( xi , x j ) exp( - xi x j ) (10)

where kernel parameter 0 serves as control of the dispersion of the kernel in the input space.
Shortly, super-parameters to be determined in the following are penalty factor C and kernel
parameter in SVM with a Gauss kernel.

3. Computational performance optimization of SVM based on SVs


3.1. Sample size reduction based on SVs
For support vector machine, its classification hyperplane is entirely determined by support vectors [29].
Therefore, the most important problem existing in sample size reduction is to select small number of
representative training samples from the large training sample set under the condition that all support
vectors are retained as far as possible. If support vectors are changed significantly, computational
performance will be definitely affected.

SVMs optimal solution (1 ,..., l ) can be obtained through Eq. (6) so that all samples can
* * * T

satisfy the following KKT condition [1] as:

4
i* 0 yi f ( xi ) 1

0 i C yi f ( xi ) 1
*
(11)
*
i C yi f ( xi ) 1
Fig. 1 is the schematic diagram of KKT condition where f ( x ) 0 is a classification hyperplane
while f ( x ) 1 is the boundary of classification interval. Samples corresponding to i 0 are
*

distributed inside the classification interval, samples corresponding to 0 i C are distributed on the
*

boundary while samples corresponding to i* C are scattered beyond the interval.

Fig. 1 Schematic diagram of KKT condition

It can be seen from Fig. 1 that the determination of hyperplane h being optimal are sample points that
distribute on the boundary of classification interval and these sample points are called support vectors. For
the learning of SVM, SVs are key elements among the training sample set while non-SVs have no
contribution. Moreover, for a large dataset, SVs account for only a small portion. As long as we find out
all SVs to be used other than the original large set, the computation complexity can be reduced while
maintaining classification accuracy.
It can be known from Eq. (9) that in practical application, to a great extent, the computational
performance depends on the proper selection of super-parameters. Unsuitable super-parameters will cause
under-fitting or over-fitting, not only affecting the classification accuracy but also resulting in excessively
long training time, and will even fail to obtain the optimization model in limited steps [30]. Therefore, we
have proposed a novel super-parameter selection method for SVM with a Gaussian kernel, which can be
divided into the following two stages [30]. First, the kernel parameter is chosen to ensure a sufficiently
large number of potential support vectors retained in the training sample set. Then outliers from the set are
screened out by assigning a special value to the penalty factor, and the optimal penalty factor from
remained set without outliers is trained out. It is observed that in the process of described super-parameter
selection, support vectors and super-parameters are closed related. The following will elaborate on how to
use the KKT condition to properly eliminate non-support vectors when determining super-parameters.
Firstly, we normalize training samples by using zero-mean standardization method so that the mean of
all features is 0 and the variance is 1. Given that the spatial distribution of samples generally accords to a
multivariate Gauss distribution, the kernel parameter under Gauss kernel assumption is [30]:
1
(12)
2k
Then, set C 1 and use the training sample set to train SVM and use KKT condition to eliminate
non-support vectors. Thus only support vectors that are vital to the determination of the decision function

5
are retained.
In addition, due to manually wrong labeling that leads to sparse distribution of heterogeneous samples
close to boundary or mixed boundaries of heterogeneous samples, there are labeled samples with outliers
close to the heterogeneous, but away from homogeneous samples. When the spatial distribution of
samples is observed with outliers, optimizing strategies will often lead to the failure of super-parameter
optimization process from quick convergence due to disturbance of outliers. Therefore, it is necessary to
distinguish and to screen out these outliers. The obtained SVM after training is used to test support vector

xisv , misclassified samples that meet the condition yisv f ( xisv ) 0 are outliers. Then, we screen out

these outliers from the training sample set to avoid over-learning due to excessive dependence on the class
information of the outlier. Thus, the initial training sample set exclusive of non-support vectors and
outliers will form a reduced training set to achieve the goal of sample size reduction.

3.2. Intrinsic dimension estimation based on SVs


Dimensionality reduction aims at removing redundant dimension to facilitate further data analysis and
processing provided that the essential features of original data are to the maximum retained. Intrinsic
dimension of n-dimension data means the needed minimum number of free variables to describe the data
distribution characteristics, which determines whether or not the distribution characteristics of the original
n-dimension space in the n' -dimension subspace can be fully described ( n' is the intrinsic dimension of
sample spatial distribution).
It is known that it needs only one point to divide a straight line, one straight line to divide a plane and
one plane to divide a three-dimensional space. Namely, a (n-1)-dimension hyperplane is needed to divide
a n-dimension space into two completely separate spaces. Because the learned optimal hyperplane is a
globally optimal solution, at least 2n samples are needed to form the (n-1)-dimension hyperplane. Suppose
the number of classes is k and the samples are linearly separable, the minimum number of support vectors
used to form an optimal hyperplane is k n' . Therefore, the theoretical lower limit of the intrinsic
dimension can be obtained.

n' min( Nsv / kn) (13)

where Nsv denotes the number of support vectors.


When researching the asymptotic characteristic of super-parameters, Keerthi et al. [31] found that when
kernel parameter is fixed and penalty factor C , SVM will strictly divide the training sample set;
while noise is present in the training sample set, over-fitting phenomenon will occur, i.e., the structural
risk minimization model will degrade into the original empirical risk minimization model. It is known
from the constraint conditions of solving SVM that all Lagrange multipliers satisfy 0 C .
According to experiments, Hsu and Lin [32] draw such a conclusion that bounded vectors become free
vectors with the increase of C, i.e., the number of bounded vectors decreases while the number of free
vectors increases. In addition, it is known that the support vector set is definitely contained in the bounded
vector set. Thus, the decreasing number of bounded vectors means that the number of support vectors
decreases. In summary, the number of support vectors will gradually decrease with the increase of C.
Therefore, acquiring minimum number of support vectors to get lower limit of intrinsic dimension is the
goal of empirical risk minimization, and it cannot reflect the essential features of the actual samples. In
addition, determining the appropriate number of support vectors will involve the selection of penalty
factor. We have proposed a novel method of selecting C in [30], i.e., a relatively large value to C (e.g.

6
C 1024 ) is assigned and SVM is trained using the reduced training sample set. Thus the largest value
of the Lagrange multiplier is viewed as the optimal value of penalty factor. The selected penalty factor is
in appropriate size, which has two advantages: on the one hand, it can prevent restraint on non-outliers
because of too small penalty factor and further prevent under-fitting of SVM; on the other hand, it can
prevent interference from outliers in the learning process because of too large penalty factor and further
prevent over-fitting of SVM [30]. After selecting proper super-parameters including and C , we train
SVM using the reduced training sample set to get the number of support vectors Nsv . Thus, the intrinsic
dimension of the original training sample set of SVM can be obtained from Eq. (13).

4. Experimental results
In order to verify the validity and superiority of the proposed method, experiments on 3 remote sensing
(KSC, ROSIS and Salinas) and 5 UCI (100PSL, ISOLET, USPS, HAR and Gas) datasets were carried out,
which are shown in Table 1.
Table 1 Description of datasets
Sample Number of Intrinsic Training Testing
Dataset
size classes dimension sample size sample size
KSC 3827 10 178 2891 963
ROSIS 9167 9 222 6422 2745
Salinas 54129 16 224 37883 16246
100PSL 1600 100 64 1100 500
ISOLET 7797 26 617 6238 1559
USPS 9298 10 256 2789 6509
HAR 10299 6 561 7352 2947
Gas 13910 12 128 9737 4173

The comparative experiments include two parts. For the first part, the proposed method (here
represented by DR&I) was compared with EigenValue [19], MLE [20], GMST [24]. Parameters for these
intrinsic dimension estimation methods are set as follows: 1) PCA is adopted as dimensionality reduction
algorithm; 2) Common parameters are shared in each method are the same; 3) Super-parameters of SVM
are selected according to our proposed method; 4) The cumulative contribution rate in EigenValue is

97.5%; 5) The range of neighborhood search in MLE is k1 6 and k 2 12 . Table 2 compares the

difference in performances in terms of classification accuracy and intrinsic dimension (here represented
by Dim). It is observed that: 1) Except that the classification accuracy of DR&I is slightly lower than that
of GMST on ROSIS dataset, DR&I has the highest accuracy on other 7 datasets; 2) Although EigenValue,
MLE and GMST get relatively small estimated values of intrinsic dimensions, loss of high-dimensional
information also occurs, causing significant reduction in classification accuracy compared with DR&I.
Table 2 Performance comparison of different intrinsic dimension estimation methods
EigenValue MLE GMST DR&I
Dataset
Acc(%) Dim Acc(%) Dim Acc(%) Dim Acc(%) Dim
KSC 79.34 4 78.79 10 78.79 10 93.88 14
ROSIS 78.01 14 79.38 18 92.32 108 92.20 75
Salinas 86.33 5 85.25 13 86.36 7 92.11 70
100PSL 71.60 22 64.00 8 71.40 16 78.00 10

7
ISOLET 66.97 8 76.78 17 81.59 21 96.15 137
USPS 83.90 8 83.90 8 89.38 14 96.42 78
HAR 77.10 6 83.68 17 82.73 12 95.45 197
Gas 54.08 1 69.34 3 62.06 2 93.32 34

For the second part, classification performance of SVM under different conditions of DR&I was
intensively compared. Table 3 shows the experimental results, where PS time, DR time, Train time, Test
time and Total time respectively denote computation consumption for super-parameter selection,
dimensionality reduction, SVM train, SVM test and total of the above all, whereas NDR&NI and NDR&I
are the special cases of DR&I. NDR&NI means that neither the elimination of non-support vectors nor the
dimensionality reduction are performed during the learning process of SVM while NDR&I means that the
former was done but latter not. It can be observed that: 1) NDR&NI has the highest classification accuracy.
This is because in addition to obtaining relatively small sample size and low dimension, NDR&I and
DR&I also have certain bad effect on classification accuracy; 2) Compared with NDR&NI and NDR&I,
although DR&I requires additional dimensionality reduction or elimination of non-support vectors, the
least computation time is consumed; 3) Without drastical loss of classification accuracy, DR&I can get
higher computing efficiency.
Table 3 Comparison of classification performance of SVM under different conditions of DR&I
Method Dataset Acc(%) PS(s) DR(s) Train(s) Test(s) Total(s)
KSC 94.33 0.50 0.00 0.07 0.47 1.07
ROSIS 92.99 12.02 0.00 2.89 10.52 25.43
Salinas 95.11 609.29 0.00 184.09 88.91 882.30
100PSL 82.80 1.87 0.00 0.54 0.42 2.85
NDR&NI
ISOLET 96.15 165.69 0.00 40.30 19.73 225.72
USPS 96.42 9.81 0.00 2.29 8.20 20.30
HAR 95.83 89.27 0.00 17.26 12.54 119.07
Gas 99.21 38.09 0.00 2.86 0.94 41.90
KSC 94.33 0.27 0.00 0.05 0.38 0.70
ROSIS 92.93 9.94 0.00 2.35 10.14 22.44
Salinas 94.15 362.05 0.00 85.56 83.60 531.25
100PSL 82.80 1.91 0.00 0.51 0.41 2.84
NDR&I
ISOLET 96.15 121.68 0.00 24.94 18.61 165.23
USPS 96.33 5.50 0.00 0.58 7.76 13.84
HAR 95.83 44.96 0.00 5.63 11.83 62.42
Gas 98.83 26.91 0.00 1.80 0.90 29.61
KSC 93.88 0.26 1.00 0.03 0.12 1.41
ROSIS 92.20 9.87 2.16 0.81 3.23 16.07
Salinas 92.11 352.36 2.46 50.91 27.93 433.66
100PSL 78.00 1.86 0.15 0.33 0.23 2.58
DR&I
ISOLET 96.15 120.64 6.72 8.24 4.91 140.52
USPS 96.42 5.48 2.31 0.46 1.83 10.08
HAR 95.45 44.64 5.71 2.72 5.29 58.35
Gas 93.32 27.47 0.34 0.48 0.20 28.47

8
5. Conclusions
Support vector machines play a very important role in data classification because of their good
generalization performance. Although SVM can effectively avoid the problem of curse of
dimensionality, degraded computational performance caused by the increase of sample size or dimension
cannot be solved. As support vectors play an important role in the determination of SVM classification
hyperplane, in this paper we study how to improve computational performance from support vectors point
of view. Comparative experimental results on 3 remote sensing dataset and 5 UCI machine learning
datasets indicate that the proposed method not only requires minimum computation time, but also at the
same time gets higher classification accuracy.

Acknowledgements
This work is supported by National Natural Science Foundation of China (61273143, 61472424) and Fundamental
Research Funds for the Central Universities (2013RC10, 2013RC12 and 2014YC07).

References
[1] V.N. Vapnik, An overview of statistical learning theory, IEEE Transactions on Neural Networks 10(5) (1999) 988-999.
[2] X.P. Hua, S.F. Ding, Weighted least squares projection twin support vector machines with local information,
Neurocomputing 160 (2015) 228-237.
[3] Z.N. Wu, H.G. Zhang, J.H. Liu, A fuzzy support vector machine algorithm for classification based on a novel PIM
fuzzy clustering method, Neurocomputing 125 (2014) 119-124.
[4] S.F. Ding, X.P. Hua, Recursive least squares projection twin support vector machines for nonlinear classification,
Neurocomputing 130 (2014) 3-9.
[5] X.J. Peng, D. Xu, J.D. Shen, A twin projection support vector machine for data regression, Neurocomputing 138
(2014) 131-141.
[6] Z.Q. Su, B.P. Tao, Z.R. Liu, Y. Qin, Multi-fault diagnosis for rotating machinery based on orthogonal supervised
linear local tangent space alignment and least square support vector machine, Neurocomputing 157 (2015) 208-222.
[7] Y. Tian, M.Y. Fu, F. Wu, Steel plates fault diagnosis on the basis of support vector machines, Neurocomputing 151(P1)
(2015) 296-303.
[8] L. Khedher, J. Ramrez, J.M. Grriz, A. Brahim, F. Segovia, Early diagnosis of Alzheimer's disease based on partial
least squares, principal component analysis and support vector machine using segmented MRI images,
Neurocomputing 151(P1) (2015) 139-150.
[9] D.J. Qu, W. Li, Y. Zhang, B. Sun, Y. Zhong, J.H. Liu, D.Y. Yu, M.Q. Li, Support vector machines combined with
wavelet-based feature extraction for identification of drugs hidden in anthropomorphic phantom, Measurement 46(1)
(2013) 284-293.
[10] Y.L. Lin, J.G .Hsieh, H.K. Wu, J.H. Jeng, Three-parameter sequential minimal optimization for support vector
machines, Neurocomputing, 74(17) (2011) 3467-3475.
[11] W.J. Chen, Y.H. Shao, H.Y. Wang, Successive overrelaxation for twin parametric-margin support vector machines,
Journal of Information and Computational Science 10(3) (2013) 791-799.
[12] M. Marzolla, Fast training of support vector machines on the cell processor, Neurocomputing 74(17) (2011)
3700-3707.
[13] Y.J. Lee, O.L. Mangasarian, RSVM: reduced support vector machines, Wisconsin: University of Wisconsin, 2000.
[14] H.X. Ke, X.G. Zhang, Editing support vector machines, in: Proceedings of International Joint Conference on Neural
Networks, 2001, pp. 1464-1467.
[15] R. Koggalage, S. Halgamuge, Reducing the number of training samples for fast support vector machine classification,
Neural Information Processing-Letters and Reviews 2(3) (2004) 57-65.

9
[16] X. Zhang, G. Wu, Z.M. Dong, C. Crawford, Embedded feature-selection support vector machine for driving pattern
recognition, Journal of the Franklin Institute, 352(2) (2015) 669-685.
[17] X.B. Kong, X.J. Liu, R.F. Shi, K.Y. Lee, Wind speed prediction using reduced support vector machines with feature
selection, Neurocomputing 169 (2015), 449-456.
[18] A. Fernndez, A.M. Gonzlez, J.L. Daz, J.R. Dorronsoro. Diffusion maps for dimensionality reduction and
visualization of meteorological data, Neurocomputing 163 (2015) 25-37.
[19] T.P. Minka, Automatic choice of dimensionality for PCA, in: NIPS, 2000, pp. 598-604.
[20] E. Levina, P.J. Bickel, Maximum likelihood estimation of intrinsic dimension, in: NIPS, 2004.
[21] F. Camastra, A. Vinciarelli, Intrinsic dimension estimation of data: an approach based on Grassberger-Procaccia's
algorithm, Neural Processing Letters 14(1) (2001) 27-34.
[22] B. Kgl, Intrinsic dimension estimation using packing numbers, in: NIPS, 2002, pp. 681-688.
[23] J.A. Costa, A. Girotra, A. Hero, Estimating local intrinsic dimension with k-nearest neighbor graphs, in: Proceedings
of the 13th IEEE/SP Workshop on Statistical Signal Processing, 2005, pp. 417-422.
[24] J.A. Costa, A.O. Hero, Geodesic entropic graphs for dimension and entropy estimation in manifold learning, IEEE
Transactions on Signal Processing, 52(8) (2004) 2210-2221.
[25] M. Belkin, P. Niyogi, Laplacian eigenmaps and spectral techniques for embedding and clustering, in: NIPS, 2001, pp.
585-591.
[26] S.J. Zhai, T. Jiang, A new sense-through-foliage target recognition method based on hybrid differential evolution and
self-adaptive particle swarm optimization-based support vector machine, Neurocomputing 149 (2015) 573-584.
[27] H.Q. Jiang, Z.B. Yan, X.G. Liu, Melt index prediction using optimized least squares support vector machines based on
hybrid particle swarm optimization algorithm, Neurocomputing 119 (2013) 469-477.
[28] S.F. Ding, X.P. Hua, J. Yu, An overview on nonparallel hyperplane support vector machines, Neural Computing &
Applications 25(5) (2014) 975-982.
[29] S.F. Ding, H.J. Huang, J.Z. Yu, H. Zhao, Research on the hybrid models of granular computing and support vector
machine, Artificial Intelligence Review 43(4), (2015) 565-577.
[30] X.S. Wang, F. Huang, Y.H. Cheng, Super-parameter selection for Gaussian-kernel SVM based on outlier-resisting,
Measurement 58 (2014) 147-153.
[31] S.S. Keerthi, C.J. Lin, Asymptotic behaviors of support vector machines with Gaussian kernel, Neural Computation,
15(7) (2003)1667-1689.
[32] C.W. Hsu, C.J. Lin. A simple decomposition method for support vector machines, Machine Learning, 46 (2002)
291-314.

10
Xuesong Wang received the PhD degree from China University of Mining and Technology in 2002.
She is currently a professor in the School of Information and Electrical Engineering, China University
of Mining and Technology. Her main research interests include machine learning, bioinformatics, and
artificial intelligence. In 2008, she was the recipient of the New Century Excellent Talents in
University from the Ministry of Education of China.

Fei Huang received the Master degree from China University of Mining and Technology in 2014.

Yuhu Cheng received the PhD degree from the Institute of Automation, Chinese Academy of Sciences
in 2005. He is currently a professor in the School of Information and Electrical Engineering, China
University of Mining and Technology. His main research interests include machine learning, transfer
learning, and intelligent system. In 2010, he was the recipient of the New Century Excellent Talents in
University from the Ministry of Education of China.

11

Das könnte Ihnen auch gefallen