Deon Garrett Et Al - Comparison of Linear and Nonlinear Methods For EEG Signal Classification

1
Comparison of Linear and Nonlinear Methods for EEG Signal Classication

Deon Garrett, David A. Peterson, Charles W. Anderson, Michael H. Thaut
Abstract The reliable operation of brain-computer interfaces (BCIs) based on spontaneous electro electroencephalogram (EEG) signals requires accurate classication of multichannel EEG. The design of EEG representations and classiers for BCI are open research questions whose difculty stems from the need to extract complex spatial and temporal patterns from noisy multidimensional time series obtained from EEG measurements. It is possible that the amount of noise in EEG limits the power of nonlinear methods; linear methods may perform just as well as nonlinear methods. This article reports the results of a linear (linear discriminant analysis) and two nonlinear classiers (neural networks and support vector machines) applied to the classication of spontaneous, six-channel EEG. The nonlinear classiers produce only slightly better classication results. An approach to feature selection based on genetic algorithms is also presented with preliminary results. Index Terms EEG, electroencephalogram, pattern classication, neural networks, support vector machines, feature selection, genetic algorithms
acquisition of EEG signals. The results of this study are detailed in Section V. Section ?? describes preliminary experiments using genetic algorithms to search for good subsets of features in an EEG classication problem. Section ?? summarizes the ndings of this article and their implications. II. S UPPORT V ECTOR M ACHINES FOR B INARY C LASSIFICATION The support vector machine (SVM) is a classication method rooted in statistical learning theory. The motivation behind SVMs is to map the input into a high dimensional feature space, in which the data might be linearly separable. In this regard, SVMs are very similar to other Neural Network based learning machines. The principle difference between these machines and SVMs is that the latter produce the optimal decision surface in the feature space. Conventional neural networks can be difcult to build due to the need to select an appropriate number of hidden units. The network must contain enough hidden units to be able to approximate the function in question to the desired accuracy. However, if the network contains too many hidden units, it may simply memorize the training data, causing very poor generalization. The ability of the machine to learn features of the training data is often referred to as learning capacity, and is formalized in a concept called VC dimension. Support Vector Machines are constructed by solving a quadratic programming problem. In solving this problem, SVM training algorithms simultaneously maximize the performance of the machine while minimizing a term representing the VC dimension of the learning machine. This minimization of the capacity of the machine ensures that the system can not overt the training data, for a given set of parameters. A. Linear Support Vector Machines In this section, the training of a support vector machine is described for the case of a binary classication problem for which a linear decision surface exists that can perfectly classify the training data. In later sections, the requirement of linear separability will be relaxed. The assumption of linear separability means that there exists some hyperplane which perfectly separates the data. This hyperplane is a decision surface of the form w x + b = 0, (1)
I. I NTRODUCTION Recently, much research has been performed into alternative methods of communication between humans and computers. The standard keyboard/mouse model of computer use is not only unsuitable for many people with disabilities, but also somewhat clumsy for many tasks regardless of the capabilities of the user. Electroencephalogram (EEG) signals provide one possible means of human-computer interaction which requires very little in terms of physical abilities. By training the computer to recognize and classify EEG signals, users could manipulate the machine by merely thinking about what they want it to do within a limited set of choices. Currently, most research into EEG classication uses such machine learning stalwarts as Neural Networks (NNs). In this article, we examine the application of support vector machines (SVM) to the problem of EEG classication and compare the results to those obtained using neural networks and linear disciminant analysis. Section II provides an overview of SVM theory and practice, and the problem of multi-class classication is considered in Section III. Section ?? discusses the
D. Garrett is a Ph.D. candidate in the Department of Computer Science, Colorado State University, Fort Collins, CO (e-mail: garrett@cs.colostate.edu). D. Peterson is a Ph.D. candidate in the Department of Computer Science, Colorado State University, Fort Collins, CO (e-mail: petersod@cs.colostate.edu). C. Anderson is with the Department of Computer Science, Colorado State University, Fort Collins, CO (e-mail: anderson@cs.colostate.edu). M. Thaut is with the Department of Music, Theatre, and Dance and the Center for Biomedical Research, Colorado State University, Fort Collins, CO (e-mail: michael.thaut@colostate.edu). D. Peterson, C. Anderson and M. Thaut are also with the Molecular, Cellular, and Integrative Neuroscience Program at Colorado State University.
where w is an adjustable weight vector, x is an input vector, and b is a bias term. The assumption of separability means that
there exists some set of values w and b, such that the following constraints hold for all input vectors, given that the classes are labeled +1 and 1: w xi + b +1 w xi + b 1 or yi (w xi + b) 1 0 i. (4) yi = +1 yi = 1 (2) (3)
why this is true, recall that we wish to maximize the Lagrangian L with respect to A. Thus, assuming w and b are constant, the second term of L must be minimized. If (yi (wxi +b)1) > 0, then i must be zero in order to maximize L. Therefore, only the training points lying closest to the optimal hyperplane, the support vectors, have any effect on its calculation. Substituting the optimality conditions, (6) and (7), into (5) yields the Wolfe dual [3] of the optimization problem: nd multipliers i such that
n
As previously stated, the support vector machine training algorithm nds the optimal hyperplane for separation of the training data. Specically, it nds the hyperplane which maximizes the margin of separation of the classier. Consider the set of training examples which satisfy (2) exactly. These examples are those which lie closest to the hyperplane on the positive side. Similarly, the training examples satisfying (3) exactly lie closest to the hyperplane on the negative side. These particular training examples are called support vectors. Note that requiring the existence of points exactly satisfying the constraints is equivalent to simply rescaling w and b by an appropriate amount. The distance between these points and the hyperplane is given by 1/ w . We dene the margin of the hyperplane to be the distance between the positive examples nearest the hyperplane and the negative examples nearest the hyperplane, which is equal to 2/ w . Therefore, we can maximize the margin of the classier by minimizing w , subject to the constraints of (4). Thus the problem of training the SVM can be stated as follows: nd w and b such that the resulting hyperplane correctly classies the training data and the Euclidean norm of the weight vector is minimized. To solve the problem described above, it is typically reformulated as a Lagrangian optimization problem. In this reformulation, nonnegative Lagrange multipliers A = {1 , 2 , ...n } are introduced, yielding the Lagrangian L= 1 w i (yi (w xi + b) 1) 2 i=1
n
LD =
i=1
1 2
i j yi yj (xi xj )
i=1 j=1
(8)
is maximized subject to the constraints: i 0 and

n
(9)
i yi = 0,
i=1
(10)
yielding a decision function of the form,

n
f (x) = sign
i=1
i yi (x xi ) + b .
(11)
Note that while w is directly determined by the set of support vectors, the bias term b is not. Once the weight vector is known, the bias may be computed by substitution of any support vector into (4) and solving as an equality constraint, although numerically, it is better to take an average over all support vectors. B. Relaxing the Separability Restriction The previous derivation assumed that the training data was linearly separable. The constraints of (4) are too rigid for use with non-linearly separable data; they force all training examples to lie outside the margin of the classier. The key idea in extending the Support Vector Machine to handle nonseparable data is to allow these constraints to be violated, but only if accompanied by a penalty in the objective function. We thus introduce another set of nonnegative slack variables, = {1 , 2 , ..., n } into the constraints [7]. The new constraints are w xi + b +1 i w xi + b 1 + i and i 0 yi = +1, yi = 1, i. (12) (13) (14)
(5)
We must minimize this Lagrangian with respect to w and b, and simultaneously maximize with respect to the Lagrangian multipliers i . Differentiating with respect to w and b and applying the results to the Lagrangian yields two conditions of optimality,
n
w=
i=1
i yi xi
(6)
An error thus occurs only when i > 1. Therefore, the sum

n
and
i yi = 0
i=1
(7)
i
i=1
There are two important consequences of these conditions: the optimal weight vector wo is described in terms of the training data, and only those training examples whose corresponding Lagrange multipliers are non-zero contribute to wo . From the Karush-Kuhn-Tucker (KKT) conditions [12], [15], [10], [3], it follows that the training patterns corresponding to the nonzero multipliers are those that satisfy (4) exactly. To understand
effectively serves as an upper bound on the number of errors committed by the SVM. We modify the original goal of the optimization problem, minimize w , by adding a term to penalize errors. The new optimization problem thus becomes: minimize
n
w +C
i=1
i ,
where C is a user-dened parameter which controls the degree to which training errors can be tolerated. Proceeding in a manner analogous to that above, the Wolfe dual of the new Lagrangian is
n
LD =
i=1
1 2
i j yi yj (xi xj ),
i=1 j=1
(15)
Mercers theorem [18], [8] provides the theoretical basis for the determination of whether a given kernel function K is equal to a dot product in some space, the requirement for admissibility as an SVM kernel. A discussion of Mercers theorem is outside the scope of this paper. Instead, we simply give two examples of suitable kernel functions which will be used here: Polynomial Kernel K(xi , xj ) = (xT xj + 1)p i
which is identical to (8). As in the separable case, LD must be maximized subject to constraints on the Lagrange multipliers. However, the addition of the i produces a subtle difference in these constraints. Specically, the constraint given in (9) becomes the following: 0 i C The second constraint,
n
(18)
Radial Basis Function Kernel K(xi , xj ) = exp 1 xi xj 2 2

2
(19)
i.
(16) III. M ULTI - CLASS C LASSIFICATION The best way to generalize SVMs to the multi-class case is an ongoing research problem. One such method proposed by Platt et.al. [23], is based on the notion of Decision Directed Acyclic Graphs (DDAGs). A given DDAG is evaluated much like a binary decision tree, where each internal node implements a decision between two of the k classes of the classication problem. At each node, one class is eliminated from consideration. When the traversal of the graph reaches a terminal node, only one class is left and the decision is made. The principle difference between the DDAG and the conventional decision tree is that DDAGs are not constrained in the same manner as trees. However, a DDAG does not take on arbitrary graph structures. It is a specic form of graph which differs from a tree only in how it handles duplication of decisions. In a decision tree, if the same decision is required in multiple locations in the tree, then each decision is represented through distinct but identical nodes. A DDAG allows two nodes to share a child. Because an algorithm using the DDAG has no need to backtrack through the graph, the algorithm can treat the graph as though it is a standard decision tree. In the so-called DAGSVM algorithm, each decision node uses a 1-v-1 SVM to determine which class to eliminate from consideration. A separate classier must be constructed to separate all pairs of classes. For the EEG classication task presented here, there are ve classes, and therefore a total of ten SVMs. Because each classier deals only with approximately 40% of the available training data, assuming that each class is represented nearly equally, each may be trained relatively quickly. In addition, only four of the classiers are used to classify any given unknown input. Figure 1 shows a possible DDAG for the EEG classication task. IV. EEG S IGNAL ACQUISITION The data used in this study were from the work of Keirn and Aunon [13], [14] and collected using the following procedure. Subjects were placed in a dim, sound controlled room and electrodes were placed at positions C3, C4, P3, P4, O1, and O2 as dened by the 10-20 system of electrode placement [11] and referenced to two electrically linked mastoids at A1 and A2. The impedance of all electrodes was kept below ve Kohms. Data were recorded at a sampling rate of 250 Hz with a Lab Master 12 bit A/D converter mounted in an IBM-AT computer.
i yi = 0,
i=1
(17)
remains the same as in the separable problem. Thus, bounding the values of the Lagrange multipliers from above allows the Support Vector Machine to construct decision boundaries for training data which cannot be linearly separated. C. Relaxing the Linearity Restriction Thus far, it has been assumed that the SVM was to construct a linear boundary between two classes represented by a set of training data. Of course, most interesting problems cannot be adequately classied by a linear machine. In order to generalize the SVM to non-linear decision functions, we introduce the notion of a kernel function [1], [5]. The training data only appears in the optimization problem (15) in the form of dot products between the input vector and the support vectors. If the input vectors are mapped into some high dimensional space via some nonlinear mapping (x), then the optimization problem would consist of dot products in this higher dimensional space, (xi ) (xj ). Given a kernel function K(xi , xj ) = (xi ) (xj ), the optimization problem would be unchanged except for the dot product xi xj would be replaced with the kernel function K(xi , xj ). The actual mapping (x) would not appear in the optimization problem and would never need to be calculated, or even known. Covers theorem on the separability of patterns [9] essentially says that data cast nonlinearly into a high dimensional feature space is more likely to be linearly separable there than in a lower dimensional space. Even though the SVM still produces a linear decision function, the function is now linear in the feature space, rather than the input space. Because of the high dimensionality of the feature space, we can expect the linear decision function to perform well, in accordance with Covers theorem. Viewed another way, because of the nonlinearity of the mapping to feature space, the SVM is capable of producing arbitrary decision functions in input space, depending on the kernel function. Thus the fact that the SVM constructs only hyperplane boundaries is of little consequence. The above discussion makes use of the kernel function K(xi , xj ), but does not specify how to choose a suitable kernel.
Fig. 1. A Decision Directed Acyclic Graph (DDAG) for the EEG classication problem. Each node represents a 1-v-1 SVM trained to differentiate between the two classes compared by the node.
Before each recording session, the system was calibrated with a known voltage. The electrodes were connected through a bank of Grass 7P511 ampliers with analog bandpass lters from 0.1100 Hz. Eye blinks were detected by means of a separate channel of data recorded from two electrodes placed above and below the subjects left eye. An eye blink was dened as a change in magnitude greater than 100 Volts within a 10 milliseconds period. With the recording instruments in place, the subjects were asked to perform ve separate mental tasks. These tasks were chosen to invoke hemispheric brainwave asymmetry. The subjects were asked to rst relax as much as possible. This task represents the baseline against which other tasks are to be compared. The subjects were also asked to mentally compose a letter to a friend, compute a non-trivial multiplication problem, visualize a sequence of numbers being written on a blackboard, and rotate a 3-dimensional solid. For each of these tasks, the subjects were asked not to vocalize or gesture in any way. Data was recorded for 10 seconds for each task, and each task was repeated ve times during each session. The data from each channel was divided into half-second segments overlapping by one quarter-second. After segments containing eye blinks were discarded, the remaining data contained at most 39 segments. V. R ESULTS In testing the classication algorithms, ve trials from one subject were selected from one day of experiments. Each trial consisted of the subject performing all ve mental tasks. The rst classier tested is linear discriminant analysis (LDA). The second type of classiers are feedforward neural networks, consisting of 36 input units, 20 hidden units, and ve binary output units. The activation function at each unit is the tanh function. The networks were trained using backpropagation with a learning rate of 0.1 and no momentum term. Training was halted after 2,000 iterations or when the generalization began to fail, as determined by a small set of validation data chosen without replacement from the training data. The third type of classiers is support vector machines (SVM) that were trained using radial basis function (RBF) kernels or polynomial kernels. The RBF based classiers were trained using 0.5, 1.0, and 2.0 as standard deviations of the kernel functions. Polynomial kernels
of degrees two, three, ve, and ten were trained to test the polynomial machines. For all kernel functions, the regularization parameter C was tested at values 1.0, 10.0, and 100.0. The Support Vector Machines were trained and tested using the DAGSVM algorithm described earlier. Each of the 1-v-1 SVMs was trained using Platts Sequential Minimal Optimization algorithm [21], [22]. SMO reduces the quadratic programming stage of training to a series of pairwise optimizations among the Lagrange multipliers. By solving the optimization problem two variables at a time, the optimization can be performed analytically. Platt shows signicant speedups resulting from the SMO algorithm as compared to using a traditional quadratic programming routine. The training data was selected from the full set of ve trials as follows. One trial was selected as test data. Of the four remaining trials, one was chosen to be a validation set, which was used to determine when to halt training of the neural networks and which values of the kernel parameters and regularization parameter to use for the SVM tests. Finally, the remaining three trials were compiled into one set of training data. The experiments were repeated for each of the 20 ways to partition the ve trials in this manner and the results of the 20 experiments were averaged to produce the results shown in Table I. This choice of training paradigm is based on earlier results [2]. The SVM results reported in Table I are those corresponding to the choice of kernel function and regularization parameter, C, which produced the best results. Specically, the SVM used for the comparisons was constructed with a Radial Basis Function (RBF) kernel using a standard deviation = 0.5 and a regularization parameter C equal to 1. LDA provides extremely fast evaluations of unknown inputs performed by distance calculations between a new sample and the mean of training data samples in each class weighted by their covariance matrices. Neural networks are also efcient after the training phase is complete. SVMs are similar to neural networks, but generally require more computation due to the comparatively large numbers of support vectors. The time required to compute class membership for an SVM is directly dependent on the number of support vectors. The number of support vectors resulting from experiments reported here ranged from 140 to 308.
Classier LDA NN SVM

P ERCENTAGE OF TEST DATA CORRECTLY
Rest 47.3 64.3 59.4
Math 45.1 47.3 44.5
Letter 51.1 54.7 52.7
Rotate 38.8 51.1 57.0

TABLE I
Count 44.5 47.3 47.9
Total 44.8 52.8 52.3
Average Over 20 Windows 66.0 69.4 72.0

THE
CLASSIFIED BROKEN DOWN BY TASK .
SET OF PARAMETERS WHICH RESULTED IN THE HIGHEST CORRECT RATE OF CLASSIFICATION AMONG ALL
T HE S UPPORT V ECTOR M ACHINE IN THESE EXPERIMENTS USED SVM S TESTED .
VI. F UTURE W ORK All data used in this study was collected from a single subject during the same day. A logical step to take from here is to test the performance of the classiers on data collected on later days and to repeat these experiments on data collected from other subjects. In addition, there have been several other attempts at generalizing kernel based learners to multi-class classication. Weston and Watkins [24] have extended the theory of SVMs directly into the multi-class domain. Import Vector Machines [28] seem to offer similar performance while using signicantly fewer support vectors. Each of these methods provide a slightly different approach to the classication problem, and could offer performance improvements. VII. F EATURE S ELECTION WITH G ENETIC A LGORITHMS A. Introduction Both invasive and non-invasive BCI systems produce a very large amount of electrophysiological data. However, only a relatively small percentage of the potentially informative features of the data are utilized. High-resolution analysis of spatial, temporal, and spectral aspects of the data, and allowing for their interactions, leads to a very high dimensional feature space. Leveraging a higher percentage of potential features in the measured data requires more powerful signal analysis and classication capabilities. We have developed an EEG analysis system that integrates advanced concepts and tools from the elds of machine learning and articial intelligence to address this challenge. One of our initial test applications of the system is the self-paced key typing dataset from Blankertz et al. [4] which includes 413 pre-key press epochs of EEG recorded from one subject. B. Method The overall system is composed of two main parts: feature composition and feature selection (see Figure 2). Feature composition entails data preprocessing, feature derivation, and assembling all of the features into a single large feature matrix. In this specic experiment, we used a xed set of six electrodes: F3, F4, C3, C4, CP3, CP4, partitioned each trial into 500 ms windows shifted by 100 ms over the entire epoch, zero-meaned the signals, zero-padded them to length 1024, and computed their power spectra at 1 Hz frequency resolution. We used mean power over the standard EEG frequency bands of delta (2-4 Hz), theta (4-8), alpha (8-13), beta1 (13-20), beta2 (2035), and gamma (35-46).
The feature selection part included a support vector machine (SVM) for predicting (classifying) the pressed key laterality and a genetic algorithm (GA) for searching the space of feature subsets [25], [27]. We used the radial basis function kernel, with gamma = 0.2. The SVM has several advantage over alternative classiers. Unlike most neural network classiers, the SVM is not susceptible to local optima. SVMs involve many fewer parameters than neural networks, have built-in regularization, are theoretically well-grounded, and, particularly important for ultimate real-time use in a BCI, are extremely fast. The GA was implemented with a population of 20, 2-point crossover probability of 0.66, and mutation rate of 0.008. Individuals in the population were binary strings, with 1 indicating that a feature was included, 0 indicating that it was not. We used a GA to search the space of feature subsets for two main reasons. First, exhaustive exploration of search spaces with greater than about 20 features is computationally intractable (ie. 220 possible subsets). Second, unlike gradient-based search methods, the GA is inherently designed to avoid the pitfall of local optima. We searched over the eleven time windows and six frequency bands, while constantly including all six electrodes in each case. Thus, the dimensionality of the searchable feature space was 66 (11 x 6). Each time an individual (feature subset) in the GA population was evaluated, we trained and tested the SVM using 10x10 fold cross validation and used the average classication accuracy as the individuals tness measure. C. Results The GA evolves a population of feature subsets whose corresponding tness (classication accuracy) improves over iterations of the GA (Figure 3). Note that although both the population average and best individual tness improve over successive generations, the best individual tness does so in a monotonic fashion. The best tness obtained was a classication accuracy of 76%. It was stable for over 50 generations of the GA. The standard deviation of the classication accuracy produced by the SVM was typically about 6%. Figure 4 shows the feature subset exhibiting the highest classication accuracy. The feature subset included features from every time window and every frequency band. This suggests that alternative methods that include only a few time windows or frequencies may be missing features that could improve classication accuracy. Furthermore, all frequency bands were included in the third time window, suggesting that early wideband activity may be a signicant feature of the process for deciding nger laterality.
Fig. 2. System Architecture for mining the EEG feature space. The space of feature subsets is searched in a wrapper fashion, whereby the search is directed by the performance of the classier, in this case a support vector machine.
Fig. 3. Classication accuracy (population tness) evolves over iterations (generations) of the genetic algorithm. Thin line is the average tness of the population. Thick line is the tness of the best individual in the population.
D. Discussion Although the best classication accuracy (76%) was considerably higher than chance, it was much lower than the approximately 95% classication accuracy obtained by Blankertz et al [4]. One possible reason is that we used data from only a small subset of the electrodes recorded (6 of 27) in order to reduce computation time by restraining the dimensionality of the feature vector presented to the SVM. Optimizing classication accuracy was not, however, our primary goal. Instead, we sought insight into the nature of the features that would provide the best classicication accuracy. The feature selection method showed that a diverse subset of spectrotemporal features in the EEG contributed to the best classication accuracy. However, most BCIs that use EEG frequency information in imagined or real movement look at only alpha (mu) and beta bands over only one or a few time windows [20], [19], [26]. Furthermore, the system is amenable to on-line applications. One could use the full system, including the GA, to learn the best dissociating features for a given subject and task, then use the trained SVM with the best dissociating features in real-time. Thus preliminary results from the research suggests that BCI performance could be improved by leveraging advances in machine learning and articial intelligence for systematic exploration of the EEG feature space. VIII. C ONCLUSIONS Support Vector Machines provide a powerful method for data classication. The SVM algorithm has a very solid foundation in statistical learning theory, and guarantees to nd the optimal decision function for a set of training data, given a set of parameters determining the operation of the SVM. The empirical evidence presented here shows that the algorithm performs very well on one real problem. Finally, we are currently working with alternative representations of the EEG data. Preliminary results indicate that applying a KL-Transform to the raw data produces a data set which is much more susceptible to accurate classication by many types of classiers.
Fig. 4. Features selected for the best individual. Black indicates the feature was included in the subset, white indicates it was not. Time windows correspond to number of 100 ms shifts from epoch onset, ie. time window 1 is early in the epoch, time window 11 ends 120 ms before the key press.
ACKNOWLEDGMENT Deons: were constructed using the SVM MATLAB toolbox developed by [6]. daves: The SVM was implemented with the OSU SVM Classier Matlab Toolbox [17]. The GA was implemented with the commercial FlexTool GA software [16]. R EFERENCES
[1] A. Aizerman, E.M. Braverman, and L.I. Rozoner. Theoretical foundations of the potential function method in pattern recognition learning. In Automation and Remote Control, 1964. [2] C. W. Anderson, S. V. Devulapalli, and E. A. Stolz. Determining mental state from EEG signals using neural networks. Scientic Programming, 4(3):171183, 1995. [3] D.P. Bertsekas. Nonlinear Programming. Athena Scientic, 1995. [4] B. Blankertz, G. Curio, and K. R. Muller. Classifying single trial EEG: Towards brain computer interfacing. 14, 2002. to appear. [5] B.E. Boser, I.M. Guyon, and V. Vapnik. A training algorithm for optimal margin classiers. In Fifth Annual Workshop on Computational Learning Theory, 1992. [6] G.C. Cawley. MATLAB support vector machine toolbox http://theoval.sys.uea.ac.uk/gcc/svm/toolbox. University of East Anglia, School of Information Systems, Norwich, Norfolk, U.K. NR4 7TJ, 2000.
[7] C. Cortes and V. Vapnik. Support vector networks. Machine Learning, 20, 1995. [8] R. Courant and D. Hilbert. Methods of Mathematical Physics, volume I and II. Wiley Interscience, 1970. [9] T.M. Cover. Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. In IEEE Transactions on Electronic Computers, 1965. [10] R. Fletcher. Practical Methods of Optimization. John Wiley and Sons, Inc., 2nd edition, 1987. [11] H. Jasper. The ten twenty electrode system of the international federation. Electroencephalography and Clinical Neurophysiology, 10:371 375, 1958. [12] W. Karush. Minima of functions of several variables with inequalities as side constraints. Masters thesis, University of Chicago, 1939. [13] Z. A. Keirn. Alternative modes of communication between man and machine. Masters thesis, Purdue University, Lafayette, IN, West Lafayette, IN, 1988. [14] Z. A. Keirn and J. I. Aunon. A new mode of communication between man and his surroundings. IEEE Transactions on Biomedical Engineering, 37(12):12091214, 1990. [15] H.W. Kuhn and A.W. Tucker. Nonlinear programming. In Proceedings of the 2nd Berkeley Symposium on Mathematical Statistics and Probabilistics, 1951. [16] CynapSys LLC. Flexga. www.cynapsys.com, 2002. [17] J. Ma, Y. Zhao, and S. Ahalt. Osu svm classier matlab toolbox. http://eewww.eng.ohio-state.edu/maj/osu svm/, 2002. [18] J. Mercer. Functions of positive and negative type, and their connection with the theory of integral equations. In Transactions of the London Philosophical Society, 1909. [19] G. Pfurtscheller, C. Neuper, C. Guger, W. Harkam, H. Ramoser, A. Schl gl, B. Obermaier, and M. Pregenzer. Current trends in Graz braino computer interface (BCI) research. EEE Transactions on Rehabilitation Engineering, 8(2):456460, 2000. [20] J. A. Pineda, B. Z. Allison, and A. Vankov. The effects of self-movement, observation, and imagination on mu rhythms and readiness potentials: Toward a brain-computer interface. IEEE Transactions on Rehabilitation Engineering, 8(2):219222, June 2000. [21] John C. Platt. Fast training of support vector machines using sequential minimal optimization. In B. Scholkopf, C. Burges, and A. Smola, editors, Advances in Kernel Methods Support Vector Learning, pages 185208. MIT Press, 1998. [22] John C. Platt. Using analytic qp and sparseness to speed training of support vector machines. In M. S. Kearns, S. A. Solla, and D. A. Cohn, editors, Advances in Neural Information Processing Systems 11. MIT Press, 1999. [23] John C. Platt, Nello Cristianini, and John Shawe-Taylor. Large margin DAGs for multiclass classication. In S. A. Solla, T. K. Leen, and K.-R. M ller, editors, Advances in Neural Information Processing Systems 12, u pages 547553, 2000. [24] J. Weston and C. Watkins. Multi-class support vector machines. Technical report, Royal Holloway University of London, 1998. [25] D. Whitley, R. Beveridge, C. Guerra, and C. Graves. Messy genetic algorithms for subset feature selection. In T. Baeck, editor, Proc. Int. Conf. on Genetic Algorithms, Boston, MA, 1997. Morgan Kaufmann. [26] J. R. Wolpaw, D. J. McFarland, and T. M. Vaughan. Brain-computer interface research at the wadsworth center. IEEE Transactions on Rehabilitation Engineering, 3(2):222226, June 2000. [27] Jihoon Yang and Vasant Honavar. Feature subset selection using a genetic algorithm. In Huan Liu and Hiroshi Motoda, editors, Feature extraction, construction and selection : a data mining perspective, pages 117136. Kluwer Academic, Boston, MA, 1998. [28] Ji Zhu and Trevor Hastie. Kernel logistic regression and the import vector machine. In NIPS2001, 2001.

Deon Garrett Et Al - Comparison of Linear and Nonlinear Methods For EEG Signal Classification

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Deon Garrett Et Al - Comparison of Linear and Nonlinear Methods For EEG Signal Classification

Hochgeladen von

Copyright:

Verfügbare Formate

1

Comparison of Linear and Nonlinear Methods for EEG Signal Classication

is maximized subject to the constraints: i 0 and

yielding a decision function of the form,

An error thus occurs only when i > 1. Therefore, the sum

Radial Basis Function Kernel K(xi , xj ) = exp 1 xi xj 2 2

Classier LDA NN SVM

Rest 47.3 64.3 59.4

Math 45.1 47.3 44.5

Letter 51.1 54.7 52.7

Rotate 38.8 51.1 57.0

Count 44.5 47.3 47.9

Total 44.8 52.8 52.3

Average Over 20 Windows 66.0 69.4 72.0

CLASSIFIED BROKEN DOWN BY TASK .

T HE S UPPORT V ECTOR M ACHINE IN THESE EXPERIMENTS USED SVM S TESTED .

Das könnte Ihnen auch gefallen