0 Bewertungen0% fanden dieses Dokument nützlich (0 Abstimmungen)
57 Ansichten8 Seiten
This paper considers the different parameters of multi layer perceptron architectures and suggests a suitable architecture to complete a specific data set classification task.
This paper considers the different parameters of multi layer perceptron architectures and suggests a suitable architecture to complete a specific data set classification task.
This paper considers the different parameters of multi layer perceptron architectures and suggests a suitable architecture to complete a specific data set classification task.
Multi-layer perceptrons: Selection of a multi-layer perceptron for a
specific data classification task Warren Gauci Abstract This paper considers the different parameters of multi layer perceptron architectures and suggests a suitable architecture to complete a specific data set classification task. Results obtained from this paper are based on Math Works Neural Network Toolbox software. Rigorous testing with variable hidden layer size, learning rate and tests sets lead to a neural architecture with the best performance measures. All testing was performed on an Iris Data set. Performance measures are evaluated by the use of the confusion matrix, the mean square error plot and the receiver operating characteristic chart. This paper will contribute to further advancements in the field of neuron training and in the field of distinguishing and classifying linearly and non-linearly separable data. 1. INTRODUCTION
An artificial neural network (ANN) is
an information-processing system that has certain performance characteristics in common with biological neural networks. Neural networks possess a remarkable ability to derive meaning from complicated or imprecise data, which can be used to extract patterns and detect trends that are too complex to be noticed by either humans or other computer techniques. A simple neuron is a device with many inputs and one output. The neuron has two modes of operation; the training mode and the using mode. A more sophisticated neuron is presented in the McCulloch and Pitts model (MCP). The difference from the simple neuron model is that the inputs are 'weighted'. Each input has a decision making that is dependent on the weight of the particular input.
All weighted inputs are then added together and
if they exceed a pre-set threshold value, the neuron fires. An adjustment to the MCP model lead to the formulation of the perceptron, a term coined by Frank Rosenblatt. A perceptron is an MCP model with additional, fixed, preprocessing. This paper deals with perceptron architectures structure, as this kind of neuron is best for pattern recognition (see [1]). Perceptrons may be grouped in single layer or multilayer architectures. Single layer architectures are restricted in classifying only linearly separable data, thus in this paper only multi layer perceptron (MLP) networks are used, as the connection form one layer to the next allows for non-linearly separable data recognition and classification. For a comprehensive overview of other kinds of networks refer to [2]. The objective of this paper is to find MLP parameters that lead to the best MLP architecture used for the classification of a given data set. In general, ANNs are scalable, i.e. they have parameters that can be changed. In most cases the structure and size of an ANN are chosen a priori or after the ANN learns a given task. Two questions that are relevant in this case are: What size and structure is necessary to solve a given task? What size and structure is sufficient to solve a given task? This paper gives an answer to the first question quoted above. The second question is also relevant but is not in the scope of this paper. In reality a MLP model can approximate functions to any given accuracy, for a good overview refer to [3]. 2. Background Theory
The ANN must be trained using a
learning process. This process involves the memorisation of patterns and the subsequent response of the neural network. This can be categorised into two paradigms; associative mapping and regularity detection. Learning is performed by the updating of the value of weights associated with each input. The methodology used in this paper makes use of an adaptive network, in which neurons found in the input layer are capable of changing their weights. The adaptive network is introduced to a supervised learning procedure, where each neuron actually knows the target output and adjusts weight to the input signals to minimise the error. The error is stipulated using a least mean square convergence technique. The behaviour of an ANN depends also on the input-output transfer function, which is specified for the units. This paper makes use of sigmoid units, where the output varies continuously but not linearly as the input changes. Sigmoid units bear a greater resemblance to real neurons than linear units do. In order to train the ANN to perform a classification task, some kind of weight adjusting technique must be set. This paper considers a back-propagation algorithm technique where the error derivative of the weights (EW) is computed. In this way the network is able to calculate how the error will change as each weight is increased or decreased slightly. 3. MATERIALS 3.1 NEURAL NETWORK TOOLBOX
The Neural Network Toolbox provides
functions that allow the modelling of complex nonlinear systems. This tool box was used in the development of this paper to design, train, simulate and assess different ANN architectures. The pattern recognition tool was applied on an Iris dataset (see 3.2). This
toolbox allowed for the division of the data set
into training, validation and test sets. The function that changes the number of neurons in the hidden layer was used to change the MLP architecture. The performance of the different ANNs was assessed using the performance plots provided by this software. 3.2 DATASET
The dataset used for classification is an
Iris Data set. Created by R.A Fisher, this data set is a classic in the field of pattern recognition. It contains 3 classes of 50 instances each. Each class refers to a type of Iris plant; Setosa, Versicolour, Virginica. One class is linearly separable from the other two, while the latter are not linearly separable from each other. Each class has four attributes; sepal length, sepal width, petal length, petal width. 4. METHOD
The best structure of MLP to perform
the given classification task was determined using an empirical procedure. The method used allowed for the variation of all the variable parameters, namely; Input weights, epoch limit, learning rate, and hidden layer size. The specific steps performed are presented in a flow chart in Appendix A. The method used may be divided into three sections; Running train.gd algorithm, collection of data and choice of best data. This method enumerated a total of 80 samples in the selection process of the best ANN architecture. 4.1 COLLECTION OF DATA
Iris dataset was divided into training
validation and test data [70% : 15% : 15%]; Hidden layer size was chosen;
Collection of Data
Sample No 1 2 3 4 5 6 7 8 9 10 Min mse Average mse Standard.dev iation
Table 1: Mean square error of different hidden layer sizes
Train.gd algorithm was run for 10 times,
saving and assessing performance plots of each sample; The sample with the minimum mse error was chosen considering validation and test data plots; The procedure was repeated for 5 different hidden layer sizes [ 5, 10, 35, 60, 100 ]. This section of the method had the following outcomes; production of 10 samples for 5 different hidden layers (50 samples in total), determination of the best data for each hidden layer size. 4.2 RUNNING TRAIN.GD ALGORITHIM
Assign and initialise weights to input
data ; Define the learning rate and epoch limit;
Train the network using the pre-defined
training set; Evaluate the network using the validation data; Update weights and terminate when the validation error is a minimum. All those steps were performed by the train.gd function in the software. The best learning rate and epoch limit values were determined in a pre-test and remained fixed during this method (learning rate of 5 and epoch limit of 2000). The outcome of this section of the method was the determination of the minimum mean square error of the validation data set for each sample taken in the previously defined method. 4.3 CHOICE OF BEST DATA
Upload the best sample for each hidden
layer size from the collection of data method; Determine the best overall sample and the corresponding hidden layer size (using average standard deviation functions); Work out another 10 samples using the best determined HL size; Select the best sample overall using validation and test performance plots; Save parameters of best sample and try this ANN architecture on a new set of data. This section of the method allowed for the determination of the overall best ANN architecture using another set of samples. 5. RESULTS
The most relevant results are
tabulated in Table 1. All results were obtained using the following percentage ratios for training, validation and test sets 70% : 15% :15%. Results are also based on a learning rate of 5 and epoch limit of 2000. These values were justified by a pre-testing procedure using the same dataset. The best ANN architecture, taking into account the mean square error of both the validation and test data is that containing 35 neurons in the hidden layer. This choice is based on the average and standard deviation values of the mean square error. Results also show this architecture applied to a different data set. Figure 1 and Figure 2show the mse vs epoch plot and the confusion matrix for the best data sample. Figure 1 shows that for the best data sample, validation converged with a mse of 2.0241e-08 at 18 epochs, with a test error of 9e-09. Figure 1 Mse Vs Epoch plot Figure 2 Confusion matrix
6. DISCUSSION
The samples with 35 HL size were
initially not those with the minimum average and standard deviation values. Further samples were taken and more concrete results were obtained, which was proof of good and consistent data. This architecture also proved a consistent mse value when test on thyroid dataset, that has the same no of classes, but more attributes. The confusion matrix in Figure 2 shows perfect classification in the test and validation data but a miss-classification of 4.3% in the test data. This shows that an error of classification still occurred even if training was executed perfectly. The latter is reinforced in the ROC plots, showing true positives and no false positives for the test and validation data and a few false positives present in the test data. 7. CONCLUSION
It may be concluded that although results are
not always satisfactory, consistency is present only in considerably small sized HL networks. Furthermore, results show that class 2 and 3 are the classes containing non-linearly separable data. It may also be concluded that a specific MLP architecture for a particular classification task can be chosen, but classification in random and not always consistent. 8. REFERENCES
[1] M. Nrgaard, O. Ravn, N. Poulsen, and L.
Hansen, Neural Networks for Modelling and Control of Dynamic Systems,M. Grimble and M. Johnson, Eds. London: Springer-Verlag, 2000. [2] S. Haykin, Neural Networks, J. Grifn, Ed. New York: Macmillan College Publishing Company, 1994. [3] F. Lewis, J. Huang, T. Parisini, D. Prokhorov, and D. Wunsch, IEEE Trans. Neural Networks, vol. 18, no. 4, pp. 969972, July 2007.
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB