Shahadat 2017

International Conference on Electrical, Computer and Communication Engineering (ECCE), February 16-18, 2017, Cox’s Bazar, Bangladesh
Dropout Effect On Probabilistic Neural Network

Nazmul Shahadat1, Bushra Rahman2, Faisal Ahmed2, Ferdows Anwar2
1
Prime University, 2City University
Dhaka, Bangladesh
nazmul.ruet@gmail.com, bushra_cse@yahoo.com, fysal165@gmail.com, ferdows.cse@gmail.com
Abstract—To ignore noisy, skewed, correlated, imbalanced feature selection techniques are used in the context of highly
and unnecessary features from real life problems, the feature skewed data [4]. Yang et al. analysis feature selection
subset selection with learning algorithm was faced some approaches in case of text categorization where they evaluate
problems of selecting these relevant features. Several factors like- with some selection based methods such as Information gain
skewed, high kurtosis valued, dependence or correlation Document frequency, mutual information (MI), a Chi-square
influenced features as well as the classifiers performance. A test(CHI) and a term strength(TS). From these methods, they
Dropout technique (feature subset selection method) was used to find IG and CHI most effective [5]. In these previous attempts,
select relevant and important features and mixed with a classifier they analysis the effect of feature selection on class
named as Probabilistic Neural Network (PNN) to find the
distribution skewness and remove some features. But these
performance. This study conducted Dropout technique with PNN
to prevent above irrelevant features to get better performance.
processes of removing features are not fully effective all the
The main task of this method was to apply several techniques (by time. They have used some test to select features that are not
omitting some fixed features like-skewed, high kurtosis valued, efficient because these features are not only dependent,
dependent or correlated and Dropout) with PNN in several skewed or high kurtosis valued features, but also have some
datasets and noticed that the overall performance for all datasets other features that may affect the overall outputs.
was significantly improved or unchanged for Dropout with On the other hand, the large network structure effect is that
PNN(DPNN) and DPNN always performed better than others.
the classifier is very sensitive to the training data [6].
Keywords—Classification; Feature selection method;
Moreover, the smoothing parameter plays an important role in
Correlation; Dropout; Probabilistic Neural Network; PNN classifier, and an appropriate smoothing parameter often
depends on data. That is why PNN algorithm was chosen. In
this study, we proposed a modified PNN algorithm to solve
I. INTRODUCTION the feature selection process under minimizing the constraint
In this decade, the classification of gaining data from of classification error rate. The proposed algorithm worked in
information is a challenge in Data mining and Machine an iterative way and it consisted with two parts. In the first
learning. Recently various applications of the neural network part of the algorithm, it performed smoothing parameter
in the field of classification have been widely studied. For selection. In the second part, it performed pattern layer neuron
classification, the Probabilistic Neural Network algorithm selection. In summation layer neuron, the output becomes a
(PNN) has emerged as the most popular algorithm for the linear combination of the outputs. Subsequently, to select
supervised learning. There are Different types of neural important neurons an orthogonal algorithm is employed [7][8].
network architecture. Among them, PNN has become an Because of the incorporation of an orthogonal transform in
effective tool and easy to classify to solve many classification neuron selection, the proposed algorithm was more efficient
problems. But here, the matter of consideration is that many than others.
training samples in pattern/input layer of a PNN could be
This work addressed two related questions. First, have to
unnecessary. In most of the problem in real-world, for
reduce irrelevant and unnecessary features from datasets. How
classification process supervised learning is required where
effectively can it be done? Secondly, the effect of this
class-conditional probabilities are unknown. In Datasets, there
reduction process that is, the overall efficiency of the classifier
are dependent/correlated features, independent features,
is improved? To successfully complete these two problems by
highly/less skewed and high/less kurtosis valued features.
further considering the modified PNN algorithm was
Therefore in datasets, many candidate features are introduced.
formulated and tested on a variety of datasets.
These features affect the classifier to classify data either
partially or completely. Sometimes these affect may be seen
and sometimes may not. These dependent, skewed, high II. BACKGROUND
kurtosis features are irrelevant and make an obstacle to get There are several key elements are experimented in this
high classification accuracy. By removing these irrelevant experiments.
features drastically reduce the running time of a learning
algorithm and yields the accuracy of the system improved [1]. A. Class Imbalance
There have many feature selection methods that are used such Class Imbalance is the extremely common problem in real
as; Donald A. Singer et al. investigate the effect of removing world applications such as oil spillage detection, fraud
the skewness by using logarithmic transform [2]. Feature detection, anomaly detection, medical diagnosis, and facial
selection methods try to pick a subset of features that are recognition etc. where one class have a number of
relevant to the output [3]. Lei et al. research the effect of instances(known as a major class) than the other class(known
217
as a minor class). In imbalance, class datasets consist of an ܼ ሺ௟ାଵሻ ൌ ܹ௜ ሺ௟ାଵሻ ‫ ݕ‬௟ ൅ ܾ௜ ሺ௟ାଵሻ ሺͳሻ
unequal class distribution. In this unequal class, the ‫ݕ‬௜ ሺ௟ାଵሻ ൌ ݂൫‫ݖ‬௜ ሺ௟ାଵሻ ൯ሺʹሻ
distribution may have the following distribution ratio 100:1,
Where f is any activation function, for example,
1,000:1, and 10,000:1 and so on which is known as Imbalance
Ration (IR). The Imbalance Ratio (IR) is defined by the ݂ሺ‫ݔ‬ሻ ൌ ͳΤሺͳ ൅ ሺെ‫ݔ‬ሻሻ ሺ͵ሻ
equation as- With dropout the feed forward operation becomes-
ܰ௠௔௝௢௥
‫ ܴܫ‬ൌ ሺͳሻ ‫ݎ‬௜ ሺ௟ሻ ̱‫݈݈݅ݑ݋݊ݎ݁ܤ‬ሺ‫݌‬ሻሺͶሻ
ܰ௠௜௡௢௥
Where, ‫ݕ‬෤ ሺ௟ሻ ൌ ‫ ݎ‬ሺ௟ሻ ‫ ݕ כ‬ሺ௟ሻ ሺͷሻ
ܰ௠௔௝௢௥ =Total number of major class instances ܼ௜ ሺ௟ାଵሻ
ൌ ܹ௜ ሺ௟ାଵሻ ‫ݕ‬෤ ሺ௟ሻ ൅ ܾ௜ ሺ௟ାଵሻ ሺ͸ሻ
ܰ௠௜௡௢௥ = Total number of minor class instances ‫ݕ‬௜ ሺ௟ାଵሻ ൌ ݂൫‫ݖ‬௜ ሺ௟ାଵሻ ൯ሺ͹ሻ
B. Skewness In these equations,‫ ݎ‬ሺ௟ሻ is a vector of independent Bernoulli
random variables for any layer ݈, each of which has probability
In probability, skewness means the probability distribution p of being ݈. This vector is multiplied element-wise with the
of a real-valued random variable is moving aside than the
mean. The skewness value can either be positive or negative outputs of that layer ‫ ݕ‬ሺ௟ሻ , to create the thinned outputs ‫ݕ‬෤ ሺ௟ሻ .
or even undefined. In multimodal distributions, skewness is Then the thinned outputs are used as input to the next layer
difficult to interpret. and continued this process to all layers [9][10][11].
C. Kurtosis III. PROBABILISTIC NEURAL NETWORK

Kurtosis is also the probability distribution of real world
applications where the data are heavy-tailed or light-tailed The probabilistic neural network is an artificial feed
relative to a normal distribution. When a graph with a set of forward neural network which graphically looks like
data is depicted, it usually compares with a standard normal multilayered perceptron. It is a supervised learning networks
distribution, like a bell curve where the bell curve has a central evolved from Bayesian classifier [12] where learning process
peak and thin tails. However, when kurtosis is present, the are not required and network’s initial weight are not needed to
distribution processes are different than they would be under a set [7]. It is faster in order of magnitude than back-propagation
normal bell-curved distribution. That is, data sets with high [13] where the size of the training set increases and guarantees
kurtosis tend something different than the normal one. Data to represents an optimal solution [6]. It is chosen for its simple
sets with low kurtosis behave something like the high one. structure and training manner [13]. The architecture of PNN is
nearly related to parzen window/Probability Density Function
D. Correation & Dependence (pdf) estimator [14][15] where a PNN consists of several sub-
Statistical correlation is a number that shows whether and networks.
how strongly two random variables are related. Correlation
Input Hidden Output Decision
means to any of statistical relationships concern dependence. Layer Layer Layer Layer
It is used most often to measure in which two variables have a
linear relationship with each other. A resourceful correlation
test can advance to a greater understanding of data. Precisely,
dependence refers to any stage in which random variables do X
1
not fulfill a mathematical case of probabilistic independence. Class 1 Group
Mutual information can be applied to measure X
2
dependence/correlation between two variables.
E. Dropout
Dropout is a stochastic directed method of the neural Class 2 Group
network and interpreted by adding noise to its hidden units. X
N
Adding noise method has been previously used by Vincent et
al. (2008, 2010) where noise is added to the input units and the
network is trained as noise-free input [9]. To train with Class 3 Group
dropout system, each hidden unit in a neural network works
with a randomly chosen sample of other units. Then each Fig. 1. Architecture of PNN
hidden unit being stronger and run it toward creating useful A PNN network has four layers where [16], the input layer
features on its own without depending on other hidden units to calculates the distance of the input. Then the input units
fix its mistakes. simply distribute the values in the hidden layer units. The unit
Consider a neural network with L hidden layers. Let l € of this layer calculate its outputs [17]. In case of multivariate
{1,…,L} index the hidden layers. Let z(l) is the inputs vector PDF estimator, g(x), may be expressed as[16][18],
into layer l, y(l) is the outputs vector from layer l (y(0) = x is the ݃ଵ ൫ଵ ǡ ଶ ǡ ǥ ǡ ୮ ൯ ൌ
input). W(l) and b(l) are the weights and biases at layer l. The ௡
feed forward operation of a standard neural network (fig 1) ͳ ‫ݔ‬ଵ െ ‫ݔ‬ଵǡ௜ ‫ݔ‬ଶ െ ‫ݔ‬ଶǡ௜ ‫ݔ‬௣ െ ‫ݔ‬௣ǡ௜
෍ܹ ቆ ǡ ǡǥǡ ቇ ሺͺሻ
can be described as (for l € {0,…,L-1} and any hidden unit i). ߪଵ ߪଶ ǥ ߪ௣ ߪଵ ߪଶ ߪ௣
௜ୀଵ
218
݃ଶ ൫ଵ ǡ ଶ ǡ ǥ ǡ ୮ ൯ ൌ Like original PNN it had four layers the same as original
ͳ
୬
ଵ െ ଵǡ୧ ଶ െ ଶǡ୧ ୮ െ ୮ǡ୧ PNN. The dropout technique was added with input layer as,
෍ ቆ ǡ ǡǥǡ ቇ ሺͻሻ
ɐଵ ɐଶ ǥ ɐ୮
୧ୀଵ
ɐଵ ɐଶ ɐ୮ ‫ݎ‬௜ ሺ௟ሻ ̱‫݈݈݅ݑ݋݊ݎ݁ܤ‬ሺ‫݌‬ሻሺͳͶሻ
ܺ෨ே ൌ ‫ ݎ‬ሺ௟ሻ ‫ܺ כ‬ே ሺͳͷሻ
Whereߪଵ ǡ ߪଶ ǡ ǥ ǡ ߪ௣ are the smoothing parameters
representing standard deviation around the mean of p random Where, ‫ ݎ‬ሺ௟ሻ is a vector of independent Bernoulli random
variables ‫ݔ‬ଵ ǡ ‫ݔ‬ଶ ǡ ǥ ǡ ‫ݔ‬௣ or ‫ݕ‬ଵ ǡ ‫ݕ‬ଶ ǡ ǥ ǡ ‫ݕ‬௣ and n is the total number variable. For any input layer ݈, each of which has probability p
of training examples. If all smoothing parameters are assumed of being ݈. In hidden layer,
equal (i.e., ߪଵ ൌ ߪଶ ൌ ߪଷ ൌ‫׸׷‬ൌ ߪ௣ ൌ ߪ ), and a bell shaped ௡
ͳ ԡ‫ݔ‬෤ െ ‫ݔ‬෤௜ ԡଶ
Gaussian function is used for W, a reduced form of Equation, ݃ଵ ሺ‫ݔ‬෤ሻ ൌ ෍ ቆെ ቇ ሺͳ͸ሻ
ሺʹߨሻ ߪ௣
௣ Τ ଶ ʹߪ ଶ
௡ ௜ୀଵ
ͳ ԡ‫ ݔ‬െ ‫ݔ‬௜ ԡଶ ௡
݃ଵ ሺ‫ݔ‬ሻ ൌ Τ
෍ ቆെ ቇ ሺͳͲሻ ͳ ԡ‫ݕ‬෤ െ ‫ݕ‬෤௜ ԡଶ
ሺʹߨሻ ߪ௣
௣ ଶ ʹߪ ଶ
௜ୀଵ ݃ଶ ሺ‫ݕ‬෤ሻ ൌ ෍ ቆെ ቇ ሺͳ͹ሻ
௡
ԡ‫ ݕ‬െ ‫ݕ‬௜ ԡଶ
ሺʹߨሻ௣Τଶ ߪ௣ ʹߪ ଶ
ͳ ௜ୀଵ
݃ଶ ሺ‫ݕ‬ሻ ൌ Τ
෍ ቆെ ቇ ሺͳͳሻ
ሺʹߨሻ௣ ଶ ߪ௣ ʹߪ ଶ In output Layer, an average operation executed for each
௜ୀଵ
class of second layer which could be express as,
The second layer is hidden layer designed with the
௡
Gaussian functions of data points as a center where the layer ͳ ԡ‫ݔ‬෤ െ ‫ݔ‬෤௜ ԡଶ
has one neuron For each class of the second layer it executes ݂ଵ ሺ‫ݔ‬෤ሻ ൌ ෍ ቆെ ቇ ሺͳͺሻ
ሺʹߨሻ௣Τଶ ݊ߪ௣ ʹߪ ଶ
an average operation of outputs which can be express as- ௜ୀଵ
௡ ௡
ͳ ԡ‫ ݔ‬െ ‫ݔ‬௜ ԡଶ ͳ ԡ‫ݕ‬෤ െ ‫ݕ‬෤௜ ԡଶ
݂ଵ ሺ‫ݔ‬ሻ ൌ ෍ ቆെ ቇ ሺͳʹሻ ݂ଶ ሺ‫ݕ‬෤ሻ ൌ ෍ ቆെ ቇ ሺͳͻሻ
ሺʹߨሻ ݊ߪ௣
௣ Τ ଶ ʹߪ ଶ ሺʹߨሻ௣Τଶ ݊ߪ௣ ʹߪ ଶ
௜ୀଵ ௜ୀଵ
௡
ͳ ԡ‫ ݕ‬െ ‫ݕ‬௜ ԡଶ The results from the two summation neurons were then
݂ଶ ሺ‫ݕ‬ሻ ൌ ෍ ቆെ ቇ ሺͳ͵ሻ
ሺʹߨሻ௣Τଶ ݊ߪ௣ ʹߪ ଶ compared and the largest was feed forward to the output
௜ୀଵ
neuron to yield the computed class and the probability that this
The forth layer is decision layer. For each targets category example would belong to that class. The main procedure of
it compares the weighted values and select the largest value proposed algorithm was described as follows.
[19][20].
Algorithm
Input: Training Data D1, Testing Data D2, Classifier L
IV. PROPOSED PNN Output: Adaptive Detection Rate By PNN
Dropout is a method of drop neuron from the network. In Procedure:
this paper, Dropout was used for the same purpose to remove Training(Model Construction)-
some unnecessary features from input and hidden layer. All Step 1. Read training samples D1 and class numbers
Feature selection processes perform with a classifier. As to Step 2. Calculate ‫ ݎ‬ሺ௟ሻ (independent Bernoulli random variables)
select classifier, PNN was chosen for its simplicity and
from Bernoulli(probability of being ݈ ).
multilevel network without weights. We proposed a new PNN
Step 3. Apply this calculated Bernoulli variable to Input layer to
network with Dropout. The Architecture of proposed make the network as thinned network.
“Dropout PNN” algorithm was, Step 4. Sorts these thinned input into the K sets where each set
Input Hidden Decision Output Contains one class of vectors
Layer layer Layer Layer Step 5. For each k-
+1 Define a Gaussian function of k sets
r Class 1 Group Testing(Model Use): Once the PNN was defined, then we could
X1 ෩
ଵ feed vectors into it and classified them as follows.
r Step 1. Read testing sample D2 and feed it to the thinned network
X2 ෩ଶ f1 that constructs in model construction in each class

Step 2. Compute thinned Gaussian functional for each group of
Sums thinned network hidden nodes(k sets) with D2.
f2 Step 3. For each group of hidden nodes, feed all its thinned Gaussian
r functional values to the decision layer where a single thinned
XN ෩
୒ decision node is made from each group.
Step 4. At each class Decision node, sum of all the thinned
Gaussian functions and multiply by constant.
Class 2 Group
Step 5. Finally, find maximum value of all summed functional
values at the output nodes
Fig. 3. Proposed PNN with dropout.
219
V. RESULT ANALYSIS TABLE III. Output accuracy with all attributes, without skewed
and without high kurtosis valued attributes in PNN algorithm
For this experiment, PNN algorithm was compared with
DPNN on several datasets in different ways. From these Data sets PNN Original Accuracy without Accuracy without
Accuracy Skewness Kurtosis
results discussed four of them.
Accuracy for # of Accuracy # of Accuracy
A. Used Datasets all Attributes Attributes Attributes
The Glass Identification dataset (Dataset I) was used Dataset I 65.12 9 60.46 7 60.46
which can be found in the Keel Repository [21]. It consisted Dataset II 91.18 6 91.18 7 91.17
of 171 learning and 43 testing examples each with 10 numeric Dataset III 76.67 9 76.67 9 76.67
attributes. In the Ecoli Data Set (Dataset II), where had 268 Dataset IV 85.18 11 81.48 9 85.18
learning and 68 testing examples with 8 numeric attributes From above accuracy table, the output accuracy of PNN
[22]. The Breast Cancer Data Set (Dataset III) was found from algorithm without skewed attributes (by reducing 1, 2 or 3
the UC Irvine Machine Learning Repository as above [23]. It attributes) were decreased (for Dataset I and IV), and
consisted of 228 learning and 60 testing examples. The Statlog unchanged (for Dataset II and III) than original PNN
Heart Data Set (Dataset IV) was used which can be found in algorithm. And in the case of PNN algorithm without high
the same UC Irvine Machine Learning Repository [24]. The kurtosis valued attributes (by reducing 1, 3 or 5 attributes), the
Heart dataset contained 270 instances with 14 attributes. output accuracy were decreased (for Dataset I), and unchanged
(for Dataset II, III, and IV) than original PNN algorithm. So
B. Implementation the accuracies were not stabled or increased for these two
We used four datasets in this experiment. The Class methods (accuracy of PNN algorithm without skewed and
Imbalance ratio and the output accuracies for these datasets high kurtosis valued attributes) but the change the number of
were represented- features might have an effect on datasets as well as output
accuracies.
TABLE I. Training datasets information with its IR value and
corresponding Output accuracy From Table II, all datasets had some correlated/ dependent
attributes. Some had more and some had less dependent
# of Training # of input Imbalance Output attributes. The numbers of dependent/correlated attributes
Datasets instances Attributes Ratio (IR) Accuracy
Dataset I 171 10 1.8 65.12
were called the correlated rank of that dataset. Then examined
Dataset II 268 8 5.54 91.18 these datasets with respect to the correlated/ dependent
Dataset III 228 10 2.36 76.67 attributes in PNN algorithm and showed the effect as output
Dataset IV 216 14 1.25 85.18 accuracy-
The accuracies found from the original PNN algorithm
TABLE IV. Correlated attributes, Correlated rank and output
were not sufficient. In these accuracies, there might have some accuracy of PNN algorithm without these correlated attributes.
impact on class imbalance ratios of training data but it should
not be ignored because, in dataset, it was normal that positive Accuracy without Accuracy without
Correlated Attributes Correlated Attributes
or negative class (for example, in disease diagnosis dataset Data sets
normal patient must be higher than abnormal patient or vice # of 1st Accuracy # of 2nd Accuracy
versa) might be much higher than others i.e. might have Attributes Rank Attributes Rank
imbalance ratios greater than 1. Dataset I 1,3,5,6,9 5 65.12 4,7,8 3 69.76
Dataset II 2,5,6 3 86.76 5,6 2 88.23
The dataset might also have some attributes that were Dataset III 4,5,9 3 68.34 5,6 2 73.34
skewed, correlated/dependent, and had high kurtosis value. Dataset IV 8,11 2 83.34 1 1 83.34
The skewed attributes, high kurtosis valued attributes and the
correlated/dependent attributes for the above mentioned four The outputs Accuracy for PNN algorithm without
datasets were- correlated attributes were found. In above table, have two
columns, one is for highest correlation rank and the other was
TABLE II. Training Datasets information with its Original, for second highest correlation rank column. This was only for
Skewed, High valued Kurtosis and correlated/dependent Attributes some datasets (Dataset I, II, and III) where greater accuracy
for second highest rank than highest correlation rank had.
Name of Name of Name of
skewness kurtosis correlated/ Then compared these two accuracy columns with PNN
Datasets # of input
Attributes valued valued dependent original algorithms accuracy-
attributes attributes attributes
Dataset I 10 6 6,8,9 1,3,5,6,9 TABLE V. Comparison Output accuracy between Original PNN
Dataset II 8 3,4 4 2,5,6 and PNN without correlated attributes.
Dataset III 10 4 4 4,5,9
PNN Original Accuracy without Accuracy without
Dataset IV 14 5,6,10 5,6,10,11,12 4,12
Accuracy Correlated Correlated
Then applied datasets in the PNN algorithm to find the Data sets For all attributes for attributes for 2nd
output accuracies with all attributes, without skewed attributes Attributes Highest Rank Highest Rank
and without high kurtosis valued attributes that showed in Dataset I 65.12 65.12 69.76
below table- Dataset II 91.18 86.76 88.23
Dataset III 76.67 68.34 73.34
Dataset IV 85.18 83.34 83.34
220
From table IV and V, the output accuracy for correlated Fig. 6. Output accuracy of DPNN with several dropped attributes for
Dataset III
attributes was affected. However, for Dataset II, III and IV,
the output accuracy were decreased and only for Dataset I, it
was increased. From table IV, the output accuracy without 90.00% 88.89%
correlated attributes was increased only for 2nd highest 88.00%
correlation rank not for highest correlation rank. That is, the 86.00%
84.00%
improving PNN algorithm by considering correlation/ 82.00%
dependence was also not a stable or proper effective technique 80.00%
like omitting skewed and high kurtosis valued attributes. 78.00%
76.00%
Then examined the four above-mentioned datasets by
DPNN, and found different output accuracies. For Dataset I,
the output accuracies were found after dropping out different
combinations of attributes. This dropped out attributes and the
corresponding accuracies were given in a bar chart-
Fig. 7. Output accuracy of DPNN with several dropped attributes for
80.00% 76.74% Dataset IV
75.00% The figure 5, 6 and 7 were formed from for the dataset II,
70.00% III, and IV where the maximum output accuracies were found
65.00% by dropping out different combinations of attributes. The
highest accuracy was found after dropping two attributes (2
60.00%
and 3) for dataset II, one attributes (9) or two attributes (1 and
55.00% 9) for dataset III, and five attributes (2, 3, 4, 6 and 8) for
50.00% dataset IV. That is-
TABLE VI. DPNN accuracy and corresponding Dropout attributes

Data sets Dropout DPNN
Fig. 4. Output accuracy of DPNN with several dropped attributes for Attributes Accuracy
Dataset I
Dataset I 1,3 76.74
The bar graph depicted that the maximum output accuracy Dataset II 2,3 92.65
was found by dropped attributes 1 and 3 for dataset I. This Dataset III 9 or 1,9 80
process was applied for Dataset II, III, and IV and found the Dataset IV 2,3,4,6,8 88.89
following bar graphs- Then compared these proposed method accuracies with all
above found accuracies such as- accuracy from PNN original
94.00% 92.65% method, PNN without skewed attributes, PNN without high
92.00% kurtosis valued attributes and PNN without
90.00% correlated/dependent attributes for all datasets-
88.00%
86.00% TABLE VII. Output Accuracy of Original PNN, without Skewed,
84.00% Without Kurtosis and Proposed PNN
82.00%
80.00% PNN Accuracy Accuracy Accuracy Proposed
Data sets Original without without without PNN
Accuracy Skewness Kurtosis correlated Accuracy
(percent) (percent) (percent) attributes (percent)
Dataset I 65.12 60.46 60.46 69.76 76.74
Fig. 5. Output accuracy of DPNN with several dropped attributes for Dataset II 91.18 91.18 91.18 88.23 92.65
Dataset II Dataset III 76.67 76.67 76.67 73.34 80
Dataset IV 85.18 81.48 85.18 83.34 88.89
80% 80%
80.00% In table VII, the output accuracies for all practical datasets
were performed better by proposed method (DPNN) than all
75.00% other techniques that were tested here.
70.00% C. Result Analysis
According to the table I, where shown class Imbalance
65.00%
Ratio for different datasets with an accuracy of original PNN
60.00% algorithm. The IR value must have some impact on dataset but
that was not our concerned because IR values were calculated
from training data samples ratio. From table III and IV, where
the output was taken by removing skewed, high kurtosis
valued attributes and correlated attributes because these
221
factors could prevent classifier to classify properly. It was [7] Bhende, C. N., S. Mishra, and B. K. Panigrahi. "Detection and
clear from above table that these had some impact on datasets classification of power quality disturbances using S-transform and
modular neural network." Electric Power Systems Research 78.1 (2008):
but without these attributes, the accuracy of PNN algorithm 122-128.
was not increased as expected for all datasets. Then DPNN [8] Shreepad S. Sawant, Preeti S. Topannavar.” Introduction to Probabilistic
was used to examine the datasets and gave better result than Neural Network –Used For Image Classifications” International Journal
the others. These proposed method results and comparison of Advanced Research in Computer Science and Software Engineering
with other methods were mentioned in Table VI and VII. 5(4), April- 2015, pp. 279-283.
These better results were found for different real world [9] Srivastava, Nitish. Improving neural networks with dropout. Diss.
practical datasets by choosing specific and proper University of Toronto, 2013.
attributes/features. From there, DPNN showed better [10] Hinton, Geoffrey E., et al. "Improving neural networks by preventing
co-adaptation of feature detectors." arXiv preprint arXiv:1207.0580
performance than all others. That is, the perfect choosing (2012).
method was done by dropout. So it could be said that dropped
[11] Srivastava, Nitish, et al. "Dropout: A simple way to prevent neural
attributes for which the performance was increased, they had networks from overfitting." The Journal of Machine Learning Research
no effect on the datasets and should be dropped. And which 15.1 (2014): 1929-1958.
dropped attributes had decreased the accuracy; they should not [12] Specht, Donald F. "Probabilistic neural networks." Neural networks 3.1
be dropped as they were skewed or dependent. So dropout was (1990): 109-118.
the best way to select specific features for real-world [13] Wu, Stephen Gang, et al. "A leaf recognition algorithm for plant
problems. classification using probabilistic neural network." Signal Processing and
Information Technology, 2007 IEEE International Symposium on.
IEEE, 2007.
VI. CONCLUSION [14] Duda, Richard O., and Peter E. Hart. Pattern classification and scene
The aim of this work was to improve PNN algorithm by analysis. Vol. 3. New York: Wiley, 1973.
selecting specific and appropriate attributes/features. For this, [15] Parzen, Emanuel. "On estimation of a probability density function and
mode." The annals of mathematical statistics 33.3 (1962): 1065-1076.
several techniques were tested because the classifier
performed well on datasets only when data are un-imbalance, [16] Mishra, Madhusmita, Amrut Ranjan Jena, and Raja Das. "A
probabilistic neural network approach for classification of vehicle."
not skewed, haven’t high kurtosis value and are not Safety 2.7 (2013).
dependent/correlated each other. When it was hard to improve [17] Mao, Ke Zhi, K-C. Tan, and Wee Ser. "Probabilistic neural-network
output accuracy by omitting this skewed attributes, high structure determination for pattern classification." Neural Networks,
kurtosis valued attributes and dependent/correlated attributes IEEE Transactions on 11.4 (2000): 1009-1016.
then proposed method was helpful and performed well in these [18] Hajmeer, M., and I. Basheer. "A probabilistic neural network approach
datasets or in any other real world problems. for modeling and classification of bacterial growth/no-growth data."
Journal of microbiological methods 51.2 (2002): 217-226.
[19] Chandra, Bala, and KV Naresh Babu. "An improved architecture for
probabilistic neural networks." Neural Networks (IJCNN), The 2011
References International Joint Conference on. IEEE, 2011.
[1] Rao, PV Nageswara, et al. "A probabilistic neural network approach for [20] Shahadat, Nazmul, and Biprodip Pal. "An empirical analysis of attribute
protein superfamily classification." Journal of Theoretical and Applied skewness over class imbalance on Probabilistic Neural Network and
Information Technology 6.1 (2009): 101-105. Naive Bayes classifier." Computer and Information Engineering
[2] Singer, Donald A., and Ryoichi Kouda. "Application of a feedforward (ICCIE), 2015 1st International Conference on. IEEE, 2015.
neural network in the search for Kuroko deposits in the Hokuroku [21] B. German.(1987) On the use of Glass Identification Data Set [Online] .
district, Japan." Mathematical Geology 28.8 (1996): 1017-1023. Available http://archive.ics.uci.edu/ml/datasets/Glass+Identification
[3] Specht, Donald F. "Probabilistic neural networks." Neural networks 3.1 File: glass.data
(1990): 109-118. [22] Kenta Nakai .(1996) On the use of Ecoli Data Set [Online] . Available
[4] Tang, Lei, and Huan Liu. "Bias analysis in text classification for highly http://sci2s.ugr.es/keel/dataset.php?cod=61#sub2 File: ecoli.data.
skewed data." Data Mining, Fifth IEEE International Conference on. [23] Matjaz Zwitter & Milan Soklic.(1988) On the use of Breast Cancer Data
IEEE, 2005. Set [Online] . Available http://archive.ics.uci.edu/ml/datasets/Breast+
[5] Yang, Yiming, and Jan O. Pedersen. "A comparative study on feature Cancer. File: breast-cancer-data.
selection in text categorization." ICML. Vol. 97. 1997. [24] On the use of Statlog (Heart) Data Set [Online] . Available
[6] Ancona, Fabio, et al. "Implementing probabilistic neural networks." http://archive.ics.uci.edu/ml/datasets/Statlog+%28Heart%29 . File:
Neural Computing & Applications 5.3 (1997): 152-159. heart.dat.
222

Shahadat 2017

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Shahadat 2017

Hochgeladen von

Copyright:

Verfügbare Formate

International Conference on Electrical, Computer and Communication Engineering (ECCE), February 16-18, 2017, Cox’s Bazar, Bangladesh

Dropout Effect On Probabilistic Neural Network

C. Kurtosis III. PROBABILISTIC NEURAL NETWORK

Fig. 3. Proposed PNN with dropout.

TABLE VI. DPNN accuracy and corresponding Dropout attributes

Das könnte Ihnen auch gefallen