Ijettcs 2012 10 01 033

I nt ernat i onal Journal of E mergi ng Trends & Technol ogy i n Comput er Sci ence(I JE TTCS)
Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com

Volume I, Issue 3, September-October 2012 ISSN 2278-6856

Vol ume I , I ssue 3 , Sept ember -Oct ober 2 0 1 2 Page 2 6

Abstract: Attribute reduction is an important issue in
rough set theory. It is necessary to investigate fast and
effective approximate algorithms to generate a set of
discriminatory features. The main objective of this paper is
investigating a strategy based on Rough Set Theory (RST)
with Particle Swarm Optimization (PSO) to be used. Rough
Set Theory has been recognized to be one of the powerful
tools in the medical feature selection .The supplementary
part which will be used is Particle Swarm Optimization
(PSO) that is defined as a subfield of swarm intelligence that
studies the emergent collective intelligence of groups of
simple agents and based on social behavior that can be
observed in nature, such as flocks of birds and fish schools
where a number of individuals with limited capabilities are
able to achieve intelligent solutions for complex problems.
Particle Swarm Optimization is widely used and rapidly
developed for its easy implementation and few particles
required to be tuned. This hybrid approach embodies an
adaptive feature selection procedure which dynamically
accounts for the relevance and dependence of the features
.The relevance selected feature subsets are used to generate
decision rules for the breast cancer classification task to
differentiate the benign cases from the malignant cases by
assigning classes to objects. The proposed hybrid approach
can help in improving classification accuracy and also in
finding more robust features to improve classifier
performance.
Keywords: Breast cancer, Rough Sets, Feature Selection,
Particle Swarm Optimization (PSO), Classification.
1. INTRODUCTION
Breast cancer occurs when cells become abnormal and
divide without control or order that can be considered as
cancerous growth that begins in the tissues of the breast.
Breast cancer has become the most common cancer
disease among women [1]. The most effective way to
reduce breast cancer deaths is detect it earlier. Early
detection is the best form of cure and accurate diagnosis
of the tumor which is extremely vital. Early detection
allows physicians to differentiate between benign breast
tumors from malignant ones without going for surgical
biopsy. It also offers accurate, timely analysis of patient's
particular type of cancer and the available treatment
options. Extensive research has been carried out on
automating the critical diagnosis procedure as various
machine learning algorithms have been developed to aid
physicians in optimizing the decision task effectively.
Rough set theory offers a novel approach to manage
uncertainty that has been used for the discovery of data
dependencies, importance of features, patterns in sample
data, feature space dimensionality reduction and the
classification of objects. While rough set on their own
provide a powerful technique, it is often combined with
other computational intelligence techniques such as
neural networks, fuzzy sets, genetic algorithms, Bayesian
approaches, swarm optimization and support vector
machines. Particle Swarm Optimization as a new
evolutionary computation technique, in which each
potential solution is seen as a particle with a certain
velocity flying through the problem space. The Particle
Swarms find optimal regions of the complex search space
through the interaction of individuals in the population.
PSO is attractive for feature selection in that particle
swarms will discover best feature combinations as they
fly within the subset space. Compared with other
evolutionary techniques, PSO requires only primitive and
simple mathematical operators. In this research, rough
set is applied to improve feature selection and data
reduction. Particle Swarm Optimization (PSO) is used to
optimize the rough set feature reduction to effectively
classify breast cancer tumors, either malignant or benign.
This paper is organized as follows: In section 2 we
reviewed briefly some of the recent related work
published in the area of cancer classification using
intelligent techniques, the vision of the proposed hybrid
techniques given in section 3, section 4 discusses basic
concepts of rough set theory, the basic idea of Particle
Swarm Optimization and also shown experiments results
and Section 5 concludes the research
2. RELATED WORK
Besides the introduction given above, a literary review is
presented on Particle Swarm Optimization and rough sets
through their journey from inception to be implemented
in various problems.
The use of machine learning and data mining techniques
[2] has revolutionized the whole process of breast cancer
Diagnosis and Prognosis. Data mining methods overview
including Decision Trees, Support Vector Machine
(SVM), Genetic Algorithms (GAs) / Evolutionary
Programming (EP), Fuzzy Sets, Neural Networks and
Using Intelligent Techniques for Breast Cancer
Classification

Hesham Arafat
1
, Sherif Barakat
2
and Amal F. Goweda
3

1
Mansoura University, Faculty of Engineering, Department of Computer Engineering and Systems,
2
Mansoura University, Faculty of Computers and Information, Department of Information Systems,
3
Mansoura University, Faculty of Computers and Information, Department of Information Systems,



Rough Sets had been carried out to enhance the breast
cancer diagnosis and prognosis.
Data Mining Classification Techniques for Breast Cancer
Diagnosis as discussed in [3] .A. Soltani Sarvestani, A.
A. Safavi, N.M. Parandeh and M.Salehi provided a
comparison among the capabilities of various neural
networks such as Multilayer Perceptron (MLP), Self-
Organizing Map (SOM), Radial Basis Function (RBF)
and Probabilistic Neural Network(PNN) which are used
to classify WBC and NHBCD data. The performance of
these neural network structures was investigated for
breast cancer diagnosis problem. RBF and PNN were
proved as the best classifiers in the training set. But the
PNN gave the best classification accuracy when the test
set is considered. This work showed that statistical neural
networks can be effectively used for breast cancer
diagnosis as by applying several neural network
structures a diagnostic system was constructed that
performed quite well.
In [4] Wei-pin Chang, Der-Ming and Liou explored that
the genetic algorithm model yielded better results than
other data mining models for the analysis of the data of
breast cancer patients in terms of the overall accuracy of
the patient classification, the expression and complexity
of the classification rule. The artificial neural network,
decision tree, logistic regression and genetic algorithm
were used for the comparative studies and the accuracy
while positive predictive value of each algorithm was
used as the evaluation indicators. WBC database was
incorporated for the data analysis followed by the 10-fold
cross-validation. The results showed that the genetic
algorithm described in the study was able to produce
accurate results in the classification of breast cancer data
and the classification rule identified was more acceptable
and comprehensible.
In [5] K. Rajiv Gandhi, Marcus Karnan and S. Kannan
in their paper constructed classification rules using the
Particle Swarm Optimization algorithm for breast cancer
datasets. In this study to cope with heavy computational
efforts, the problem of feature subset selection as a pre-
processing step was used which learns fuzzy rules bases
using GA implementing the Pittsburgh approach. It was
used to produce a smaller fuzzy rule bases system with
higher accuracy. The resulted datasets after feature
selection were used for classification using particle
swarm optimization algorithm. The rules developed were
with rate of accuracy defining the underlying attributes
effectively.
Das et al. in [6] hybridized rough set theory with Particle
Swarm Optimization (PSO) algorithm. The hybrid
rough-PSO technique has been used for grouping the
pixels of an image in its intensity space. Authors treated
image segmentation as a clustering problem. Each cluster
is modeled with a rough set. PSO is employed to tune the
threshold and relative importance of upper and lower
approximations of the rough sets.
Another approach that uses rough set with PSO has been
proposed by Wang et al. in [7]. The authors applied
rough set to predict the degree of malignancy in brain
glioma. The selected feature subsets based on rough set
with PSO are used to generate decision rules for the
classification task. A rough set attribute reduction
algorithm that employs a search method based on PSO is
proposed and compared with other rough set reduction
algorithms. Experimental results show that reducts found
by the proposed algorithm are more efficient and can
generate decision rules with better classification
performance. Moreover, the decision rules induced by
rough set rule induction algorithm can reveal regular and
interpretable patterns of the relations between glioma
MRI features and the degree of malignancy, which are
helpful for medical experts.
In this introduction, different data mining techniques
including rough sets and evolutionary algorithms used
for working out the feature selection problem,
classification, data analysis and clustering had been
showed. No attempt has been made to cover all
approaches currently existing in literature but our
intention has been to highlight the role played by data
mining techniques and especially rough set theory across
evolutionary algorithms.
Table 1 depicts a fair comparison between supervised
learning approaches which are used for the classification
task that concentrates on predicting the value of the
decision class for an object among a predefined set of
values classes given the values of some given attributes
for the object comparison between learning algorithms
and more details can be found in [8]
Table 1: a fair comparison between supervised learning
approaches

Classificatio
n Technique
Strength Points Weakness
Points
Decision Tree
Induction
- Decision trees
generate rules
which are
effective and
simple
- Decision trees
provide fields
which are the
important
ones.
- Decision
trees are
less
appropri
ate to
predict
the
continuo
us
attribute
values.
- It can be
expensiv
e to
train.


Bayesian
Classification

- Easy to
implement.
- Good results
obtained in
most of the
cases.
- Loss of
accuracy
accordin
g to
assumpti
ons
- Dependen
cies
exist
among
variables
.
Neural
Network
Classifier
- High tolerance
to noisy data
- Well suited for
continuous
valued inputs
and outputs
- Algorithms are
parallel.
- Long
training
time.
- Require a
number
of
paramet
ers.
- Poor
interpret
ability
Support
Vector
Machines

- Training is
relatively easy.
- Non-traditional
data can be
used as input
to SVM
instead of
feature vectors
- The need
for a
good
kernel
function
Genetic
Algorithms
- The parallelism
that allows
evaluating
many schema
at once
- Genetic
algorithms
perform well
in problems for
which the
fitness
landscape is
complex ones.
- The
language
used to
specify
candidat
e
solutions
must be
robust.
- The
problem
of how
to write
the
fitness
function
must be
carefully
consider
ed so
that
higher
fitness is
attainabl
e.

To evaluate which model is the best for the classification
task some dimensions for Comparison are taken into
accounts which are as follow:
- Error in numeric predictions
- Cross Validation
- Speed of model application
- Speed of model application
- Classification Accuracy
- Total cost/benefit
- Noise Tolerance

3. ROUGH SET AND PSO BASED
FEATURE SELECTION
3.1Problem Definition
Feature selection aims to determine a minimal feature
subset from a problem domain while retaining a suitably
high accuracy in representing the original features. The
significance of feature selection can be viewed in two
facets. The frontier facet is to filter out noise and remove
redundant and irrelevant features. According to Jensen
and Shen in [9] feature selection is compulsory due to the
abundance of noisy, irrelevant or misleading features in a
dataset. Second facet, feature selection can be
implemented as an optimization procedure of search for
an optimal subset of features that better satisfy a desired
measure in [10].Thus, a proposed rough set feature
selection algorithm based on a search method called
Particle Swarm Optimization (PSO) is used to select
feature subsets that are more efficient to describe the
decisions as well as the original whole feature set and
discarded the redundant features leading to better
prediction accuracy. After selecting those features that
influence the decision concepts, they are employed within
a decision rule generation process and creating
descriptive rules for the classification task.
3.2 Objective
Problem target is focusing on finding optimal minimal
feature subset from a problem domain in order to remove
those features considered as irrelevant that increase the
complexity of learning process and decrease the accuracy
of induced knowledge. It results not only in improving
the speed of data manipulation but even in improving the
classification rate by reducing the influence of noise and
achieving classification accuracy. The aim is to build a
concise model of the distribution of class labels in terms


of predictor features. Then the resulting classifier is used
to assign class labels to the testing instances where the
values of the predictor features are known, but the value
of the class label is unknown.
Figure 1 shows the feature selection procedure that was
adopted in this study. This structure composes of two
important phases. Feature Selection tier constitutes the
deployment of Particle Swarm Optimization for further
refinement and recommend only the significant features.
Classification tier constitutes the process of exploiting
optimized features to extract classification rules to be
used to differentiate between benign cases from
malignant cases.
In processing medical data, the advantage of choosing
the optimal subset of features is as in [11]:
Reducing the dimensionality of the attributes reduces
the complexity of the problem and allows researchers
to focus more clearly on the relevant attributes
Simplifying data description may facilitate
physicians to make a prompt diagnosis.
Having fewer features means that less data need to
be collected, as collecting data is never an easy job in
medical applications because it is time-consuming
and costly.

Figure 1 Rough-PSO Feature Selection Process

The idea of PSO reducts for the optimal feature selection
problem can be shown [12] in Figure 2:
(1) i : Xi randomPosition();
V i randomVelocity()
(2) fit bestFit(X);
globalbest fit;
pbest

bestPos(X);
(3) i : Pi Xi
(4) while (stopping criterion not met)
(5) for i = 1,, S // for each particle
(6) if (fitness(i) > fit) // local best
(7) fit fitness(i)
(8) pbest Xi
(9) if (fitness(i) > globalbest) // global best
(10) globalbest fitness(i)
(11) gbest Xi;
R getReduct(Xi) // convert to reduct
(12) updateVelocity(); updatePosition()

Figure 2 An algorithm to compute reducts
Initially, a population of particles is constructed with
random positions and velocities on S dimensions in the
problem space. For each particle, the fitness function is
evaluated. If the current particles fitness evaluation is
better than pbest, then this particle becomes the current
best, and its position and fitness are stored. Next, the
current particles fitness is compared with the
populations overall previous best fitness. If the current
value is better than gbest, then this is set to the current
particles position, with the global best fitness updated.
This position represents the best feature subset
encountered so far, and is thus converted and stored in R
.The velocity and position of the particle is then updated
according to Equation 12 and Equation 13. This process
loops until a stopping criterion is met, usually a
sufficiently good fitness or a maximum number of
iterations (generations).The chosen subsets are then
employed within a decision rule generation process,
creating descriptive rules for the classification task.
The motivation behind this study is trying to provide a
practical tool for optimizing feature selection problem,
the number of reducts found and the classification
accuracy when applied to the classification of complex,
real-world datasets. PSO has a strong search capability in
the problem space and can efficiently find minimal
reducts. Therefore, the combination of both with domain
intelligence leads to better knowledge.
4. TECHNIQUES USED IN THE STUDY
Structure described in Fig.1, involves two techniques
Rough Set and PSO.
4.1.Rough Set Theory
Rough set theory [13] is a mathematical approach for
handling vagueness and uncertainty in data analysis.
Objects may be indiscernible due to the limited available
information. A rough set is characterized by a pair of
precise concepts, called lower and upper approximations
which are generated using object indiscernibility. The
most important issues are the reduction of attributes and
the generation of decision rules. The rough set approach
seems to be of fundamental importance to AI and
cognitive sciences, especially in the areas of machine


learning, knowledge acquisition, and decision analysis,
knowledge discovery from databases, expert systems,
inductive reasoning and pattern recognition.
Irrelevant features, uncertainties and missing values
often exist in medical data such as breast cancer data. So,
the analysis of medical data often requires dealing with
incompleteness and inconsistent of data that make it
differ from other intelligent techniques such as neural
networks, decision trees and fuzzy theory that are mainly
based hypothesis (e.g. knowledge about dependencies,
probability distributions and large number of
experiments).
Rough set theory [13] can deal with uncertainty and
incompleteness in data analysis. The attribute reduction
algorithm removes redundant information or features and
selects a feature subset that has the same discernibility as
the original set of features. The medical goal is to
identify subsets of the most important attributes
influencing the treatment of patients. Rough set rule
induction algorithms generate decision rules that can be
more useful for medical expert to analyze and gain
understanding dimensions of the problem
The main advantage of rough set theory in data analysis
is that it does not need any preliminary or additional
information about data like probability in statistics or
grade of membership or the value of possibility in fuzzy
set theory. Rough set has many advantages to be used by
many researchers:
- Providing efficient algorithms for finding hidden
patterns in data and most of them are suited for
parallel processing.
- Evaluating significance of data
- Generating sets of decision rules from data which are
concise and valuable
- Finding minimal sets of data (data reduction)
- Rough set methods do not need membership
functions and prior parameter settings due to its
simplicity
Rough sets have been a useful tool for medical
applications. Hassanien [14] reported application of
rough sets to breast cancer data that generated rules with
98% accurate results. Tsumoto [15] proposed a rough set
algorithm to generate diagnostic rules based on the
hierarchical structure of differential medical diagnosis
and it was evaluated experimentally results show that
rules represent experts decision processes. Komorowski
and Ohrn [16] use a rough set approach for identifying a
patient group in need of a scintigraphic scan for
subsequent modeling. In [15], a rough set classification
algorithm exhibits higher classification accuracy than
decision tree algorithms, such as ID3 and C4.5. The
generated rules are more understandable than those
produced by decision tree methods.
4.1.1.Basic Rough Set Concepts
Let }) { , ( d A U I = be an information system [13]
where U is the universe with a non-empty set of finite
objects. A is a non-empty finite set of condition
attributes, and d is the decision attribute (such a table is
also called decision table). A ae There is a
corresponding function
a a
V U f : , where
a
V is the
set of values of a. If A P _ , there is an associated
equivalence relation:
)} ( ) ( , | ) , {( ) ( y f x f P a U U y x P IND
a a
= e e = (1)

The partition of U, generated by IND (P) is denoted U/P.
If ) ( ) , ( P IND y x e then x and y are indiscernible to P.
The equivalence classes of the P-indiscernibility relation
are denoted
P
x] [ . Let U X _ , the P-lower
approximation X P and P-upper approximation X P of
set X can be defined as:

} ] [ | { X x U x X P
P
_ e =

(2)

} ] [ | { = e = X x U x X P
P

(3)
Let A Q P _ , be equivalence relations over U, then the
positive, negative and boundary regions can be defined
as:
X P Q POS
Q U X
P
/
) (
e
=
(4)
X P U Q NEG
Q U X
P
/
) (
e
=
(5)
X P X P Q BND
Q U X Q U X
P
/ /
) (
e e
=
(6)
The positive region of the partition U/Q with respect to
P abbreviated as ) (Q POS
P
is the set of all objects of U
that can be certainly classified to blocks of the partition
U/Q by means of P. Functional dependence: For given
A= (U, A), P, Q A, by PQ is denoted the functional
dependence of (Q) on (P) in A that holds if and only if
IND (P) IND (Q). Also dependencies to a degree are
considered in [13]:
Q depends on P in a degree k ( 1 0 s s k ) denoted
Q P
k

U
Q POS
Q k
P
P
) (
) ( = =
(7)
If k=1, Q depends totally on P, if 0<k<1, Q depends
partially on P, and if k=0 then Q does not depend on P.
When P is a set of condition attributes and Q is the
decision, ) (Q
P
is the quality of classification [13],
[17].


The goal of attribute reduction is to remove redundant
attributes so that the reduced set provides the same
quality of classification as the original. The set of all
reducts is defined as [18]:
A dataset may have many attribute reducts. The set of all
optimal reducts is:
} , Red | Red { Red
min
R R R R ' s e ' e =
(9)

The intersection of all reducts is called the core, the
elements of which are those attributes that cannot
be eliminated. The core is defined as [13], [18]:
Core(C) = Red (10)
4.1.2.Decision Rules
An expression c: (a=v) where A a e and
a
V v e is an
elementary condition (atomic formula) of the decision
rule which can be checked for any X x e . An elementary
condition c can be interpreted as a
mapping } , { : false true U c . A conjunction C of q
elementary conditions is denoted
by
q
c c c C . . . = ...
2 1
. The cover of a conjunction C
denoted by [C] or
A
C is the subset of examples that
satisfy the conditions represented by C as showed in
[17].The cover of conjunction
} ) ( : { ] [ true x C U x C = e = called the support
descriptor .If K is concept, the positive cover
K C C
K
=
+
] [ ] [ denotes the set of positive examples
covered by C.
A decision rule r for A is any expression of the form
) ( v d = where
q
c c c . . . = ...
2 1
is a
conjunction, satisfying =
+
K
] [ and
d
V v e ,
d
V is the
set of values of d. The set of attribute-value pairs
occurring in the left hand side of the rule r is the
condition part, Pred(r), and the right hand is the decision
part, Succ(r). An object U u e is matched by a decision
rule ) ( v d = if and only if u supports both the
condition part and the decision part of the rule. If u is
matched by ) ( v d = then we say that the rule
classifies u to decision class v. The number of objects
matched by a decision rule, ) ( v d = , denoted by
Match(r), is equal to ) (
A
card . The support of the
rule ) (
A A
v d card = is the number of objects
supporting the decision rule.
As in [19], the accuracy and coverage of a decision
rule ) ( v d = are defined as:
) (
) (
) (
A
A A
card
v d card
D

=
=
(11)
) (
) (
) (
A
A A
v d card
v d card
D
=
=
=

(12)
4.1.3.Rough set Feature Selection
Rough sets for feature selection [20] is valuable, as the
selected feature subset can generate more general
decision rules and better classification quality of new
samples. So some heuristic or approximation algorithms
have to be considered. K.Y. Hu [21] computes the
significance of an attribute using heuristic ideas from
discernibility matrices and proposes a heuristic reduction
algorithm (DISMAR). X. Hu [22] gives a rough set
reduction algorithm using a positive region-based
attribute significance measure as a heuristic (POSAR).
G.Y. Wang [23] develops a conditional information
entropy reduction algorithm (CEAR).
4.2.Particle Swarm Optimization (PSO)
Particle swarm optimization (PSO) is an evolutionary
computation technique developed by Kennedy and
Eberhart [24], [25]. The original idea was to graphically
simulate the choreography of a bird flock. Shi.Y.
introduced the concept of inertia weight into the particle
swarm optimizer to produce the standard PSO algorithm
[26].The concept of particle swarms has become very
popular these days as an efficient search and
optimization technique. The Particle Swarm
Optimization (PSO) [27], [30] does not require any
gradient information of the function to be optimized, uses
only primitive mathematical operators, and is
conceptually very simple. Since its advent in 1995, PSO
has attracted the attention of many researchers all over
the world resulting in a huge number of variants of the
basic algorithm and many parameter automation
strategies.
An Analysis of the Advantages of the Basic Particle
Swarm Optimization Algorithm discussed in [28]:
- PSO is based on the intelligence. It can be applied
into both scientific research and engineering use.
- PSO have no overlapping and mutation calculation.
The search can be carried out by the speed of the
particle. During the development of several
generations, only the most optimist particle can
transmit information onto the other particles, and the
speed of the researching is very fast.
- The calculation in PSO is very simple. In compared
with the other developing calculations, it occupies
the biggest optimization ability and it can be
completed easily.
)} ( ) ( , ), ( ) ( | { Red D D R B D D C R
C B C R
= c = _ =
(8)



- PSO adopts the real number code, and it is decided
directly by the solution. The number of the
dimension is equal to the constant of the solution.
PSO is initialized with a population of particles. Each
particle is treated as a point in an S-dimensional space.
The ith particle is represented as ) ,..., , (
2 1 iS i i i
x x x X = .
The best previous position (pbest, the position giving the
best fitness value) of any particle
is ) ,..., , (
2 1 iS i i i
p p p P = . The index of the global best
particle is represented by gbest. The velocity for particle
is ) ,..., , (
2 1 iS i i i
v v v V = . The particles are manipulated
according to the following equation:
) ( * () * ) ( * () * *
2 1 id gd id id id id
x p Rand c x p rand c v w v + + =
(13)

id id id
v x x + =
(14)

Where w is the inertia weight, suitable selection of the
inertia weight provides a balance between global and
local exploration and thus require less iterations on
average to find the optimum. If a time varying inertia
weight is employed, better performance can be expected
[29]. The acceleration constants c1 and c2 in equation
(13) represent the weighting of the stochastic
acceleration terms that pull each particle toward pbest
and gbest positions. Low values allow particles to roam
far from target regions before being tugged back, while
high values result in abrupt movement toward, or past,
target regions. rand () and Rand() are two random
functions in the range [0,1]. Particles velocities on each
dimension are limited to a maximum velocity Vmax. If
Vmax is too small, particles may not explore sufficiently
beyond locally good regions. If Vmax is too high
particles might fly past good solutions.
The first part of equation (13) enables the flying
particles with memory capability and the ability to
explore new search space areas. The second part is the
cognition part, which represents the private thinking of
the particle itself. The third part is the social part,
which represents the collaboration among the particles.
Equation (13) is used to update the particles velocity.
Then, the particle flies toward a new position according
to equation (14). The performance of each particle is
measured according to a pre-defined fitness function.
The process for implementing the PSO algorithm is as
follows [7]:

(1) procedure PSO
(2) repeat
(3) for i = 1 to number of individuals do
(4) if G (xi) > G (pi) then
/ / G () evaluates goodness
(5) for d = 1 to dimensions do
(6) pid = xid .
// pid is the best state found so far
(7) end for
(8) end if
(9) g = i.// Arbitrary
(10) for j = indexes of neighbors do
(11) if G (pj) > G (pg) then
(12) g = j.
//g is the index of the best performer in the
neighborhood
(13) end if
(14) end for
(15) for d = 1 to number of dimensions do
(16) vid (t) = f(xid(t 1), vid(t 1), pid, pgd)
//Update velocity
(17) vid 2 in (Vmax,+Vmax)
(18) xid(t) = f(vid(t), xid(t 1)) .
//Update position
(19) end for
(20) end for
(21) until stopping criteria
(22) end procedure

Figure 3: Standard Particle Swarm Optimization
(PSO)
Definitions and Variables used in Figure 3:
- t means the current time step, t 1 means the
previous time step.
- xid(t) is the current state (position) at site d of
individual i.
- vid (t) is the current velocity at site d of individual i.
- Vmax is the upper/lower bound placed on vid.
- pid is the individuals i best state (position) found so
far at site d.
- pgd is the neighborhood best state found so far at site
d
4.3.Problems' Description and Basic
Experimentation Setup
Breast cancer UCI dataset [31] was obtained from
University of Wisconsin Hospitals, Madison from Dr.
William H. Wolberg. We perform experimentation on the
dataset summarized in Table 2.
Table2: Data used in the experiments
Name Instances
699
Class
Distributio
n
Validati
on
Wisconsin
Breast
Cancer
Diagnosti
c
Attribute
s
11
Benign
cases: 458
(65.5%)
Malignant
cases: 241
(34.5%).
Trainin
g 80%
Testing
:140
case

The data will be nine conditional features and one
decision feature as the first attribute that describes
sample code number will be removed as shown later.
Data was implemented in WEKA software more
information about it can be found in [32]
Steps to be implemented:


Step 1: Remove sample code number from data (no effect
on data) with removal filter.
Step 2: Dataset was discretized from numeric to nominal
data using NumericToNominal filter which is defined as
an instance filter that discretizes a range of numeric
attributes in the dataset into nominal attributes.
Step 3: Replace missing values for nominal and numeric
attributes with modes and means from the training data
that will be done by using ReplaceMissingValues filter.
Step 4: To find the reducts we applied the supervised
attribute selection filter RSARSubsetEval(Rough Set
Attribute Reduction) that is the implementation of the
QuickReduct algorithm of rough set attribute reduction
and we use the search method as PSOsearch that explores
the attribute space using the Particle Swarm
Optimization (PSO) algorithm described in [33] and
parameters showed in figure4 and table 3
Table 3: PSO Parameters

Figure 4 Preprocess Implementation
This stage was important because Rough Set filtering was
used to eliminate the unimportant and redundant features
(First phase in Figure1) and to reduce the number of
iterations that PSO has to perform in finding an optimum
feature subset.
Step 5: We used some of classification techniques as
showed in table 4 to classify the data .The number of
decision rules and the classification accuracy are also
shown.
From the results, we could conclude that an increase of
particle/individual above 20 does not bring any relevant
improvement in the algorithms performance. The
increment or the decrement in number of iterations has
also no influence on algorithm's performance as its ideal
result by experiments is on 20 iterations. The best result
in all of the classification algorithms obtains with
minimum feature subset .This achieves our view to obtain
best results with minimum features subset. Finally, the
evaluation results show that using Nave Bayes with 5
population size and 20 iterations obtains the best result
among the other methods.
Table 4: Comparison of classification results by using
various classification techniques

C
l
a
s
s
i
f
i
c
a
t
i
o
n

T
e
c
h
n
i
q
u
e

Correct
ly
Classifi
ed
Instanc
es
Incorr
ectly
Classif
ied
Instan
ces
TP
Rat
e
(A
V
G)

F
P
R
at
e
(A
V
G)

Pre
cisi
on
(A
VG
)

R
ec
all

Popul
ation
Size

Feat
ure
Selec
tion
N
a
v
e

B
a
y
e
s

135
96.428
6%
5
3.571
4 %
0.9
64
0.
03
8
0.9
65
0.
96
4
20 10
Attri
butes
Confusion Matrix
a b <------classified as
87 3 | a = 2
2 48 | b = 4
136
97.142
9%
42.
8571
%
0.9
71
0.
02
5
0.9
72
0.
97
1
10 9
Attri
butes
Confusion Matrix
a b <-- classified as
87 3 | a = 2
1 49 | b = 4
136
97.142
9%
42.
8571
%
0.9
71
0.
02
5
0.9
72
0.97
1
5 8
Attri
butes
Confusion Matrix
87 3 | a = 2
1 49 | b = 4
D
e
c
i
s
i
o
n

T
a
b
l
e

129
92.142
9%
117
.8571
%
0.9
21
0.
10
6
0.9
21
0.
92
1
20 10
Attri
butes
129
92.142
9%
117
.8571
%
0.9
21
0.
10
6
0.9
21
0.
92
1
10

9
Attri
butes
10
Rule
s
0.08
Seco
nds
129
92.142
9%
117
.8571
%
0.9
21
0.
10
6
0.9
21
0.
92
1
5 8Attr
ibute
s
10
Rule
s
0.07
Seco
nds
Confusion Matrix for the above three cases
86 4 a = 2
7 43 b = 4
PSO
Parameter
s
Individua
l
Weight
Inertia
Weigh
t
Social
Weigh
t
Iteration
s
0.34 0.33 0.33 20


P
r
i
s
m

122
87.142
9%
96.
4286
%
0.93
1
0.
10
4
0.9
32
0.
93
1
20 10
Attri
butes
Uncl
assifi
ed
96
.428
6 %
Confusion Matrix
82 2 | a = 2
7 40 | b = 4
119
85 %
117
.1429
%

0.9
22
0.
12
6
0.927 0.922 10 9 9
Attri
butes
Uncl
assifi
ed
11
(7.8
571
%)
Confusion Matrix
81 1 | a = 2
9 38 | b = 4
120
85.714
3%
9
6.428
6 %

0.9
3
0.
12
6
0.9
37
0.
93
5 8
Attri
butes
Uncl
assifi
ed
11
7.85
71%
Confusion Matrix
83 0 | a = 2
9 37 | b = 4
J
4
8

1339
5%
7 5
%

0.9
5
0.5
4
0.9
5
0.9
5
20 10
Attrib
utes
Num
ber of
Leav
es :29
Tree
Size
:32
1339
5%
7 5
%

0.9
5
0.5
4
0.9
5
0.9
5
10 9
Attrib
utes
Num
ber of
Leav
es :29
Tree
Size
:32
Confusion Matrix
86 4 | a = 2
3 47 | b = 4
130
92.857
1%
107
.1429
%
0.9
29
0.
09
3
0.9
28
0.
92
9
5 8
Attri
butes
Leav
es:37
Tree
Size
:41
Confusion Matrix
86 4 | a = 2
6 44 | b = 4
J
R
i
p

w
i
t
h

k
=
2

o
p
t
i
m
i
z
a
t
i
o
n
s
,

F
o
l
d
s

3

130
92.857
1%
107
.1429
%
0.9
29
0.
08
4
0.9
29
0.
92
9
20 10At
tribu
tes
14
Rule
s
0.11
seco
nds
Confusion Matrix
85 5 | a = 2
5 45 | b = 4
135
96.428
6%
53.
5714
%
0.9
64
0.
05
5
0.9
65
0.
96
4
10 9Attr
ibute
s 13
Rule
s
0.08
seco
nds
Confusion Matrix
89 1 | a = 2
4 46 | b = 4
133
95%
75
%
0.9
5

0.
07
2
0.9
5
0.
95
5 8
Attri
butes
15
Rule
s

Confusion Matrix
88 2 | a = 2
5 45 | b = 4



Nave Bayes Classifier shows that classification process
with minimum features only 8 attributes achieve higher
result with 136 correctly classified instances. The same
result the nave base classifier achieved when it used 9
attributes and 10 population size .This shows that the
best result is on minimum features selection .Decision
Table Classifier gives 129 correctly classified instances
with minimum features subset .Only 8 attributes give the
same result as 10 or 9 attributes used.
Prism classifier gives the worst results .It achieves 122
correctly classified instances with ten attributes used.
Decrement number of correctly classified instances to
120 when 8 attributes and 5 population size are used .So
we can say that decrement number of attributes with
Prism classifier gives counterproductive and decreases
the classification accuracy.J48 classifier achieves best
result of 133 correctly classified with 9 attributes and
population size of 10 .JRip achieves best result of 135
correctly classified instances with 9 attributes and 10
population size .Best results were extracted by Nave
Bayes then JRip classifier in terms of classification
accuracy and feature reduction

5. Future Plans
The blending with the other intelligent optimization
algorithm [28]
The Blending Process is to combine the advantages of the
PSO with the advantages of the other intelligent
optimization algorithms to create the compound
algorithm that has practical value. For example, the
particle swarm optimization algorithm can be improved
by the simulated annealing (SA) approach .It can be
connected with the hereditary agents, the algorithm of a
colony of ants, vague method and etc.
The application area of the Algorithm
At present, the most research on PSO in the coordinate
system. There is less research on the PSO algorithm
application in non-coordinate system, scattered system
and compound optimization system.
6. CONCLUSION
Medical diagnosis is considered as an intricate task that
needs to be carried out precisely and efficiently. The
automation of the same would be highly beneficial.
Clinical decisions are often made based on doctor's
intuition and experience. Data mining techniques have
the potential to generate a knowledge-rich environment
which can help to significantly improve the quality of
clinical decisions. Rough set theory supplies essential
tools for knowledge analysis. It provides algorithms for
knowledge reduction, concept approximation, decision
rule induction and object classification. The methods of
rough set theory rest on indiscernibility and related
notions, in particular on notions related to rough
inclusions. All constructs needed in implementing rough
set based algorithms can be derived from data tables with
no need for priori estimates or preliminary assumptions.
The combination of rough sets with other intelligence
techniques is able to provide a more effective approach.
We have illustrated that rough sets have been
successfully combined with particle swarm optimization
algorithms that is described as new heuristic optimization
method based on swarm intelligence. It is very simple,
easily implemented and it needs fewer parameters, which
made it fully developed and applied for feature extraction
task.
References
[1] R. Roselin, K. Thangavel, and C.
Velayutham,Fuzzy-Rough Feature Selection for
Mammogram Classification, Journal of Electronic
Science and Technology, Vol. 9, No. 2, JUNE 2011.
[2] J.R. Quinlan, Induction of Decision Trees,
Machine Learning, pp.81-106, Vol.1, 1986.
[3] Sarvestan Soltani A., Safavi A. A., Parandeh M.
N. and Salehi M., Predicting Breast Cancer
Survivability using data mining techniques,
Software Technology and Engineering (ICSTE), 2nd
International Conference ,pp.227-231, Vol.2, 2010.
[4] Chang Pin Wei and Liou Ming Der, Comparison
of three Data Mining techniques with Genetic
Algorithm in analysisof Breast Cancer data,
Available:http://www.ym.edu.tw/~dmliou/Paper/com
par_threedata.pdf.
[5] Gandhi Rajiv K., Karnan Marcus and Kannan S.,
Classification Rule Construction Using Particle
Swarm Optimization Algorithm for Breast Cancer
Datasets, Signal Acquisition and Processing,
ICSAP, International Conference, pp. 233 237,
2010.
[6] S.Das, A. Abraham, S.K. Sarkar,A Hybrid
Rough SetParticle Swarm Algorithm for Image
Pixel Classification,Proc.of the SixthInt.Conf. on
Hybrid Intelligent Systems, pp. 26-32, 2006.
[7] Matthew Settles, An Introduction to Particle
Swarm Optimization, November 2007.
[8] S. B. Kotsiantis, Supervised Machine Learning:
A Review of Classification Techniques,
Informatica, Vol.31, 249-268, 2007.
[9] Jensen, R. and Shen, Q., Fuzzy-rough Data
Reduction with Ant Colony Optimization, Journal
of Fussy Sets and Systems, pp.5-20, Vol. 149, 2005.
[10] Monteiro, S., Uto, TK., Kosugi, Y.,
Kobayashi,N.,Watanabe, E. and Kameyama,
K,Feature Extraction of Hyperspectral Data for
Under Spilled Blood Visualization Using Particle
Swarm Optimization,International Journal of
Bioelectromagnetism, pp.232-235, Vol. 7,No.1 ,
2005.
[11] Yan WANG, Lizhuang MA, Feature Selection
for Medical Dataset Using Rough Set Theory,


Proceedings of the 3rd WSEAS International
Conference on COMPUTER ENGINEERING and
APPLICATIONS.
[12] Jensen, R., Shen, Q., & Tuson, A.,Finding
Rough Set Reducts with SAT, In Proceedings of the
10th International conference on Rough Sets, Fuzzy
Sets, Data Mining and Granular Computing, LNAI
3641, pp. 194-203, 2005.
[13] Z. Pawlak, Rough Sets: Theoretical aspects of
reasoning about data, Kluwer Academic Publishers,
Dordrecht, 1991.
[14] A.E. Hassanien,Rough Set Approach for
Attribute Reduction and Rule Generation: A Case
of Patients with Suspected Breast Cancer, Journal
of the American society for Information science and
Technology ,pp. 954-962, Vol.55,No.11, 2004.
[15] S. Tsumoto,Mining Diagnostic Rules from
Clinical Databases Using Rough Sets and Medical
Diagnostic Model, Information Sciences, pp.65-80,
Vol.162, 2004.
[16] J. Komorowski, A. Ohrn, Modeling Prognostic
Power of Cardiac Tests Using Rough Sets, Artificial
Intelligence in Medicine , pp. 167-191, Vol.15,
1999.
[17] ZPawlak, Rough Set Approach to Knowledge-
Based Decision Support", European Journal of
Operational Research, pp. 48-57, Vol.99, 1997.
[18] Xiangyang Wang , Jie Yang , Xiaolong Teng ,
Weijun Xia , Richard Jensen , Feature Selection
based on Rough Sets and Particle Swarm
Optimization .
[19] A.Skowron, C.Rauszer, The Discernibility
Matrices and Functions in Information Systems, In:
R.W. Swiniarski (Eds.): Intelligent Decision
SupportHandbook of Applications and Advances
of the Rough Sets Theory, Kluwer Academic
Publishers, Dordrecht, pp. 311-362, 1992.
[20] R.W. Swiniarski, A. Skowron, Rough set
methods in feature selection and recognition,
Pattern Recognition Letters, pp. 833-849, Vol. 24,
2003.[21] K.Y. Hu, Y.C. Lu, C.Y. Shi, Feature
ranking in rough sets, AI Communications, pp. 41-
50, Vol.16,No.1, 2003.
[22] X. Hu, Knowledge Discovery in Databases: An
Attribute-Oriented Rough Set Approach,Ph.D
thesis, Regina University,1995.
[23] G.Y. Wang, J. Zhao, J.J. An, Y. Wu, Theoretical
Study on Attribute Reduction of Rough Set Theory:
Comparison of Algebra and Information Views", In:
Proceedings of the Third IEEE International
Conference on Cognitive Informatics, (ICCI04),
2004.
[24] J .Kennedy, R.Eberhart, Particle Swarm
Optimization",In :Proc IEEE Int. Conf. On Neural
Networks, Perth, pp. 1942-1948, 1995.
[25] J.Kennedy, R.C.Eberhart,A new optimizer
using particle swarm theory, In: Sixth International
Symposium on Micro Machine and Human Science,
Nagoya, pp. 39-43, 1995.
[26] Y. Shi, R. Eberhart,A Modified Particle Swarm
Optimizer", In: Proc. IEEE Int. Conf. On
Evolutionary Computation, Anchorage, AK, USA,
pp. 69-73, 1998.
[27] Kennedy.J,Small Worlds and Mega-Minds:
Effects of Neighborhood Topology on Particle
Swarm Performance", Proceedings of the 1999
Congress of Evolutionary Computation, IEEE Press,
Vol. 3, pp. 1931-1938, 1999.
[28] Qinghai Bai, Analysis of Particle Swarm
Optimization Algorithm",Computer and Information
Science,Vol.3,No.1, 2010.
[29] Y. Shi, R. C. Eberhart,Parameter Selection in
Particle Swarm Optimization in Evolutionary
Programming, VII: Proc. EP98, New York:
Springer-Verlag, pp. 591-600, 1998.
[30] R.C. Eberhart, Y. Shi,Particle Swarm
Optimization: Developments, Applications and
Resources, In: Proc. IEEE Int. Conf. On
Evolutionary Computation, Seoul, pp. 81-86, 2001.
[31] http://archive.ics.uci.edu/ml/machine-learning-
databases/breast-cancer-wisconsin/breast-cancer-
wisconsin.data
[32] http://www.cs.waikato.ac.nz/ml/weka/
[33] Moraglio, A., Di Chio, C., and Poli, R.,
Geometric Particle Swarm Optimization,EuroGP,
LNCS 445, pp. 125-135, 2007.

Ijettcs 2012 10 01 033

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Ijettcs 2012 10 01 033

Hochgeladen von

Copyright:

Verfügbare Formate

I nt ernat i onal Journal of E mergi ng Trends & Technol ogy i n Comput er Sci ence(I JE TTCS)

Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com

Das könnte Ihnen auch gefallen