100%(2)100% fanden dieses Dokument nützlich (2 Abstimmungen)

1K Ansichten82 SeitenClassification of Wisconsin Breast Cancer Diagnostic and Prognostic Dataset Using Polynomial Neural Network
sapmeen

Sep 07, 2014

© © All Rights Reserved

PDF, TXT oder online auf Scribd lesen

Classification of Wisconsin Breast Cancer Diagnostic and Prognostic Dataset Using Polynomial Neural Network
sapmeen

© All Rights Reserved

Als PDF, TXT **herunterladen** oder online auf Scribd lesen

100%(2)100% fanden dieses Dokument nützlich (2 Abstimmungen)

1K Ansichten82 SeitenClassification of Wisconsin Breast Cancer Diagnostic and Prognostic Dataset Using Polynomial Neural Network
sapmeen

© All Rights Reserved

Als PDF, TXT **herunterladen** oder online auf Scribd lesen

Sie sind auf Seite 1von 82

Network

A Dissertation Work

Submitted in Partial fulfillment for the award of

Post Graduate Degree of Master of Technology

In Computer Science & Engineering

Submitted to

Rajiv Gandhi Proudyogiki Vishwavidhyalaya,

Bhopal (M.P.)

Submitted By:

Shweta Saxena

0126CS10MT17

Under the Guidance of

Dr. Kavita Burse

Director, OCT, Bhopal.

Department of Computer Science & Engineering

ORIENTAL COLLGEGE OF TECHNOLOGY,

BHOPAL

(Formerly known as Thakral College of Technology, Bhopal)

Approved by AICTE New Delhi & Govt. of M.P.

Affiliated to Rajiv Gandhi Proudyogiki Vishwavidhyalaya, Bhopal (M.P.)

Session 2012-13

II

ORIENTAL COLLGEGE OF TECHNOLOGY, BHOPAL

(Formerly known as Thakral College of Technology, Bhopal)

Approved by AICTE New Delhi & Govt. of M.P. and Affiliated to Rajiv

Gandhi Proudyogiki Vishwavidhyalaya Bhopal (M.P.)

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

CERTIFICATE

THIS IS TO CERTIFY THAT THE DISSERTATION ENTITLED

Classification of Wisconsin Breast Cancer

Diagnostic and Prognostic Dataset using Polynomial

Neural Network BEING SUBMITTED BY Shweta Saxena

IN PARTIAL FULFILLMENT OF THE REQUIREMENT FOR

THE AWARD OF M.TECH DEGREE IN COMPUTER

SCIENCE & ENGINEERING TO ORIENTAL COLLEGE OF

TECHNOLOGY, BHOPAL (M.P) IS A RECORD OF BONAFIDE

WORK DONE BY HIM UNDER MY GUIDANCE.

Dr. Kavita Burse Prof. Roopali

Soni

Director Head of

Department, CSE

OCT, Bhopal OCT, Bhopal

(Guide)

III

ORIENTAL COLLGEGE OF TECHNOLOGY, BHOPAL

(Formerly known as Thakral College of Technology, Bhopal)

Approved by AICTE New Delhi & Govt. of M.P. and Affiliated to Rajiv

Gandhi Proudyogiki Vishwavidhyalaya Bhopal (M.P.)

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

APPROVAL CERTIFICATE

This dissertation work entitled Classification of

Wisconsin Breast Cancer Diagnostic and Prognostic

Dataset using Polynomial Neural Network submitted

by Shweta Saxena is approved for the award of degree of

Master of Technology in Computer Science & Engineering.

INTERNAL EXAMINER EXTERNAL EXAMINER

Date: Date:

IV

CANDIDATE DECLARATION

I hereby declare that the dissertation work presented in the report

entitled as Classification of Wisconsin Breast Cancer Diagnostic

and Prognostic Dataset using Polynomial Neural Network

submitted in the partial fulfillment of the requirements for the award

of the degree of Master of Technology in Computer Science &

Engineering of Oriental College of Technology is an authentic record

of my own work.

I have not submitted the part and partial of this report for the award

of any other degree or diploma.

Date: Shweta

Saxena

(0126CS10MT17)

This is to certify that the above statement made by the candidate is

correct to the best the best of my knowledge.

Dr. Kavita Burse

Director

OCT, Bhopal

(Guide)

V

ACKNOWLEDGEMENT

I would like to express my deep sense of respect and gratitude towards my advisor

and guide Dr. Kavita Burse, Director Oriental College of Technology who has

given me an opportunity to work under her. She has been a constant source of

inspiration throughout my work. She displayed unique tolerance and understanding at

every step of progress of this work and encouraged me incessantly. Her invaluable

knowledge and innovative ideas helped me to take the work to the final stage. I

consider it my good fortune work under such a wonderful person.

I express my respect to Prof. Roopali Soni, Head, Computer Science

Engineering Department, Oriental College of Technology for her constant

encouragement and invaluable advice in every aspect of my academic life. I am also

thankful to all faculty members of Computer Science and Engineering Department for

their support and guidance.

I am especially thankful to my father Mr. Damodar Saxena, my mother Mrs.

Nirmala Saxena, and my loving sisters Shikha and Shraddha for their love,

sacrifice and support on every path of my life. I extend a special word of thanks to my

husband Mr. Ashish Saxena for his moral support and help in achieving my aim.

Last but not the least I am extremely thankful to all who have directly or indirectly

helped me for the completion of my work.

Shweta Saxena

(0126CS10MT17)

VI

ORGANIZATION OF DISSERTATION

The report Classification of Wisconsin Diagnostic and Prognostic Dataset using

Polynomial Neural Network has been divided into 7 chapters as follows:

Chapter 2 Introduction

Chapter 1 first describes the motivation of this research work. It then describes breast

cancer disease, its symptoms and types in detail. The chapter also describes diagnosis

and prognosis process of the disease.

Chapter 2 Literature Review

Different Neural network techniques for diagnosis and prognosis of breast cancer

diagnosis and prognosis are described in this chapter along with the related work

concerned with these techniques. The chapter also compares the accuracies of

different techniques at the end.

Chapter 3 Artificial Neural Network and Principal Component

Analysis

In this chapter Artificial Neural network is described in detail along with its

advantages and medical applications. The chapter describes in detail the higher order

or polynomial neural network along with back propagation algorithm which are used

in this research for classification. The chapter next provides the detailed information

about data preprocessing technique named Principal Component Analysis and its

advantages.

Chapter 4 MATLAB

The technology used for implementation of proposed work is MATLAB. The chapter

gives a brief introduction of MATLAB along with its advantages and detailed

description of Neural Network Toolbox available in MATLAB for design of neural

network. The chapter also explains the neural network design process using neural

network toolbox.

VII

Chapter 5

Chapter 5 presents the description of dataset used for implementation of this research

and the results of implementation.

Chapter 6

Chapter 6 concludes the dissertation and provides possible directions for relevant

future work.

VIII

ABSTRACT

Breast cancer is the most common form of cancer and major cause of death in

women. Normally, the cells of the breast divide in a regulated manner. If cells keep

on dividing when new cells are not needed, a mass of tissue forms. This mass is

called a tumor. This tumor can be cancerous or non-cancerous. The goal of diagnosis

is to distinguish between cancerous and non-cancerous cells. Once a patient is

diagnosed with breast cancer, the prognosis gives the anticipated long-term behavior

of the ailment. Breast cancer detection, classification, scoring and grading of

histopathological images is the standard clinical practice for the diagnosis and

prognosis of breast cancer. In a large hospital, a pathologist typically handles a

number of cancer detection cases per day. It is, therefore, a very difficult and time-

consuming task. Owing to their wide range of applicability and their ability to learn

complex and non linear relationships including noisy or less precise informat ion

Artificial Neural Networks (ANNs) are very well suited to solve problems in

biomedical engineering. ANNs can be applied to medicine in four basic fields:

modeling, bioelectric signal processing, diagnosing and prognostics. There are

several systems available for the diagnosis and selection of therapeutic strategies in

breast cancer.

In this research we propose neural network based clinical support system to provide

medical data analysis for diagnosis and prognosis of breast cancer. The system

classifies the breast cancer diagnostic data which are provided as input to neural

network into two sets- benign (non- cancerous) and malignant (cancerous) to get the

diagnostic results. For getting prognosis results the system classify the prognostic

data which are given as input to neural network into two classes- recurrent and non-

recurrent. Results belong to recurrent set shows that cancer is reoccurred after some

time. Polynomial neural network (PNN) structure is used along with back

propagation algorithm for classification of breast cancer data. Wisconsin Breast

Cancer (WBC) datasets from the UCI Machine Learning repository is used as input

datasets to PNN. Data pre-processing technique named Principal Component

Analysis (PCA) is used as a features reduction transformation method to improve the

accuracy of PNN. In our results the Mean Square error (MSE) is substantially

reduced for PCA preprocessed data as compared to normalized data. Hence we get

more accurate diagnosis and prognosis results.

Keywords- breast cancer, polynomial neural network, principal component

analysis, wisconsin breast cancer dataset.

IX

CONTENTS

DESCRIPTION PAGE NO.

List of Fig.s XII

List of Tables XIII

Chapter I

Introduction 1-7

1.1 Research Motivation 2

1.2 Introduction 3

1.3 Symptoms of breast cancer 4

1.4 Types of breast cancer 4

1.5 Breast cancer diagnosis 5

1.6 Breast cancer prognosis 6

Chapter - 2

Literature Review 8-26

2.1 Introduction 9

2.2 Neural network techniques for diagnosis and prognosis of breast cancer 11

2.3 Comparison of neural network techniques for breast cancer diagnosis and

prognosis 26

Chapter 3 27-40

Artificial Neural Network and Principal Component Analysis

3.1 Overview of ANN 28

3.2 Basics of ANN 28

3.3 Feed Forward Neural Network with Back propagation 29

X

3.4 Higher order or polynomial neural network 33

3.5 Advantages of ANN 35

3.6 Medical Applications 35

3.7 Overview of data Preprocessing 36

3.7.1 Feature selection 37

3.7.2 Feature extraction 37

3.8 Principal Component Analysis 38

3.8.1 Dimension reduction 38

3.8.2 Lower dimensionality basis 39

3.8.3 Selection of principal components 39

3.8.4 Selecting best lower dimensional space 39

3.8.5 Linear transformation implied 40

3.9 Advantages of PCA 40

Chapter 4

41-48

MATLAB

4.1 Introduction 42

4.2 Advantages of MATLAB 42

4.3 Limitations of MATLAB 43

4.4 Neural Network Toolbox 44

4.5 Neural Network Design using Neural Network Toolbox 45

4.5.1 Collecting the data 46

4.5.1.1 Pre-processing and post-processing the data 46

4.5.1.2 Representing Unknown or Dont Care Targets 47

4.5.1.3 Dividing the Data 47

XI

4.5.2 Creating and configuring the network 47

4.5.3 Initializing weights and biases 47

4.5.4 Training the network 47

4.5.5 Validation of network 48

4.5.6 Use the network 48

Chapter 5

Simulation and Results 49-60

5.1 Introduction 50

5.2 Description of dataset 52

5.3 Results and discussions 57

5.3.1 Diagnosis Results 57

5.3.2 Prognosis Results 58

Chapter 6

Conclusion and Future Scope 61-62

6.1 Conclusion 62

6.2 Future work 62

List of Publications 63-64

References 65-74

LIST OF FIGURES

XII

FIGURE NO. TITLE PAGE NO.

Fig. 1.1 Breast Cancer 3

Fig. 1.2 FNA Images of benign and malignant breast mass 6

Fig. 2.1 An MLP structure 11

Fig. 2.2 Probabilistic neural network for cancer diagnosis 16

Fig. 3.1 A single neuron 26

Fig. 3.2 Feed Forward NN model for Breast Cancer diagnosis 27

Fig. 3.3 Node structure of PNN 30

Fig. 3.4 Polynomial Neural Network 30

Fig. 3.5 Data Pre-processing using PCA 34

Fig. 4.1 Pre-processing and post-processing 42

Fig. 5.1 Flow chart of ANN process 47

Fig. 5.2 Comparison of the convergence performance for WPBC dataset (50

iterations) 55

Fig. 5.3 (a) Testing error for normalization and PCA data for WPBC dataset over

100 data 55

Fig. 5.3 (b) Testing error for normalization PCA for WPBC dataset over 198 data

56

LIST OF TABLES

TABLE NO. TITLE PAGE NO.

Table 2.1 Accuracy comparison for test data classification 23

Table 4.1 Pre-processing and post-processing functions 42

Table 5.1 A brief description of breast cancer datasets 46

Table 5.2 Attribute information for WBC dataset 48

Table 5.3 Attribute information for WDBC dataset 49

XIII

Table 5.4 Attribute information for WPBC dataset 50-51

Table 5.5 Training performance for WBC dataset 52

Table 5.6 Testing performance for WBC dataset 53

Table 5.7 Training performance for WDBC dataset 53

Table 5.8 Testing performance for WDBC dataset 53-54

Table 5.9 Training performance for WPBC dataset 54

Table 5.10 Testing performance for WPBC dataset 54

XIV

Chapter 1

Introduction

1.1 Research Motivation

According to the World Health Organization (WHO), breast cancer is currently the

top cancer in women worldwide and the second highest cause of death for all female.

Diagnosis and prognosis of breast cancer at very early stage is recondite due to

various factors, which are cryptically interconnected to each other. We are oblivious

to many of them. Until an effective preventive measure becomes widely available,

early detection followed by effective treatment is the only recourse for reducing

breast cancer mortality. Most breast cancers are detected by the patient as the lump in

the breast. The majority of breast lumps are benign (non- cancerous) so it is the

physicians responsibility to diagnose breast cancer. The goal of diagnosis is to

distinguish between malignant (Cancerous) and benign breast lumps. Once a patient

is diagnosed with breast cancer, the malignant lump must be excised. During this

procedure, or during a different post-operative procedure, physicians must determine

the prognosis of the disease. Prognosis gives the anticipated long-term behavior of

the ailment. A major class of problems in medical science involves the diagnosis and

prognosis of breast cancer, based upon various tests performed upon the patient.

When several tests are involved, the ultimate diagnosis and prognosis may be

difficult to obtain, even for a medical expert. In human operator base analysis of test

results errors may also be created in calculation and this will result in faulty

treatment for the patients. This has given rise, over the past few decades, to

computerized diagnostic and prognostic tools, intended to aid the physician in

making sense out of the welter of data. A prime target for such computerized tools is

in the domain of cancer diagnosis and prognosis. Neural networks are computer-

XV

based tools inspired by the vertebrate nervous system that have been increasingly

used in the past decade to model biomedical domains. The motivation for this

research is to create neural network based tool for doctors to use for classifying the

results obtained from various tests performed upon the patient. The neural networks

based clinical support system proposed in this research provide medical data analysis

for diagnosis and prognosis in shorter time and remain unaffected by human errors

caused by inexperience or fatigue. Use of ANN increases the accuracy of most of the

methods and reduces the need of the human expert. The back propagation algorithm

has been used to train neural network keeping in view of the significant

characteristics of NN and its advantages for the implementation of the classification

problem. PCA is used as a features reduction transformation method to improve the

accuracy of ANN. Advantages of feature reduction includes the identification of a

reduced set of features among a large set of features that are used for outcome

prediction. Though the proposed neural network model is implemented on standard

Wisconsin dataset obtained from UCI machine learning repository, it can also be

implemented using similar dataset.

1.2 Introduction

Breast cancer is the major cause of death by cancer in the female population [1].

Most breast cancer cases occur in women aged 40 and above but certain women with

high-risk characteristics may develop breast cancer at a younger age [2]. Breast

cancer occurs in humans and other mammals. While theoverwhelming majority of

human cases occur in women, male breast cancer can also occur [3]. Cancer is a

disease in which cells become abnormal and form more cells in an uncontrolled way.

With breast cancer, the cancer begins in the tissues that make up the breasts. The

breast consists of lobes, lobules, and bulbs that are connected by ducts. The breast

also contains blood and lymph vessels. These lymph vessels lead to structures that

are called lymph nodes. Clusters of lymph nodes are found under the arm, above the

collarbone, in the chest, and in other parts of the body. Together, the lymph vessels

and lymph nodes make up the lymphatic system, which circulates fluid called lymph

throughout the body. Lymph contains cells that help fight infection and disease.

Normally, the cells of the breast divide in a regulated manner. If cells keep dividing

when new cells are not needed, a mass of tissue forms. This mass is called a tumor as

shown in fig. 1.1[4]. A tumor can be benign or malignant. A benign tumor is not

XVI

cancer and will not spread to other parts of the body. A malignant tumor is cancer.

Cancer cells divide and damage tissue around them. When breast cancer spreads

outside the breast, cancer cells are most often found under the arm in the lymph

nodes. In many cases, if the cancer has reached the lymph nodes, cancer cells may

have also spread to other parts of the body via the lymphatic system or through the

bloodstream. This can be life-threatening [5].

Fig 1.1 Breast Cancer

In addition to being the most frequently diagnosed cancer among women in the

United States, breast cancer accounts for up to 20 percent of the total costs of cancer

overall. Women covered by Medicaid have unique challenges when it comes to this

disease. For example, Medicaid recipients are more likely to be diagnosed at an

advanced stage. They also have much lower screening rates compared to the general

population. A new study found a high prevalence of breast cancer in Medicaid

patients as well as significantly higher health care use and costs [6].

1.3 Symptoms of Breast Cancer

The first noticeable symptom of breast cancer is typically a lump that feels different

from the rest of the breast tissue. More than 80% of breast cancer cases are

discovered when the woman feels a lump. Lumps found in lymph nodes located in

the armpits can also indicate breast cancer [7]. Indications other than a lump may

include thickening different from the other breast tissue, one breast becoming larger

or lower, a nipple changing position or shape or becoming inverted, skin puckering

or dimpling, a rash on or around a nipple, discharge from nipple/s, constant pain in

part of the breast or armpit, and swelling beneath the armpit or around the collarbone

[8]. Inflammatory breast cancer is a particular type of breast cancer which can pose a

XVII

substantial diagnostic challenge. Symptoms may resemble a breast inflammation and

may include itching, pain, swelling, nipple inversion, warmth and redness throughout

the breast, as well as an orange-peel texture to the skin [7]. Another reported

symptom complex of breast cancer is Paget's disease of the breast. This syndrome

presents as eczematoid skin changes such as redness and mild flaking of the nipple

skin. As Paget's advances, symptoms may include tingling, itching, increased

sensitivity, burning, and pain. There may also be discharge from the nipple.

Approximately half of women diagnosed with Paget's also have a lump in the breast

[9].

1.4 Types of Breast Cancer

Breast cancer can develop in different ways and may affect different parts of the

breast. The location of cancer will affect the progression of cancer and the treatment.

Breast cancer is divided mainly into the pre-invasive or in-situ form, or the

invasive or infiltrating form. The pre-invasive form is restricted to the breast itself

and has not yet invaded any of the lymphatics or blood vessels that surround the

breast tissue. Therefore, it does not spread to lymph nodes or other organs in the

body [5]. Pre-invasive Forms of breast cancer are-

a) Ductal carcinoma in situ (DCIS):

This is the most common pre-invasive breast cancer. More commonly seen now

because this form is generally seen on a mammogram and is identified by unusual

calcium deposits or puckering of the breast tissue (called stellate appearance). If left

untreated, DCIS will progress to invasive breast cancer.

b) Lobular carcinoma in situ (LCIS):

Unlike DCIS, LCIS is not really cancer at all. Most physicians consider the finding

of LCIS to be accidental, and it is thought to be a marker for breast cancer risk. That

is, women with LCIS seem to have a 7-10 times increased risk of developing some

form of breast cancer (usually invasive lobular carcinoma) over the next 20 years.

LCIS does not warrant treatment by surgery or radiation therapy. Close follow-up is

most commonly indicated, and LCIS is not easily seen on mammogram. Recent data

suggest that this condition may be a precursor to invasive lobular cancer. There may

XVIII

be some forms of LCIS (ie, the pleomorphic subtype) that require more aggressive

local therapy and closer follow-up.

Invasive Forms of cancer are-

a) Ductal carcinoma:

This is the most common form of breast cancer and accounts for 70% of breast

cancer cases. This cancer begins in the milk ducts and grows into surrounding

tissues.

b) Lobular carcinoma:

This originates in the milk-producing lobules of the breast. It can spread to the fatty

tissue and other parts of the body. About 1 in 10 breast cancers are of this type [10].

c) Medullary, mucinous, and tubular carcinomas:

These are three relatively slower-growing types of breast cancer.

d) Inflammatory carcinoma:

This is the fastest growing and most difficult type of breast cancer to treat. This

cancer invades the lymphatic vessels of the skin and can be very extensive. It is very

likely to spread to the local lymph nodes.

e) Pagets disease:

Paget's disease is cancer of the areola and nipple. It is very rare (about 1% of all

breast cancers). In general, women who develop this type of cancer have a history of

nipple crusting, scaling, itching, or inflammation.

1.5 Breast Cancer Diagnosis

Most breast cancers are detected by the patient as the lump in the breast. The

majority of breast lumps are benign (non- cancerous) so it is the physicians

responsibility to diagnose breast cancer. The goal of diagnosis is to distinguish

between malignant (Cancerous) and benign breast lumps. The three methods

currently used for breast cancer diagnosis are mammography, fine needle aspirate

(FNA) and surgical biopsy [11]. Mammography has a reported sensitivity

(probability of correctly identifying a malignant lump) which varies between 68%

and79% [12].Taking a fine needle aspirate (i.e. extracting fluid from a breast lump

XIX

using a small-gauge needle) and visually inspecting the fluid under a microscope has

a reported sensitivity varying from 65% to 98% [13]. Fig 1.2 shows an FNA image

of benign and malignant breast mass.

Fig 1.2 FNA Images of benign and malignant breast mass

The more evasive and costly surgical biopsy has close to 100% sensitivity and

remains the only test that can confirm malignancy. Therefore mammography lacks

sensitivity, FNA sensitivity varies widely, and surgical biopsy, although accurate, is

invasive, time consuming and costly [11]. The goal of the diagnostic aspect of our

research is to develop a neural network system that diagnoses breast cancer with help

of Wisconsin Breast cancer database which is obtained from FNAs.

1.6 Breast Cancer Prognosis

Once a patient is diagnosed with breast cancer, the malignant lump must be excised.

During this procedure, or during a different post-operative procedure, physicians

must determine the prognosis of the disease[14]. This is simply the long-term

outlook for the disease for patients whose cancer has been surgically removed[11].

Prognosis is important because the type and intensity of the medications are based on

it. Currently, the most reliable method of determining the prognosis is by axillary

clearance (the dissection of axillary lymph nodes) [Choong]. Unfortunately, for

patients with unaffected lymph nodes, the result is unnecessary numbness, pain,

weakness, swelling, and stiffness[15]. Prognosis poses a more difficult problem than

that of diagnosis since the data is censored. That is, there are only a few cases where

we have an observed recurrence of the disease [14]. A patient can be classified as a

recur if the disease is observed at some subsequent time to tumor excision, a patient

for whom cancer has not been recurred and may never recur, has an unknown or

censored[16] time to recur (TTR). On the other hand, we do not observe recurrence

in most patients. For these, there is no real point at which we can consider the patient

a non recurrent case. So, the data is considered censored since we do not know the

time of recurrence. For such patients, all we know is the time of their last check-up.

We call this the disease-free survival time (DFS) [14]. Prognostic aspect of the

XX

proposed research is to develop a neural network system that classify Wisconsin

Breast cancer Prognostic database into two classes- Recur and non-recur patients.

XXI

Chapter 2

Literature Review

2.1 Introduction

Neural network techniques have been successfully applied to the diagnosis and

prognosis of breast cancer. This chapter reviews the existing/popular neural network

techniques for the diagnosis and prognosis of breast cancer. Various neural network

techniques are compared at the end. The Wisconsin breast cancer data set is used to

study the classification accuracy of the neural networks. Two research papers which

were helpful for getting the idea of survey are-

An Analysis of the methods employed for breast cancer diagnosis by M. M.

Beg and M. Jain.

Breast cancer diagnosis using statistical neural networks by T. Kiyan, T

Yildirim

A brief description of above two papers is as follows-

An Analysis of the methods employed for breast cancer diagnosis, Author:

M. M. Beg and M. Jain [17]

Abstract:

Breast cancer research over the last decade has been tremendous. The ground

breaking innovations and novel methods help in the early detection, in setting the

stages of the therapy and in assessing the response of the patient to the treatment. The

XXII

prediction of the recurrent cancer is also crucial for the survival of the patient. This

paper studies various techniques used for the diagnosis of breast cancer. Different

methods are explored for their merits and de-merits for the diagnosis of breast lesion.

Some of the methods are yet unproven but the studies look very encouraging. It was

found that the recent use of the combination of Artificial Neural Networks in most of

the instances gives accurate results for the diagnosis of breast cancer and their use

can also be extended to other diseases.

Comments:

This paper reviews the existing/popular methods which employ the soft computing

techniques to the diagnosis of breast cancer. The paper demonstrated the better

performance of the multiple neural networks over the monolithic neural networks for

the diagnosis of breast cancer. It can be concluded from this study that the neural

networks based clinical support systems provide the medical experts with a second

opinion thus removing the need for biopsy, excision and reduce the unnecessary

expenditure. Use of ANN increases the accuracy of most of the methods and reduces

the need of the human expert. The ANN, Support Vector Machine, Genetic algorithm

(GA), and K-nearest neighbor may be used for the classification problems. The GA is

better used for the feature selection. The fuzzy co-occurrence matrix and fuzzy

entropy method can also be used for feature extraction. Almost all intelligent

computational learning algorithms use supervised learning. Supervised ANN

outperforms the unsupervised network but in the case of a patient with no previous

medical records the unsupervised ANN is the only solution.

Breast cancer diagnosis using statistical neural networks, Author: M. M.

Beg and M. Jain[18]

Abstract:

Breast cancer is the second largest cause of cancer deaths among women. The

performance of the statistical neural network structures, radial basis network (RBF),

general regression neural network (GRNN) and probabilistic neural network (PNN)

are examined on the Wisconsin breast cancer data (WBCD) in this paper. This is a

well-used database in machine learning, neural network and signal processing.

XXIII

Statistical neural networks are used to increase the accuracy and objectivity of breast

cancer diagnosis.

Comments:

This paper shows that how statistical neural networks are used in actual clinical

diagnosis of breast cancer. The simulations were realized by using MATLAB 6.0

Neural Network Toolbox. Four different neural network structures, multi layer

perceptron (MLP), RBF, PNN and GRNN were applied to WBCD database to show

the performance of statistical neural networks on breast cancer data. According to the

results RBF and PNN are the best classifiers in training set whereas GRNN gives the

best classification accuracy when the test set is considered. According to overall

results, it is seen that the most suitable neural network model for classifying WBCD

data is GRNN.

2.2 Neural network techniques for diagnosis and prognosis of breast cancer

Various techniques for diagnosis and prognosis of breast cancer are-

Multilayer Perceptron (MLP):

MLP has been widely used for the aim of cancer prediction and prognosis [19]. MLP

is a class of feed forward neural networks which is trained in a supervised manner to

become capable of outcome prediction for new data [20]. The structure of MLP is

shown in fig 2.1. An MLP consists of a set of interconnected artificial neurons

connected only in a forward manner to form layers. One input, one or more hidden

and one output layer are the layers forming an MLP [21]. Artificial neuron is basic

processing element of a neural network. It receives signal from other neurons,

multiplies each signal by the corresponding connection strength that is weight, sums

up the weighted signals and passes them through an activation function and feeds the

output to other neurons [22].

Fig. 2.1 MLP structure

XXIV

The simplest form of trainable neural network, first developed (Rosenblatt, 1958),

composed of two layers of nodes namely input and output layer. A mapping between

the input and output data could be established by assigning weights to the input

numerical data during training. More complicated MLPs which are commonly used

consist of some hidden layers in addition to the input and output layers. These hidden

layers enable the MLP to extract higher order statistics from a set of given data and

hence, capture the complex relationship between input-output data. Therefore, MLPs

commonly consist of an input layer for which the number of nodes are defined by

size of input vector, one or more hidden layers which can have variable number of

nodes depending on the application and an output layer which has one or more nodes

depending on the number of output classes. Connections between these layers are

defined by weights which are assigned in a supervised learning process so that the

neural network would respond correctly to new data. This can be done via a training

algorithm, in which a cost function is computed by comparing the networks output

and the desired output and is then minimized with respect to the network parameters

[21]. Neural network classification process consists of two steps- training and testing.

The classification accuracy depends on training [23]. A mapping between the input

and output data could be established by assigning weights to the input numerical data

during training [21]. The training requires a series of input and associated output

vectors. During the training, the network is repeatedly presented with the training

data and the weights and thresholds in the network are adjusted from time to time till

the desired input output mapping occurs [22]. Training is done on known examples

and testing is done on unknown samples. The training procedure itself consisted of

two processes involving feed-forwarding the input data followed by back

propagation of error by adjusting weights to minimize error on each training epoch

[24]. Following research paper presents the effectiveness of MLP for diagnosis and

prognosis of breast cancer-

An expert system for detection of breast cancer based on association rules

and neural network, Author: M. Karabatak and M. C. Ince [93]

This paper presents an automatic diagnosis system for detecting breast cancer based

on association rules (AR) and neural network (NN). In this study, AR is used for

reducing the dimension of breast cancer database and NN is used for intelligent

classification. The proposed AR + NN system performance is compared with NN

XXV

model. The dimension of input feature space is reduced from nine to four by using

AR. In test stage, 3-fold cross validation method was applied to the Wisconsin breast

cancer database to evaluate the proposed system performances. The correct

classification rate of proposed system is 95.6%. This research demonstrated that the

AR can be used for reducing the dimension of feature space and proposed AR + NN

model can be used to obtain fast automatic diagnostic systems for other diseases.

Cross Validation Evaluation for Breast Cancer Prediction Using Multilayer

Perceptron Neural Networks, Author: Shirin A. Mojarad, Satnam S. Dlay, Wai L.

Woo and Gajanan V. Sherbet [25]

Abstract:

The aim of this study is to investigate the effectiveness of a Multilayer Perceptron

(MLP) for predicting breast cancer progression using a set of four biomarkers of

breast tumors. The biomarkers include DNA ploidy, cell cycle distribution

(G0G1/G2M), steroid receptors (ER/PR) and S-Phase Fraction (SPF). A further

objective of the study is to explore the predictive potential of these markers in

defining the state of nodal involvement in breast cancer. Two methods of outcome

evaluation viz. stratified and simple k-fold Cross Validation (CV) are studied in order

to assess their accuracy and reliability for neural network validation. Criteria such as

output accuracy, sensitivity and specificity are used for selecting the best validation

technique besides evaluating the network outcome for different combinations of

markers.

Comments:

The presence of metastasis in the regional lymph nodes is the most important factor

in predicting prognosis in breast cancer. Many biomarkers have been identified that

appear to relate to the aggressive behaviour of cancer. However, the nonlinear

relation of these markers to nodal status and also the existence of complex interaction

between markers have prohibited an accurate prognosis. The results show that

stratified 2-fold CV is more accurate and reliable compared to simple k-fold CV as it

obtains a higher accuracy and specificity and also provides a more stable network

validation in terms of sensitivity. Best prediction results are obtained by using an

individual marker-SPF which obtains an accuracy of 65%. The authors suggest that

XXVI

MLP-based analysis provides an accurate and reliable platform for breast cancer

prediction given that an appropriate design and validation method is employed.

WBCD breast cancer database classification applying artificial

metaplasticity neural network, Author: A. Marcano-Cedeo , J. Quintanilla-

Domnguez and D. Andina [26]

Abstract:

The correct diagnosis of breast cancer is one of the major problems in the medical

field. From the literature it has been found that different pattern recognition

techniques can help them to improve in this domain. These techniques can help

doctors form a second opinion and make a better diagnosis. In this paper we present a

novel improvement in neural network training for pattern classification. The

proposed training algorithm is inspired by the biological metaplasticity property of

neurons and Shannons information theory. During the training phase the Artificial

metaplasticity Multilayer Perceptron (AMMLP) algorithm gives priority to updating

the weights for the less frequent activations over the more frequent ones. In this way

metaplasticity is modeled artificially. AMMLP achieves a more effcient training,

while maintaining MLP performance. To test the proposed algorithm we used the

Wisconsin Breast Cancer Database (WBCD). AMMLP performance is tested using

classification accuracy, sensitivity and specificity analysis, and confusion matrix.

The obtained AMMLP classification accuracy of 99.26%, a very promising result

compared to the Backpropagation Algorithm (BPA) and recent classification

techniques applied to the same database.

Comments:

In this study, a Artificial Neural Network for Classification Breast Cancer based on

the biological metaplasticity property was presented. The proposed AMMLP

algorithm was compared with the classic MLP with Backpropagation, applied to the

Wisconsin Breast Cancer Database. The AMMLP classifier shows a great

performance obtaining the following results average for 100 networks: 97.89% in

specificity, 100% in sensitivity and the total classification accuracy of 99.26%, the

ROC curve to show the AMMPL superiority over the classic MLP with

Backpropagation and finally the results obtained after calculating the AUC in this

XXVII

case were as follows for AMMLP is 0.989 while the AUC for BP is 0.928, this

indicates one more time the AMMLP superiority over the BP, in this particular case.

From the above results, we conclude that the AMMLP obtains very promising results

in classifying the possible breast cancer. We believe that the proposed system can be

very helpful to the physicians for their as a second opinion for their final decision. By

using such an efficient tool, they can make very accurate decisions. Our AMMLP,

proved to be equal or superior to the state-of-the-art algorithms applied to the

Wisconsin Breast Cancer Database, and shows that it can be an interesting

alternative.

Classification of breast cancer by comparing back propagation training

algorithms Author: F. Paulin and A. Santhakumaran [27]

Abstract:

Breast cancer diagnosis has been approached by various machine learning techniques

for many years. This paper presents a study on classification of Breast cancer using

Feed Forward Artificial Neural Networks. Back propagation algorithm is used to

train this network. The performance of the network is evaluated using Wisconsin

breast cancer data set for various training algorithms. The highest accuracy of

99.28% is achieved when using levenberg marquardt algorithm.

Comments:

The Back-propagation algorithm and supervised training method are used in this

project. The aim of training is to adjust the weights until the error measured between

the desired output and the actual output is reduced. The training stops when this

reaches a sufficiently low value. To analyze the data neural network tool box which

is available in MATLAB software is used. In this research a feed forward neural

network is constructed and the Back propagation algorithm is used to train the

network. The proposed algorithm is tested on a real life problem, the Wisconsin

Breast Cancer Diagnosis problem. In this paper six training algorithms are used,

among these six methods, Levenberg Marquardt method gave the good result of

99.28%. Preprocessing using min-max normalization is used in this diagnosis.

Further work is needed to increase the accuracy of classification of breast cancer

diagnosis.

XXVIII

Radial Basis Function Neural Network (RBFNN)

RBFNN is trained to perform a mapping from an m-dimensional input space to an n-

dimensional output space. An RBFNN consists of the m-dimensional input x being

passed directly to a hidden layer. Suppose there are c neurons in the hidden layer.

Each of the c neurons in the hidden layer applies an activation function, which is a

function of the Euclidean distance between the input and an m-dimensional prototype

vector. Each hidden neuron contains its own prototype vector as a parameter. The

output of each hidden neuron is then weighted and passed to the output layer. The

outputs of the network consist of sums of the weighted hidden layer neurons [28].

The transformation from the input space to the hidden-unit space is nonlinear where

as the transformation from the hidden-unit space to the output space is linear [29].

The performance of an RBFNN depends on the number and location (in the input

space) of the centers, the shape of the RBFNN functions at the hidden neurons, and

the method used for determining the network weights. Some researchers have trained

RBFNN networks by selecting the centers randomly from the training data [30].

Following research paper describes the application of RBFNN in breast cancer

prediction-

Breast Cancer Detection using Recursive Least Square and Modified Radial

Basis Functional Neural Network, Author: M. R. Senapati, P. K .Routray, P. K.

Dask [31]

Abstract:

A new approach for classification has been presented in this paper. The proposed

technique, Modified Radial Basis Functional Neural Network (MRBFNN) consists of

assigning weights between the input layer and the hidden layer of Radial Basis

functional Neural Network (RBFNN). The centers of MRBFNN are initialized using

Particle swarm Optimization (PSO) and variance and centers are updated using back

propagation and both the sets of weights are updated using Recursive Least Square

(RLS). Our simulation result is carried out on Wisconsin Breast Cancer (WBC) data

set. The results are compared with RBFNN, where the variance and centers are

updated using back propagation and weights are updated using Recursive Least

XXIX

Square (RLS) and Kalman Filter. It is found the proposed method provides more

accurate result and better classification.

Comments:

Modified Radial Basis Functional Neural Network is same as that of RBFNN with an

exception that weights are assigned between neurons in the input layer and the

neurons in the hidden layer. An efficient Pattern Recognition and rule extraction

technique using Recursive Least square approximation and Modified Radial Basis

Functional Neural Networks (MRBFNN) is presented in this paper. The weights

between input layer and the hidden layer as well as hidden layer and output layer of

the RBFNN classifier can be trained using the linear recursive least square (RLS)

algorithm. The RLS has a much faster rate of convergence compared to gradient

search and least mean square (LMS) algorithms.

Probabilistic Neural Networks (PNN):

PNN is a kind of RBFNN suitable for classification problems. It has three layers. The

network contains an input layer, which has as many elements as there are separable

parameters needed to describe the objects to be classified. It has a pattern layer,

which organizes the training set such that an individual processing element represents

each input vector. And finally, the network contains an output layer, called the

summation layer, which has as many processing elements as there are classes to be

recognized [32]. For detection of breast cancer output layer should have 2 neurons

(one for benign class, and another for malignant class). Each element in this layer

combines via processing elements within the pattern layer which relate to the same

class and prepares that category for output [32].

Fig. 2.2 Probabilistic neural network for breast cancer diagnosis

PNN used in [33] has a multilayer structures consisting of a single RBF hidden layer

of locally tuned units which are fully interconnected to an output layer (competitive

XXX

layer) of two units, as shown in Fig. 2.2. In this system, real valued input vector is

features vector, and two outputs are index of two classes. All hidden units

simultaneously receive the eight-dimensional real valued input vector. The input

vector to the network is passed to the hidden layer nodes via unit connection weights.

The hidden layer consists of a set of radial basis functions. Associated with jth

hidden unit is a parameter vector, called (C_j ) a center. The hidden layer node

calculates the Euclidean distance between the center and the network input vector

and then passes the result to the radial basis function. All the radial basis functions

are of Gaussian type. Equations which used in the neural network model are as

follows-

X_j=(f -c _j * b^ih)

2.1

(X)=exp(-X^2 )

2.2

b^ih= 0.833/s

2.3

S_i=_(j=1)^hW_ji^ho* X_j

2.4

1, if Si max of { S_1,S_2 }

Y_i= 2.5

0, else

where i = 1,2, j = 1,2,. . . ,h, Y_i is the ith output (classification index), (f ) is the

eight-dimensional real valued input vector, W_ji^ho is the weight between the jth

hidden node and the ith output node, (C _j) is the center vector of the jth hidden

node, s is the real constant known as spread factor, bih is the biasing term of radial

basis layer, and (.) is the nonlinear RBF (Gaussian). PNN provides a general

solution to pattern classification problems by following an approach developed in

statistics, called Bayesian classifiers [34][35]. PNN combines the Bays decision

strategy with the Parzen non-parametric estimator of the probability density functions

XXXI

of different classes [36]. Following research papers present the application of PNN in

breast cancer diagnosis and prognosis-

The Wisconsin Breast Cancer Problem: Diagnosis and DFS time prognosis

using probabilistic and generalised regression neural classifiers Author: Ioannis

Anagnostopoulos, Christos Anagnostopoulos, Angelos Rouskas, George

Kormentzas and Dimitrios Vergados [37].

Abstract:

This papers deals with the breast cancer diagnosis and prognosis problem employing

two proposed neural network architectures over the Wisconsin Diagnostic and

Prognostic Breast Cancer (WDBC/WPBC) datasets. A probabilistic approach is

dedicated to solve the diagnosis problem, detecting malignancy among instances

derived from the Fine Needle Aspirate (FNA) test, while the second architecture

estimates the time interval that possibly contain the right end-point of the patients

Disease-Free Survival (DFS) time. The accuracy of the neural classifiers reaches

nearly 98% for the diagnosis and 92% for the prognosis problem. Furthermore, the

prognostic recurrence predictions were further evaluated using survival analysis

through the Kaplan-Meier approximation method and compared with other

techniques from the literature.

Comments:

In this paper PNN is used to solve the diagnosis problem because this kind of

networks present high-generalization ability and do not require large amount of

training data. PNN is used to detect malignancy among instances derived from the

Fine Needle Aspirate (FNA) test. The accuracy of the neural classifiers reaches

nearly 98%.

Generalized Regression Neural Networks (GRNN):

GRNN is the paradigm of RBFNN, often used for function approximations [38].

GRNN consists of four layers: The first layer is responsible for reception of

information, the input neurons present the data to the second layer (pattern neurons),

the output of the pattern neurons are forwarded to the third layer (summation

neurons), summation neurons are sent to the fourth layer (output neuron)[39]. If f(x)

XXXII

is the probability density function of the vector random variable x and its scalar

random variable z, then the GRNN calculates the conditional mean E(z\x) of the

output vector. The joint probability density function f(x, z) is required to compute the

above conditional mean. GRNN approximates the probability density function from

the training vectors using Parzen windows estimation [40]. GRNNs do not require

iterative training; the hidden- to-output weights are just the target values tk, so the

output y(x), is simply a weighted average of the target values tk of training cases xk

close to the given input case x. It can be viewed as a normalized RBF network in

which there is a hidden unit centered at every training case. These RBF units are

called kernels and are usually probability density functions such as the Gaussians.

The only weights that need to be learned are the widths of the RBF units h. These

widths (often a single width is used) are called smoothing parameters or bandwidths

and are usually chosen by cross validation [38]. Following research paper gives

breast cancer diagnosis and prognosis results by GRNN-

The Wisconsin Breast Cancer Problem: Diagnosis and DFS time prognosis

using probabilistic and generalised regression neural classifiers, Author: Ioannis

Anagnostopoulos and Christos Anagnostopoulos, Angelos Rouskas, George

Kormentzas, and Dimitrios Vergados [37].

Abstract:

This papers deals with the breast cancer diagnosis and prognosis problem

employing two proposed neural network architectures over the Wisconsin Diagnostic

and Prognostic Breast Cancer (WDBC/WPBC) datasets. A probabilistic approach is

dedicated to solve the diagnosis problem, detecting malignancy among instances

derived from the Fine Needle Aspirate (FNA) test, while the second architecture

estimates the time interval that possibly contain the right end-point of the patients

Disease-Free Survival (DFS) time. The accuracy of the neural classifiers reaches

nearly 98% for the diagnosis and 92% for the prognosis problem. Furthermore, the

prognostic recurrence predictions were further evaluated using survival analysis

through the Kaplan-Meier approximation method and compared with other

techniques from the literature.

Comments:

XXXIII

Generalised Regression Neural Network architecture (GRNNs) is used for

breast cancer prognosis in this paper. These neural networks have the special ability

to deal with sparse and non-stationary data where non-linear relationships exist

among inputs and outputs. In the problem addressed, the network calculates a time

interval that corresponds to a possible right end-point of the patients disease-free

survival time. Thus, if f(x,z) is the probability density function of the vector random

variable x and its scalar random variable z, then the GRNN calculates the conditional

mean E(x\z)of the output vector. The joint probability density function f(x,z) is

required to compute the above conditional mean. GRNN approximates the pdf from

the training vectors using Parzen windows estimation, which is a non-parametric

technique approximating a function by constructing it out of many simple parametric

probability density functions. Parzen windows are considered as Gaussian functions

with a constant diagonal covariance matrix. The accuracy of the neural classifiers

reaches 92% for prognosis problem.

Fuzzy- Neuro System:

Fuzzy-Neuro system uses a learning procedure to find a set of fuzzy membership

functions which can be expressed in the form of if-then rules[41]-[43]. A fuzzy

inference system uses fuzzy logic, rather than Boolean logic, to reason about data

[44]. Its basic structure includes four main components- a fuzzifier, which translates

crisp (real-valued) inputs into fuzzy values; an inference engine that applies a fuzzy

reasoning mechanism to obtain a fuzzy output; a defuzzifier, which translates this

latter output into a crisp value; and a knowledge base, which contains both an

ensemble of fuzzy rules, known as the rule base, and an ensemble of membership

functions, known as the database. The decision-making process is performed by the

inference engine using the rules contained in the rule base[45].The fuzzy logic

procedure can be summarized in following steps: Determination of the input and

output variables that describe the observed phenomenon together with the selection

of their variation interval, defining a set of linguistic values together with their

associated membership functions that map/cover the numerical range of the fuzzy

variable, and defining a set of fuzzy inference rules between input and output fuzzy

XXXIV

variables[46]. Following research papers uses fuzzy logic approach for breast cancer

diagnosis-

A fuzzy-genetic approach to breast cancer diagnosis, Author: Carlos

Andres Pena-Reyes, Moshe Sipper [47].

Abstract:

The automatic diagnosis of breast cancer is an important, real-world medical

problem. In this paper we focus on the Wisconsin breast cancer diagnosis (WBCD)

problem, combining two methodologiesfuzzy systems and evolutionary

algorithmsso as to automatically produce diagnostic systems. We find that our

fuzzy-genetic approach produces systems exhibiting two prime characteristics: first,

they attain high classification performance (the best shown to date), with the

possibility of attributing a confidence measure to the output diagnosis; second, the

resulting systems involve a few simple rules, and are therefore (human-)

interpretable.

Comments:

A good computerized diagnostic tool should possess two characteristics, which are

often in conflict. First, the tool must attain the highest possible performance, i.e.

diagnose the presented cases correctly as being either benign or malignant.

Moreover, it would be highly desirable to be in possession of a so-called degree of

confidence: the system not only provides a binary diagnosis (benign or malignant),

but also outputs a numeric value that represents the degree to which the system is

confident about its response. Second, it would be highly beneficial for such a

diagnostic system to be human-friendly, exhibiting so-called interpretability. This

means that the physician is not faced with a black box that simply spouts answers

(albeit correct) with no explanation; rather, we would like for the system to provide

some insight as to how it derives its outputs. In this paper we combine two

methodologiesfuzzy systems and evolutionary algorithmsso as to automatically

produce systems for breast cancer diagnosis. The major advantage of fuzzy systems

is that they favour interpretability, however, finding good fuzzy systems can be quite

an arduous task. This is where evolutionary algorithms step in, enabling the

automatic production of fuzzy systems, based on a database of training cases.

XXXV

Cancer Diagnosis Using Modified Fuzzy Network, Author: Essam Al-

Daoud [48]

Abstract:

In this study, a modified fuzzy c-means (MFCM) radial basis function (RBF)

network is proposed. The main purposes of the suggested model are to diagnose the

cancer diseases by using fuzzy rules with relatively small number of linguistic labels,

reduce the similarity of the membership functions and preserve the meaning of the

linguistic labels. The modified model is implemented and compared with adaptive

neuro-fuzzy inference system (ANFIS). The both models are applied on "Wisconsin

Breast Cancer" data set. Three rules are needed to obtain the classification rate 97%

by using the modified model (3 out of 114 is classified wrongly). On the contrary,

more rules are needed to get the same accuracy by using ANFIS. Moreover, the

results indicate that the new model is more accurate than the state-of-art prediction

methods. The suggested neuro-fuzzy inference system can be re-applied to many

applications such as data approximation, human behavior representation, forecasting

urban water demand and identifying DNA splice sites.

Comments:

ANFIS works with different activation functions and uses un-weighted connections

in each layer. ANFIS consists from five layers and can be adapted by a supervised

learning algorithm. In this paper ANFIS and the modified Fuzzy RBF (MFRBF) are

applied on Wisconsin Breast Cancer data set. The main purposes of the suggested

model are to diagnose the cancer diseases by using fuzzy rules with relatively small

number of linguistic labels, reduce the similarity of the membership functions and

preserve the meaning of the linguistic labels. The standard fuzzy c-means has various

well-known problems, namely the number of the clusters must be specified in

advanced, the output membership functions have high similarity, and FCM is

unsupervised method and cannot preserve the meaning of the linguistic labels. On the

contrary, the grid partitions method solves some of the previous matters, but it has

very high number of the output clusters. The basic idea of the suggested MFCM

algorithm is to combine the advantages of the two methods, such that, if more than

one cluster's center exist in one partition then merge them and calculate the

XXXVI

membership values again, but if there is no cluster's center in a partition then delete it

and redefined the other clusters. The experimental results show that MFRBF can be

used to get high accuracy with fewer and unambiguous rules. The classificati-on rate

is 97% by using only three rules. On the contrary, more rules are needed to get the

same accuracy by using ANFIS. Moreover the features projected partition in ANFIS

is amb-iguous and cant preserve the meaning of the linguistic labels.

Genetic Algorithm (GA):

The standard GA proceeds as follows: an initial population of individuals is

generated at random or heuristically. Every evolutionary step, known as a generation,

the individuals in the current population are decoded and evaluated according to

some predefined quality criterion. To form a new population (the next generation),

individuals are selected according to their fitness. Many selection procedures are

currently in use, one of the simplest being fitness-proportionate selection, where

individuals are selected with a probability proportional to their relative fitness. This

ensures that the expected number of times an individual is chosen is approximately

proportional to its relative performance in the population. Thus, high-fitness or good

individuals stand a better chance of reproducing, while low-fitness ones are more

likely to disappear [45]. Genetic algorithms can be used to determine the

interconnecting weights of the ANN. During training of the network, the BP requires

approximately two ANN evaluations (i.e., one forward propagation and one

backward error propagation) for each iteration, while the GA required only one ANN

evaluation (i.e., forward propagation) for each generation and each chromosome. In

comparison to the conventional BP training algorithm, the GA has shown to provide

some benefit in evolving the inter-connecting weights for the ANNs. In [49] although

the GA trained ANN didnt outperform the BP-trained ANN at all numbers of ANN

evaluations in the test set, the GA trained ANN was found to converge faster than the

BP trained ANN in the training set.

Computer-aided diagnosis of breast cancer using artificial neural networks:

Comparison of Back propagation and Genetic Algorithms Author: Yuan-Hsiang

Chang, Bin Zheng, Xiao-Hui Wang, abd Walter F. Good [49].

Abstract:

XXXVII

The authors investigated computer-aided diagnosis (CAD) schemes to determine the

probabilio for the presence of breast cancer using artificial neural networks (ANN)

that were trained by a Backpropagation (BP) algorithm or by a Genetic Algorithm

(GA). A clinical database of 418 previously verified patient cases was employed and

randomly pariitioned into two independent sets for CAD training and testing. During

training, the BP and the GA were independenti'y applied to optimize, or to evolve the

inter-connecting weights of the ANN . Both the BP-trained and the GA-trained CAD

performances were then compared using receiver-operating characteristics (ROC)

analysis. In the training set, the BP-trained and the GA-trained CAD schemes yielded

the areas under ROC curves of 0.91 and 0.93, respectively. In the testing set, both the

BP-trained and the GA-trained ANN, yielded the areas under ROC curves of

approximately 0.83. These results demonstrated that the GA performed slightly

better, although not significantly, than BP for the training of the CAD schemes.

Comments:

In this paper it is found that although the GA trained ANN didnt outperform the BP-

trained ANN at all numbers of ANN evaluations in the test set, the GA trained ANN

was found to converge faster than the BP trained ANN in the training set.

2.3 Comparison of neural network techniques for breast cancer diagnosis and

prognosis

NN techniques for breast cancer diagnosis are compared for WBC data. It is

concluded that the MLP, RBFNN, PNN, GRNN, GA, Fuzzy- neuro -system, SANE,

IGANIFS, Xcyct system, ANFIS, SIANN may be used for the classification problem.

Almost all intelligent computational learning algorithms use supervised learning. The

accuracy of different methods is compared in table 2.1.

Table 2.1 Accuracy comparison for test data classification

Type of Network Accuracy References

Radial Basis Function Neural Network (RBFNN) 96.18% [18]

Probabilistic Neural Network (PNN) 97.0% [18]

Multilayer Perceptorn (MLP) 95.74% [18]

XXXVIII

Generalized Regression Neural Network (GRNN) 98.8% [18]

Symbiotic Adaptive Neuro-Evolution (SANE) 98.7% [50]

Information Gain and Adaptive Neuro-Fuzzy Inference System (IGANIFS)

98.24% [51]

Xcyct system using leave one out method 90 to 91% [52]

Adaptive Neuro-Fuzzy Inference System (ANFIS) 59.90% [53]

Fuzzy 96.71% [54]

Shunting Inhibitory Artificial Neural Networks (SIANN) 100% [55]

XXXIX

Chapter 4

Matlab

4.1 Introduction

MATLAB is a powerful computing system for handling the calculations involved in

scientific and engineering problems. The name MATLAB stands for MATrix

LABoratory, because the system was designed to make matrix computations

particularly easy[87]. Matlab program and script files always have filenames ending

with ".m". Script files contain a sequence of usual MATLAB commands, that are

executed (in order) once the script is called within MATLAB. In MATLAB almost

every data object is assumed to be an array. A good source of information related to

MATLAB, the creator company THE MATHWORKS INC and their other products

is their Web Page at www.mathworks.com [88]. There are two essential requirements

for successful MATLAB programming [87]-

a) We need to learn the exact rules for writing MATLAB statements.

b) We need to develop a logical plan of attack for solving particular problems.

The MATLAB program implements the MATLAB programming language, and

provides a very extensive library of predefined functions to make technical

programming task easier and more efficient.

XL

4.2 Advantages of MATLAB [89]

MATLAB has many advantages compared to conventional computer languages for

technical problem solving. Among them are-

1. Ease of use:

MATLAB is an interpreted language like Basic, it is very easy to use. Programs may

be easily written and modified with the built-in integrated development environment

and debugged with the MATLAB debugger. Because the language is so easy to use,

it is ideal for the rapid prototyping of new programs. Many program development

tools are provided to make the program easy to use. They include an integrated

editor/debugger, on-line documentation and manuals, a workspace browser, and

extensive demos.

2. Platform Independence:

In MATLAB programs written on any platform will run on all of the other platforms,

and data files written on any platform may be read transparently on any other

platform. As a result, programs written in MATLAB can migrate to new platforms

when the needs of user changes.

3. Predefined functions:

MATLAB has extensive library of predefined functions that provide tested

and pre-packaged solutions to many basic technical tasks. There are many special

purpose toolboxes available to solve complex problems in specific areas. Toolboxes

are libraries of MATLAB functions used to customize MATLAB for solving

particular class of problem. Toolboxes are a result of some of the worlds top

researchers in specialized fields. They are equivalent to pre-packaged of-the-

shelfsoftware for particular class of problem. These are the collection of special files

called M files that extend the functionality of the base program. Such files are called

m-files because they must have the filename extension .m. This extension is

required in order for these files to be interpreted by MATLAB. Each toolbox is

purchased separately. If an evaluation license is requested, the MathWorks sales

department requires detailed information about the project for which MATLAB is to

be evaluated. Overall the process of acquiring a license is expensive in terms of

XLI

money and time. If granted (which it often is), the evaluation license is valid for two

to four weeks. The various toolboxes are-

a. Control Systems

b. Signal Processing

c. Communications

d. System Identification

e. Robust Control

f. Simulink

g. Image processing

h. neural networks

i. fuzzy logic

j. Analysis

k. Optimization

l. Spline

m. Symbolic

n. User interface utility

4. Device- Independent plotting

MATLAB has many integral plotting and imaging commands. The plots and images

can be displayed on any graphical output device supported by the computer on which

MATLAB is running. This capability makes MATLAB an outstanding tool for

visualizing technical data.

5. Graphical User Interface:

MATLAB include tools that allow a programmer to interactively construct a

graphical user interface (GUI) for his/her own program. With this capability, the

programmer can design sophisticated data-analysis programs that can be operated by

relatively inexperienced users.

6. MATLAB Compiler:

XLII

MATLAB code interpreted rather than compiled. A separate compiler is available.

This compiler can compile a MATLAB program into a true executable code that runs

faster than the interpreted code. Its a great way to convert a prototype MATLAB

program into an executable and suitable for sale and distribution to users.

MATLAB is an efficient tool to develop applications based on neural network.

Therefore it is used in proposed result for breast cancer diagnosis and prognosis

using polynomial neural network.

4.3 Limitations of MATLAB [89]

Following are some limitations of using MATLAB-

1. It is an interpreted language and therefore can execute more slowly than

compiled languages.

This problem can be mitigated by properly structuring the MATLAB program and by

the use of MATLAB compiler to compile the final MATLAB program before

distribution and general use.

2. A full copy of MATLAB is 5-10 times more expensive than a conventional

than C or FORTRAN compiler. There is also an inexpensive student edition for

MATLAB which is a great tool for students. The student edition of MATLAB is

essentially identical to the full edition.

4.4 Neural Network Toolbox [90]

Neural network toolbox is equivalent to pre-packaged of-the-shelf software for

neural network class of problem. The Neural Network Toolbox software uses the

network object to store all of the information that defines a neural network. There are

four different levels at which the Neural Network Toolbox software can be used-

1. The first level is represented by the GUIs that are described in Getting

Started with Neural Network Toolbox. These provide a quick way to access the

power of the toolbox for many problems of function fitting, pattern recognition,

clustering and time series analysis.

XLIII

2. The second level of toolbox use is through basic command-line operations.

The command-line functions use simple argument lists with intelligent default

settings for function parameters. (You can override all of the default settings, for

increased functionality.) This topic, and the ones that follow, concentrate on

command-line operations. The GUIs described in Getting Started can automatically

generate MATLAB code files with the command-line implementation of the GUI

operations. This provides a nice introduction to the use of the command-line

functionality.

3. A third level of toolbox use is customization of the toolbox. This advanced

capability allows you to create your own custom neural networks, while still having

access to the full functionality of the toolbox.

4. The fourth level of toolbox usage is the ability to modify any of the M-files

contained in the toolbox. Every computational component is written in MATLAB

code and is fully accessible.

4.5 Neural Network Design using Neural Network Toolbox[90]

The multilayer feed forward neural network is the workhorse of the Neural Network

Toolbox software. It can be used for both function fitting and pattern recognition

problems. With the addition of a tapped delay line, it can also be used for prediction

problems. The work flow for the neural network design process has seven primary

steps:

Collecting the data

Creating the network

Configuring the network

Initializing the weights and biases

Training the network

Validating the network

Using the network

XLIV

The first step might happen outside the framework of Neural Network Toolbox

software, but this step is critical to the success of the design process.

4.5.1 Collecting the data

We need to collect and prepare sample data that cover the range of inputs for which

the network will be used. After the data have been collected, there are two steps that

need to be performed before the data are used to train the network: the data need to

be pre-processed, and they need to be divided into subsets.

4.5.1.1 Pre-processing and post-processing the data

The most common pre-processing routines are provided automatically when we

create a network, and they become part of the network object, so that whenever the

network is used, the data coming into the network is pre-processed in the same way.

It is easiest to think of the neural network as having a pre-processing block that

appears between the input and the first layer of the network and a post-processing

block that appears between the last layer of the network and the output, as shown in

the fig. 4.1.

Input Output

Fig 4.1 Pre-processing and post-processing

Most of the network creation functions in the toolbox, including the multilayer

network creation functions, such as feedforwardnet, automatically assign processing

functions to network inputs and outputs. These functions transform the input and

target values you provide into values that are better suited for network training. Some

common pre-processing and post-processing functions are shown in table 4.1.

Table 4.1 Pre-processing and post-processing functions

Function Algotithm

Mapminmax Normalize inputs/targets to fall in the range [1, 1]

XLV

processpca Extract principal components from the input vector

fixunknowns Process unknown inputs

Generally, the normalization step is applied to both the input vectors and the target

vectors in the data set. In this way, the network output always falls into a normalized

range. The network output can then be reverse transformed back into the units of the

original target data when the network is put to use in the field.

4.5.1.2 Representing Unknown or Dont Care Targets

Unknown or dont care targets can be represented with NaN values. All the

performance functions of the toolbox will ignore those targets for purposes of

calculating performance and derivatives of performance.

4.5.1.3 Dividing the Data

When training multilayer networks, the general practice is to first divide

the data into three subsets- trining, validation and testing. The function dividerand

is a default function that divide the data randomly into three subsets.

4.5.2 Creating and configuring the network

Basic components of a neural network are created and stored in the network object.

As an example, the dataset file contains a predefined set of input and target vectors.

We Load the dataset using the load command. Loading the dataset file creates two

variables. The input matrix and The target matrix. The function

feedforwardnetcreates a multilayer feedforward network.The resulting network can

then be configured with the configure command.

4.5.3 Initializing weights and biases

The configure command automatically initializes the weights, but we might want to

reinitialize them. You do this with the init command. This function takes a network

object as input and returns a network object with all weights and biases initialized.

4.5.4 Train the network

Once the network weights and biases are initialized, the network is ready for training.

The multilayer feed forward network can be trained for function approximation

XLVI

(nonlinear regression) or pattern recognition. The training process requires a set of

examples of proper network behaviour- network inputs pand target outputs t. The

process of training a neural network involves tuning the values of the weights and

biases of the network to optimize network performance, as defined by the network

performance function net.performfcn. The default performance function for feed

forward networks is mean square error (mse). For training multilayer feedforward

networks, any standard numerical optimization algorithm like gradient descent can be

used to optimize the performance function. Gradient descent algorithm updates the

network weights and biases in the direction in which the performance function

decreases most rapidly, the negative of the gradient. Training function traingd is

used for gradient descent algorithm. The gradient is calculated using a technique

called the back propagation algorithm, which involves performing computations

backward through the network. Properly trained multilayer networks tend to give

reasonable answers when presented with inputs that they have never seen. This

property is called generalization. The default generalization feature for the multilayer

feed forward network is early stopping. Data are automatically divided into training,

validation and test sets. The error on the validation set is monitored during training,

and the training is stopped when the validation increases over

net.trainParam.max_fail iterations.

4.5.5Validation of network

When the training is complete, we check the network performance and determine if

any changes need to be made to the training process, the network architecture or the

data sets. The first thing to do is to check the training record, tr, which was the

second argument returned from the training function. For example, tr.trainInd,

tr.valInd and tr.testInd contain the indices of the data points that were used in the

training, validation and test sets, respectively. If we want to retrain the network using

the same division of data, we can set net.divideFcn to 'divideInd',

net.divideParam.trainInd to tr.trainInd, net.divideParam-.valInd to tr.valInd,

net.divideParam.testInd to tr.testInd. We can use the training record to plot the

performance progress by using the plotperf command. The next step in validating

the network is to create a regression plot, which shows the relationship between the

outputs of the network and the targets. If the training were perfect, the network

outputs and the targets would be exactly equal, but the relationship is rarely perfect in

XLVII

practice. If the network is not sufficiently accurate, we can try initializing the

network and the training again. Each time your initialize a feed forward network, the

network parameters are different and might produce different solutions.

4.5.6 Use the network

After the network is trained and validated, the network object can be used to

calculate the network response to any input.

XLVIII

Chapter 5

Simulation and Results

5.1 Introduction

For simulation three different datasets named Wisconsin Breast Cancer original

(WBC) dataset, Wisconsin diagnosis Breast Cancer (WBCD) dataset and Wisconsin

Prognosis Breast Cancer (WPBC) dataset are downloaded from the UCI Machine

Learning Repository website [91] and saved as a text file. A brief description of

Wisconsin dataset is given in table 5.1. Detaied decription of dataset is provided in

next section.

Table 5.1 A brief description of Breast Cancer datasets

Dataset name No of attributes No of instances No. of classes

Wisconsin Breast Cancer (WBC) 11 699 2

Wisconsin Diagnosis Breast Cancer (WDBC) 32 569 2

Wisconsin Prognosis Breast Cancer (WPBC) 34 198 2

XLIX

After downloading we have got three separate files; one for each dataset. These files

are then imported into Excel spreadsheets and the values are saved with the

corresponding attributes as column headers. The ID of the patient cases does not

contribute to the classifier performance. Hence it is removed and the outcome

attribute defines the target or dependant variable. We preprocessed the data using

principal component analysis described in chapter 3[34]. After pre processing the

WBC data is applied to PNN described in chapter 3[29-31] which classifies the data

into two sets. The overall classification involves training and testing as shown in fig

5.1. Implementation is done with help of MATLAB 7.0 using neural network toolbox

described in chapter 4 [40-41].

Fig. 5.1 Flow chart of ANN process

5.2 Description of dataset

Detailed description of the three datasets used in the proposed research is as follows

[83]-

Wisconsin Breast Cancer (WBC) Dataset :

This database has 699 instances and 10 attributes including the class attribute.

Attribute 1 through 9 are used to represent instances. Each instance has one of two

possible classes: benign or malignant. According to the class distribution 458 or

65.5% instances are Benign and 241 or 34.5% instances are malignant. Table 5.2

provides the attribute information.

Table 5.2 Attribute information of WBC dataset

S.no Attribute Domain

1 Clump thickness 1-10

2 Uniformity of cell size 1-10

L

3 Uniformity of cell shape 1-10

4 Marginal adhesion 1-10

5 Single epithelial cell size 1-10

6 Bare nuclei 1-10

7 Bland chromatin 1-10

8 Normal nucleoli 1-10

9 Mitosis 1-10

Class 2 for benign, 4 for malignant

In the Clump thickness benign cells tend to be grouped in monolayer, while

cancerous cells are often grouped in multilayer. While in the Uniformity of cell

size/shape the cancer cells tend to vary in size and shape. That is why these

parameters are valuable in determining whether the cells are cancerous or not. In the

case of Marginal adhesion the normal cells tend to stick together, where cancer cells

tend to lose this ability. So loss of adhesion is a sign of malignancy. In the Single

epithelial cell size the size is related to the uniformity mentioned above. Epithelial

LI

cells that are significantly enlarged may be a malignant cell. The Bare nuclei is a

term used for nuclei that is not surrounded by cytoplasm (the rest of the cell). Those

are typically seen in benign tumors. The Bland Chromatin describes a uniform

texture of the nucleus seen in benign cells. In cancer cells the chromatin tends to be

coarser. The Normal nucleoli are small structures seen in the nucleus. In normal cells

the nucleolus is usually very small if visible. In cancer cells the nucleoli become

more prominent, and sometimes there are more of them. Finally, Mitoses is nuclear

division plus cytokines and produce two identical daughter cells during prophase. It

is the process in which the cell divides and replicates. Pathologists can determine the

grade of cancer by counting the number of mitoses.

Wisconsin Diagnosis Breast Cancer (WDBC) Dataset :

This database has 569 instances and 32 attributes including the class attribute.

Attribute 2 is class attribute. Other attributes are used to represent instances. Each

instance has one of two possible classes: benign or malignant. According to the class

distribution 357 instances are Benign and 212 instances are Malignant. Table 5.3

provides the attribute information of WDBC dataset.

Table 5.3 Attribute information of WDBC dataset

Attribute name Significance Attribute ID

ID Unique ID of patient 1

Outcome Diagnosis ( B- Benign / M- Malingnant) 2

Radius 1,2,3 Mean of distances from centre to points on the perimeter 3, 13, 23

Texture 1, 2,3 Standard deviation of gray scale values 4, 14, 24

Perimeter 1,2,3 Perimeter of the cell nucleolus 5, 15,25

Area 1,2,3 Area of the cell nucleolus 6, 16, 26

Smoothness 1,2,3 Local variation in radius lengths 7, 17,27

Compactness 1,2,3 Perimeter2 / area - 1.0 8, 18, 28

Concavity 1,2,3 Severity of concave portions of the contour 9, 19,29

Concave points 1,2,3 Number of concave portions of the contour 10, 20, 30

Symmetry 1,2,3 Symmetry of the cell nuclei 11,21, 31

LII

12, 22, 32

Fractal dimension 1,2,3 Coastline approximation 1 12, 22, 32

The details of the attributes found in WDBC dataset are : ID number, Diagnosis

(M = malignant, B = benign) and ten real-valued features are computed for each cell

nucleus: Radius, Texture, Perimeter, Area, Smoothness, Compactness, Concavity,

Concave points, Symmetry and Fractal dimension [92]. These features are computed

from a digitized image of a fine needle aspirate (FNA) of a breast mass. Where the

radius of an individual nucleus is measured by averaging the length of the radial line

segments defined by the centroid of the snake and the individual snake points. The

total distance between consecutive snake points constitutes the nuclear perimeter.The

area is measured by counting the number of pixels on the interior of the snake and

adding one-half of the pixels on the perimeter. The perimeter and area are combined

to give a measure of the compactness of the cell nuclei using the formula

perimeter2/area. Smoothness is quantified by measuring the difference between the

length of a radial line and the mean length of the lines surrounding it. This is similar

to the curvature energy computation in the snakes. Concavity is captured by

measuring the size of the indentation (concavities) in the boundary of the cell

nucleus. Chords between non-adjacent snake points are drawn and measure the

extent to which the actual boundary of the nucleus lies on the inside of each chord.

Concave Points feature is similar to concavity but counted only the number of

boundary point lying on the concave regions of the boundary. In order to measure

symmetry, the major axis, or longest chord through the center, is found. Then the

length difference between lines perpendicular to the major axis to the nuclear

boundary in both directions is measured. The fractal dimension of a nuclear boundary

is approximated using the coastline approximation described by Mandelbrot. The

perimeter of the nucleus is measured using increasingly larger rulers. As the ruler

size increases, decreasing the precision of the measurement, the observed perimeter

decreases. Plotting log of observed perimeter against log of ruler size and measuring

the downward slope gives (the negative of) an approximation to the fractal

dimension. With all the shape features, a higher value corresponds to a less regular

contour and thus to a higher probability of malignancy. The texture of the cell

LIII

nucleus is measured by finding the variance of the gray scale intensities in the

component pixels.

Wisconsin Prognosis Breast Cancer (WPBC) Dataset:

This database has 198 instances and 35 attributes including the class attribute.

Attribute 2 is class attribute. other attributes are used to represent instances. Each

instance has one of two possible classes: R (recur) or N (non-recur). According to the

class distribution 151 instances belongs to non-recur class and 47 instances belong to

recur class. Table 5.4 provides the attribute information of WPBC dataset.

Table 5.4 Attribute information of WPBC dataset

Attribute name Significance Attribute ID

ID Unique ID of patient 1

Outcome Nature of the case ( R- recurrent/ N- Non- recurrent) 2

Time TTR (Time to recur)/ DFS (Disease free survival) 3

Radius 1,2,3 Mean of distances from centre to points on the perimeter 4, 14, 24

Texture 1, 2,3 Standard deviation of gray scale values 5, 15, 25

Perimeter 1,2,3 Perimeter of the cell nucleolus 6, 16, 26

Area 1,2,3 Area of the cell nucleolus 7, 17,27

Smoothness 1,2,3 Local variation in radius lengths 8, 18, 28

LIV

Compactness 1,2,3 Perimeter2 / area - 1.0 9, 19,29

Concavity 1,2,3 Severity of concave portions of the contour 10, 20, 30

Concave points 1,2,3 Number of concave portions of the contour 11,21, 31

Symmetry 1,2,3 Symmetry of the cell nuclei 12, 22, 32

Fractal dimension 1,2,3 Coastline approximation 1 13, 23, 33

Tumour size Size of the tumour 34

Lumph node status Status of the lymph node 35

The details of the attributes found in WPBC dataset are: ID number, Outcome (R

= recur, N = non-recur), Time (R => recurrence time, N => disease-free time), from 3

to 33 ten real-valued features are computed for each cell nucleus: Radius, Texture,

Perimeter, Area, Smoothness, Compactness, Concavity, Concave points, Symmetry

and Fractal dimension. These features computed for each cell nucleus from a

digitized image of a fine needle aspirate (FNA) of a breast mass. The mean, standard

error, and/or largest (worst case-mean of the three largest values) of these features

were computed for each image, resulting in 30 features. The thirty four is Tumor size

and the thirty five is the Lymph node status. Its known from the previous lines that

the diagnosis and prognosis has the same features yet the prognosis has two

additional features as follows: Tumor Size is the diameter of the excised tumor in

centimeters. Tumor Size is divided into four classes: T-1 is from 0 - 2 centimeters. T-

2 is from 2 - 5 cm. T-3 is greater than 5cm. T-4 is a tumor of any size that has broken

LV

through (ulcerated) the skin, or is attached to the chest wall. Lymph node status is the

number of positive auxiliary lymph nodes observed at time of surgery. The lymph

nodes in the armpit (the auxillary lymph nodes) are the first place breast cancer is

likely to spread. Lymph node status is highly related to prognosis. Lymph node-

negative means the lymph nodes do not contain cancer. And Lymph node-positive

means the lymph nodes contain cancer.

According to the attributes in WDBC and WPBC datasets, these attributes have

following 3 values with 3 columns in the data set.

The mean calculated as :

Mean=_(i=1)^nx_i

5.1

The standard error calculated as:

S_e= s/n 5.2

Where refers to Standard error parameter, s refers to Standard deviation and n

refers to sample size.

Worst mean or largest mean.

5.3 Results and Discussion

5.3.1 Diagnosis Results

Diagnosis results divides the whole data into two sets- malignant (cancerous) and

benign (non-cancerous). WBC and WDB databases are used for taining and testing

the neural network. We have got different results which are shown below-

Result using WBC dataset:

PCA data preprocessing is used in this dataset to deal with missing data. PNN

architecture has 10 input nodes 2 nodes in the hidden layer 1, 2 nodes in the hidden

layer 2 and 1 output node. The MSE performance for normalized WBC data and

LVI

WBC data after applying PCA is compared in Table 5.5. We observed that MSE is

substantially reduced for PCA processed data even when 400 instances are used for

training. The testing error is compared in table 5.6.

Table 5.5 Training performance for WBC dataset

Number of Training Patterns MSE for Normalization MSE for PCA

100 0.0050 0.0011

200 0.0025 6.6408e-04

300 0.0017 4.1121e-04

400 0.0013 3.0468e-04

Table 5.6 Testing performance for WBC dataset

Number of Testing Patterns MSE for Normalization MSE for PCA

100 0.0076 3.3319e-04

200 9.7617e-05 4.1995e-04

300 3.6278e-04 1.5611e-04

400 1.4996e-04 3.0486e-04

500 1.6873e-04 9.5419e-05

600 1.0297e-04 3.8735e-05

699 1.0078e-05 2.2117e-05

Results using WDBC dataset:

WDBC data do not have missing values so there is no need to preprocess the data.

The PNN architecture has 31 inputs nodes, 2 nodes in the hidden layer 1, 2 nodes in

the hidden layer 2 and 1 output node. The MSE performance for normalized WDBC

data is given in Table 5.7. The testing error is compared in Table 5.8.

Table 5.7 Training performance for WDBC dataset

Number of Training Patterns MSE for Normalization

LVII

100 0.0050

200 0.0026

300 0.0017

400 0.0013

Table 5.8 Testing performance for WDBC dataset

Number of Testing Patterns MSE for Normalization

100 0.0340

200 0.0016

300 7.6359e-04

400 5.7702e-04

500 1.9828e-04

569 8.8298e-05

5.3.2 Prognosis Results

Prognosis results divides the whole data into two sets- recurred and non-recurred

.WPBC dataset is used for prognosis prediction. PCA data preprocessing is used in

this dataset to deal with missing data. The MSE for normalized and PCA

preprocessed WPBC data are compared in Table 5.9 and 5.10. The corresponding

graphs are given in fig. 5.3 (a) and 5.3 (b) We observed that MSE is substantially

reduced for PCA processed data.

Table 5.9 Training performance for WPBC dataset

Number of Training Patterns MSE for Normalization MSE for PCA

100 0.0050 3.5910e-08

198 0.0025 2.1737e-08

Table 5.10 Testing performance for WPBC dataset

Number of Testing Patterns MSE for Normalization MSE for PCA

LVIII

100 0.0022 6.9075e-08

198 0.0019 4.6077e-11

Fig 5.2 Comparision of the Convergence performance for WPBC dataset

(50 iterations)

Fig 5.3 (a) Testing error for normalization and PCA data for WPBC dataset over 100

data

Fig 5. 3(b) Testing error for normalization and PCA for WPBC dataset over 198 data.

Chapter 3

Artificial Neural Network

and

Principal Component Analysis

3.1 Overview of Artificial Neural Network

LIX

Neural networks are an emergent technology with an increasing number of real-

world applications [56]. Neural networks are a form of artificial intelligence that

have found application in a wide range of problems [57]-[59] and have given, in

many cases, superior results to standard statistical models [60].Artificial Neural

Networks perform various tasks such pattern matching and classification,

optimization function and data clustering. These tasks are very difficult for

traditional computers, which are faster in algorithmic computational tasks and precise

arithmetic operations [61].Originally inspired by biological models of mammalian

brains, ANN have emerged as a powerful technique for data analysis [62].Neural

Network is able to solve highly complex problems due to the non linear processing

capabilities of its neurons. In addition, the inherent modularity of the neural network

structure makes it adaptable to a wide range of applications [63].Following are the

main characteristics of ANN [64]-

The NNs exibit mapping capabilities, that is, they can map input patterns to

their associated output patterns.

The NNs learn by examples. Thus NN architectures can be trained with

known examples of a problem before they are tested for their inference capability

of unknown instances of the problem. They can, therefore, identify new objects

previously untrained.

The NNs posses the capability to generalize. Thus they can predict new

outcomes from the past trends.

The NNs are robust systems and are fault tolerant. They can, therefore, recall

full patterns from incomplete, partial or noisy patterns.

The NNs can process information in parallel, at high speed, and in distributed

manner.

3.2 Basic concepts of ANN

The terminology of ANNs has developed from a biological model of brain [64].

There are three aspects involved in the construction of a Neural Networks [63]-

Structure : The architecture and topology of Neural Networks.

Encoding : The method of changing weights ( Training ).

LX

Recall : The method and capacity to retrieve information.

A NN consists of a set of connected cells: The neurons. The neuron or unit processes

inputs of NN to create an output[64]. The network consists of a number of input

units, one or more output units, together with internal units. The outputs of the

network correspond to the variables we require to predict: the inputs to the variables

on which we base the prediction. Adjustable weights are associated with the

interconnections between the units [65]. Fig 3.1[64] shows the structure of a single

neuron. Artificial neuron performs the following- Receives signal from other

neurons, multiplies each signal by the corresponding connection strength, that is

weight, sums up the weighted signals and pass them through an activation function

and feeds output to other neurons[66].

Fig 3.1 A single neuron

3.3 Feed Forward Neural Network with Back propagation

Various Neural Networks models exist and among Feed Forward Neural Network.

Feed forward neural network model, besides being popular and simple, is easy to

implement and appropriate for classification applications [63]. The feed forward

backpropagation network does not have feedback connections, but the errors are back

propagated during training. Fig.3. 2 shows the feed forward NN for breast cancer

diagnosis. The Network consists an input layer, one or more hidden layers and an

output layer. It takes the predictive attributes as input and produces the output that is

the class attribute {benign or Malingnant}.

Fig 3.2: Feed Forward NN model for Breast Cancer diagnosis

The neurons present in the hidden and output layer have biases, which are the

connections from the units whose activation is always 1. The bias terms also acts as

weights. Backpropagation learning consists of two passes through the different layers

LXI

of the network: a forward pass and backward pass. In forward pass, input vector is

applied to the sensory nodes of the network and its effect propagates through the

network layer by layer. Finally, a set of outputs is produced as the actual response of

the network. During the forward pass the synaptic weights of the network are all

fixed. During the backward pass, the synaptic weights are all adjusted in accordance

with an error correction rule. The actual response of the network is subtracted from a

desired (target) response to produce an error signal. This error signal is then

backpropogated through the network, against the direction of synaptic connections

[67]. Propagation of errors is done beginning at the output layer, through the hidden

layer, and so on, to the input layer, in backward direction. The weights are therefore

updated at each layer, beginning at the output layer. The changes in weights are

proportional to the derivative of the errors with respect to incoming weights [68]. For

a given set of training input-output pair, A BP learning algorithm provides a

procedure for changing the weights in a BPNN to classify the given input patterns

correctly. The error is the difference between the actual (calculated) and desired

(target) output [69].The input and output of the neuron, i, (except for the input layer

according to the BP algorithm [70] is formulated in 3.1 and 3.2.

Input X_i= W_(ij ) O_j+b_i 3.1

Output O_i=f(X_i) 3.2

where Wij is the weight of the connection from neuron i to node j, bi is the numerical

value called the bias and f is the activation function. The sum in (1) is over all

neurons, j, in the previous layer. The output function is a nonlinear function, which

allows a network to solve problems that a linear network cannot [71].The training

algorithm and various parameters used for training BPNN is as follows [61]-

A. Various Parameters:

Input training vector x = (x_1,,x_i,x_n )

Output target vector t =(t_(1,.) t_(k..,) t_m)

_k= error at the output unit y_k

_j=error at the hidden unit z_j

LXII

= learning rate

V_oj= bias on hidden unit j

z_j=hidden unit at j

w_oj=bias on output unit k

y_k=output unit k.

B. Training Algorithm:

Step 1: Initialize weight to small random values.

Step 2: While stopping condition is false, do steps 3-10.

Step 3: For each training pair do Steps 4-9.

Step 4: Each input unit receives the input signal x_i and transmit to all units in the

hidden layer.

Step 5: Each hidden unit (z_(j,)j=1p) sums its weighted input signals

z_inj=v_oj+_(i=1)^nx_i v_ij , applying activation functionZ_j=f(z_inj) and

sends this signal to all units in the output layer.

Step 6: Each output unit (y_(k,) k=1 m) sums its weighted input signals

y_ink=w_ok+_(j=1)^pz_j w_jk, and applies its activation function to calculate

the output signal Y_k=f(y_ink).

Step 7: Each output unit (y_(k,) k=1 m) receives a target pattern corresponding to

an input pattern. Error information term is calculated as _k=(t_k-y_k )f(y_ink).

Step 8: Each hidden unit (z_j, j=1 n) sums its delta inputs from units in the layer

above _inj= _(k=1)^m_j w_jk .

The error information term is calculated as _j =_inj f(z_inj).

Step 9: Each output unit (y_k, k=1 m) updates its bias and weights (j = 0 p) the

weight correction term is given by w_jk=_k z_k and the bias correction

term is given byW_ok=_k. Therefore,

W_jk (new) = W_jk(old) +W_jk , W_ok(new) =W_ok+W_ok. Each

hidden unit (z_j,j=1,p) updates its bias and weights (i=0n). The weight

LXIII

correction term V_ij=_j x_i.The bias correction term V_oj=_j.

Therefore V_jk(new) = V_jk(old) +V_ij , V_oj(new) =V_oj+V_oj

Step 10: Test the stopping condition.

Steps 1 to 3 initializes the weights, steps 4-6 are called feed forward steps,

steps 7-8 are called BP steps, step 9 updates weight and biases, and finally step 10 is

stopping condition which may be the minimization of the errors, number of epochs

etc. Back propagation algorithm can be improved by considering momentum and

variable learning rate. Momentum allows a network to respond not only to the local

gradient, but also to the recent trends in error surface. Acting like a low pass filter,

momentum allows the network to ignore small features in the error surface. Without

momentum, a network may get struck in a shallow local minimum. In

backpropagation with momentum, the weight change is in a direction that is a

combination of the current and previous gradients. This is a modification of gradient

descent whose advantages arise chiefly when some training data are very different

from the majority of the data. Convergence is sometimes faster if a momentum term

is added to the weight update formulas. The performance of algorithm is very

sensitive to the proper setting of the learning rate. If the learning rate is set too high,

the algorithm may oscillate and become unstable. If the learning rate too small, the

algorithm will take too long to converge. It is not practical to determine the optimal

setting for the learning rate before training and in fact the optimal learning rate

changes during the training process, as the algorithm moves across the performance

surface. Performance of the backpropogation can be improved by allowing the

learning rate to change during the training process. An adaptive learning rate will

attempt to keep the learning step size as large as possible while keeping the learning

process stable. The learning rate is made responsive to the complexity of the local

error surface [63].

3.4 Higher order or polynomial neural network

Higher order or Polynomial neural networks ( PNNs) were first introduced by [72]

and further analyzed by [73] who referred to them as 'tensor networks' and regarded

LXIV

them as a special case of his functional-link models. PNNs use joint activations

between inputs, thus removing the task of establishing relationships between them

during training. PNN is faster to train and execute when compared to other neural

networks [73]. An error back propagation based learning using a norm-squared error

function is described as follows [74].The aggregation function is considered as a

product of linear functions in different dimensions of space. A bipolar sigmoidal

activation function is used at each node. This kind of neuron itself looks complex in

the first instance but when used to solve a complicated problem needs less number of

parameters as compared to the existing conventional models. PNN is a type of feed

forward NN. Fig. 3.4 shows a feed forward NN for breast cancer diagnosis. Fig.

3.3[75] shows a schematic diagram of a generalized single multiplicative or

polynomial neuron. The operator is a multiplicative operation as in (1) and (2) with

shows an architecture of PNN.

Figure 3.3 Node Structure of PNN

Fig. 3.4 Polynomial Neural Network

Aggregation u before applying activation function is given by:

u= 3.3

3.4

The output at the node y is given by

3.5

The mean square error is given by

3.6

Where, is the number of input patterns.

The weight update equation for the split complex back propagation algorithm is

given by

3.7

LXV

Where, is the learning rate and is the desired signal.

The bias is updated as-

3.8

3.9

3.10

The weights are updated after the entire training sequence has been presented to the

network once. This is called learning by epoch. The algorithm is extended to train

PNN.

3.5 Advantages of neural network

Neural network, with their remarkable ability to derive meaning from complicated or

imprecise data, could be used to extract patterns and detect trends that are too

complex to be noticed by either humans or other computer technique. Advantages of

neural network include their high tolerance to noises data as well as their ability to

classify patterns on which they have not been trained. Other advantages of working

with ANN include [61]:

Adaptive Learning: An ANN is endowed with the ability to learn how to do

tasks based on the data given for training or initial experience.

Self-Organization: An ANN can create its own organization or representation

of the information it receives during learning time.

Real time operation: ANN computation may be carried out in parallel. Special

hardware devices are being designed and manufactured to take advantage of this

capability of ANNs.

Fault tolerance: Partial destruction of a neural network leads to the

corresponding degradation of performance. However, some network capabilities may

be retrained even after major network damage.

3.6 Medical applications

Baxt [76] demonstrated the predictive reliability of an artificial neural networks

model in medical diagnosis. In this case, we utilise the ability of neural networks to

recognise complex and highly non-linear relationships, such as are likely to

LXVI

characterise medical circumstances. Owing to their wide range of applicability and

their ability to learn complex and non linear relationships including noisy or less

precise information Neural Networks are very well suited to solve problems in

biomedical engineering. By their nature, Neural Networks are capable of high-speed

parallel signal processing in real time. They have an advantage over conventional

technologies because they can solve problems that are too complex-problems that do

not have an algorithmic solution or for which an algorithmic solution is too complex.

Neural Networks are trained by examples instead of rules and are automated. This is

one of the major advantages of neural networks over traditional expert systems [77]-

[78]. When NN is used in medical diagnosis they are not affected by factors such as

human fatigue, emotional states and habituation. They are capable of rapid

identification, analyses of conditions, and diagnosis in real time. With the spread of

Neural Networks in almost all fields of science and engineering, it has found

extensive application in biomedical engineering field also. The applications of neural

networks in biomedical computing are numerous. Various applications of ANN

techniques in medical field are [79]-

Cardiology

Oncology

Neurology

Radiology

Pathology

Genetics

Clinical chemistry

Biochemistry etc.

3.7 Overview of data Preprocessing

Pre-processing is the process of transforming data into simpler, more effective, and

in accordance with user needs [80]. If a training data set contains irrelevant attributes

classification analysis may produce less accurate results. The problem of missing

data poses difficulty in the analysis and decision-making processes and the missing

data is replaced before applying it to NN model. Without this pre-processing, training

LXVII

the neural networks would have been very slow. Data preprocessing is required to

improve the predictive accuracy. It can be used to scale the data in the same range of

values for each input feature in order to minimize bias within the neural network for

one feature to another. Data pre-processing speeds up training time by starting the

training process for each feature within the same scale. It is especially useful for

modeling application where the inputs are generally on widely different scales.

Therefore Neural networks learn faster and give better performance if the input

variables are pre-processed before being used to train the network. Preprocessing for

neural network involves feature selection and feature extraction.

3.7.1 Feature selection

Feature selection is the process of finding a subset of the original variables, with the

aim to reduce and eliminate the noise dimension. The main idea of feature selection

is to choose a subset of input variables by eliminating features with little or no

predictive information. Feature selection can significantly improve the

comprehensibility of the resulting classifier models and often build a model that

generalizes better to unseen points [81].

3.7.2 Feature extraction

Feature extraction is a technique to transform high-dimensional data into lower

dimensions. When the input data to an algorithm is too large to be processed and it is

suspected to be notoriously redundant (much data, but not much information) then

the input data will be transformed into a reduced representation set of features (also

named features vector). If the features extracted are carefully chosen it is expected

that the features set will extract the relevant information from the input data in order

to perform the desired task using this reduced representation instead of the full size

input. By reducing, the dimensionality of the input set correlated information is

eliminated at the cost of a loss of accuracy. Dimensionality reduction can be

achieved by either eliminating data closely related with other data in the set, or

combining data to make a smaller set of features. The identification of a reduced set

of features that are predictive of outcomes can be very useful from a knowledge

discovery perspective. For many learning algorithms, the training and/or

classification time increases directly with the number of features, which is efficiently

reduced by dimension reduction methods Noisy or irrelevant features can have the

LXVIII

same influence on classification as predictive features so they will impact negatively

on accuracy, by using dimension reduction methods noisy or irrelevant features can

also be removed [82]. Two popular methods for feature extraction are linear

discriminant analysis (LDA) and principal Component analysis (PCA). Linear

Discriminant Analysis (LDA) is a supervised learning algorithm and Principal

Component Analysis (PCA) is unsupervised learning, algorithms for data

preprocessing[80]. Linear Discriminant Analysis (LDA) is a widely used technique

for pattern classification. It seeks the linear projection of the data to a low

dimensional subspace where the data features can be modeled with maximal

discriminative power. The main computation involved in LDA is the dot product

between LDA base vector and the data which is costly element-wise floating point

multiplications [86].

3.8 Principal Component Analysis

PCA uses an orthogonal transformation to convert a set of observations of possibly

correlated variables into a set of values of linearly uncorrelated variables called

principal components. The number of principal components is less than or equal to

the number of original variables [83]. PCA finds linear transformations of data that

retain the maximal amount of variance [84]. The PCA can be used as a features

reduction transformation method in which combines a set of correlated features [83].

Fig. 3.5 shows data preprocessing using PCA. PCA reduces the dimensions of the

data while retaining as much as possible of the variation present in the original

dataset. The best low-dimensional space can be determined by the best eigenvectors

of the covariance matrix of M. The eigenvectors corresponding to the highest eigen

values are also called principal components.

Fig 3.5 Data Pre-processing using PCA

3.8.1 Dimensionality reduction

The goal of PCA is to reduce the dimensionality of the data while retaining as much

as possible of the variation present in the original dataset.

LXIX

[(b_1@b_2@@b_K )]

PCA allows us to compute a linear transformation that maps data from a high

dimensional space to a lower dimensional space.

y=Tx where T=[(((T_11& T_12 T_1N @T_21&T_22

T_2N ) @ ) @ (&) @T_(K1 ) T_K2 T_KN )]

3.8.2 Lower dimensionality basis

Approximate the vectors by finding a basis in an appropriate lower dimensional

Space.

(1) Higher-dimensional space representation:

x = 10v1 + 11v2 + + aNvN

v1, v2... vN is a basis of the N-dimensional space

(2) Lower-dimensional space representation:

x = 12u1 + b2u2 + bKuK

u1, u2 uK is a basis of the K-dimensional space.

3.8.3 Selection of principal components

To choose K, use the following criterion.

(_(i=1)^K _i )/(_(i=1)^N _i ) > Threshold (e.g., 0.9 or 0.95)

3.8.4 Selecting best lower dimensional space

The best low-dimensional space can be determined by the "best" eigenvectors of the

covariance matrix of x (i.e., the eigenvectors corresponding to the "largest"

eigenvalues also called "principal components").

3.8.5 Linear transformation implied

The linear transformation RN ->RK that performs the dimensionality reduction is:

LXX

[(b_1@b_2@@b_K )]= [(u_1^T@u_2^T@@u_K^T )](x-x) =

UT( x-x)

3.9 Advantages of Principal Component Analysis

Principal components capture most of the variability in data by using fewer

dimension that where the data exists. Hence the principal components lie in the same

space as data.

The principal eigenvectors are orthogonal and represent the directions where

the signals have maximum variation. This property will speed up the convergence of

model training and improve the system performance [85].

The feature space having reduced features that truly contributes to

classification that cuts pre-processing costs [85].

LXXI

Chapter 6

Conclusion and

Future Work

6.1 Conclusion

The last decade has witnessed major advancements in the method of diagnosis and

prognosis of breast cancer. Soft Computing techniques can be used for breast cancer

diagnosis and prognosis. The use of ANN increases the accuracy of most methods

and decreases the need of human expert. The neural network based clinical support

system proposed in this research provides the medical experts with a second opinion

thus removing the need for biopsy, excitation and reduce the unnecessary

expenditure. We believe that the results presented here are interesting and will leads

to further research on how the technique can be more efficiently used for diagnosis of

other diseases. The neural network works depending on the data being train to the

network. If more data being trained in the network, it will make the network more

intelligent. From the diagnosis results, determination can be made weather the

women having cancerous tumor or not. Prognosis results will help in taking treatment

decision for women having cancerous tumor.

6.2 Future work

For future work Neural Network Classification with on Fuzzy Inference System can

be applied to the task of diagnosis and prognosis. So that we can give results with a

percentage of confidence that the women is having cancerous breast tumor or non-

cancerous breast tumor. Prognosis results can also be given with a confidence

measure that the cancer is re-occurred or not. More accurate learning methods may

be evaluated. It is believed that the fuzzy system along with polynomial neural

network can be very helpful to the physicians for their final decision for diagnosis

LXXII

and prognosis on their patients. The physicians can perform very accurate with a

confidence by using such an efficient tool. It can assist in diagnosis and prognosis of

breast cancer.

We can also use genetic algorithm to generate the optimum weights for our network.

There are three methods where GA can be used in NN. The first is threshold hybrid

where the fitness function uses standard deviation that is less than 0.0001. Secondly

is basic hybrid of GA and BP when iteration is less than 2000 or using basic error

rate is less than 0.01. And the third is adaptation hybrid of GA and BP where error

square root is decreases by 30% until it reaches 0.01. All the three methods have

helped NN from trapped in the local minimum. In addition, GA is also a stochastic

method, it can works well with BPNN as the generated weight might be changed

several times during the learning process.

LXXIII

List of Publications

International Journal

1. Shweta Saxena, Kavita Burse,A Survey on Neural Network Techniques for

Classification of Breast Cancer Data, Int. J. of Eng. and adv. Tech. (IJEAT), ISSN:

2249 8958 , vol. 2, no. 1, pp. 234-237, October 30, 2012.

International Conferences

1. Shweta Saxena, Kavita Burse,Classification of Wisconsin Breast Cancer

(Original) Database and Wisconsin Breast Cancer (Diagnostic) Database using

Polynomial Neural Network, in proc. of Int. Conf. of Elect. and Electron. Eng.,

December 9, 2012, Bhopal, India.

2. Shweta Saxena, Vishnu Pratap Sing Kirar, Kavita Burse, A Polynomial

Neural Network Model for Prognostic Breast Cancer Prediction, in proc. of Int.

Conf. on Advances in Comput. Sci. and Eng., Jan 2013, Hydrabad, India.

LXXIV

References

References

1. Cancer Facts and Figures 2010 [online]. Available:

http://www.cancer.org/Research/ CancerFactsFig-ures/cancer-facts-and-figures-

2010.

2. Tawam's 2012 Breast Cancer Awareness Campaign [online]. Available:http://

www.ameinfo.co-m/ tawams-2012-breast-cancer-awareness-campaign-312992.

3. Male Breast Cancer[online]. Available:http://www.nlm.nih.gov/medlineplus

/malebreastcancer.html.

4. G. I. Salama, M. B. Abdelhalim, and M. A. Zeid, Breast Cancer Diagnosis

on Three Different Datasets Using Multi-Classifiers, Int. J. of Comput. and Inform.

Technology, vol. 1, no. 1, pp. 2277 0764 , Sept. 2012.

5. http://www.beliefnet.com/healthandhealing/getcontent.aspx?cid=21322

6. Research Activities January 2012[online]. Available: http://www.ahrq.gov

/research/jan12/0112RA-20.htm.

7. Merck Manual of Diagnosis and Therapy (February 2003). "Breast Disorders:

Breast Cancer". retrieved 2008-02-05.

8. WatsonM(2008)."Assessment of suspected cancer". Innoait 1 (2): 94

107.doi:10. 1093/innovait/inn001.

9. National Cancer Institute (27 June, 2005). "Paget's Disease of the Nipple:

Questions and Answers". Retrieved 2008-02-06.

LXXV

10. Breast Cancer [online]. Available: http://www.womenshealth.gov/ breast-

cancer/what-is-breast-cancer.

11. O.L. Mangasarian, W. N. Street, W. H. Wolberg, Breast cancer diagnosis and

prognosis via linear programming. Mathematical Programming Technical report,

43(4), pages 94-10, Dec 1994.

12. S. W. Fletcher, W. Black, R. Harris, B. K. Rimer, and S. Shapiro. Report of

the international work shop on screening for breast cancer. Journal of the National

Cancer Institute, 85: 1644-1656,1993.

13. R. W. M. Giard and J. Hermans. The value of aspiration cytologic

examination of the breast. A statistical review of the medical literature. Cancer,

69:2104-2110, 1992.

14. Patrick Pantel, Breast Cancer Diagnosis and Prognosis.

15. P. L. Choong and C. J. S. deSilva, Breast Cancer Prognosis using the EMN

Architecture, 1994 IEEE International Conference on Neural Networks, Orlando,

Florida, 1994.

16. E. T. Lee. Statistical Methods for Survival Data Analysis. Joha Wiley and

Sons, New York, 1992.

17. M. M. Beg and M. Jain, An Analysis of the methods employed for breast

cancer diagnosis, Int. J. of Research in Comput. Sci., vol. 2, no. 3, 2012, pp. 25-

29.

18. T. Kiyan, T Yildirim , Breast cancer diagnosis using statistical neural

networks, J. of Elect. & Electron. Eng., vol .4-2, 2004, pp. 1149- 1153.

19. G. Schwarzer, W. Vach and M. Schumacher, 2000. On the misuses of

artificial neural networks for prognostic and diagnostic classification in oncology.

Stat. Med., 19: 541-561. PMID: 10694735

20. S. S. Haykin, Neural Networks and Learning Machines. 3rd Edn., Prentice

Hall, New York, 2009, pp. 906.

21. S. A. Mojarad, S. S. Dlay, W. L. Woo and, G. V. Sherbet Cross Validation

Evaluation for Breast Cancer Prediction Using Multilayer Perceptron Neural

LXXVI

Networks, American J. of Engineering and Applied Sciences, vol. 5, no. 1, pp. 42-

51, 2012.

22. F. Paulin and A. Santhakumaran, Classification of breast cancer by

comparing back propagation training algorithms, Int. J. on Comput. Sci. and Eng.

(IJCSE), vol. 3 no. 1, Jan 2011.

23. R. Nithya and B. Santhi, Classification of Normal and Abnormal Patterns in

Digital Mammograms for Diagnosis of Breast Cancer Int. J. of Comput. Applicat.,

vol. 28, no.6, pp. 0975 8887, Aug. 2011.

24. J. H. Song, S. S. Venkatesh, E. F. Conant, T. W. Cary, P. H. Arger, C. M.

Sehgal, Artificial neural network to aid differentiation of malignant and benign

breast masses by ultrasound imaging.

25. Shirin A. Mojarad, Satnam S. Dlay, Wai L. Woo and, Gajanan V. Sherbet

Cross Validation Evaluation for Breast Cancer Prediction Using Multilayer

Perceptron Neural Networks American J. of Engineering and Applied Sciences 5 (1)

: 42-51, 2012 ISSN 1941-7020.

26. Marcano-Cedeo , J. Quintanilla-Domnguez, D. Andina WBCD breast

cancer database classification applying artificial metaplasticity neural network,

Expert Systems with Applications 38 (2011) 95739579

27. F. Paulin and A. Santhakumaran, Classification of breast cancer by

comparing back propagation training algorithms, Int. J. on Comput. Sci. and Eng.

(IJCSE), vol. 3 no. 1, Jan 2011.

28. M. R. Senapati, P. K .Routray, P. K. Dask Breast Cancer Detection using

Recursive Least Square and Modified Radial Basis Functional Neural Network, Int.

Conf. [ICCT-2010], IJCCT vol. 2, no. 2, 3, 4, 3rd-5th December 2010.

29. S. Haykin, Neural Networks: A Comprehensive Foundation, Mac Millan

College Publishing Company, 1994.

30. D. Broomhead and D.Lowe, Multivariable Functional Interpolation and

adaptive networks, complex systems, vol. 2, pp. 321-355, 1998.

LXXVII

31. M. R. Senapati, P. K .Routray, P. K. Dask Breast Cancer Detection using

Recursive Least Square and Modified Radial Basis Functional Neural Network, Int.

Conf. [ICCT-2010], IJCCT vol. 2, no. 2, 3, 4, 3rd-5th December 2010.

32. Y. Shan, R. Zhao, G. Xu, H.M. Liebich, Y. Zhang, Application of

probabilistic neural network in the clinical diagnosis of cancers based on clinical

chemistry data, Analytica Chimica Acta, vol. 471, no. 1, pp. 77-86, Oct. 23, 2002.

33. H. Temurtas, N. Yumusak , F. Temurtas , A comparative study on diabetes

disease diagnosis using neural networks, Expert Systems with Applications, vol. 36,

no. 4, pp. 8610-8615, May 2009.

34. G.S. Gill and J. S. Sohal, Battlefield Decision Making: A Neural Network

Approach, J. of Theoretical and Applied Inform. Technology, vol. 4, no. 8, pp. 697

699, 2009.

35. S. N. Sivandam, S. Sumathi, and S. N. Deepa, Introduction to Neural

Networks using Matlab 6.0, TMH Company Ltd., 2006.

36. D. F. Specht, Probabilistic Neural Networks, Neural Networks, vol. 3, no.

1, pp. 109-118, 1990.

37. Ioannis Anagnostopoulos, Christos Anagnostopoulos, Angelos Rouskas,

George Kormentzas, Dimitrios Vergados The Wisconsin Breast Cancer Problem:

Diagnosis and DFS time prognosis using probabilistic and generalised regression

neural classifiers, Draft version of paper to appear at the Oncology Reports, special

issue Computational Analysis and Decision Support Systems in Oncology, last

quarter 2005.

38. C. Lu, J. De Brabanter, S. V. Huffel, I. Vergote, D. Timmerman, Using

artificial neural networks to predict malignancy of ovarian tumors, in Proc. 23rd

Annual Int. Conf. of the IEEE Eng. in Medicine and Biology Society, 2001, vol. 2,

pp. 1637-1640.

39. R. G. Ahangar, M. Yahyazadehfar, and H. Pournaghshband, The

comparison of methods artificial neural network with linear regression using specific

variables for prediction stock price in Tehran stock exchange, Int. J. of Comput. Sci.

and Inform. Security (IJCSIS), vol. 7, no. 2, Feb. 2010.

LXXVIII

40. I. Anagnostopoulos, C. Anagnostopoulos, A. Rouskas, G. Kormentzas and

D. Vergados The wisconsin breast cancer problem: diagnosis and DFS time

prognosis using probabilistic and generalised regression neural classifiers Draft

version of paper to appear at the Oncology Reports, special issue Computational

Analysis and Decision Support Systems in Oncology, last quarter 2005.

41. K. Rahul, S. Anupam and T. Ritu, Fuzzy Neuro Systems for Machine

Learning for Large Data Sets, Proc. of the IEEE Int. Adv. Computing Conf. 6-7,

Patiala, India, pp.541-545, 2009.

42. C. Juang, R. Huang and W. Cheng, An interval type-2 fuzzy-neural network

with support-vector regression for noisy regression problems, IEEE Trans. on Fuzzy

Systems, vol. 18, no. 4, pp. 686 699, 2010.

43. C. Juang, Y. Lin and C. Tu, Recurrent self-evolving fuzzy neural network

with local feedbacks and its application to dynamic system processing, Fuzzy Sets

and Systems, vol. 161, no. 19, pp. 2552-2562, 2010.

44. RR Yager, LA. Zadeh, Fuzzy Sets, Neural Networks, and Soft Computing.

New York: Van Nostrand Reinhold, 1994.

45. Carlos Andres Pena-Reyes, Moshe Sipper A fuzzy-genetic approach to

breast cancer diagnosis, Artificial Intelligence in Medicine, vol. 1, no 2, pp. 131-

155, Oct. 1999.

46. I. Dumitrache, Ingineria Reglarii Automate (Automatic Control Engineering),

Bucuresti, Politehnica Press, 2005.

47. Carlos Andres Pena-Reyes, Moshe Sipper A fuzzy-genetic approach to

breast cancer diagnosis, Artificial Intelligence in Medicine, vol. 1, no 2, pp. 131-

155, Oct. 1999.

48. Essam Al-Daoud Cancer Diagnosis Using Modified Fuzzy Network,

Universal Journal of Computer Science and Engineering Technology 1 (2), 73-78,

Nov. 2010. 2010 UniCSE, ISSN: 2219-2158.

49. Yuan-Hsiang Chang, BinZheng, Xiao-Hui Wang, Walter F. Good

Computer-aided diagnosis of breast cancer using artificial neural networks:

Comparison of Backpropagation and Genetic Algorithms, in Int. Joint Conf Neural

Networks UCNN 99, 1999, vol. 5, pp. 3674-3679.

LXXIX

50. R. R. Janghel, A. Shukla, R. Tiwari, and R. Kala, Breast cancer diagnostic

system using symbiotic adaptive neuro-evolution (SANE) in Proc. Int. conf. of Soft

Computing and Pattern Recognition 2010 (SoCPaR-2010), ABV-IIITM, Gwalior,

7th-10th Dec., pp. 326-329.

51. M. Ashraf, L. Kim and H. Xu., Information gain and adaptive neuro-fuzzy

inference system for breast cancer diagnoses, in Proc. Comput. Sci. Convergence

Inform. Tech. 2010 (ICCIT-2010), IEEE, Seoul , 30th Nov.-2nd Dec., pp. 911 915.

52. V. Bevilacqua, G. Mastronardi, F. Menolascina Hybrid data analysis

methods and artificial neural network design in breast cancer diagnosis: IDEST

experience, in Proc. Int. Conf. on Intelligent Agents, Web Technologies and Internet

Commerce and Int. Conf. on Computational Intelligence for Modeling, Control

Automation 2005 (CIMCA-2005), 28th-30th Nov., IEEE, Vienna, pp. 373-378.

53. W. Land, and E. Veheggen, Experiments using an evolutionary programmed

neural network with adaptive boosting for computer aided Diagnosis of breast

cancer, in Proc. IEEE Int. Workshop on Soft Computing in Ind. Application, 2003

(SMCia-2003), Finland, 23rd-25th June, pp. 167-172.

54. P. Meesad and G. Yen., Combined numerical and linguistic knowledge

representation and Its application to medical diagnosis, IEEE transactions on Syst.,

Man and Cybernetics, Part A: Syst. and Humans, vol. 3, no.2, pp. 206-222.

55. G. Arulampalam, and A. Bouzerdoum, Application of shunting inhibitory

artificial neural networks to medical diagnosis, in Proc. 7th Australian and New

Zealand Intelligent Inform. System Conf. 2001, IEEE, Perth, 18th-21st Nov., 2001,

pp. 89 -94.

56. A. Ismail Training and optimization of product unit neural networks, M.S.

Thesis, University of Pretoria, Pretoria, 2001

57. Gorman RP, Sejnowski TJ. Analysis of hidden units in a layered network

trained to classify sonar targets. Neural Networks 1988;1:7589.

58. ONeill M. Training back-propagation neural networks to define and detect

DNA-binding sites. Nucl Acids Res 1991;19:3138.

59. Qian N, Sejnowski TJ. Predicting the secondary structure of globular proteins

using neural network models. J Mol Biol 1988;202:86584.

LXXX

60. White H. Learning in artificial neural networks: a statistical approach. Neural

Comput 1989;1:42564.

61. Principles of Soft Computing, Book by Dr. S.N. Sivanandam, S.N. Deepa,

First Indian Edition, Wiley Publication, 2008.

62. R. Rojas. Neural Networks: a systematic introduction. Springer-Verlag, 1996.

63. Dr. K. U. Rani Parallel Approach for Diagnosis of Breast Cancer using

Neural Network Technique, International Journal of Computer Applications (0975

8887), Volume 10 No.3, November 2010

64. G. K. Jha, Artificial neural networks and its applications [online]. Available:

www.iasri.res.in/ebook /EBADAT/5.../5-ANN_GKJHA_2007.pdf

65. Ruth M. Ripley, Neural network models for breast cancer prognosis, Ph. D.

Thesis, Dept. of Eng. and Sci., St. Cross College, Univ. Of Oxford, 1998.

66. F. Paulin and A. Santhakumaran, Classification of breast cancer by

comparing back propagation training algorithms, Int. J. on Comput. Sci. and Eng.

(IJCSE), vol. 3 no. 1, Jan 2011.

67. Simon Haykin. Neural Networks A Comprehensive Foundation. Pearson

Education, 2001.

68. Lucila Ohno-Machado Medical applications of Artificial Neural Networks:

connectionist models of survival, Ph.D. dissertation, submitted to the program in

medical information sciences and the committee on graduate studies, Stanford

University, March 1996.

69. G. I. Salama, M. B. Abdelhalim, and M. A. Zeid , Breast Cancer Diagnosis

on Three Different Datasets Using Multi- Classifiers, Int. J. of Comput. and

Inform. Technology, vol. 1, no. 1, pp. 2277 0764 , Sept. 2012.

70. Y. H. Pao, Adaptive Pattern Recognition and Neural Network, Addison-

Wesley Publishing Company, 1989.

71. P. Heermann and N. Khazenie,Classifcation of multispectral remote

sensing data using a back- propagation neural network, IEEE Trans. on Geoscience

and Remote Sensing, vol. 30, pp. 81-88 , 1992.

LXXXI

72. L. Giles and T. Maxwell, Learning, invariance and generalization in

highorder neural networks, In Applied Optics, Vol. 26, No. 23, Optical Society of

America, Washington DC, Pages 4972-4978, 1987.

73. Y. Pao. Adaptive Pattern Recognition and Neural Networks. Addison-

Wesley, USA, 1989. ISBN: 0 201012584-6

74. K. Burse, R. N. Yadav, S. C. Srivastava, V. P. S. Kirar, A Compact Pi

Network for Reducing Bit Error Rate in Dispersive FIR Channel Noise Model, Int.

J. of Elect. and Comput. Eng., vol. 3, no. 3, 2009, psp. 150-153.

75. R.N. Yadav, P.K. Kalra, and J. John, Time series prediction with single

multiplicative neuron model, Applied Soft Computing, vol. 7, pp. 1157-1163, 2007.

76. Baxt WG. Application of neural networks to clinical medicine. Lancet

1995;346:11358.

77. K. Anil Jain, Jianchang Mao and K.M. Mohiuddin. Artificial Neural

Networks: A Tutorial. IEEE Computers, 1996, pp.31-44.

78. George Cybenko. Neural Networks in Computational Science and

Engineering. IEEE Computational Science and Engineering, 1996, pp.36-42.

79. K. Papik et al. (1998), Application of neural networks in medicine- a review

[Online].Available:uran.donetsk.ua/~masters/2006/kita/zbykovsky/library/nninmed.p

df.

80. R. W. Sembiring and J. M. Zain The Design of Pre-Processing

Multidimensional Data Based on Component Analysis, Comput. and Inform. Sci.,

vol. 4, no. 3, pp. 106-115, May 2011.

81. P. Cunningham, Dimension Reduction, August 2007.

82. M. Dash and H. Liu, Dimensionality Reduction, Wiley Encyclopedia of

Computer Science and Engineering, 2008.

83. G. I. Salama, M. B. Abdelhalim, and M. A. Zeid , Breast Cancer Diagnosis

on Three Different Datasets Using Multi- Classifiers, Int. J. of Comput. and

Inform. Technology, vol. 1, no. 1, pp. 2277 0764 , Sept. 2012.

LXXXII

84. A. Ilin and T. Raiko, Practical Approaches to Principal Component Analysis

in the Presence of Missing Values, J. of Machine Learning Research , vol. 11, pp.

1957-2000, 2010.

85. Anil K. Jain, Robert P.W. Duin, and Jianchang Mao, Statistical Pattern

Recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.

22, No.1, January 2000.

86. Tang F., Tao H., Fast linear discriminant analysis using binary bases, Proc. of

the 18th International Conference on Pattern Recognition (ICPR06), (2006).

87. Brain D. Hahn and Daniel T. Valentine Essential MATLAB for Engineers

and Scientists, Third edition. Published by Elsevier, India, 2007.

88. C Xenophontos. A Beginner's Guide to MATLAB[online], Clarkson

University, 1999 , Available: web2.clarkson.edu/class/ma571/Xeno-

MATLAB_guide.pdf.

89. Stephen J. Chapman, Essentials of MATLAB Programming. 2nd ed.,

Cengage Learning, 2004.

90. Mark Hudson Beale, Martin T. Hagan, Howard B. Demuth (1992).Neural

Network Toolbox Users Guide[online]. Available:

www.mathworks.in/help/pdf_doc/nnet/nnet_ug.pdf.

91. A. Frank and A. Asuncion (2010). UCI Machine Learning

Repository[Online]. Available: http://archive.ics.uci.edu/ml. Irvine, CA: University

of California, School of Information and Computer Science.

92. W. N. Street, W. H. Wolberg, and O. L. Mangasarian, Nuclear feature

extraction for breast tumor diagnosis, Proc. IS&T/ SPIE Int. Symp. on Electron.

Imaging: Sci. and Technology, 1993, vol. 1905, pp. 861870.

93. M. Karabatak, M. C. Ince, An expert system for detection of breast cancer

based on association ruled and neural network, Expert System with Applications,

vol. 36, pp. 3465-3469, 2009.