Sie sind auf Seite 1von 28

DIAGNOSIS OF LIVER DISEASES USING MACHINE LEARNING

Abstract

Liver Diseases account for over 2.4% of Indian deaths per annum. Liver disease is also difficult
to diagnose in the early stages owing to subtle symptoms. Often the symptoms become apparent
when it is too late. This paper aims to improve diagnosis of liver diseases by exploring 2
methods of identification patient parameters and genome expression. The paper also discusses
the computational algorithms that can be used in the aforementioned methodology and lists
demerits. It proposes methods to improve the efficiency of these algorithms.
INTRODUCTION

Liver disease is a tricky disease to diagnose given the subtlety of the symptoms while in the early
stages. Problems with liver diseases are not discovered until it is often too late as the liver
continues to function even when partially damaged. Early diagnosis can potentially be life-
saving. Although not discoverable to even the experienced medical practioner, the early
symptoms of these diseases can be detected. Early diagnoses of patients can increase his/her life
span substantially. Thus the results of this study are important both from the point of view of the
computer scientist and the medical professional. This paper aims to compare 2 methods of
computer aided medical diagnoses. The first of these methods is a symptomatic approach to
diagnosis. This method involves the training of an Artificial Neural Network to respond to
several patient parameters such as age, Bilirubin, Alkaline Phosphotase, Alamine
Aminotransferase, and Aspartate Aminotransferase among others. The Neural Network classifies
the patients according to whether the patient does indeed suffer from a chronic Liver Disease or
not that is healthy or not. The second method studied in this paper involves a genetic approach to
the diagnosis. The proposed approach is the application of Artificial Neural Networks and Multi-
Layer Perceptrons to Micro-Array Analysis.
EXISTING SYSTEM

A. Micro Array Analysis

Among the most influential work in Micro-Array Analysis can be attributed to Rifkin et al [2].
Their work is attributed to a Support Vector Machine to accurately (80%) predict the origin of
tumors collected from samples obtained at Massachusetts General and other medical institutions.

Kun-Hong Liu and De-Shuang Huang also solved the problems of cancer origin identification
using Micro Array analysis. Several other technologies for Micro-Array analysis have been
developed over the last decade. The most common ones are spotted cDNA and oligonucleotide
microarrays which are discussed in this paper. Pioneers in the field include researchers from
Brown and Stanford (Duggan et al Chipping Forecast 1999) where cDNA samples were
hybridized to glass slides onto which the corresponding genes of interest were robotically
deposited.

B. SVM and Neural Networks

Akin Ozcift and ArifGulten constructed a rotation forest ensemble classifier that was tested with
success on Parkinson's, heart disease and diabetes. Some of the most useful work was done by
BendiVenkataRamana et al who successfully compared various machine learning algorithms on
the basis of Accuracy, Precision, Sensitivity, and Specificity when classifying this very liver
patient data set. They proposed the use of Bayesian classification combined with Bagging and
Boosting for improved accuracy. Bayesian classification is a simple yet powerful algorithm and
works on the assumption that all variables are independent of one another. They also proposed
ANOVA and MANOVA (Analysis of Variance and Multivariate Analysis of Variance) for a
population comparison between the ILPD and UCI dataset.
SYSTEM ARCHITECTURE

Pre-process Apply ML
Dataset

ANN SVM
algorithm algorithm

Initialize Train model


network

Forward
propagate Predict
result

Back propagate
error

Train
Network

Predicted
result

Figure: Overall Architecture

The above figure represents system architecture of proposed work. The figure represents the
Liver disease diagnosis using machine learning algorithm of Neural network and SVM with brief
steps involved in processing. Final predicted result is analyzed with original dataset to get
accuracy and error levels.
IMPLEMENTATION DETAILS

DATASET

Indian Liver patient dataset is considered for implementation, which contains column values as
given below

'age', 'gender', 'total_bilirubin', 'direct_bilirubin', 'alkaline_phosphotase',


'alamine_aminotransferase', 'aspartate_aminotransferase', 'total_protiens', 'albumin',
'ratio_albumin_and_globulin_ratio', 'liver_res'

ARTIFICIAL NEURAL NETWORK

Back propagation algorithm is a supervised learning method for multilayer feed-forward


networks from the field of Artificial Neural Networks. Back propagation approach is to model a
given function by modifying internal weightings of input signals to produce an expected output
signal. The system is trained using a supervised learning method, where the error between the
system’s output and a known expected output is presented to the system and used to modify its
internal state.

In the proposed system, we are applying 5 folds and expected outputs are transformed into
numeric value from 0 to 1.
Figure: Back propagation Algorithm

The algorithm is implemented as like following modules

Initialize Network

Forward Propagate

Back Propagate Error

Train Network and Predict values

Initialize network

Each neuron has a set of weights that need to be maintained. The input layer is a row from our
training dataset. The first real layer is the hidden layer. This is followed by the output layer that
has one neuron for each class value. Initialize the network weights to small random numbers in
the range of 0 to 1. initialize_network() that creates a new neural network with three input
parameters, the number of inputs, the number of neurons to have in the hidden layer and the
number of outputs.

Forward Propagate

We can calculate an output from a neural network by propagating an input signal through each
layer until the output layer outputs its values. It is the technique we will need to generate
predictions during training that will need to be corrected, and it is the method we will need after
the network is trained to make predictions on new data.

Forward propagation is done in three steps such as Neuron Activation, Neuron Transfer,
Forward Propagation.

Back Propagate Error

The backpropagation algorithm is named for the way in which weights are trained. Error is
calculated between the expected outputs and the outputs forward propagated from the network.
These errors are then propagated backward through the network from the output layer to the
hidden layer, assigning blame for the error and updating weights. This has done in two steps such
as Transfer Derivative and Error Backpropagation.

Train Network and Predict values

The network is trained using newly generated weights. Function named predict() is used to
implements prediction. It returns the index in the network output that has the largest probability.
It assumes that class values have been converted to integers starting at 0.

SUPPORT VECTOR MACHINES ALGORITHM

A Support Vector Machine is a supervised learning algorithm. An SVM models the data into k
categories, performing classification and forming an N-dimensional hyper plane. These models
are very similar to neural networks. The model was proposed by Vapnik. Consider a dataset of N
dimensions. The SVM plots the training data into an N dimensioned space. The training data
points are then divided into k different regions depending on their labels by hyper-planes of n
different dimensions. After the testing phase is complete, the test points are plotted in the same N
dimensioned plane. Depending on which region the points are located in, they are appropriately
classified in that region.
DATA FLOW DIAGRAM

Level 0

User Load Apply ML


dataset algo

Predicted
result

Analyze

Figure: Data Flow diagram – Level 0


Level 1

User Load Initialize


dataset network

Forward
propagate

Back
Train propagate
network error

Result
prediction

Figure: Data Flow diagram – Level 1


Level 2

User Load Split Train &


dataset test

Train set SVM algo

Train model
Test set Result
prediction

Analyze

Figure: Data Flow diagram – Level 2


USE CASE DIAGRAM

Load dataset

Apply ML algorithm

ANN SVM

Find/updat Predict
e weight values

Predict
values

Find
accuracy

Figure: Use case Diagram


CLASS DIAGRAM
SYSTEM ANALYSIS AND DESIGN

Introduction

Computer Aided Diagnosis is a rapidly growing dynamic area of research in medical industry.
The recent researchers in machine learning machine learning promise the improved
accuracy of perception and diagnosis of disease. Here the computers are enabled to think
by developing intelligence by learning. There are many types of Machine Learning
Techniques and which are used to classify the data sets.

Requirement Analysis

Software Requirement Specification (SRS) is the starting point of the software developing
activity. As system grew more complex it became evident that the goal of the entire system
cannot be easily comprehended. Hence the need for the requirement phase arose. The software
project is initiated by the client needs. The SRS is the means of translating the ideas of the
minds of clients (the input) into a formal document (the output of the requirement phase.)

Under requirement specification, the focus is on specifying what has been found giving analysis
such as representation, specification languages and tools, and checking the specifications are
addressed during this activity.

The Requirement phase terminates with the production of the validate SRS document.
Producing the SRS document is the basic goal of this phase.

The purpose of the Software Requirement Specification is to reduce the communication gap
between the clients and the developers. Software Requirement Specification is the medium
though which the client and user needs are accurately specified. It forms the basis of software
development. A good SRS should satisfy all the parties involved in the system.

Functional Requirements

The proposed application should be able to identify liver disease for the taken test input. Neural
network algorithm, back propagation is executed five folds to get the prediction values.

Also SVM classification is used for prediction of liver disease.


4.2.1.1. Product Perspective

The application is developed in such a way that any future enhancement can be easily
implementable. The project is developed in such a way that it requires minimal maintenance. The
software used are open source and easy to install. The application developed should be easy to
install and use.

4.2.1.2. Product features

The application is developed in such a way that liver disease can be predicted using SVM
classification and neural network.

The dataset is taken from UCI machine learning data.

We can compare the accuracy for the implemented algorithms.

4.2.1.3. User characteristics

Application is developed in such a way that its users are

 Easy to use

 Error free

 Minimal training or no training

 Patient regular monitor

4.2.1.4. Assumption & Dependencies

It is considered that the dataset taken fulfils all the requirements.

4.2.1.5. Domain Requirements

This document is the only one that describes the requirements of the system. It is meant for the
use by the developers, and will also by the basis for validating the final delivered system. Any
changes made to the requirements in the future will have to go through a formal change approval
process.
4.2.1.6. User Requirements

User can decide on the prediction accuracy to decide on which algorithm can be used in real-time
predictions.

4.2.2. Non Functional Requirements

 Dataset collected should be in the CSV format

 The column values should be numerical values

 Training set and test set are stored as CSV files

 Error rates can be calculated for prediction algorithms

4.2.2.1. Product Requirements

4.2.2.1.1. Efficiency: Less time for detection and more space for image storage

4.2.2.1.2. Reliability: Maturity, fault tolerance and recoverability

4.2.2.1.3. Portability: can the software easily be transferred to another environment,


including install ability.

4.2.2.1.4. Usability: How easy it is to understand, learn and operate the software system

4.2.2.2. Organizational Requirements:

Do not block the some available ports through the windows firewall. Internet connection should
be available

4.2.2.2.1. Implementation Requirements

UCI machine learning dataset for input, internet connection to install related libraries.

4.2.2.2.2. Engineering Standard Requirements

User Interfaces

No interface is used for the application. It is executed through command prompt.


Hardware Interfaces

Ethernet

Ethernet on the AS/400 supports TCP/IP, Advanced Peer-to-Peer Networking (APPN) and
advanced program-to-program communications (APPC).

ISDN

To connect AS/400 to an Integrated Services Digital Network (ISDN) for faster, more accurate
data transmission. An ISDN is a public or private digital communications network that can
support data, fax, image, and other services over the same physical interface. can use other
protocols on ISDN, such as IDLC and X.25.

Software Interfaces

No specific software interface is used.

4.2.2.3. Operational Requirements

• Economic

Power supply unit provides require power unit of microcontroller.


Temperature sensor sense patient body temperature, heart rate sensor monitor patient heart rate
that is heart beat per second and blood pressure sensed by ECG.
All sensor information displayed by LCD and updated into doctor’s PC.

• Environmental

Statements of fact and assumptions that define the expectations of the system in terms of mission
objectives, environment, constraints, and measures of effectiveness and suitability (MOE/MOS).
The customers are those that perform the eight primary functions of systems engineering, with
special emphasis on the operator as the key customer.

• Social

Anyone who benefits from the system (functional, political, financial and social beneficiaries)

• Political
Anyone who benefits from the system (functional, political, financial and social beneficiaries)

• Ethical

• Health and Safety

The software may be safety-critical. If so, there are issues associated with its integrity level. The
software may not be safety-critical although it forms part of a safety-critical system. For
example, software may simply log transactions. If a system must be of a high integrity level and
if the software is shown to be of that integrity level, then the hardware must be at least of the
same integrity level. There is little point in producing 'perfect' code in some language if hardware
and system software (in widest sense) are not reliable. If a computer system is to run software of
a high integrity level then that system should not at the same time accommodate software of a
lower integrity level. Systems with different requirements for safety levels must be separated.
Otherwise, the highest level of integrity required must be applied to all systems in the same
environment.

4.2.3. System Requirements

4.2.3.1. H/W Requirements

Processor : Any Processor above 500 MHz.

Ram : 4 GB

Hard Disk : 4 GB

Input device : Standard Keyboard and Mouse.

Output device : VGA and High Resolution Monitor.

4.2.3.2 S/W Requirements


Operating System : Windows 7 or higher

Programming : Python 3.6 and related libraries

TESTING:
Introduction:

After finishing the development of any computer based system the next
complicated time consuming process is system testing. During the time of testing
only the development company can know that, how far the user requirements have
been met out, and so on.

Software testing is an important element of the software quality assurance


and represents the ultimate review of specification, design and coding. The
increasing feasibility of software as a system and the cost associated with the
software failures are motivated forces for well planned through testing.

Testing Objectives

These are several rules that can save as testing objectives they are:

 Testing is a process of executing program with the intent of finding


an error.
 A good test case is one that has a high probability of finding an
undiscovered error.
Testing procedures for the project is done in the following sequence

 System testing is done for checking the server name of the machines
being connected between the customer and executive..
 The product information provided by the company to the executive is
tested against the validation with the centralized data store.
 System testing is also done for checking the executive availability to
connected to the server.
 The server name authentication is checked and availability to the
customer
 Proper communication chat line viability is tested and made the chat
system function properly.
 Mail functions are tested against the user concurrency and customer mail
date validate.

Following are the some of the testing methods applied to this effective
project:

SOURCE CODE TESTING:

This examines the logic of the system. If we are getting the output that is
required by the user, then we can say that the logic is perfect.

SPECIFICATION TESTING:

We can set with, what program should do and how it should perform under
various condition. This testing is a comparative study of evolution of system
performance and system requirements.

MODULE LEVEL TESTING:

In this the error will be found at each individual module, it encourages the
programmer to find and rectify the errors without affecting the other modules.

UNIT TESTING:

Unit testing focuses on verifying the effort on the smallest unit of software-
module. The local data structure is examined to ensure that the date stored
temporarily maintains its integrity during all steps in the algorithm’s execution.
Boundary conditions are tested to ensure that the module operates properly at
boundaries established to limit or restrict processing.
INTEGRATION TESTING:

Data can be tested across an interface. One module can have an inadvertent,
adverse effect on the other. Integration testing is a systematic technique for
constructing a program structure while conducting tests to uncover errors
associated with interring.

VALIDATION TESTING:

It begins after the integration testing is successfully assembled. Validation


succeeds when the software functions in a manner that can be reasonably accepted
by the client. In this the majority of the validation is done during the data entry
operation where there is a maximum possibility of entering wrong data. Other
validation will be performed in all process where correct details and data should be
entered to get the required results.

RECOVERY TESTING:

Recovery Testing is a system that forces the software to fail in variety of


ways and verifies that the recovery is properly performed. If recovery is automatic,
re-initialization, and data recovery are each evaluated for correctness.

SECURITY TESTING:

Security testing attempts to verify that protection mechanism built into


system will in fact protect it from improper penetration. The tester may attempt to
acquire password through external clerical means, may attack the system with
custom software design to break down any defenses to others, and may purposely
cause errors.

PERFORMANCE TESTING:
Performance Testing is used to test runtime performance of software within
the context of an integrated system. Performance test are often coupled with stress
testing and require both software instrumentation.

BLACKBOX TESTING:

Black- box testing focuses on functional requirement of software. It enables


to derive ets of input conditions that will fully exercise all functional requirements
for a program. Black box testing attempts to find error in the following category:

 Incorrect or missing function


 Interface errors
 Errors in data structures or external database access and performance errors.

OUTPUT TESTING:

After performing the validation testing, the next step is output testing of the
proposed system since no system would be termed as useful until it does produce
the required output in the specified format. Output format is considered in two
ways, the screen format and the printer format.

USER ACCEPTANCE TESTING:

User Acceptance Testing is the key factor for the success of any system. The
system under consideration is tested for user acceptance by constantly keeping in
touch with prospective system users at the time of developing and making changes
whenever required.

TEST CASES
Sl. No TestCase Name Test Procedure Pre-Condition Expected Result Passed/ failed

1 Dataset Enter click Use dataset Alert “Dataset Passed


validation neural batch with text should be
file column values numeric”

2 Dataset Enter click SVM Use dataset Alert “Dataset Passed


validation batch file with text should be
column values numeric”
Scatter plot for SVM algorithm is plotted below
Liver disease is predicted through SVM algorithm and plotted below

Accuracy for SVM algorithm


RESULTS AND DISCUSSION

The below figure represents the ANN’s back propagation algorithm accuracy for 5 different
folds. The overall accuracy arrived using Backpropagation model is around 75%. Back
propagation bring more accuracy for liver disease prediction for the given dataset.

Figure: Accuracy for different folds of ANN algorithm


The below figure represents the accuracy arrived for SVM classification algorithm. This
algorithm arrives an accuracy of around 30%. From this study, it is visible that ANN outperforms
SVM in terms of accuracy.

Figure: Accuracy for SVM Classification algorithm


CONCLUSION

This study explores 2 methodologies in chronic liver disease prediction such as ANN and SVM.
Liver disease is especially difficult to diagnose given the subtle nature of its symptoms. Of the
2,626,418 deaths reported in the United States for 2014, chronic liver disease accounted for
nearly 38,170 deaths. Prediction by means of computers will continue to grow in importance.
This paper explored 2 possibilities of machine learning models that can improve predictive
power. The molecular biology approach is often affected by diet, age, and ethnicity. The
chemical approach is a surer method of prediction. However in all eventuality, research in the
direction of molecular biology can help unravel the secrets to human anatomy which will help
save lives.
REFERENCES

[1] Rong-Ho Lin, "An Intelligent Model for Liver Disease Diagnosis," Artificial Intelligence in
Medicine, 2009”

[2] Ryan Rifkin, Sridhar Ramaswamy, Pablo Tamayo, Sayan Mukherjee, Chen-Hsiang Yeang,
Micheal Angelo, Christine Ladd, Micheal Reich, Eva Latulippe, Jill P Merisov, Tomaso Poggio,
William Gerald, Massimo Loda, Eric S Lander, Todd R Golub, "An Analytical Method For
Multi-Class Molecular Cancer Classification ", 2003

[3] Akin Ozcivit and Arif Gulten “Classifier Ensemble Construction With Rotation Forest To
Improve Medical Diagnosis Performance Of Machine Learning Algorithms”,2011

[4] Kun-Hong Liu and De-Shuang Huang. “Cancer classification using Rotation forest”,
Computers in Biology and Medicine, 2008

[5] BendiVenkataRamana, Prof. M.Surendra Prasad Babu and Prof. N. B. Venkateswarlu, “A


Critical Study of Selected Classification Algorithms for Liver Disease Diagnosis”. International
Journal of Engineering Reasearch and Development, 2012

[6] V.N. Vapnik, “Statistical Learning Theory”, Wiley Publications, 1998

[7] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, “Delving Deep into Rectifiers”,
Microsoft Research, 2009

[8] Beilharz TH, Preiss T: Translational profiling: the genome-wide measure of the nascent
proteome. Brief Funct Genomic Proteomic, 2009.

[9] Gros F: From the messenger RNA saga to the transcriptome era. C R Biol. 2003, 326: 893-
900.

[10] Shackel NA, Gorrell MD, McCaughan GW: Gene array analysis and the liver. Hepatology.
2002, 36: 1313-1325. 10.1053/jhep.2002.36950.