Sie sind auf Seite 1von 20

Final Report

How likely Am I to have Diabetes

04/28/2015

FINAL REPORT
WEB MINING

Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925)

Page 1

Final Report

How likely Am I to have Diabetes

04/28/2015

Contents
Description................................................................................................................. 3
Questions................................................................................................................... 3
Data Dictionary.......................................................................................................... 4
Sample Data............................................................................................................... 5
Outlier Detection........................................................................................................ 6
Normalized Data......................................................................................................... 6
Association Rules........................................................................................................ 7
Performance Models................................................................................................ 8
Nave Bayes............................................................................................................ 9
Neural Network..................................................................................................... 10
SVM....................................................................................................................... 11
Logistic Regression................................................................................................ 12
KNN....................................................................................................................... 13
Evaluation Models:................................................................................................... 14
Decision Tree:........................................................................................................... 15
Nave Bayes.......................................................................................................... 15
Neural Net:............................................................................................................... 16
SVM....................................................................................................................... 16
Logistic Regression................................................................................................ 16
K Nearest Neighbor:................................................................................................. 17
Answers.................................................................................................................... 18
References................................................................................................................ 18

Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925)

Page 2

Final Report

How likely Am I to have Diabetes

04/28/2015

Description:

Our dataset relates whether a patient shows signs of diabetes


according to World Health Organization criteria (i.e., if the 2
hour post-load plasma glucose was at least 200 mg/dl at any
survey examination or if found during routine medical care).
The population lives near Phoenix, Arizona, USA.
Several constraints were placed on the selection of these
instances from a larger database. In particular, all patients here
are females at least 21 years old of Pima Indian heritage.
Each instance represents individual patients and their various
medical attributes along with diabetes classification attributes.

Questions:

How likely is a particular patient affected by


diabetes given the medical parameters?

Data Dictionary

Attribute

Description Data Type

Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925)

Range
Page 3

Final Report

How likely Am I to have Diabetes

04/28/2015

Pregnancies

Number of
Pregnancies
PG
Plasma
Concentratio glucose at
2 hours in
n
an oral
glucose
tolerance
test

Numeric(17 0-17
)
Numeric(19 0-199
9)

Diastolic BP

Diastolic
Blood
Pressure
(mm Hg)
Tri Fold Thick Triceps Skin
Fold
Thickness
(mm)

Numeric(12 0-122
2)

Serums Ins

2-Hour
Serum
Insulin (mu
U/ml)

Numeric(84 0-846
6)

BMI

Body Mass
Index:
(weight in
kg/ (height
in m)^2)

Decimal(53. 0-53.2
2)

Numeric(52 0-52
)

Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925)

Page 4

Final Report

DP Function
Age
Diagnosis

How likely Am I to have Diabetes

04/28/2015

Diabetes
Decimal(1.3 0.088Pedigree
53)
1.353
Function
Age (years) Numeric(66 21-66
)
Is the
Varchar(7)
patient Sick
or Healthy?

Healthy or
Sick

Sample Data:

Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925)

Page 5

Final Report

How likely Am I to have Diabetes

04/28/2015

Outlier Detection:

Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925)

Page 6

Final Report

How likely Am I to have Diabetes

04/28/2015

Normalized Data:

Association Rules:

Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925)

Page 7

Final Report

How likely Am I to have Diabetes

04/28/2015

Performance Models:

Decision Tree:

Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925)

Page 8

Final Report

How likely Am I to have Diabetes

04/28/2015

Nave Bayes:

Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925)

Page 9

Final Report

How likely Am I to have Diabetes

04/28/2015

Neural Network:

Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925)

Page 10

Final Report

How likely Am I to have Diabetes

04/28/2015

SVM:

Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925)

Page 11

Final Report

How likely Am I to have Diabetes

04/28/2015

Logistic Regression:

Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925)

Page 12

Final Report

How likely Am I to have Diabetes

04/28/2015

KNN:

Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925)

Page 13

Final Report

How likely Am I to have Diabetes

04/28/2015

Performance Summary:
Model Name

Accuracy

Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925)

Page 14

Final Report

How likely Am I to have Diabetes

Decision Tree
Nave Bayes
Neural Network
SVM
Logistic Regression
KNN

04/28/2015

73.18
76.17
79.95
77.73
76.43
100

Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925)

Page 15

Final Report

How likely Am I to have Diabetes

04/28/2015

Evaluation Models:
Decision Tree

Nave Bayes

Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925)

Page 16

Final Report

How likely Am I to have Diabetes

04/28/2015

Neural Net:

SVM

Logistic Regression

Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925)

Page 17

Final Report

How likely Am I to have Diabetes

04/28/2015

K Nearest Neighbor

Evaluation Summary:
Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925)

Page 18

Final Report

How likely Am I to have Diabetes

Model Name
Decision tree
Nave Bayes
Neural Net
Regression
SVM
K Nearest Neighbor

04/28/2015

Accuracy (%)
71.86
75.51
74.74
75.65
76.95
68.24

Answers:

Using the SVM model the above result was


generated with a prediction accuracy of 76.95%

Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925)

Page 19

Final Report

How likely Am I to have Diabetes

04/28/2015

Therefore given the medical records of any patient


with the above attributes, our model can diagnose
the patient for diabetes with a decent accuracy.
References:

Professors Class notes


YouTube
Wikipedia
WHOs dataset

Ashwin Kumar Pitchai (akp73) & Suchait Mattoo (sm925)

Page 20

Das könnte Ihnen auch gefallen