Sie sind auf Seite 1von 51

“STUDY ON PIMA INDIAN DIABETES”

DATA ANALYSIS REPORT SUBMITTED IN PARTIAL FULFILMENT OF THE


REQUIREMENT FOR THE AWARD OF
MASTERS IN MANAGEMENT STUDIES
BANGALORE UNIVERSITY

SUBMITTED BY
SANDESH.V.S
REGISTRATION NO: 17SBCMD058

UNDER GUIDANCE OF

PROF: P.C. SAHU

ADMINISTRATIVE MANAGEMENT COLLEGE


18TH KM, BANNERGHATTA ROAD, BANGALORE -560083.

(2018-2019)

1
DECLARATION

I hereby declare that the case study titled “PIMA INDIAN DIABETES” is an original word carried
and submitted by me to Department of Management, Bangalore University in partial fulfillment of
requirement of MBA program. The work has been carried by me under the guidance of P.C.SAHU.
The matter embodied in this research is genuine work and has not been earlier submitted to any
other university or institution for the award of any degree/diploma certificate or published any time
before.

DATE:

PLACE: BANGALORE
2
GUIDE CERTIFICATE

This is to certify that the project titled “PIMA INDIAN DIABETES” is the original work of the
student and is being submitted by SANDESH.V.S, Registration No: 17SBCMD058, to Bangalore
University for the award of Degree of MASTERS OF BUSINESS ADMINISTRATION and is a
record of the work carried out by him under my guidance.

PLACE: BANGALORE (P.C.SAHU)

DATE: SIGNATURE

3
ACKNOWLEDGEMENT

Every work of research and innovation needs an effort of a number of people. It is my prime duty to
thank all those people who directly or indirectly helped me in successfully carrying out the project.

The completion of this research has been possible due to the inspiration and valuable suggestions
and guidance from my guide P.C.SAHU. From the very outset of this endeavor, my respected guide
has stood by my side to help me out whenever required.

I take this opportunity to thank Dr PRAKASH.B.NAIK, PRINIPAL OF ADMINISTRATIVE


MANAGEMENT COLLEGE, Bangalore for his encouragement, guidance and many valuable ideas
imported to me for my project.

I would also like to thank all my respected teachers of MBA Department, AMC College, who have
always been very kind and supportive to me.

My heartful thanks to my parents and friends for going out of their way to see that I successfully
implemented and completed my project. Their words of wisdom and patience were much more than
a blessing.

PLACE : BANGALORE

NAME : SANDESH.V.S

REGNO : 17SBCMD058

DATE :
4
STUDY ON PIMA INDIAN DIABETES

NAME: SANDESH.V.S
REG NO: 17SBCMD058
COLLEGE: ADMINISTRATIVE MANAGEMENT COLLEGE
(A.M.C)

ABSTRACT:
Diabetes is a dangerous disease which is caused by one’s self negligence. It is a
group of metabolic disorders in which there are high blood sugar levels over a
prolonged period. Symptoms of high blood sugar include frequent urination,
increased thirst, and increased hunger. If left untreated, diabetes can cause many
complications. Acute complications can include diabetic ketoacidosis, hyperosmolar
hyperglycemic state, or death. Serious long-term complications include
cardiovascular disease, stroke, chronic kidney disease, foot ulcers, and damage to the
eyes.

This dataset is originally from the National Institute of Diabetes and Digestive and
Kidney Diseases. The objective of the dataset is to diagnostically predict whether or
not a patient has diabetes, based on certain diagnostic measurements included in the
dataset. In particular, all the female patients who are at least 21 years old of Pima
Indian Heritage are taken for the study. The datasets consists of several medical
predictor variables and one target variable, Outcome. Predictor variables include the
number of pregnancies the patient has had, their BMI, glucose level, insulin level,
blood pressure, diabetes pedigree function, skin thickness and age. Outcome is
induced by class variables 0 or 1.

5
 Pregnancies: Number of times pregnant
 Glucose: A two hour glucose tolerance test
 Blood Pressure: Diastolic blood pressure (mm Hg)
 Skin Thickness: Triceps skin fold thickness (mm)
 Insulin: Two-Hour serum insulin (mu U/ml)
 BMI: Body mass index (weight in kg/(height in m)^2)
 Diabetes Pedigree Function: Diabetes pedigree function
 Age: Age (years)
 Outcome: Class variable (0 or 1)

ISSUES OF THE STUDY:

 To build a machine learning model to accurately predict the diabetes rate.

 To predict whether the patients in the PIMA heritage have diabetes or not.

 Prediction has to be done using the given data based on their pregnancies, glucose
level, blood pressure, skin thickness, insulin level, BMI, Diabetes Pedigree Function
and age.

SIGNATURE OF THE GUIDE: SIGNATURE OF THE STUDENT:

6
CONTENT TABLE

SL.NO CHAPTERS PAGE NO

1 INTRODUCTION 9-13

2 BACKGROUND OF THE STUDY 14-15

3 TRAINING METHODS 16-24

4 LEARNING OUTCOME 25-39

5 DATA ANALYSIS 40-45

6 KEY FINDINGS AND COCLUSION 46-47

REFERENCES

APPENDIX

7
LIST OF TABLES AND GRAPHS

TABLE NO LIST OF GRAPHS PAGE. NO


1. LEVEL OF INSULIN WITH REFERENCE 40
TO THE NUMBER OF RECORDS,
PREGNANCIES, AGE AND BMI.

2. GRAPH RELATING TO THE VALUES OF 41


VARIOUS INDICATORS.

3. GRAPH RELATING TO DIABETES 42


PEDIGREE FUNCTION, INSULIN LEVEL
AND OUTCOME.

4. GRAPH INDICATING THE GLUCOSE 43


LEVELS.

5. GRAPH OF OUTCOME AND VALUE 44


WITH RESPECT TO GLUCOSE.

6. GRAPH RELATING TO THE SUM OF 45


DIABETES PEDIGREE FUNCTION.

8
CHAPTER-I

INTRODUCTION OF THE STUDY

PIMA INDIAN DIABETES:


Diabetes is a dangerous disease which is caused by one’s self negligence. It
is a group of metabolic disorders in which there are high blood sugar levels over a
prolonged period. Symptoms of high blood sugar include frequent urination,
increased thirst, and increased hunger. If left untreated, diabetes can cause many
complications. Acute complications can include diabetic ketoacidosis, hyperosmolar
hyperglycemic state, or death. Serious long-term complications include
cardiovascular disease, stroke, chronic kidney disease, foot ulcers, and damage to the
eyes.
This dataset is originally from the National Institute of Diabetes and Digestive and
Kidney Diseases. The objective of the dataset is to diagnostically predict whether or
not a patient has diabetes, based on certain diagnostic measurements included in the
dataset. In particular, all the female patients who are at least 21 years old of Pima
Indian Heritage are taken for the study. The datasets consists of several medical
predictor variables and one target variable, Outcome. Predictor variables include the
number of pregnancies the patient has had, their BMI, glucose level, insulin level,
blood pressure, diabetes pedigree function, skin thickness and age. Outcome is
induced by class variables 0 or 1.

Diabetes is a disease that occurs when your blood glucose which is also called blood
sugar, is too high. Blood glucose is your main source of energy and comes from the
food you eat. Insulin, a hormone made by the pancreas, helps glucose from food get
into your cells to be used for energy. Sometimes your body doesn’t make enough or
any insulin due to which glucose then stays in the blood and doesn’t reach the cells.
Over time, having too much glucose in your blood can cause health problems.
Sometimes people call diabetes “a touch of sugar” or “borderline diabetes”. These
9
terms suggest that someone doesn’t really have diabetes or has a less serious case,
but every case of diabetes is serious.

According to WORLD HEALTH ORGANIZATION diabetes is a chronic disease


caused by inherited or acquired deficiency in production of insulin by the pancreas,
or by the ineffectiveness of the insulin produced. Such a deficiency results in
increased concentrations of glucose in the blood, which in turn damage many of the
body's systems, in particular the blood vessels and nerves.

THE TWO PRINCIPLE FORMS OF DIABETES:


1. TYPE 1: In Type 1 diabetes the pancreas fails to produce the insulin which is
essential for survival. This form develops most frequently in children and
adolescents, but is being increasingly noted later in life.
2. TYPE 2: Type 2 diabetes is much more common and accounts for around 90% of all
diabetes cases worldwide. It occurs most frequently in adults, but is being noted
increasingly in adolescents as well.

CAUSES OF TYPE 1 DIABETES:


 Due to viral or bacterial infection.
 Due to chemical toxins in food.

CAUSES OF TYPE 2 DIABETES:


 If there is a history of diabetes in your family.
 If you are middle aged or older.
 If you are overweight or obese.
 If you are middle-aged and have high blood pressure.
 If insufficient physical activity, poor diet and extra weight carried around the waist.

10
OBJECTIVES OF THE STUDY:
 To know the diabetes rate of the inmates of PIMA Indian diabetes.
 To create an awareness about diabetes and its effects.
 To motivate the patients to improvise their physicality.
 To explain them the deadliness of the disease.

SCOPE OF THE STUDY:


The objective the study is to diagnostically predict whether or not a patient has
diabetes, based on certain diagnostic measurements included in the dataset. In
particular, all the female patients who are at least 21 years old of Pima Indian
Heritage are taken for the study. The datasets consists of several medical predictor
variables and one target variable, Outcome. Predictor variables include the number
of pregnancies the patient has had, their BMI, glucose level, insulin level, blood
pressure, diabetes pedigree function, skin thickness and age. Outcome is induced by
class variables 0 or 1.

11
SIGNIFICANCE OF THE STUDY:
 This project has been prepared with an intention to analyze and understand
the rate of diabetes of the female patients who are above 21 years of age of PIMA
Indian Heritage.
 This project is also prepared to spread awareness about diabetes in the PIMA
heritage.
 It even drives the patients to maintain their physicality and improvise their
health conditions.

PERIOD OF THE STUDY:


The research of the study has taken 4 weeks to analyze. The data for the
research has been taken from the year 2016-2017.

METHODOLOGY:
This research is purely based on the secondary data from www.kaggle.com
website and the internet. In this research Tableau has been used to analyze the data.

12
LIMITATIONS:
 All the medical predictor variables should be correct to get the accurate outcome.
 Women patients below 21 are not taken as a count for the study.
 Male patients are not taken for the study.
 We can only get the rate of diabetes through female patients who are above 21 years
of age of Pima Indian Heritage.

SUMMARY:
Diabetes is a dangerous disease which is caused by one’s self negligence. It
is a group of metabolic disorders in which there are high blood sugar levels over a
prolonged period. In this study we are going to analyze and predict whether the
patients of PIMA Indian Heritage have diabetes or not with the help of several
medical predictor variables and one target variable, Outcome. Predictor variables
include the number of pregnancies the patient has had, their BMI, glucose level,
insulin level, blood pressure, diabetes pedigree function, skin thickness and age.
Outcome is induced by class variables 0 or 1. The patients taken for the study are
female in gender and are above 21 years of age. This helps us to know and
understand the rate of diabetes in female patients of this heritage as well as it spreads
awareness about diabetes in this heritage. It even drives the patients to maintain their
physique and to consume healthy and hygiene food.

13
CHAPTER-II

BACKGROUND OF THE STUDY

Python was created in the early 1990’s by Guido Van Rossum at Centrum
Wiskunde and Informatica (CWI) in the Netherlands as a successor of a language
called ABC. Guido is Python's principal author, although it includes many
contributions from others. The last version released from CWI was Python 1.2. in
1995. Guido continued his work on Python at the Corporation for National Research
Initiatives (CNRI) in Reston, Virginia where he released several versions of the
software. Python 1.6 was the last of the versions released by CNRI. In 2000, Guido
and the Python core development team moved to BeOpen.com to form the BeOpen
PythonLabs team. Python 2.0 was the first and only release from BeOpen.com.
Following the release of Python 1.6, and after Guido Van Rossum left CNRI to work
with commercial software developers, it became clear that the ability to use Python
with software available under the GNU Public License (GPL) was very desirable.
CNRI and the Free Software Foundation (FSF) interacted to develop enabling
wording changes to the Python license. Python 1.6.1 is essentially the same as
Python 1.6, with a few minor bug fixes, and with a different license that enables later
versions to be GPL-compatible. Python 2.0.1 is a derivative work of Python 1.6.1, as
well as of Python 2.0.
Python is a relatively simple programming language that includes a rich set of
supporting libraries. This approach keeps the language simple and reliable, while
providing specialized feature sets as separate extensions.
Python has an easy-to-use syntax focused on the programmer who must type in the
program, read what was typed and provide formal documentation for the program.
Many languages have syntax focused on developing a simple, fast compiler; but
those languages may sacrifice readability and write-ability. Python strikes a good
balance between fast compilation, readability and write-ability.
Python is implemented in C, and relies on the extensive, well understood portable C
libraries. It fits seamlessly with UNIX, Linux and POSIX environments. Since these

14
standard C libraries are widely available for the various MS-Windows variants, and
other non-POSIX operating systems, Python runs similarly in all environments.
Python reflects a number of growing trends in software development. It is a very
simple language surrounded by a vast library of add-on modules. It is an open source
project, supported by dozens of individuals. It is an object-oriented language. It is a
platform-independent, scripted language, with complete access to operating system
API‘s. It supports integration of complex solutions from pre-built components. It is a
dynamic language, allowing more run-time flexibility than statically compiled
languages.
Additionally, Python is a scripting language with full access to Operating System
(OS) services. Consequently, Python can create high level solutions built up from
other complete programs. This allows someone to integrate applications seamlessly
creating high-powered and highly-focused meta-applications. This kind of very-high-
level programming is often attempted with shell scripting tools. However, the
programming power in most shell script languages is severely limited. Python is a
complete programming language in its own right, allowing a powerful mixture of
existing application programs and unique processing to be combined.
Python is an interpreter object-oriented programming language similar to PERL that
has gained popularity because of its clear syntax and readability. Python is said to be
relatively easy to learn and portable, meaning its statements can be interpreted in a
number of operating systems including UNIX-based systems Mac OS, MS-DOS,
OS/2, and various versions of Microsoft Windows 98. Python was created by Guido
van Rossum, a former resident of the Netherlands, whose favorite comedy group at
the time was Monty Python's Flying Circus. The source code is freely available and
open for modification and reuse. Python has a significant number of users.
A notable feature of Python is its indenting of source statements to make the code
easier to read. Python offers dynamic data type, ready-made class, and interfaces to
many system calls and libraries. It can be extended, using the C or C++ language.
Python can be used as the script in Microsoft's Active Server Page (ASP)
technology. The scoreboard system for the Melbourne (Australia) Cricket Ground is
written in Python. Z Object Publishing Environment, a popular Web application
server is also written in the Python language.

15
CHAPTER-III

TRAINING METHODS

Python is a general purpose programming language that is often applied in scripting


roles and python is also called as interpreted language. Python uses an interpreter
and using it we can not only write complete programs, we can also work with the
interpreter in a statement by statement mode enabling us to experiment quite easily.
Python is especially good for our purposes, in that it does not have a lot of
“overhead” before getting started. It is easy to jump in and experiment in an
interactive fashion.
My original motivation for creating python was the perceived need for a higher level
language in the Pima India Diabetes project.
I realized that the development of system administration utilities in C was taking too
long. Moreover, doing these things in the Bourne shell wouldn’t work for a Varity of
reason.
Python is interpreted –n python is processed at runtime by the interpreter. You do not
need to compile your program before executing it.
Python is interactive – you can actually sit at a python prompt and interact with the
interpreter directly to write your programs.
Python is object oriented – python supports object oriented style or technique of
programming that encapsulates code within objects.
Python is a beginner’s language – python is a great language for the beginner level
programmers and supports the development of a wide range of applications from
simple text processing to WWW browsers to games.

16
DATA TYPES IN PYTHON:

1. Python Numbers.
2. Python List.
3. Python Tuple.
4. Python Strings.
5. Python Set.
6. Python Dictionary.
7. Conversion between data types.

1. Python Numbers:
Integers, floating point numbers and complex numbers fall under python numbers
category. They are defined as int, float and complex class in python.
We can use the type() function to know which class a variable or a value belongs to
and the isinstance() function to check if an object belongs to a particular class.
Complex numbers are written in the form, x +y, where x is the real part and y is the
imaginary part.

2. Python list:
List is an ordered sequence of item. It is one of the most used data type in python and
is very flexible. All the items in a list do not need to be of the same type. Declaring a
list is pretty straight forward within brackets [].

3. Python Tuple:
Tuple is a list. The only difference is that tuples are immutable. Tuples once created
cannot be modified.
Tuples are used to write-protect data and are usually faster than list as it cannot
change dynamically. It is defined within parentheses () where items are separated by
commas.

17
4. Python Strings:
String is sequence of Unicode characters. We can use single quotes or double quotes
to represent strings. Multi-line strings can be denoted using triple quotes, ‘’’ or ‘’’

5. Python set:
Set is an unordered collection of unique items. Set is defined by values separated by
comma inside braces {}. Items in a set are not ordered. We can perform set
operations like union, intersection on two sets. Set have unique values. They
eliminate duplicates.

6. Python Dictionary:
Dictionary is an unordered collection of key-value pairs. It is generally used when
we have a huge amount of data. Dictionaries are optimized for retrieving data. We
must know the key to retrieve the value. In python, dictionaries are defined within
braces {} with each item being a pair in the from key: value key and value can be of
any type.

7. Conversion between data types:


We can convert between different data types by using different type conversion
functions like int(), float(), str() etc. Conversion from float to int will truncate the
value (make it closer to zero).

DATA
Data is measured collected and reported and analyzed, whereupon it can be
visualized using graphs, images or other analyses tools. Data as a general concept
refers to the fact that some existing information or knowledge is represented or
coded in some from suitable for better usage or processing. Raw data (“unprocessed
data”) is a collection of numbers or characters been “cleaned “and corrected by
researchers. Raw data need to be corrected to remove outliers or obvious instrument
or data entry errors. Data processing commonly occurs by stages, and the “processed

18
data “from one stage may be considered the “raw data” of the next stage. Data have
quantities, characters, or symbols on which operations are performed by a computer,
which may be stored and transmitted in the form of electrical signals and recorded on
magnetic, optical, or mechanical recording media.

DATA OBJECTIVES:
A database should act as a kind of medium to collect and store the incoming data in
an organized way in addition to storing the input data and should allow for an
efficient retrieval of stored data as per user’s requirements. A database should be
implemented with various security features such that it ensures high level of
integrity, i.e., developing a trust for the users about their data stored in the database.
A database should be highly scalable as the amount of the data increases over time.
Database should be highly adaptable to changes with respect to business needs. A
database should be highly consistent transactions operating on the data stored in it.
Further, it should also be highly durable so that it prevents loss of data despite the
loss of power.

METHODS OF CREATING DIFFERENT TYPE OF TABLE:


PANDAS:
PANDAS is a Python package providing fast, flexible, and expressive data structures
designed to make working with “relational” or “labeled” data both easy and intuitive.
It aims to be the fundamental high-level building block for doing practical, real
world data analysis in Python. Additionally, it has the broader goal of becoming the
most powerful and flexible open source data analysis / manipulation tool
available in any language. It is already well on its way toward this goal.

NUMPY:
NumPy is an open source Python package for scientific computing. NumPy supports
large, multidimensional arrays and matrices. NumPy is written in Python and C.
NumPy arrays are faster compared to Python lists. But NumPy arrays are not flexible
like Python lists; you can store only same data type in each column.

OVERVIEW OF THE DATA MANUPALATION:


19
Data manipulation attacks are generally an indirect type of sabotage. Although
altering data may not directly compromise a project, decisions based on bad data
have the potential to cause great damage later. For an example of how a change in
data can cause the failure of an entire operation, we can look at the crash of the Mars
Orbiter in 1999.

In this particular case, the difference in measurement of navigational data between


two teams involved, one using metric measurements and the other using English
measurements caused the craft to go 15 miles further into the atmosphere of Mars
than was planned. Instead of orbiting Mars, the Orbiter is believed to have pushed
straight through the atmosphere and out the other side, travelling off into space.
To clarify, nothing was mechanically or otherwise wrong with the entire project, just
a lack of communication on how to interpret the data. When data manipulation is
carried out as a deliberate attack, the results can range from common attacks such as
identity theft to potential loss of life, depending on the system in question and the
data being altered.

GROUP FUNCTION:
The groupby() function returns a GroupBy object, but essentially describes how the
rows of the original data set has been split the GroupBy object. Group variable is a
dictionary whose keys are the computed unique groups and corresponding values
being the axis labels belonging to each group.
Functions like max(), min(), mean(), first(), last() can be quickly applied to the
GroupBy object to obtain summary statistics for each group – an immensely useful
function. This functionality is similar to the dplyr and plyr libraries for R. Different
variables can be excluded / included from each summary requirement.

20
DATABASE MODEL:
There are many kinds of data models. Some of the most common ones include:
 Hierarchical database model
 Relational model
 Network model
 Object-oriented database model

Hierarchical model:
The hierarchical model organizes data into a tree-like structure, where each record
has a single parent or root. Sibling records are sorted in a particular order. That order
is used as the physical order for storing the database. This model is good for
describing many real-world relationships.

Relational model:
The most common model, the relational model sorts data into tables, also known as
relations, each of which consists of columns and rows. Each column lists an attribute
of the entity in question, such as price, zip code, or birth date. Together, the
attributes in a relation are called a domain. A particular attribute or combination of
attributes is chosen as a primary key that can be referred to in other tables, when it’s
called a foreign key.

Network model:
The network model builds on the hierarchical model by allowing many-to-many
relationships between linked records, implying multiple parent records. Based on
mathematical set theory, the model is constructed with sets of related records. Each
set consists of one owner or parent record and one or more member or child records.
A record can be a member or child in multiple sets, allowing this model to convey
complex relationships.

21
Object-oriented database model:
This model defines a database as a collection of objects, or reusable software
elements, with associated features and methods. There are several kinds of object-
oriented databases.
A multimedia database incorporates media, such as images, that could not be stored
in a relational database. A hypertext database allows any object to link to any other
object. It’s useful for organizing lots of disparate data, but it’s not ideal for numerical
analysis.

Displaying Data from Multiple Tables:


The related tables of a large database are linked through the use of foreign and
primary keys or add what are often referred to as common columns. The ability to
join tables will enable you to more meaning to the result table that is produced. For
'n' number tables to be joined in a query, minimum (n-1) join conditions are
necessary. Based on the join conditions:

NATURAL JOIN:
The NATURAL keyword can simplify the syntax of an equijoin. A NATURAL
JOIN is possible whenever two (or more) tables have columns with the same name,
and the columns are join compatible, i.e., the columns have a shared domain of
values. The join operation joins rows from the tables that have equal column values
for the same named columns.

USING CLAUSE:
Using Natural joins, Oracle implicitly identify columns to form the basis of join.
Many situations require explicit declaration of join conditions. In such cases, we use
USING clause to specify the joining criteria. Since, USING clause joins the tables
based on equality of columns, it is also known as Equijoin. They are also known as
Inner joins or simple joins.

22
SELF JOIN:
A SELF-JOIN operation produces a result table when the relationship of interest
exists among rows that are stored within a single table. In other words, when a table
is joined to itself, the join is known as Self Join.
Consider EMPLOYEES table, which contains employee and their reporting
managers. To find manager's name for an employee would require a join on the EMP
table itself. This is a typical candidate for Self Join

Non EQUIJOIN:
A non-equality join is used when the related columns can't be joined with an equal
sign-meaning there are no equivalent rows in the tables to be joined. A non-equality
join enables you to store a range's minimum value in one column of a record and the
maximum value in another column. So instead of finding a column-to column match,
you can use a non-equality join to determine whether the item being shipped falls
between minimum and maximum ranges in the columns. If the join does find a
matching range for the item, the corresponding shipping fee can be returned in the
results. As with the traditional method of equality joins, a non-equality join can be
performed in a WHERE clause. In addition, the JOIN keyword can be used with the
ON clause to specify relevant columns for the join.

OUTER JOINS:
An Outer Join is used to identify situations where rows in one table do not match
rows in a second table, even though the two tables are related.
There are three types of outer joins: the LEFT, RIGHT, and FULL OUTER JOIN.
They all begin with an INNER JOIN, and then they add back some of the rows that
have been dropped. A LEFT OUTER JOIN adds back all the rows that are dropped
from the first (left) table in the join condition, and output columns from the second
(right) table are set to NULL. A RIGHT OUTER JOIN adds back all the rows that
are dropped from the second (right) table in the join condition, and output columns

23
from the first (left) table are set to NULL. The FULL OUTER JOIN adds back all
the rows that are dropped from both the tables.

RIGHT OUTER JOIN:


A RIGHT OUTER JOIN adds back all the rows that are dropped from the second
(right) table in the join condition, and output columns from the first (left) table are
set to NULL. Note the below query lists the employees and their corresponding
departments. Also no employee has been assigned to department 30.

FULL OUTER JOIN:


The FULL OUTER JOIN adds back all the rows that are dropped from both the
tables. Below query shows lists the employees and their departments. Note that
employee 'MAN' has not been assigned any department till now (it's NULL) and
department 30 is not assigned to any employee.

DISCRIBE WHAT KIND OF PROBLEM THAT CAN BE


SOLVED WITH HELP OF PYTHON:
Python is industry standard. If you learn / master it and want a software engineering
job, this is one step. Google and YouTube both have sections of their back-end
software written in Python. There is a large community of developers in Python.
Sharing, commenting, looking at other’s code is exactly the way to learn more. (See
educational criterion number 4 from above) there are scores of libraries written in
Python to solve many tasks. There are new tools and libraries being developed every
day, so you can automate any kind of task you can think of in Python. The
fuzzywuzzy library implements fuzzy search; the YouTube-dl package lets you
download YouTube videos; this package shows the text of “The Zen of Python” on
the screen; etc. Python reads like English - if there is any language whose code you
can look at and just read it is Python.

24
CHAPTER-IV

LEARNING OUTCOME

OVERALL LEARNING:

 Have an intermediate skill level of Python programming.


 Use the Jupyter Notebook environment.
 Use the Numpy library to create and manipulate arrays.
 Use the Pandas module with Python to create and structure data.
 Learn how to work with various data formats with MS-TABLEAU worksheets,
HTML.
 Have a portfolio of various data analysis project.
 Extraction of data.
 Data visualization using TABLEAU

LEARNING OUTCOMES:

 Ability to extract the data from different sources by using SQL and RDBMS.
 Conversion of data into TABLEAU format, which is extracted from SQL.
 Analyze the data using Python with Numpy.
 Visualize data using TABLEAU for creating the report.

25
INTRODUCTION:

THE OVERVEIW OF TABLEAU:


 Artificial Intelligence.
 Business Intelligence.
 Data Visualization.
 Tableau.

WHAT IS TABLEAU?
 Most popular tool, fastest evolving and used for Data Visualization and Business
Intelligence.
 Used to create reports, graph, dashboards and maps
 Using simple interactions (just by dragging and dropping).
 Perform end to end analytics for a wide range of data.

TOOLS OFFERED BY TABLEAU:

 Tableau Desktop

 Tableau Online

 Tableau Server

 Tableau Reader

 Tableau Public

26
CONNECTING TO VARIOUS SOURCES:
 Excel Spreadsheet.
 Database.
 Bigdata.
 Data warehouses.
 Cloud application.
 50+ different databases.
 Compatible on desktop, tablets and mobile phones.

FEATURES AND ADVANTAGES OF TABLEAU:


 Speed and Accuracy.
 See and understand the data better.
 Suits for all kinds of need and organization.
 Leverage the power of database.
 Easy publishing and sharing.
 Beautiful and interactive dashboards.
 User-friendly.

TABLEAU CALCULATION:
 Create Calculated Field.
 Using the calculated field.
 Creating formulae.
 Aggregate calculations.
 Quick table calculations.

FORMATING:
 Formatting the axes.
 Changing the font.
 Changing the shade and alignment.
 Formatting the boarders.
27
INSTALLATION OF TABLEAU:

28
29
30
31
CONNECTING TO A DATA SOURCE:

32
33
WORKING ON TABLEAU:

34
REPORT GENERATION:

35
CHARTS IN TABLEAU:

36
FORECASTING:

37
TREND LINES:

38
TABLEAU DASHBOARD:

39
CHAPTER-V

DATA ANALYSIS:
LEVEL OF INSULIN WITH REFERENCE TO THE NUMBER OF RECORDS, PREGNANCIES,
AGE AND BMI.

INTERPRETATION:

The graph gives detailing about the level of insulin with relation to number of records,
pregnancies, age and BMI. It indicates that the insulin levels are affecting all the categories of the
graph.

40
GRAPH RELATING TO THE VALUES OF VARIOUS INDICATORS.

INTERPRETATION:
The graph gives detailing about the various indicators of diabetes with respect to their value, skin
thickness and number of records. In value the amount of glucose level is more and diabetes pedigree
is the least. The value of skin thickness is above 15k.The number of records is between 600 and
800.

41
GRAPH RELATING TO DIABETES PEDIGREE FUNCTION, INSULIN LEVEL AND
OUTCOME.

INTERPRETATION:
The graph gives detailing that diabetes pedigree function is above 300, insulin is above 60k and
outcome is above 200. Thus, it indicates diabetes pedigree function, insulin level and gives out the
outcome.
42
GRAPH INDICATING THE GLUCOSE LEVELS.

INTERPRETATION:
The graph shows the pattern of the glucose levels. The graph states that the highest count of glucose
is 130 to 140 which are between 100 and 110 and the least count of glucose is 0 to 10 which is
between 200 and 210.

43
GRAPH OF OUTCOME AND VALUE WITH RESPECT TO GLUCOSE.

INTERPRETATION:

The graph shows the pattern of glucose with respect to the outcome and the value. The highest
value of glucose with respect to outcome is 40 at 120 of glucose and the least value of glucose with
respect to outcome is 0 at 50 and 60.

44
GRAPH RELATING TO THE SUM OF DIABETES PEDIGREE FUNCTION.

INTERPRETAION:
The graph gives details about the sum of diabetes pedigree function.

45
CHAPTER-VI

KEY FINDINGS AND CONCLUSION

KEY FINDINGS:
 Diabetes is a deadly disease.
 The number of people with diabetes has nearly quadrupled since 1980.
 Diabetes is one of the leading causes of death in the world.
 There are two major forms of diabetes.
 A third type of diabetes is gestational diabetes.
 Type2 diabetes is much more common than type1 diabetes.
 Type2 diabetes can be prevented.
 People of diabetes can live long and healthy lives when their diabetes is detected and
well managed.
 Early diagnosis and intervention is the starting point for living well with diabetes.
 Diabetes is an important cause of blindness, amputation and kidney failure.

46
CONCLUSION:
Diabetes is a dangerous disease which is caused by one’s self negligence. It
is a group of metabolic disorders in which there are high blood sugar levels over a
prolonged period. In this study we are analyzing and predicting whether the patients
of PIMA Indian Heritage have diabetes or not with the help of several medical
predictor variables and one target variable, Outcome. Predictor variables include the
number of pregnancies the patient has had, their BMI, glucose level, insulin level,
blood pressure, diabetes pedigree function, skin thickness and age. Outcome is
induced by class variables 0 or 1. The patients taken for the study are female in
gender and are above 21 years of age. This helps us to know and understand the rate
of diabetes in female patients of this heritage as well as it spreads awareness about
diabetes in this heritage. It even drives the patients to maintain their physique and to
consume healthy and hygiene food.

47
REFERENCES

 www.kaggle.com.
 Training Material.
 The web.

48
APPENDIX

Pregnancies Glucose BP ST Insulin BMI DPF Age Outcome


6 148 72 35 0 33.6 0.627 50 1
1 85 66 29 0 26.6 0.351 31 0
8 183 64 0 0 23.3 0.672 32 1
1 89 66 23 94 28.1 0.167 21 0
0 137 40 35 168 43.1 2.288 33 1
5 116 74 0 0 25.6 0.201 30 0
3 78 50 32 88 31 0.248 26 1
10 115 0 0 0 35.3 0.134 29 0
2 197 70 45 543 30.5 0.158 53 1
8 125 96 0 0 0 0.232 54 1
4 110 92 0 0 37.6 0.191 30 0
10 168 74 0 0 38 0.537 34 1
10 139 80 0 0 27.1 1.441 57 0
1 189 60 23 846 30.1 0.398 59 1
5 166 72 19 175 25.8 0.587 51 1
7 100 0 0 0 30 0.484 32 1
0 118 84 47 230 45.8 0.551 31 1
7 107 74 0 0 29.6 0.254 31 1
1 103 30 38 83 43.3 0.183 33 0
1 115 70 30 96 34.6 0.529 32 1
3 126 88 41 235 39.3 0.704 27 0
8 99 84 0 0 35.4 0.388 50 0
7 196 90 0 0 39.8 0.451 41 1
9 119 80 35 0 29 0.263 29 1
11 143 94 33 146 36.6 0.254 51 1
10 125 70 26 115 31.1 0.205 41 1
7 147 76 0 0 39.4 0.257 43 1
1 97 66 15 140 23.2 0.487 22 0
13 145 82 19 110 22.2 0.245 57 0
5 117 92 0 0 34.1 0.337 38 0
5 109 75 26 0 36 0.546 60 0
3 158 76 36 245 31.6 0.851 28 1
3 88 58 11 54 24.8 0.267 22 0
6 92 92 0 0 19.9 0.188 28 0

49
10 122 78 31 0 27.6 0.512 45 0
4 103 60 33 192 24 0.966 33 0
11 138 76 0 0 33.2 0.42 35 0
9 102 76 37 0 32.9 0.665 46 1
2 90 68 42 0 38.2 0.503 27 1
4 111 72 47 207 37.1 1.39 56 1
3 180 64 25 70 34 0.271 26 0
7 133 84 0 0 40.2 0.696 37 0
7 106 92 18 0 22.7 0.235 48 0
9 171 110 24 240 45.4 0.721 54 1
7 159 64 0 0 27.4 0.294 40 0
0 180 66 39 0 42 1.893 25 1
1 146 56 0 0 29.7 0.564 29 0
2 71 70 27 0 28 0.586 22 0
7 103 66 32 0 39.1 0.344 31 1
7 105 0 0 0 0 0.305 24 0
1 103 80 11 82 19.4 0.491 22 0
1 101 50 15 36 24.2 0.526 26 0
5 88 66 21 23 24.4 0.342 30 0
8 176 90 34 300 33.7 0.467 58 1
7 150 66 42 342 34.7 0.718 42 0
1 73 50 10 0 23 0.248 21 0
7 187 68 39 304 37.7 0.254 41 1
0 100 88 60 110 46.8 0.962 31 0
0 146 82 0 0 40.5 1.781 44 0
0 105 64 41 142 41.5 0.173 22 0
2 84 0 0 0 0 0.304 21 0
8 133 72 0 0 32.9 0.27 39 1
5 44 62 0 0 25 0.587 36 0
2 141 58 34 128 25.4 0.699 24 0
7 114 66 0 0 32.8 0.258 42 1
5 99 74 27 0 29 0.203 32 0
0 109 88 30 0 32.5 0.855 38 1
2 109 92 0 0 42.7 0.845 54 0
1 95 66 13 38 19.6 0.334 25 0
4 146 85 27 100 28.9 0.189 27 0
2 100 66 20 90 32.9 0.867 28 1
5 139 64 35 140 28.6 0.411 26 0
13 126 90 0 0 43.4 0.583 42 1

50
4 129 86 20 270 35.1 0.231 23 0
1 79 75 30 0 32 0.396 22 0
1 0 48 20 0 24.7 0.14 22 0
7 62 78 0 0 32.6 0.391 41 0
5 95 72 33 0 37.7 0.37 27 0
0 131 0 0 0 43.2 0.27 26 1
2 112 66 22 0 25 0.307 24 0
3 113 44 13 0 22.4 0.14 22 0
2 74 0 0 0 0 0.102 22 0
7 83 78 26 71 29.3 0.767 36 0
0 101 65 28 0 24.6 0.237 22 0
5 137 108 0 0 48.8 0.227 37 1
2 110 74 29 125 32.4 0.698 27 0
13 106 72 54 0 36.6 0.178 45 0
2 100 68 25 71 38.5 0.324 26 0
15 136 70 32 110 37.1 0.153 43 1
1 107 68 19 0 26.5 0.165 24 0
1 80 55 0 0 19.1 0.258 21 0
4 123 80 15 176 32 0.443 34 0
7 81 78 40 48 46.7 0.261 42 0
4 134 72 0 0 23.8 0.277 60 1
2 142 82 18 64 24.7 0.761 21 0
6 144 72 27 228 33.9 0.255 40 0
2 92 62 28 0 31.6 0.13 24 0
1 71 48 18 76 20.4 0.323 22 0
6 93 50 30 64 28.7 0.356 23 0

51

Das könnte Ihnen auch gefallen