Sie sind auf Seite 1von 23

PRACTICAL MACHINE LEARNING WITH

PYTHON AND SCIKIT-LEARN


Andrea Grandi
@andreagrandi
WHO AM I

• Andrea Grandi
• I live in London (UK)
• Software Developer at Government Digital Service
• Python/Django developer
• passionate about micro controllers, IoT, Arduino, Golang
and machine learning
WHO I AM NOT
WHY THIS TALK?
WHAT YOU ARE GOING TO LEARN

• Approach the problem


• Study the available data
• Find the “best” algorithm
• Make predictions

… in a simple way!
WHAT IS MACHINE LEARNING?
“an application of artificial
intelligence (AI) that provides
systems the ability to automatically
learn and improve from experience
without being explicitly
programmed”
TYPES OF LEARNING
SUPERVISED LEARNING
WHAT YOU NEED

• Find an interesting problem: you can get a free dataset from


http://archive.ics.uci.edu/ml
• Python 3 + virtualenv
• Requirements: jupyter, matplotlib, pandas, numpy, scikit-learn, seaborn
(see requirements.txt for details)
INTRODUCING THE DATASET
PIMA INDIANS
WHO

Pima Indians are a group of native Indians living in south area of Arizona
WHAT HAPPENED?

A group of them moved in a different area and started developing an


increased rate of type 2 diabetes
WHY?

Similar genetic characteristics but different habits. When these people


moved, they adopted a different lifestyle and diet.
CONCLUSION

Type 2 diabetes is largely preventable


DATASET
768 women with 8 characteristics

• Number of times pregnant


• Plasma glucose concentration a 2 hours in an oral glucose tolerance test
• Diastolic blood pressure (mm Hg)
• Triceps skin fold thickness (mm)
• 2-Hour serum insulin (mu U/ml)
• Body mass index (weight in kg/(height in m)^2)
• Diabetes pedigree function
• Age (years)

The last column indicates if the person is affected (1) by diabetes or not (0).
BEFORE WE START...
JUPYTER NOTEBOOK

https://github.com/andreagrandi/ml-pima-notebook
WHAT’S NEXT?
References

• https://github.com/andreagrandi/ml-pima-notebook
• https://machinelearningmastery.com
• http://archive.ics.uci.edu/ml
• https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-india
ns-diabetes.data.csv
• https://www.kaggle.com/uciml/pima-indians-diabetes-database/data
Credits

• Dr. Daniela Ceccarelli (my wife): for validating the medical part of what I say
• Dr. Jason Brownlee: for his amazing blog
• my colleagues who had the patience to listen to my talk
How to stay in touch

• blog: https://www.andreagrandi.it
• Twitter: @andreagrandi
• GitHub: https://github.com/andreagrandi
• email: a.grandi@gmail.com
• IRC: Andy80 on FreeNode (#python, #django, #python-uk )
• PGP: 7D4C 4090 DB50 1693 4614 F6FC 6206 9DE9 2240 402E
Thanks!

Das könnte Ihnen auch gefallen