Python for Probability, Statistics, and Machine Learning

Ebook567 pages3 hours

Python for Probability, Statistics, and Machine Learning

Name: Python for Probability, Statistics, and Machine Learning
Author: José Unpingco
ISBN: 9783319307176

By José Unpingco

Rating: 0 out of 5 stars

()

Read preview

About this ebook

This book, fully updated for Python version 3.6+, covers the key ideas that link probability, statistics, and machine learning illustrated using Python modules in these areas. All the figures and numerical results are reproducible using the Python codes provided. The author develops key intuitions in machine learning by working meaningful examples using multiple analytical methods and Python codes, thereby connecting theoretical concepts to concrete implementations. Detailed proofs for certain important results are also provided. Modern Python modules like Pandas, Sympy, Scikit-learn, Tensorflow, and Keras are applied to simulate and visualize important machine learning concepts like the bias/variance trade-off, cross-validation, and regularization. Many abstract mathematical ideas, such as convergence in probability theory, are developed and illustrated with numerical examples.
This updated edition now includes the Fisher Exact Test and the Mann-Whitney-Wilcoxon Test. A new section on survival analysis has been included as well as substantial development of Generalized Linear Models. The new deep learning section for image processing includes an in-depth discussion of gradient descent methods that underpin all deep learning algorithms. As with the prior edition, there are new and updated *Programming Tips* that the illustrate effective Python modules and methods for scientific programming and machine learning. There are 445 run-able code blocks with corresponding outputs that have been tested for accuracy. Over 158 graphical visualizations (almost all generated using Python) illustrate the concepts that are developed both in code and in mathematics. We also discuss and use key Python modules such as Numpy, Scikit-learn, Sympy, Scipy, Lifelines, CvxPy, Theano, Matplotlib, Pandas, Tensorflow, Statsmodels, and Keras.
This book is suitable for anyone with an undergraduate-level exposure to probability, statistics, or machine learning and with rudimentary knowledge of Python programming.

Skip carousel

LanguageEnglish

PublisherSpringer

Release dateMar 16, 2016

ISBN9783319307176

Author

José Unpingco

Related authors

Skip carousel

Related to Python for Probability, Statistics, and Machine Learning

Related ebooks

Skip carousel

Numerical Python: A Practical Techniques Approach for Industry
Ebook
Numerical Python: A Practical Techniques Approach for Industry
byRobert Johansson
Rating: 0 out of 5 stars
0 ratings
Python for Marketing Research and Analytics
Ebook
Python for Marketing Research and Analytics
byJason S. Schwarz
Rating: 0 out of 5 stars
0 ratings
Applied Natural Language Processing with Python: Implementing Machine Learning and Deep Learning Algorithms for Natural Language Processing
Ebook
Applied Natural Language Processing with Python: Implementing Machine Learning and Deep Learning Algorithms for Natural Language Processing
byTaweh Beysolow II
Rating: 0 out of 5 stars
0 ratings
MATLAB Machine Learning Recipes: A Problem-Solution Approach
Ebook
MATLAB Machine Learning Recipes: A Problem-Solution Approach
byMichael Paluszek
Rating: 0 out of 5 stars
0 ratings
Practical Data Science with Python 3: Synthesizing Actionable Insights from Data
Ebook
Practical Data Science with Python 3: Synthesizing Actionable Insights from Data
byErvin Varga
Rating: 0 out of 5 stars
0 ratings
Mastering Python: a Comprehensive Guide
Ebook
Mastering Python: a Comprehensive Guide
byAmérico Moreira
Rating: 0 out of 5 stars
0 ratings
Learn Java with Math: Using Fun Projects and Games
Ebook
Learn Java with Math: Using Fun Projects and Games
byRon Dai
Rating: 0 out of 5 stars
0 ratings
Embedded Software Design and Programming of Multiprocessor System-on-Chip: Simulink and System C Case Studies
Ebook
Embedded Software Design and Programming of Multiprocessor System-on-Chip: Simulink and System C Case Studies
byKatalin Popovici
Rating: 0 out of 5 stars
0 ratings
MATLAB Optimization Techniques
Ebook
MATLAB Optimization Techniques
byCesar Lopez
Rating: 0 out of 5 stars
0 ratings
Automated Theorem Proving in Software Engineering
Ebook
Automated Theorem Proving in Software Engineering
byJohann M. Schumann
Rating: 0 out of 5 stars
0 ratings
Handbook of Human Centric Visualization
Ebook
Handbook of Human Centric Visualization
byWeidong Huang
Rating: 0 out of 5 stars
0 ratings
Pro ASP.NET 4.5 in C#
Ebook
Pro ASP.NET 4.5 in C#
byAdam Freeman
Rating: 0 out of 5 stars
0 ratings
Mathematical Modeling, Simulations, and AI for Emergent Pandemic Diseases: Lessons Learned From COVID-19
Ebook
Mathematical Modeling, Simulations, and AI for Emergent Pandemic Diseases: Lessons Learned From COVID-19
byEsteban A. Hernandez-Vargas
Rating: 0 out of 5 stars
0 ratings
Tensor Analysis and Elementary Differential Geometry for Physicists and Engineers
Ebook
Tensor Analysis and Elementary Differential Geometry for Physicists and Engineers
byHung Nguyen-Schäfer
Rating: 0 out of 5 stars
0 ratings
.NET IL Assembler
Ebook
.NET IL Assembler
bySerge Lidin
Rating: 0 out of 5 stars
0 ratings
Complex Binary Number System: Algorithms and Circuits
Ebook
Complex Binary Number System: Algorithms and Circuits
byTariq Jamil
Rating: 0 out of 5 stars
0 ratings
Assessing and Improving Prediction and Classification: Theory and Algorithms in C++
Ebook
Assessing and Improving Prediction and Classification: Theory and Algorithms in C++
byTimothy Masters
Rating: 0 out of 5 stars
0 ratings
.NET DevOps for Azure: A Developer's Guide to DevOps Architecture the Right Way
Ebook
.NET DevOps for Azure: A Developer's Guide to DevOps Architecture the Right Way
byJeffrey Palermo
Rating: 0 out of 5 stars
0 ratings
Beginning T-SQL
Ebook
Beginning T-SQL
byKathi Kellenberger
Rating: 0 out of 5 stars
0 ratings
Beginning Android C++ Game Development
Ebook
Beginning Android C++ Game Development
byBruce Sutherland
Rating: 0 out of 5 stars
0 ratings
C# Deconstructed: Discover how C# works on the .NET Framework
Ebook
C# Deconstructed: Discover how C# works on the .NET Framework
byMohammad Rahman
Rating: 0 out of 5 stars
0 ratings
Beginning Backbone.js
Ebook
Beginning Backbone.js
byJames Sugrue
Rating: 3 out of 5 stars
3/5
Pro PowerShell for Amazon Web Services: DevOps for the AWS Cloud
Ebook
Pro PowerShell for Amazon Web Services: DevOps for the AWS Cloud
byBrian Beach
Rating: 0 out of 5 stars
0 ratings
State Space Systems With Time-Delays Analysis, Identification, and Applications
Ebook
State Space Systems With Time-Delays Analysis, Identification, and Applications
byYa Gu
Rating: 0 out of 5 stars
0 ratings
How Transistor Area Shrank by 1 Million Fold
Ebook
How Transistor Area Shrank by 1 Million Fold
byHoward Tigelaar
Rating: 0 out of 5 stars
0 ratings
Java EE 7 Recipes: A Problem-Solution Approach
Ebook
Java EE 7 Recipes: A Problem-Solution Approach
byJosh Juneau
Rating: 0 out of 5 stars
0 ratings
Writing for Computer Science
Ebook
Writing for Computer Science
byJustin Zobel
Rating: 4 out of 5 stars
4/5
Singular Spectrum Analysis for Time Series
Ebook
Singular Spectrum Analysis for Time Series
byNina Golyandina
Rating: 0 out of 5 stars
0 ratings
Jump Start Sass: Get Up to Speed With Sass in a Weekend
Ebook
Jump Start Sass: Get Up to Speed With Sass in a Weekend
byHugo Giraudel
Rating: 0 out of 5 stars
0 ratings
Requirements Engineering
Ebook
Requirements Engineering
byJeremy Dick
Rating: 2 out of 5 stars
2/5

Telecommunications For You

Skip carousel

Math Proofs Demystified
Ebook
Math Proofs Demystified
byStan Gibilisco
Rating: 5 out of 5 stars
5/5
12 Ways Your Phone Is Changing You
Ebook
12 Ways Your Phone Is Changing You
byTony Reinke
Rating: 4 out of 5 stars
4/5
Make Your Smartphone 007 Smart
Ebook
Make Your Smartphone 007 Smart
byConrad Jaeger
Rating: 4 out of 5 stars
4/5
15 Dangerously Mad Projects for the Evil Genius
Ebook
15 Dangerously Mad Projects for the Evil Genius
bySimon Monk
Rating: 4 out of 5 stars
4/5
Android App Development For Dummies
Ebook
Android App Development For Dummies
byMichael Burton
Rating: 0 out of 5 stars
0 ratings
Tor and the Dark Art of Anonymity
Ebook
Tor and the Dark Art of Anonymity
byLance Henderson
Rating: 5 out of 5 stars
5/5
The Deal of the Century: The Breakup of AT&T
Ebook
The Deal of the Century: The Breakup of AT&T
bySteve Coll
Rating: 4 out of 5 stars
4/5
The Hello Girls: America’s First Women Soldiers
Ebook
The Hello Girls: America’s First Women Soldiers
byElizabeth Cobbs
Rating: 4 out of 5 stars
4/5
22 Radio and Receiver Projects for the Evil Genius
Ebook
22 Radio and Receiver Projects for the Evil Genius
byThomas Petruzzellis
Rating: 0 out of 5 stars
0 ratings
Wireless and Mobile Hacking and Sniffing Techniques
Ebook
Wireless and Mobile Hacking and Sniffing Techniques
byDr. Hidaia Mahmood Alassouli
Rating: 0 out of 5 stars
0 ratings
Codes and Ciphers - A History of Cryptography
Ebook
Codes and Ciphers - A History of Cryptography
byAlexander D'Agapeyeff
Rating: 4 out of 5 stars
4/5
Codes and Ciphers
Ebook
Codes and Ciphers
byHarperCollins UK
Rating: 5 out of 5 stars
5/5
Making Things Move DIY Mechanisms for Inventors, Hobbyists, and Artists
Ebook
Making Things Move DIY Mechanisms for Inventors, Hobbyists, and Artists
byDustyn Roberts
Rating: 0 out of 5 stars
0 ratings
Medical Charting Demystified
Ebook
Medical Charting Demystified
byJoan Richards
Rating: 2 out of 5 stars
2/5
Digital Filmmaking for Beginners A Practical Guide to Video Production
Ebook
Digital Filmmaking for Beginners A Practical Guide to Video Production
byMichael K. Hughes
Rating: 0 out of 5 stars
0 ratings
A Beginner's Guide to Ham Radio
Ebook
A Beginner's Guide to Ham Radio
byGeorge Freeman
Rating: 0 out of 5 stars
0 ratings
The Great U.S.-China Tech War
Ebook
The Great U.S.-China Tech War
byGordon G. Chang
Rating: 4 out of 5 stars
4/5
130 Work from Home Ideas
Ebook
130 Work from Home Ideas
byMichael A. Hudson
Rating: 3 out of 5 stars
3/5
iPhone Unlocked for the Non-Tech Savvy
Ebook
iPhone Unlocked for the Non-Tech Savvy
byKevin Pitch
Rating: 0 out of 5 stars
0 ratings
The TAB Guide to DIY Welding: Hands-on Projects for Hobbyists, Handymen, and Artists
Ebook
The TAB Guide to DIY Welding: Hands-on Projects for Hobbyists, Handymen, and Artists
byJackson Morley
Rating: 0 out of 5 stars
0 ratings
Tubes: A Journey to the Center of the Internet
Ebook
Tubes: A Journey to the Center of the Internet
byAndrew Blum
Rating: 4 out of 5 stars
4/5
VoIP For Dummies
Ebook
VoIP For Dummies
byTimothy V. Kelly
Rating: 0 out of 5 stars
0 ratings
The Fast Track to Your General Class Ham Radio License: Comprehensive Preparation for All FCC General Class Exam Questions July 1, 2023 through June 30, 2027
Ebook
The Fast Track to Your General Class Ham Radio License: Comprehensive Preparation for All FCC General Class Exam Questions July 1, 2023 through June 30, 2027
byMichael Burnette, AF7KB
Rating: 0 out of 5 stars
0 ratings
MORE Electronic Gadgets for the Evil Genius: 40 NEW Build-it-Yourself Projects
Ebook
MORE Electronic Gadgets for the Evil Genius: 40 NEW Build-it-Yourself Projects
byRobert E. Iannini
Rating: 4 out of 5 stars
4/5
Making Everyday Electronics Work: A Do-It-Yourself Guide: A Do-It-Yourself Guide
Ebook
Making Everyday Electronics Work: A Do-It-Yourself Guide: A Do-It-Yourself Guide
byStan Gibilisco
Rating: 4 out of 5 stars
4/5
Virtual Selling: How to Build Relationships, Differentiate, and Win Sales Remotely
Ebook
Virtual Selling: How to Build Relationships, Differentiate, and Win Sales Remotely
byMike Schultz
Rating: 4 out of 5 stars
4/5
Nurse Management Demystified
Ebook
Nurse Management Demystified
byIrene McEachen
Rating: 0 out of 5 stars
0 ratings
Trigonometry Demystified 2/E
Ebook
Trigonometry Demystified 2/E
byStan Gibilisco
Rating: 4 out of 5 stars
4/5
Pharmacology Demystified
Ebook
Pharmacology Demystified
byMary Kamienski
Rating: 4 out of 5 stars
4/5
Going iPad (Third Edition): Making the iPad Your Only Computer
Ebook
Going iPad (Third Edition): Making the iPad Your Only Computer
byBrian Schell
Rating: 5 out of 5 stars
5/5

Related podcast episodes

Skip carousel

Working with Code: How Does a Coder at NASA Do His Job?
Podcast episode
Working with Code: How Does a Coder at NASA Do His Job?
byWorking
0 ratings
0% found this document useful
Unlocking The Power of Data Lineage In Your Platform with OpenLineage: An interview with Julien Le Dem about the OpenLineage specification and the opportunity that it offers for simplifying the tracking and analysis of data lineage across your data platform.
Podcast episode
Unlocking The Power of Data Lineage In Your Platform with OpenLineage: An interview with Julien Le Dem about the OpenLineage specification and the opportunity that it offers for simplifying the tracking and analysis of data lineage across your data platform.
byData Engineering Podcast
0 ratings
0% found this document useful
Throwing Houlihans at MongoDB with Rick Houlihan: A year or so before the pandemic hit Corey traveled to Australia for a keynote speech. There he crossed paths with the closing keynote which was delivered by Rick Houlihan. Rick, Director Developer Relations for Strategic Accounts at MongoDB, put Corey’s
Podcast episode
Throwing Houlihans at MongoDB with Rick Houlihan: A year or so before the pandemic hit Corey traveled to Australia for a keynote speech. There he crossed paths with the closing keynote which was delivered by Rick Houlihan. Rick, Director Developer Relations for Strategic Accounts at MongoDB, put Corey’s
byScreaming in the Cloud
0 ratings
0% found this document useful
Episode 233: Manual unit testing and WFH demotivation
Podcast episode
Episode 233: Manual unit testing and WFH demotivation
bySoft Skills Engineering
100%
100% found this document useful
Jobs of Tomorrow: Windows Insider Podcast Episode 17
Podcast episode
Jobs of Tomorrow: Windows Insider Podcast Episode 17
byWindows Insider Podcast
100%
100% found this document useful
Every commit is a gift: celebrating Maintainer Week with Brett Cannon
Podcast episode
Every commit is a gift: celebrating Maintainer Week with Brett Cannon
byThe Changelog: Software Development, Open Source
0 ratings
0% found this document useful
Analyze Massive Data At Interactive Speeds With The Power Of Bitmaps Using FeatureBase: An interview with Matt Jaffee about FeatureBase, an open source bitmap database that allows you to query and analyze massive data sets at interactive speeds and the work they have done to simplify integration with the rest of your data platform.
Podcast episode
Analyze Massive Data At Interactive Speeds With The Power Of Bitmaps Using FeatureBase: An interview with Matt Jaffee about FeatureBase, an open source bitmap database that allows you to query and analyze massive data sets at interactive speeds and the work they have done to simplify integration with the rest of your data platform.
byData Engineering Podcast
0 ratings
0% found this document useful
TypeScript Fundamentals: In this episode of Syntax, Scott and Wes talk about TypeScript fundamentals — what it is, how you use it, why people love it so much, and more! Sanity - Sponsor is a real-time headless CMS with a fully customizable Content Studio built in...
Podcast episode
TypeScript Fundamentals: In this episode of Syntax, Scott and Wes talk about TypeScript fundamentals — what it is, how you use it, why people love it so much, and more! Sanity - Sponsor is a real-time headless CMS with a fully customizable Content Studio built in...
bySyntax - Tasty Web Development Treats
0 ratings
0% found this document useful
How ChatGPT Changes Tech + The End of Remote Work? — With Aaron Levie
Podcast episode
How ChatGPT Changes Tech + The End of Remote Work? — With Aaron Levie
byBig Technology Podcast
100%
100% found this document useful
Run your home on a Raspberry Pi: with Mike Riley, author of Portable Python Projects
Podcast episode
Run your home on a Raspberry Pi: with Mike Riley, author of Portable Python Projects
byThe Changelog: Software Development, Open Source
0 ratings
0% found this document useful
Blazor brings .NET to Web Assembly with Steve Sanderson: The Blazor project aims to bring .NET to the open Web using Web Assembly. Scott talks to Steve Sanderson about this experiment and it's future plans. How are they compiling C# and .NET to Web Assembly in a way that works everywhere? How does Mono and .NET Standard fit in?
Podcast episode
Blazor brings .NET to Web Assembly with Steve Sanderson: The Blazor project aims to bring .NET to the open Web using Web Assembly. Scott talks to Steve Sanderson about this experiment and it's future plans. How are they compiling C# and .NET to Web Assembly in a way that works everywhere? How does Mono and .NET Standard fit in?
byHanselminutes with Scott Hanselman
0 ratings
0% found this document useful
#103: Learning and Teaching Early Math: An Interview With Dr. Doug Clements: In this episode we speak with author, early years math curriculum designer Doug Clements. Doug is a professor at the university of Denver. He’s published over 130 research studies, authored 23 books on early years mathematics. He’s served on the...
Podcast episode
#103: Learning and Teaching Early Math: An Interview With Dr. Doug Clements: In this episode we speak with author, early years math curriculum designer Doug Clements. Doug is a professor at the university of Denver. He’s published over 130 research studies, authored 23 books on early years mathematics. He’s served on the...
byMaking Math Moments That Matter
100%
100% found this document useful
#11 Data Science at BuzzFeed and the Digital Media Landscape: How does data science help Buzzfeed achieve online virality? What type of mass online experiments do data scientists at BuzzFeed run for this purpose? What products do they develop to make all of this easy and intuitive for content producers? Find out ...
Podcast episode
#11 Data Science at BuzzFeed and the Digital Media Landscape: How does data science help Buzzfeed achieve online virality? What type of mass online experiments do data scientists at BuzzFeed run for this purpose? What products do they develop to make all of this easy and intuitive for content producers? Find out ...
byDataFramed
0 ratings
0% found this document useful
Massively Parallel Data Processing In Python Without The Effort Using Bodo: An interview about how Bodo converts standard Python code to native MPI automatically for massive speed ups in data processing workloads
Podcast episode
Massively Parallel Data Processing In Python Without The Effort Using Bodo: An interview about how Bodo converts standard Python code to native MPI automatically for massive speed ups in data processing workloads
byData Engineering Podcast
0 ratings
0% found this document useful
[DataFramed Careers Series #2] What Makes a Great Data Science Portfolio
Podcast episode
[DataFramed Careers Series #2] What Makes a Great Data Science Portfolio
byDataFramed
0 ratings
0% found this document useful
Renee M. P. Teate, "SQL for Data Scientists: A Beginner's Guide for Building Datasets for Analysis" (John Wiley & Sons, 2021): An interview with Renee M. P. Teate
Podcast episode
Renee M. P. Teate, "SQL for Data Scientists: A Beginner's Guide for Building Datasets for Analysis" (John Wiley & Sons, 2021): An interview with Renee M. P. Teate
byNew Books in Science, Technology, and Society
0 ratings
0% found this document useful
Past, Present and Future of C++ with Bjarne Stroustrup: Rob and Jason are joined by Bjarne Stroustrup, designer and original implementer of C++ to discuss the current state of C++, his vision for the future as well as some discussion of the past. Bjarne Stroustrup is the designer and original implementer...
Podcast episode
Past, Present and Future of C++ with Bjarne Stroustrup: Rob and Jason are joined by Bjarne Stroustrup, designer and original implementer of C++ to discuss the current state of C++, his vision for the future as well as some discussion of the past. Bjarne Stroustrup is the designer and original implementer...
byCppCast
0 ratings
0% found this document useful
639: Software Club: Greg Pierce and Drafts: In the inaugural meeting of the Software Club, David and Stephen talk about Drafts and their use of the application. Then they are joined by Drafts developer Greg Pierce to talk about the app's community of users and more.
Podcast episode
639: Software Club: Greg Pierce and Drafts: In the inaugural meeting of the Software Club, David and Stephen talk about Drafts and their use of the application. Then they are joined by Drafts developer Greg Pierce to talk about the app's community of users and more.
byMac Power Users
0 ratings
0% found this document useful
Getting Started With Reverse Engineering Hardware - PSW #802: In our first segment: the PSW hosts drop valuable insight on how to start your own journey into reverse engineering hardware! Resources we mentioned: The Hardware Hackers Handbook is a great start Do a badge challenge: Take some classes Do some...
Podcast episode
Getting Started With Reverse Engineering Hardware - PSW #802: In our first segment: the PSW hosts drop valuable insight on how to start your own journey into reverse engineering hardware! Resources we mentioned: The Hardware Hackers Handbook is a great start Do a badge challenge: Take some classes Do some...
bySecurity Weekly Podcast Network (Audio)
0 ratings
0% found this document useful
Thomas Haigh and Paul E. Ceruzzi, "A New History of Modern Computing" (MIT Press, 2021): An interview with Thomas Haigh
Podcast episode
Thomas Haigh and Paul E. Ceruzzi, "A New History of Modern Computing" (MIT Press, 2021): An interview with Thomas Haigh
byNew Books in Mathematics
0 ratings
0% found this document useful
How Google Tracks Hackers: Tracking hacking groups has become a booming business. Dozens of so-called “threat intelligence” companies keep tabs on them and sell subscriptions to feeds where they provide customers with up to date information on what the most advanced cyber crimin...
Podcast episode
How Google Tracks Hackers: Tracking hacking groups has become a booming business. Dozens of so-called “threat intelligence” companies keep tabs on them and sell subscriptions to feeds where they provide customers with up to date information on what the most advanced cyber crimin...
byCYBER
0 ratings
0% found this document useful
#122 How Organizations Can Bridge the Data Literacy Gap
Podcast episode
#122 How Organizations Can Bridge the Data Literacy Gap
byDataFramed
0 ratings
0% found this document useful
Accelerated data science with a Kaggle grandmaster: featuring Christof Henkel
Podcast episode
Accelerated data science with a Kaggle grandmaster: featuring Christof Henkel
byPractical AI: Machine Learning, Data Science
0 ratings
0% found this document useful
Revisit The Fundamental Principles Of Working With Data To Avoid Getting Caught In The Hype Cycle: The data ecosystem has seen a constant flurry of activity for the past several years, and it shows no signs of slowing down. With all of the products, techniques, and buzzwords being discussed it can be easy to be overcome by the hype. In this episode Juan Sequeda and Tim Gasper from data.world share their views on the core principles that you can use to ground your work and avoid getting caught in the hype cycles.
Podcast episode
Revisit The Fundamental Principles Of Working With Data To Avoid Getting Caught In The Hype Cycle: The data ecosystem has seen a constant flurry of activity for the past several years, and it shows no signs of slowing down. With all of the products, techniques, and buzzwords being discussed it can be easy to be overcome by the hype. In this episode Juan Sequeda and Tim Gasper from data.world share their views on the core principles that you can use to ground your work and avoid getting caught in the hype cycles.
byData Engineering Podcast
0 ratings
0% found this document useful
This vs That × map vs reduce, forEach vs for in, and more!: In this Hasty Treat, Scott and Wes do a little this vs that with map vs reduce, forEach vs for in, .hasOwnProperty() vs in vs .hasOwn(), CSS absolute + left/right/top/bottom vs transform, and more. Prismic - Sponsor Prismic is a Headless CMS that...
Podcast episode
This vs That × map vs reduce, forEach vs for in, and more!: In this Hasty Treat, Scott and Wes do a little this vs that with map vs reduce, forEach vs for in, .hasOwnProperty() vs in vs .hasOwn(), CSS absolute + left/right/top/bottom vs transform, and more. Prismic - Sponsor Prismic is a Headless CMS that...
bySyntax - Tasty Web Development Treats
0 ratings
0% found this document useful
MLA 005 Shapes & Sizes: Dimensions, size, and shape of Numpy ndarrays / TensorFlow tensors, and methods for transforming those.
Podcast episode
MLA 005 Shapes & Sizes: Dimensions, size, and shape of Numpy ndarrays / TensorFlow tensors, and methods for transforming those.
byMachine Learning Guide
0 ratings
0% found this document useful
Intelligent Automation in Finance with Workday: Learn about intelligent automation in finance and how AI and ML can improve modern financial processes. The discussion was recorded live at Workday Rising in San Francisco, between analyst Michael Krigsman and John Hugo, VP of Financial Products &...
Podcast episode
Intelligent Automation in Finance with Workday: Learn about intelligent automation in finance and how AI and ML can improve modern financial processes. The discussion was recorded live at Workday Rising in San Francisco, between analyst Michael Krigsman and John Hugo, VP of Financial Products &...
byCXOTalk
0 ratings
0% found this document useful
#20 Kaggle and the Future of Data Science
Podcast episode
#20 Kaggle and the Future of Data Science
byDataFramed
0 ratings
0% found this document useful
167 | Visualization and Statistics with Andrew Gelman and Jessica Hullman
Podcast episode
167 | Visualization and Statistics with Andrew Gelman and Jessica Hullman
byData Stories
0 ratings
0% found this document useful
433: Falling for FastAPI: Mike's falling in love with FastAPI and gives us a hint at the next project he's building.
Podcast episode
433: Falling for FastAPI: Mike's falling in love with FastAPI and gives us a hint at the next project he's building.
byCoder Radio
0 ratings
0% found this document useful

Skip carousel

Micro-controllers
Linux Format
Article
Micro-controllers
Jan 12, 2021
If you want to use JavaScript and interface directly with Arduino and other system-on-chip boards, consider these simpler systems. They’re designed to interface directly to a large amount of boards, including the Arduino, Raspberry Pi and other small
1 min read
What Is The Future Of Game Streaming Now That Stadia Is Dead?
APC
Article
What Is The Future Of Game Streaming Now That Stadia Is Dead?
Oct 31, 2022
Once hyped as being ‘the future of gaming’, the Google Stadia game streaming service was officially, just three years after launch and before even making it to Australian shores. When game streaming first launched we did have some apprehension about
2 min read
HOW TO… Revive an old PC with ChromeOS Flex
Computeractive
Article
HOW TO… Revive an old PC with ChromeOS Flex
Aug 17, 2022
7 min read
Good-bye, Windows 8…
APC
Article
Good-bye, Windows 8…
Sep 5, 2022
4 min read
Design 3D Parts With Computer Aided Design
Linux Format
Article
Design 3D Parts With Computer Aided Design
May 31, 2022
10 min read
Rokoko Studio 2.0
3D World
Article
Rokoko Studio 2.0
Feb 23, 2021
1 min read
Rise Of The Robots
Linux Format
Article
Rise Of The Robots
Jan 12, 2021
7 min read
User Interface And Experience
Linux Format
Article
User Interface And Experience
Sep 21, 2021
3 min read
The Nerds Who Conquered Silicon Valley
MoneyWeek
Article
The Nerds Who Conquered Silicon Valley
Apr 2, 2021
2 min read
VisionFive V1 RISC-V SBC on sale
Linux Format
Article
VisionFive V1 RISC-V SBC on sale
May 3, 2022
1 min read
Meet Your New Mac
MacFormat
Article
Meet Your New Mac
Aug 22, 2023
22 min read
Create A Custom Windows Installer
Maximum PC
Article
Create A Custom Windows Installer
May 28, 2019
8 min read
Graph Theory In A Nutshell
Linux Format
Article
Graph Theory In A Nutshell
May 5, 2020
A graph G(V,E) is a finite, non-empty set of vertices V (or nodes) and a set of edges E. If the edges E that have the form of (a,b) are an ordered pairs of vertices, then we have a directed graph. In contrast, if the edges E are unordered pairs of ve
1 min read
Extract And Organise Files More Quickly
Computeractive
Article
Extract And Organise Files More Quickly
Feb 28, 2024
3 min read
‘Morpheus’ Chip Turns Hacking Into An Impossible Rubik’s Cube
Futurity
Article
‘Morpheus’ Chip Turns Hacking Into An Impossible Rubik’s Cube
May 6, 2019
2 min read
Up Your Game
Business Today
Article
Up Your Game
Jun 24, 2019
4 min read
MapReduce: The ‘Big Data’ Idea Inside Your Android Phone
APC
Article
MapReduce: The ‘Big Data’ Idea Inside Your Android Phone
Dec 2, 2019
4 min read
If We Can’t Solve Global Conflicts, We Can Control Local Excesses
PC Pro Magazine
Article
If We Can’t Solve Global Conflicts, We Can Control Local Excesses
Apr 10, 2022
2 min read
PC Builder’s Manual
TechLife
Article
PC Builder’s Manual
Jun 1, 2020
18 min read
Hands-on: The ROG Ally Is A Steam Deck Competitor With More Power And Options
PCWorld
Article
Hands-on: The ROG Ally Is A Steam Deck Competitor With More Power And Options
Jun 6, 2023
7 min read
67 AMAZING iPHONE SECRETS
MacLife
Article
67 AMAZING iPHONE SECRETS
Feb 27, 2024
1 min read
Add a Server to Your Home
MacLife
Article
Add a Server to Your Home
Mar 30, 2017
6 min read
Geforce Rtx 40-series Shapes Up
Maximum PC
Article
Geforce Rtx 40-series Shapes Up
Jun 21, 2022
1 min read
Set Up Your First Database
Linux Format
Article
Set Up Your First Database
Aug 25, 2020
1 min read
Stream Live Data To Custom Web Pages
APC
Article
Stream Live Data To Custom Web Pages
Oct 31, 2022
5 min read
The Ultimate Raw Quiz
Digital Camera World
Article
The Ultimate Raw Quiz
Aug 10, 2017
16 min read
Teenage Engineering Rig
Maximum PC
Article
Teenage Engineering Rig
Aug 16, 2022
LENGTH OF TIME: 2-3 HOURS LEVEL OF DIFFICULTY: MEDIUM LOOKS AREN’T EVERYTHING, but they do play a big part in this build. We can’t really start without first mentioning how cool this mini DIY PC setup is. A huge part of building computers is the cust
11 min read
Access Your Mac Anywhere
MacLife
Article
Access Your Mac Anywhere
Nov 8, 2022
2 min read
Exclusive Downloads
APC
Article
Exclusive Downloads
Feb 21, 2022
1 min read
Run An Apple II On Your Linux System
Linux Format
Article
Run An Apple II On Your Linux System
Apr 6, 2021
7 min read

Related categories

Skip carousel

Reviews for Python for Probability, Statistics, and Machine Learning

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Python for Probability, Statistics, and Machine Learning - José Unpingco

J. UnpingcoPython for Probability, Statistics, and Machine Learninghttps://doi.org/10.1007/978-3-319-30717-6_1

1. Getting Started with Scientific Python

José Unpingco¹

(1)

West Health Institute, San Diego, California, USA

José Unpingco

Email: unpingco@gmail.com

Keywords

ScipyPandasMatplotlibJupyterIPythonScikit-LearnSeabornStatsmodels

Python went mainstream years ago. It is now part of many undergraduate curricula in engineering and computer science. Great books and interactive on-line tutorials are easy to find. In particular, Python is well-established in web programming with frameworks such as Django and CherryPy, and is the back-end platform for many high-traffic sites.

Beyond web programming, there is an ever-expanding list of third-party extensions that reach across many scientific disciplines, from linear algebra to visualization to machine learning. For these applications, Python is the software glue that permits easy exchange of methods and data across core routines typically written in Fortran or C. Scientific Python has been fundamental for almost two decades in government, academia, and industry. For example, NASA’s Jet Propulsion Laboratory uses it for interfacing Fortran/C++ libraries for planning and visualization of spacecraft trajectories. The Lawrence Livermore National Laboratory uses scientific Python for a wide variety of computing tasks, some involving routine text processing, and others involving advanced visualization of vast data sets (e.g. VISIT [1]). Shell Research, Boeing, Industrial Light and Magic, Sony Entertainment, and Procter & Gamble use scientific Python on a daily basis for data processing and analysis. Python is thus well-established and continues to extend into many different fields.

Python is a language geared towards scientists and engineers who may not have formal software development training. It is used to prototype, design, simulate, and test without getting in the way because Python provides an inherently easy and incremental development cycle, interoperability with existing codes, access to a large base of reliable open source codes, and a hierarchical compartmentalized design philosophy. It is known that productivity is strongly influenced by the workflow of the user, (e.g., time spent running versus time spent programming) [2]. Therefore, Python can dramatically enhance user-productivity.

Python is an interpreted language. This means that Python codes run on a Python virtual machine that provides a layer of abstraction between the code and the platform it runs on, thus making codes portable across different platforms. For example, the same script that runs on a Windows laptop can also run on a Linux-based supercomputer or on a mobile phone. This makes programming easier because the virtual machine handles the low-level details of implementing the business logic of the script on the underlying platform.

Python is a dynamically typed language, which means that the interpreter itself figures out the representative types (e.g., floats, integers) interactively or at run-time. This is in contrast to a language like Fortran that have compilers that study the code from beginning to end, perform many compiler-level optimizations, link intimately with the existing libraries on a specific platform, and then create an executable that is henceforth liberated from the compiler. As you may guess, the compiler’s access to the details of the underlying platform means that it can utilize optimizations that exploit chip-specific features and cache memory. Because the virtual machine abstracts away these details, it means that the Python language does not have programmable access to these kinds of optimizations. So, where is the balance between the ease of programming the virtual machine and these key numerical optimizations that are crucial for scientific work?

The balance comes from Python’s native ability to bind to compiled Fortran and C libraries. This means that you can send intensive computations to compiled libraries directly from the interpreter. This approach has two primary advantages. First, it give you the fun of programming in Python, with its expressive syntax and lack of visual clutter. This is a particular boon to scientists who typically want to use software as a tool as opposed to developing software as a product. The second advantage is that you can mix-and-match different compiled libraries from diverse research areas that were not otherwise designed to work together. This works because Python makes it easy to allocate and fill memory in the interpreter, pass it as input to compiled libraries, and then retrieve the output back at the interpreter.

Moreover, Python provides a multiplatform solution for scientific codes. As an open-source project, Python itself is available anywhere you can build it, even though it typically comes standard nowadays, as part of many operating systems. This means that once you have written your code in Python, you can just transfer the script to another platform and run it, as long as the compiled libraries are also available there. What if the compiled libraries are absent? Building and configuring compiled libraries across multiple systems used to be a painstaking job, but as scientific Python has matured, a wide range of libraries have now become available across all of the major platforms (i.e., Windows, MacOS, Linux, Unix) as prepackaged distributions.

Finally, scientific Python facilitates maintainability of scientific codes because Python syntax is clean, free of semi-colon litter and other visual distractions that makes code hard to read and easy to obfuscate. Python has many built-in testing, documentation, and development tools that ease maintenance. Scientific codes are usually written by scientists unschooled in software development, so having solid software development tools built into the language itself is a particular boon.

1.1 Installation and Setup

The easiest way to get started is to download the freely available Anaconda distribution provided by Continuum Analytics (continuum.io), which is available for all of the major platforms. On Linux, even though most of the toolchain is available via the built-in Linux package manager, it is still better to install the Anaconda distribution because it provides its own powerful package manager (i.e., conda) that can keep track of changes in the software dependencies of the packages that it supports. Note that if you do not have administrator privileges, there is also a corresponding miniconda distribution that does not require these privileges.

Regardless of your platform, we recommend Python version 2.7. Python 2.7 is the last of the Python 2.x series and guarantees backwards compatibility with legacy codes. Python 3.x makes no such guarantees. Although all of the key components of scientific Python are available in version 3.x, the safest bet is to stick with version 2.7. Alternatively, one compromise is to write in a hybrid dialect of Python that is the intersection of elements of versions 2.7 and 3.x. The six module enables this transition by providing utility functions for 2.5 and newer codes. There is also a Python 2.7 to 3.x converter available as the 2to3 module but it may be hard to debug or maintain the so-converted code; nonetheless, this might be a good option for small, self-contained libraries that do not require further development or maintenance.

You may have encountered other Python variants on the web, such as IronPython (Python implemented in C#) and Jython (Python implemented in Java). In this text, we focus on the C-implementation of Python (i.e., known as CPython), which is, by far, the most popular implementation. These other Python variants permit specialized, native interaction with libraries in C# or Java (respectively), which is still possible (but clunky) using the CPython. Even more Python variants exist that implement the low-level machinery of Python differently for various reasons, beyond interacting with native libraries in other languages. Most notable of these is Pypy that implements a just-in-time compiler (JIT) and other powerful optimizations that can substantially speed up pure Python codes. The downside of Pypy is that its coverage of some popular scientific modules (e.g., Matplotlib, Scipy) is limited or non-existent which means that you cannot use those modules in code meant for Pypy.

You may later want to use a Python module that is not maintained by Anaconda’s conda manager. Because Anaconda comes with the pip package manager, which is the main one used outside of scientific Python, you can simply do

../images/331882_1_En_1_Chapter/331882_1_En_1_Figa_HTML.png

and pip will run out to the web and download the package you want and its dependencies and install them in the existing Anaconda directory tree. This works beautifully in the case where the package in question is pure-Python, without any system-specific dependencies. Otherwise, this can be a real nightmare, especially on Windows, which lacks freely available Fortran compilers. If the module in question is a C-library, one way to cope is to install the freely available Visual Studio Community Edition, which usually has enough to compile many C-codes. This platform dependency is the problem that conda was designed to solve by making the binary dependencies of the various platforms available instead of attempting to compile them. On a Windows system, if you installed Anaconda and registered it as the default Python installation (it asks during the install process), then you can use the high-quality Python wheel files on Christoph Gohlke’s laboratory site at the University of California, Irvine where he kindly makes a long list of scientific modules available.¹ Failing this, you can try the binstar.org site, which is a community-powered repository of modules that conda is capable of installing, but which are not formally supported by Anaconda. Note that binstar allows you to share scientific Python configurations with your remote colleagues using authentication so that you can be sure that you are downloading and running code from users you trust.

Again, if you are on Windows, and none of the above works, then you may want to consider installing a full virtual machine solution, as provided by VMWare’s Player or Oracle’s VirtualBox (both freely available under liberal terms). Using either of these, you can set up a Linux machine running on top of Windows, which should cure these problems entirely! The great part of this approach is that you can share directories between the virtual machine and the Windows system so that you don’t have to maintain duplicate data files. Anaconda Linux images are also available on the cloud by IAAS providers like Amazon Web Services and Microsoft Azure. Note that for the vast majority of users, especially newcomers to Python, the Anaconda distribution should be more than enough on any platform. It is just worth highlighting the Windows-specific issues and associated workarounds early on. Note that there are other well-maintained scientific Python Windows installers like WinPython and PythonXY. These provide the spyder integrated development environment, which is very Matlab-like environment for transitioning Matlab users.

1.2 Numpy

As we touched upon earlier, to use a compiled scientific library, the memory allocated in the Python interpreter must somehow reach this library as input. Furthermore, the output from these libraries must likewise return to the Python interpreter. This two-way exchange of memory is essentially the core function of the Numpy (numerical arrays in Python) module. Numpy is the de-facto standard for numerical arrays in Python. It arose as an effort by Travis Oliphant and others to unify the numerical arrays in Python. In this section, we provide an overview and some tips for using Numpy effectively, but for much more detail, Travis’ book [3] is a great place to start and is available for free online.

Numpy provides specification of byte-sized arrays in Python. For example, below we create an array of three numbers, each of four-bytes long (32 bits at 8 bits per byte) as shown by the itemsize property. The first line imports Numpy as np, which is the recommended convention. The next line creates an array of 32 bit floating point numbers. The itemize property shows the number of bytes per item.

../images/331882_1_En_1_Chapter/331882_1_En_1_Figb_HTML.png

In addition to providing uniform containers for numbers, Numpy provides a comprehensive set of unary functions (i.e., ufuncs) that process arrays element-wise without additional looping semantics. Below, we show how to compute the element-wise sine using Numpy,

../images/331882_1_En_1_Chapter/331882_1_En_1_Figc_HTML.png

This computes the sine of the input array [1,2,3], using Numpy’s unary function, np.sin. There is another sine function in the built-in math module, but the Numpy version is faster because it does not require explicit looping (i.e., using a for loop) over each of the elements in the array. That looping happens in the compiled np.sin function itself. Otherwise, we would have to do looping explicitly as in the following:

../images/331882_1_En_1_Chapter/331882_1_En_1_Figd_HTML.png

Numpy uses common-sense casting rules to resolve the output types. For example, if the inputs had been an integer-type, the output would still have been a floating point type. In this example, we provided a Numpy array as input to the sine function. We could have also used a plain Python list instead and Numpy would have built the intermediate Numpy array (e.g., np.sin([1,1,1])). The Numpy documentation provides a comprehensive (and very long) list of available ufuncs.

Numpy arrays come in many dimensions. For example, the following shows a two-dimensional 2 $$\times $$ 3 array constructed from two conforming Python lists.

../images/331882_1_En_1_Chapter/331882_1_En_1_Fige_HTML.png

Note that Numpy is limited to 32 dimensions unless you build it for more.² Numpy arrays follow the usual Python slicing rules in multiple dimensions as shown below where the : colon character selects all elements along a particular axis.

../images/331882_1_En_1_Chapter/331882_1_En_1_Figf_HTML.png

You can also select sub-sections of arrays by using slicing as shown below.

../images/331882_1_En_1_Chapter/331882_1_En_1_Figg_HTML.png

1.2.1 Numpy Arrays and Memory

Some interpreted languages implicitly allocate memory. For example, in Matlab, you can extend a matrix by simply tacking on another dimension as in the following Matlab session:

../images/331882_1_En_1_Chapter/331882_1_En_1_Figh_HTML.png

This works because Matlab arrays use pass-by-value semantics so that slice operations actually copy parts of the array as needed. By contrast, Numpy uses pass-by-reference semantics so that slice operations are views into the array without implicit copying. This is particularly helpful with large arrays that already strain available memory. In Numpy terminology, slicing creates views (no copying) and advanced indexing creates copies. Let’s start with advanced indexing.

If the indexing object (i.e., the item between the brackets) is a non-tuple sequence object, another Numpy array (of type integer or boolean), or a tuple with at least one sequence object or Numpy array, then indexing creates copies. For the above example, to accomplish the same array extension in Numpy, you have to do something like the following

../images/331882_1_En_1_Chapter/331882_1_En_1_Figi_HTML.png

Because of advanced indexing, the variable y has its own memory because the relevant parts of x were copied. To prove it, we assign a new element to x and see that y is not updated.

../images/331882_1_En_1_Chapter/331882_1_En_1_Figj_HTML.png

However, if we start over and construct y by slicing (which makes it a view) as shown below, then the change we made does affect y because a view is just a window into the same memory.

../images/331882_1_En_1_Chapter/331882_1_En_1_Figk_HTML.png

Note that if you want to explicitly force a copy without any indexing tricks, you can do y=x.copy(). The code below works through another example of advanced indexing versus slicing.

../images/331882_1_En_1_Chapter/331882_1_En_1_Figl_HTML.png

In this example, y is a copy, not a view, because it was created using advanced indexing whereas z was created using slicing. Thus, even though y and z have the same entries, only z is affected by changes to x. Note that the flags.ownsdata property of Numpy arrays can help sort this out until you get used to it.

Manipulating memory using views is particularly powerful for signal and image processing algorithms that require overlapping fragments of memory. The following is an example of how to use advanced Numpy to create overlapping blocks that do not actually consume additional memory,

../images/331882_1_En_1_Chapter/331882_1_En_1_Figm_HTML.png

The above code creates a range of integers and then overlaps the entries to create a 7 $$\times $$ 4 Numpy array. The final argument in the as_strided function are the strides, which are the steps in bytes to move in the row and column dimensions, respectively. Thus, the resulting array steps four bytes in the column dimension and eight bytes in the row dimension. Because the integer elements in the Numpy array are four bytes, this is equivalent to moving by one element in the column dimension and by two elements in the row dimension. The second row in the Numpy array starts at eight bytes (two elements) from the first entry (i.e., 2) and then proceeds by four bytes (by one element) in the column dimension (i.e., 2,3,4,5). The important part is that memory is re-used in the resulting 7 $$\times $$ 4 Numpy array. The code below demonstrates this by reassigning elements in the original x array. The changes show up in the y array because they point at the same allocated memory.

../images/331882_1_En_1_Chapter/331882_1_En_1_Fign_HTML.png

Bear in mind that as_strided does not check that you stay within memory block bounds. So, if the size of the target matrix is not filled by the available data, the remaining elements will come from whatever bytes are at that memory location. In other words, there is no default filling by zeros or other strategy that defends memory block bounds. One defense is to explicitly control the dimensions as in the following code,

../images/331882_1_En_1_Chapter/331882_1_En_1_Figo_HTML.png

1.2.2 Numpy Matrices

Matrices in Numpy are similar to Numpy arrays but they can only have two dimensions. They implement row-column matrix multiplication as opposed to element-wise multiplication. If you have two matrices you want to multiply, you can either create them directly or convert them from Numpy arrays. For example, the following shows how to create two matrices and multiply them.

../images/331882_1_En_1_Chapter/331882_1_En_1_Figp_HTML.png

This can also be done using arrays as shown below,

../images/331882_1_En_1_Chapter/331882_1_En_1_Figq_HTML.png

Numpy arrays support elementwise multiplication, not row-column multiplication. You must use Numpy matrices for this kind of multiplication unless use the inner product np.dot, which also works in multiple dimensions (see np.tensordot for more general dot products).

It is unnecessary to cast everything to matrices for multiplication. In the next example, everything until last line is a Numpy array and thereafter we cast the array as a matrix with np.matrix which then uses row-column multiplication. Note that it is unnecessary to cast the x variable as a matrix because the left-to-right order of the evaluation takes care of that automatically. If we need to use A as a matrix elsewhere in the code then we should bind it to another variable instead of re-casting it every time. If you find yourself casting back and forth for large arrays, passing the copy=False flag to matrix avoids the expense of making a copy.

../images/331882_1_En_1_Chapter/331882_1_En_1_Figr_HTML.png../images/331882_1_En_1_Chapter/331882_1_En_1_Figs_HTML.png

1.2.3 Numpy Broadcasting

Numpy broadcasting is a powerful way to make implicit multidimensional grids for expressions. It is probably the single most powerful feature of Numpy and the most difficult to grasp. Proceeding by example, consider the vertices of a two-dimensional unit square as shown below,

../images/331882_1_En_1_Chapter/331882_1_En_1_Figt_HTML.png

Numpy’s meshgrid creates two-dimensional grids. The X and Y arrays have corresponding entries match the coordinates of the vertices of the unit square (e.g., (0, 0), (0, 1), (1, 0), (1, 1)). To add the x and y-coordinates, we could use X and Y as in X+Y shown below, The output is the sum of the vertex coordinates of the unit square.

../images/331882_1_En_1_Chapter/331882_1_En_1_Figu_HTML.png

Because the two arrays have compatible shapes, they can be added together element-wise. It turns out we can skip a step here and not bother with meshgrid to implicitly obtain the vertex coordinates by using broadcasting as shown below

../images/331882_1_En_1_Chapter/331882_1_En_1_Figv_HTML.png

On line 7 the None Python singleton tells Numpy to make copies of y along this dimension to create a conformable calculation. Note that np.newaxis can be used instead of None to be more explicit. The following lines show that we obtain the same output as when we used the X+Y Numpy arrays. Note that without broadcasting x+y=array([0, 2]) which is not what we are trying to compute. Let’s continue with a more complicated example where we have differing array shapes.

../images/331882_1_En_1_Chapter/331882_1_En_1_Figw_HTML.png

In this example, the array shapes are different, so the addition of x and y is not possible without Numpy broadcasting. The last line shows that broadcasting generates the same output as using the compatible array generated by meshgrid. This shows that broadcasting works with different array shapes. For the sake of comparison, on line 3, meshgrid creates two conformable arrays, X and Y. On the last line, x+y[:,None] produces the same output as X+Y without the meshgrid. We can also put the None dimension on the x array as x[:,None]+y which would give the transpose of the result.

Broadcasting works in multiple dimensions also. The output shown has shape (4,3,2). On the last line, the x+y[:,None] produces a two-dimensional array which is then broadcast against z[:,None,None], which duplicates itself along the two added dimensions to accommodate the two-dimensional result on its left (i.e., x + y[:,None]). The caveat about broadcasting is that it can potentially create large, memory-consuming, intermediate arrays. There are methods for controlling this by re-using previously allocated memory but that is beyond our scope here. Formulas in physics that evaluate functions on the vertices of high dimensional grids are great use-cases for broadcasting.

../images/331882_1_En_1_Chapter/331882_1_En_1_Figx_HTML.png

1.2.4 Numpy Masked Arrays

Numpy provides a powerful method to temporarily hide array elements without changing the shape of the array itself,

../images/331882_1_En_1_Chapter/331882_1_En_1_Figy_HTML.png

Note that the elements in the array for which the logical condition (x< 5) is true are masked, but the size of the array remains the same. This is particularly useful in plotting categorical data, where you may only want those values that correspond to a given category for part of the plot. Another common use is for image processing, wherein parts of the image may need to be excluded from subsequent processing. Note that creating a masked array does not force an implicit copy operation unless copy=True argument is used. For example, changing an element in x does change the corresponding element in y, even though y is a masked array,

../images/331882_1_En_1_Chapter/331882_1_En_1_Figz_HTML.png

1.2.5 Numpy Optimizations and Prospectus

The scientific Python community continues to push the frontier of scientific computing. Several important extensions to Numpy are under active development. First, Numba is a compiler that generates optimized machine code from pure Python code using the LLVM compiler infrastructure. LLVM started as a research project at the University of Illinois to provide a target-independent compilation strategy for arbitrary programming languages and is now a well-established technology. The combination of LLVM and Python via Numba means that accelerating a block of Python code can be as easy as putting a @numba.jit decorator above the function definition, but this doesn’t work for all situations. Numba can target general graphics processing units (GPGPUs) also.

Blaze is considered the next generation of Numpy and generalizes the semantics of Numpy for very large data sets that exist on a variety of backend filesystems. This means that Blaze is designed to handle out-of-core (i.e., too big to fit in a single workstation’s RAM) data manipulations and computations using the familiar operations from Numpy. Further, Blaze offers tight integration with Pandas (see Sect. 1.6) dataframes. Roughly speaking, Blaze understands how to unpack Python expressions and translate them for a variety of distributed backend data services upon which the computing will actually happen (i.e., using blaze.compute). This means that Blaze separates the expression of the computation from the particular implementation on a given backend.

Building on his amazing work on PyTables, Francesc Alted has been working on the bcolz module which is a compressed columnar data container. Also motivated by out-of-core data and computing, bcolz tries to relieve the stress of the memory subsystem by compressing data in memory and then interleaving computations on the compressed data in an intelligent way. This approach takes

Enjoying the preview?

Page 1 of 1

Python for Probability, Statistics, and Machine Learning

About this ebook

José Unpingco

Related authors

Related to Python for Probability, Statistics, and Machine Learning

Related ebooks

Telecommunications For You

Related podcast episodes

Related articles

Related categories

Reviews for Python for Probability, Statistics, and Machine Learning

What did you think?

Book preview

Python for Probability, Statistics, and Machine Learning - José Unpingco

1. Getting Started with Scientific Python

Keywords

1.1 Installation and Setup

1.2 Numpy

1.2.1 Numpy Arrays and Memory

1.2.2 Numpy Matrices

1.2.3 Numpy Broadcasting

1.2.4 Numpy Masked Arrays

1.2.5 Numpy Optimizations and Prospectus