2 views

Uploaded by MyScribd_ielts

How to Learn Python for Data Science

- Book Practical Statistics for Medical Research_Altman (1991)
- Stan Reference 2.9.0
- MaStat Leaflet (1)
- Data-Science-in-Python.pdf
- Agent Intelligence Through Dm
- AIFull
- EScholarship UC Item 2s1851nq
- Comparison Between Public and Private Sector in Pakistan
- english final project completed
- research project reflection
- BSHRM CMO_30_s2006.pdf
- Bayesian Statistics
- Causes of Contractors Failure in 4
- Statistics Ocenanography
- Graphical Models_parameter Learning
- Coke Analysis Phase One
- FULLTEXT01.pdf
- Statistics & SPSS
- 9780521190176
- silabus CRP

You are on page 1of 22

Self-Starter Way

October 23, 2016

28

SHARES

ShareGoogleLinkedinTweet

Do you want to learn Python for data science, but don’t want to take a slow, expensive course?

Most courses are just rehashed versions of the excellent free content out there. Here are

resources for self-starters to acquire this valuable skill at their own pace!

At its heart, data science is about problem solving, exploration, and extracting valuable

information from data. To do so effectively, you'll need to be able to wrangle datasets,

implement statistical models, write programs, and much more.

Therefore, developing sharp programming skills is critical to your success. It's like learning

how to ride a bike in a crowded city. Not only will you reach your destinations faster, but you'll

also have the freedom to visit areas you could never reach on foot.

Plus, your chosen programming tool will become your trusty sidekick in this journey. For most

aspiring data scientists, we strongly recommend starting with Python. Then, you should learn R

after you become fluent with Python.

Python is one of the most widespread languages in the world, and it has a passionate community

of users:

Within the data science community, Python is even more popular. Here's why...

Some people judge the quality of a programming language by the simplicity of its "hello, world!"

program. Python does pretty well by this standard:

Python

Java

2 public static void main(String[] args) {

3 System.out.println("hello, world!");

4 }

5}

Great, case closed! See you back here after you've mastered Python, sound good?

...

Okay, okay... but in all seriousness... simplicity is definitely one of Python's biggest strengths.

Thanks to its precise and efficient syntax, Python can often accomplish the same tasks with much

less code compared to other languages. This makes implementing solutions refreshingly fast.

In addition, Python's vibrant data science community means you'll be able to find plenty of

tutorials, code snippets, and people to commiserate with fixes to common bugs. Stackoverflow

will be one of your best friends.

Finally, Python has all-star lineup of libraries (a.k.a. packages) for numeric and scientific

computing, all of which will make your life much easier. More on this later.

We believe in a hyper-practical, action-centric approach to learning Python for data science as

quickly as possible, but you must be a self-starter to succeed with this strategy.

The reason is that we're going to completely cut out "classroom" study. You'll learn just enough

of the fundamentals to jump into real-world problems, and then gradually build mastery over

time by "just doing shit." (not the formal term)

You'll also have a ton of fun using this method because it's the fastest way gain the essential

programming skills required to start doing data science.

However, you must first build a rock-solid foundation of core programming concepts. This is

the one place where you cannot take any shortcuts because you'll need to know how to translate

solutions in your head into instructions for a computer. Effective programming is not about

memorizing syntax, but rather mastering a new way of thinking.

We recommend learning Python for data science through the following 3 reliable steps:

1

2

3

Equip the tools needed for data science.

After completing these 3 steps, you'll be ready to dive into projects and analyses while

continuing to learn as you go.

There are many ways to install Python on your computer, but we recommend installing it

through the Anaconda bundle, which includes many of the libraries you'll need for data science.

Here's a quick tutorial on installing Python using Anaconda.

Python 2.7 or 3.0+? Use Python 2.7, plain and simple. Python 2.7 is more widely used in almost

every field. It supports more packages, especially those required for machine learning.

The amount of time you spend at this step depends on how much previous programming

experience you have and whether you can work on this full-time or part-time, but it typically

ranges from 1 week to 6 weeks.

If you are completely new to programming, be prepared to spend at least 1 month on this step.

You'll want the time to absorb these rich concepts. They form the base needed to learn Python

for data science quickly.

Among all the courses, tutorials, and guides out there, we've found the following two resources

to be the best for self-starters. They are both self-paced, hands-on, and comprehensive (and free).

How to Think Like a Computer Scientist is a fantastic interactive online book that

takes a whirlwind tour through key programming concepts (with Python). If you're new to

programming, we suggest starting here, as it's like a condensed "Computer Science 101" course.

Learn Python the Hard Way is an excellent online book for people with some previous exposure

to programming concepts. The "hard way" simply refers to learning through instructive

exercises. Through 52 short exercises, you'll start with setting up Python and incrementally work

your way up to writing multi-file programs.

If you want to learn Python for data science well, then don't skip this step.

After you grasp the core programming concepts, spend a week or two solidifying them

by completing drills and challenges.

If you try to jump into a real project right away, you'll be overwhelmed by the number of moving

parts. It's easy for our brains to trick us into believing we know something after reading about it

in a book, but it takes concentrated practice to really learn the skills.

Think about it this way. Professional basketball players cannot just play games all the time if

they want to improve. They must also spend hours every day practicing specific shots from

different parts of the court.

When you take your newfound programming skills and hone them through short, targeted drills

and challenges, you'll improve much faster than jumping into projects immediately.

Here's what we recommend:

Code Fights is a platform with many short coding challenges that can be completed in 5-minute

chunks (although it's so fun that you might find yourself playing through it for hours at at time).

You'll gain points along the way and unlock new levels, making it a nice way to track your

progression as well.

Solve a mystery...

The Python Challenge is one of the coolest puzzles on the web, so don't be put off by its 1990's

graphics. You can complete all 33 levels with the help of Python scripts. One user called it "an

addictive way to learn the ins and outs of Python..." We agree!

Consider alternative solutions...

PracticePython.org is a collection of short practice problems in Python. It's updated almost every

week with a new problem. What's really nice is that the author includes multiple user-submitted

solutions for each problem so you can see alternative ways of solving them.

Now you're almost ready to dive into real data science projects!

First, we built a strong foundation of core concepts. Then, we practiced pure Python through

drills and challenges. Now, we're going to focus on the for data science part of "how to learn

Python for data science."

As we mentioned earlier, Python has an all-star lineup of libraries that are essential for data

science. To begin, we recommend acquiring a working knowledge of NumPy, pandas,

SciPy and matplotlib, while using them in the IPython notebook environment. This is the core

stack of tools you'll need for data analysis.

scraping), can be picked up when you need to learn their specific use cases later.

NumPy - NumPy is the grand-daddy of all data science libraries. It allows easy and

efficient numeric computation, and many other machine learning libraries are built on top

of it.

Pandas - Pandas is high-performance library for data structures and exploratory analysis.

Matplotlib - Flexible plotting and visualization library.

IPython - Interactive shell for Python that makes it much easier to explore data and debug

errors. Makes it much more enjoyable to learn Python for data science.

SciPy - Extends NumPy with more functionality, such as calculating integrals, linear

algebra, and statistics.

Training Videos

scientific computing with NumPy.

Introduction to Pandas and Exploratory Data Analysis (Video) - Pandas, IPython, and

matplotlib for exploratory data analysis.

More Resources

How to Learn Statistics for Data Science, The Self-Starter Way

How to Learn Math for Data Science, The Self-Starter Way

Supercharge Your Data Science Career: 88 Free Resources

Self-Starter Way

October 30, 2016

55

SHARES

ShareGoogleLinkedinTweet

Do you need to have a math Ph.D to become a data scientist? Absolutely not! This guide will

show you how to learn math for data science and machine learning without taking slow,

expensive courses.

How much math you'll do on a daily basis as a data scientist varies a lot depending on your role.

Keep reading to find out which concepts you'll need to master to succeed for your goals.

To complete this guide, you'll need at least basic Python* programming skills. We'll be learning

math in an applied, hands-on way.

Check out our guide, How to Learn Python for Data Science, The Self-Starter Way, for the

fastest way to get up to speed with Python. We recommend at least completing up to Step 2 in

that guide.

*note: other languages are fine too, but the examples will be in Python.

The amount of math you'll need depends on the role. First, every data scientist needs to know

some statistics and probability theory. We have a guide for that:

What about other types of math? Well, here's where the answer is more nuanced... it depends on

how much original machine learning research you'll be doing.

In practice, especially in entry-level roles, you'll often be using out-of-the-box ML

implementations. There are robust libraries of common libraries in many programming

languages. You don't need to reinvent the wheel.

Even so, interviewers may still test your basic linear algebra and multivariable calculus. Why

do they do this?

Well, at some point, your team may still need to build custom implementations of ML

algorithms. For example, you may need to adapt one to your tech stack or to expand its base

functionality. To do so, you must be able to peel back ML algorithms and work with their

innards.

Other roles need much more original ML research and development. You may need to translate

algorithms from academic papers into working code. Or, you might research enhancements

based on your business's unique challenges.

In other words, you'll be implementing algorithms from scratch much more often.

For these positions, mastery of both linear algebra and multivariable calculus is a must.

The self-starter way to learning math for data science is to learn by "doing shit." So we're

going to tackle linear algebra and calculus by using them in real algorithms!

Even so, you'll want to learn or review the underlying theory up front. You don't need to read a

whole textbook, but you'll want to learn the key concepts first.

Here are the 3 steps to learning the math required for data science and machine learning:

1

2

3

Gradient Descent from Scratch

Many machine learning concepts are tied to linear algebra. For example, PCA requires

eigenvalues and regression requires matrix multiplication.

Also, most ML applications deal with high dimensional data (data with many variables). This

type of data is best represented by matrices.

Here are a few of the best free resources we've found for learning linear algebra for data science:

Khan Academy has short, practical linear algebra lessons. They cover the most important topics.

MIT OpenCourseWare offers a rigorous linear algebra class. The video lectures and course

materials are all included.

Linear Algebra Review for Machine Learning (Video Series) - These are the optional

linear algebra review videos for Andrew Ng's machine learning course. The entire 6-part

series can be watched in under 1 hour. Recommended if you've taken linear algebra

before and just need a quick review.

The Matrix Cookbook (PDF) - Excellent reference resource for matrix algebra.

Calculus is important for several key ML applications. For example. you'll need to be able to

calculate derivatives and gradients for optimization.

Here are some of the best resources for learning calculus for data science:

Khan Academy has short, practical multivariable calculus lessons. They cover the most

important concepts.

For R&D-heavy roles...

MIT OpenCourseWare offers a rigorous multivariable calculus class. The video lectures and

course materials are all included.

in the format of solving practice problems. Recommended if you've taken multivariable

calculus before and just need a quick review.

Congratulations! You've got the theory out of the way. Now it's time for the really fun part.

One of the best ways to learn math for data science and machine learning is to build a simple

neural network from scratch.

You'll use linear algebra to represent the network and calculus to optimize it. Specifically, you'll

code up gradient descent from scratch.

Don't worry too much about the nuances of neural networks for now. It's ok if you're just

following instructions and writing code. We'll cover machine learning in depth in another guide,

as this is for targeted math practice.

Follow along with the tutorials, and review theory as you go along. Plus, you'll have a cool

project to add to your portfolio afterward.

Neural Network in Python, Part 2 - This is an incredible tutorial that takes you through a

simple neural network from end to end. It's packed with helpful illustrations, and you'll

learn about how gradient descent fits in.

Neural Nets to Recognize Handwritten Digits - We love this resource! This is a free

online book that walks you through a famous application of neural networks. It explains

ideas very intuitively, and it's the most in-depth tutorial in this list.

Implementing a Neural Network from Scratch - A shorter tutorial that also takes you

through step-by-step.

How to Learn Statistics for Data Science, The

Self-Starter Way

October 23, 2016

67

SHARES

ShareGoogleLinkedinTweet

Do you want to learn statistics for data science without taking a slow and expensive course?

Goods news… You can master the core concepts, probability, Bayesian thinking, and even

statistical machine learning using only free online resources. Here are the best resources for self-

starters!

By the way... you don't need a math degree to succeed with this approach. Yet, if you do have a

math background, you'll definitely enjoy this fun, hands-on method too.

This guide will equip you with the tools of statistical thinking needed for data science. It will arm

you with a huge advantage over other aspiring data scientists who try to get by without it.

You see, it can be tempting to jump directly into using machine learning packages once you've

learned how to program... And you know what? It's ok if you want to initially get the ball rolling

with real projects.

But, you should never, ever completely skip learning statistics and probability theory. It's

essential to progressing your career as a data scientist.

Here's why...

To complete this guide, you'll need at least basic Python* programming skills. We'll be learning

statistics in an applied, hands-on way.

Check out our guide, How to Learn Python for Data Science, The Self-Starter Way, for the

fastest way to get up to speed with Python. We recommend at least completing up to Step 2 in

that guide.

*note: other languages are fine too, but the examples will be in Python.

Statistics is a broad field with applications in many industries.

Wikipedia defines it as the study of the collection, analysis, interpretation, presentation, and

organization of data. Therefore, it shouldn't be a surprise that data scientists need to

know statistics.

For example, data analysis requires descriptive statistics and probability theory, at a minimum.

These concepts will help you make better business decisions from data.

Key concepts include probability distributions, statistical significance, hypothesis testing,

and regression.

the process of updating beliefs as additional data is collected, and it's the engine behind many

machine learning models.

Key concepts include conditional probability, priors and posteriors, and maximum

likelihood.

If those terms sound like mumbo jumbo to you, don't worry. This will all make sense once you

roll up your sleeves and start learning.

By now, you've probably noticed that one common theme in "the self-starter way to learning X"

is to skip classroom instruction and learn by "doing shit."

In fact, we're going to tackle key statistical concepts by programming them with code! Trust us...

this will be super fun.

If you do not have formal math training, you'll find this approach much more intuitive

than trying to decipher complicated formulas. It allows you to think through the logical steps of

each calculation.

If you do have a formal math background, this approach will help you translate theory into

practice and give you some fun programming challenges.

Here are the 3 steps to learning the statistics and probability required for data science:

1

2

Bayesian Thinking

3

Intro to Statistical Machine Learning

After completing these 3 steps, you'll be ready to attack more difficult machine learning

problems and common real-world applications of data science.

To know how to learn statistics for data science, it's helpful to start by looking at how it will be

used.

Let's take a look as some examples of real analyses or applications you might need to

implement as a data scientist:

1. Experimental design: Your company is rolling out a new product line, but it sells

through offline retail stores. You need to design an A/B test that controls for differences

across geographies. You also need to estimate how many stores to pilot in for statistically

significant results.

2. Regression modeling: Your company needs to better predict the demand of individual

product lines in its stores. Under-stocking and over-stocking are both expensive. You

consider building a series of regularized regression models.

3. Data transformation: You have multiple machine learning model candidates

you're testing. Several of them assume specific probability distributions of input data, and

you need to be able to identify them and either transform the input data appropriately or

know when underlying assumptions can be relaxed.

A data scientist makes hundreds of decisions every day. They range from small ones like how to

tune a model all the way up big ones like the team's R&D strategy.

Many of these decisions require a strong foundation in statistics and probability theory.

For example, data scientists often need to decide which results are believable and which are

bullshit likely due to randomness. Plus, they need to know if there are pockets of interest that

should be explored further.

These are central skills in analytical decision making (knowing how to calculate p-values is only

scratching the surface).

Here's one of the best resources we've found for learning basic statistics as a self-starter:

Think like a statistician...

Think Stats is an excellent book (with free PDF version) introducing all the key concepts. The

premise of the book? If you know how to program, then you can use that skill to teach yourself

statistics. We've found this approach to be very effective, even for those with formal math

backgrounds.

One of the philosophical debates in statistics is between Bayesians and frequentists. The

Bayesian side is more relevant when learning statistics for data science.

In a nutshell, frequentists use probability only to model sampling processes. This means they

only assign probabilities to describe data they've already collected.

On the other hand, Bayesians use probability to model sampling processes and to quantify

uncertainty before collecting data. If you'd like to learn more about this divide, check out this

Quora post: For a non-expert, what's the difference between Bayesian and frequentist

approaches?

In Bayesian thinking, the level of uncertainty before collecting data is called the prior

probability. It's then updated to a posterior probability after data is collected. This is a central

concept to many machine learning models, so it's important to master.

Again, all of these concepts will make sense once you implement them.

Here's one of the best resources we've found for learning Bayesian thinking as a self-starter:

Think like a Bayesian...

Think Bayes is the follow-up book (with free PDF version) of Think Stats. It's all about Bayesian

thinking, and it uses the same approach of using programming to teach yourself statistics. This

approach is fun and intuitive, and you'll learn each concept's underlying mechanics well since

you'll be implementing them.

If you want to learn statistics for data science, there's no better way than playing with statistical

machine learning models after you've learned core concepts and Bayesian thinking.

The statistics and machine learning fields are closely linked, and "statistical" machine learning is

the main approach to modern machine learning.

In this step, you'll be implementing a few machine learning models from scratch. This will help

you unlock true understanding of their underlying mechanics.

This helps you break open the black box of machine learning while solidifying your

understanding of the applied statistics required for data science.

The following models were chosen because they illustrate several of the key concepts from

earlier.

Linear Regression

Linear Regression from Scratch in Python

Next, we have an embarrassingly simple model that works pretty darn well...

Multi-Armed Bandits

And finally, we have the famous "20 lines of code that beat any A/B test!"

If you're hungry for more, we recommend the following resource. We'll also be coming out with

a detailed guide for learning machine learning the self-starter way, so stay tuned.

Introduction to Statistical Machine Learning is a wonderful textbook (with free PDF version)

that you can use as a reference. The examples are in R, and the book covers a much broader

range of topics, making this a valuable tool as you progress into more work in machine learning.

More Resources

How to Learn Math for Data Science, The Self-Starter Way

6 Fun Machine Learning Projects for Beginners

Supercharge Your Data Science Career: 88 Free Resources

67

SHARES

ShareGoogleLinkedinTweet

1 Comment

Comments

Trackbacks

Recommended Reading

The Beginner’s Guide to Kaggle

How to Handle Imbalanced Classes in Machine Learning

9 Mistakes to Avoid When Starting Your Career in Data Science

WTF is the Bias-Variance Tradeoff? (Infographic)

Free Data Science Resources for Beginners

Dimensionality Reduction Algorithms: Strengths and Weaknesses

- Book Practical Statistics for Medical Research_Altman (1991)Uploaded byWesley Pereira Rogerio
- Stan Reference 2.9.0Uploaded bygrzerysz
- MaStat Leaflet (1)Uploaded byGhani Stallin
- Data-Science-in-Python.pdfUploaded byCebastian Kirby
- Agent Intelligence Through DmUploaded bySharmila Saravanan
- AIFullUploaded byHakan Bilgehan
- EScholarship UC Item 2s1851nqUploaded byBurdea Andra
- Comparison Between Public and Private Sector in PakistanUploaded byJeet Summer
- english final project completedUploaded byapi-273907768
- research project reflectionUploaded byapi-242290521
- BSHRM CMO_30_s2006.pdfUploaded byJonalyn Laroya Reoliquio
- Bayesian StatisticsUploaded byCarlos Alfredo Ramirez Peña
- Causes of Contractors Failure in 4Uploaded byrasputin0780803494
- Statistics OcenanographyUploaded byi333
- Graphical Models_parameter LearningUploaded byMoh Teeti
- Coke Analysis Phase OneUploaded byBrianasScribd
- FULLTEXT01.pdfUploaded byscribd_geotec
- Statistics & SPSSUploaded bymyra
- 9780521190176Uploaded byHabib Mrad
- silabus CRPUploaded bysmsoraya
- 20. Effect of Selected Skipping Rope Exercrise on Explosive Strength of Leg of Volleyball Players.Uploaded byAnonymous CwJeBCAXp
- Data Analysis - Statistics-021216_NH [Compatibility Mode]Uploaded byJonathan Joe
- out (2)Uploaded byAnam
- Statistics in Business ResearchUploaded byDipayan_lu
- 1 - Pendahuluan.pptUploaded byDhe Nur
- Introduction to Social Network MethodsUploaded bySEEPSocial
- Kellstedt P., Whitten G.-The Fundamentals of Political Science Research-CUP (2013).pdfUploaded byCamila Castilho
- 5 Part Project 13Uploaded byRenee Edwards McKnight
- Test_Set_11Uploaded byravi.btech20023935
- OUTPUT2.docUploaded byJuanCarlos Yogi

- R FundamentalsUploaded byAryan Khanna
- Math for Data ScienceUploaded byMyScribd_ielts
- Cloud Legal India ProvitionsUploaded byMyScribd_ielts
- The role of certification and standards for trusted cloud solutionsUploaded byMyScribd_ielts
- Sourashtrians ArticleUploaded byMyScribd_ielts
- ayurpharm642Uploaded byMyScribd_ielts
- ayurpharm642Uploaded byMyScribd_ielts
- Statistics Machine Learning Python DraftUploaded byGrant imahara
- Lamb Psychophysiology of Yoga Review 2006Uploaded byVipula Parekh
- Disease Review 1-53 New 5 MayUploaded byMyScribd_ielts
- Charaka-Samhita-2003-rev2_Vol_IUploaded byhabibun_66
- Kirana WholesaleUploaded byMyScribd_ielts
- Anatella Tutorial BasicUploaded byMyScribd_ielts
- Data Dictionary - IBRD Statement of Loans and IDA Statement of Credits and GrantsUploaded byMyScribd_ielts
- Jason Brownlee-Basics for Linear Algebra for Machine Learning - Discover the Mathematical Language of Data in Python (2018)Uploaded byEttore_Rizza
- Chicken Growth HarmoneUploaded byMyScribd_ielts
- Names of Foodstuffs in Indian LanguagesUploaded bySubramanyam Gunda
- PGDBA_Bro 2015.pdfUploaded byvikalp shri
- AyurvedA SYLUBUSUploaded bySri Sakthi Sumanan
- Balance LetterUploaded byMyScribd_ielts
- Cl Cloud Technology Basics 2018 PDFUploaded byMyScribd_ielts
- Sample Questions PgdbaUploaded byRupam Sengupta
- Sample Questions PgdbaUploaded byRupam Sengupta
- Yajurveda SandhyavandanamUploaded bykaustavchakravarthy
- How to Learn Python for Data ScienceUploaded byMyScribd_ielts
- Laptop SpecificationUploaded byMyScribd_ielts
- 264731888-Teradata-SyllabusUploaded byMyScribd_ielts
- Data Science - Mathematics ReqUploaded byMyScribd_ielts
- Blood Glucose Levels.pdfUploaded byMyScribd_ielts

- GMATClub Tests Test 31 Quantitative QuestionsUploaded byTrần Ngọc Quân
- On the Universality of the Riemann zeta-functionUploaded byGiuliano Ciolacu
- 0906.2185Higher order fractional derivativesUploaded bySARA
- Markov Chains 2013Uploaded bynickthegreek142857
- Multiplicative InverseUploaded bydearbhupi
- moana marsh 1Uploaded byapi-382258147
- SRJC JC 2 H2 Maths 2011 Mid Year Exam Questions Solutions Paper 1Uploaded byjimmytanlimlong
- permutation and combination.pdfUploaded byhaniza6385
- the unit circle 1Uploaded byapi-316958803
- InterpolationUploaded byriccardocozza
- Karen A. Hallberg- New Trends in Density Matrix RenormalizationUploaded byPo48HSD
- General Perturbation TheoryUploaded byAref Baron
- [솔루션] Probability and Stochastic Processes 2nd Roy D. Yates and David J. Goodman 2판 확률과 통계 솔루션 433 4000Uploaded byTaeho Lee
- Medical Image SegmentationUploaded byRai Sirlopú
- KeyHW10Math370Fall2014Uploaded byjdiorio88
- MAT2384 ODE Solution 2Uploaded byliveinperson
- ProbabilityUploaded byaskjvjahsdjhva
- MATH1310 EngMath1 Sem1!15!16Uploaded byAzrin Saedi
- Brml PackageUploaded byPaul Malcolm
- FREE Equation Calculator - Equations Solver - Mathematics SoftwareUploaded byGoldenKstar
- COMBINATORIAL WORLD - Applications of Voltage Assignament to Principal Fiber BundlesUploaded byAnonymous 0U9j6BLllB
- Godel and Physics by BarrowUploaded byhorea67
- KinematicUploaded bysmarjan
- BOOKS TBDUploaded byShoaib Shaik
- Mathematics IIUploaded byShareef Khan
- Outputs FinalUploaded byEr Manika Vishnoi
- Theory of Robust Control (Carsten Scherer)Uploaded byErsin Erol
- SKEMA SET 1Uploaded byCHIN SIAU LING -
- maker lab reportUploaded byapi-284142521
- TFA - Cohen.pdfUploaded byanon020202