Sie sind auf Seite 1von 22

Introduction to Python

Data Analytics
Course Outline

• Session 1: Basic Concepts


• Session 2: Demonstration of Python Data Analytics
• Session 3: Hands-On Practice
Goal & Scope of This Course

We’re going to cover only the key


concepts in Python data analytics

• Python basics for data analytics


• Python data analytics libraries
• Jupyter Notebook
Quick Survey on Prior Experience

• Python
• I have experience with Python
• I have experience with programming, but not with Python
• I have no experience with programming

• Data analytics
• I have experience with data analytics
• I have no experience with data analytics
What Is Data Analytics?

Data analytics is the process and


methodology of analyzing data
to draw meaningful insight
from the data
Why Is It So Popular?

We now see the limitless potential


for gaining critical insight
by applying data analytics
Typical Process of Data Analytics

Decision Making
Problem

Insight Decision
Development Making
Requirement Data Data Data
Understanding Understanding Preparation Exploration
Modeling &
Deployment
Evaluation

The most time- The most


consuming part exciting part Modeling
Problem
Types of Data Analytics

Data Analytics

Descriptive Predictive Prescriptive


Analytics Analytics Analytics
What has happened or is What could happen in the What should we do to make
happening? future? that happen or not happen?

“How will the population “What actions should be


“How has the population
change over the next ten taken in order to avoid the
been changing?”
years?” demographic cliff?”
Confusion – Machine Learning vs. Data Analytics

AI

Machine Learning Vs. Data Analytics

Data analytics depends


Deep Learning heavily on machine learning
Confusion –AI vs. Data Analytics

AI Vs. Data Analytics

Machine Learning
The goals are different!
• AI: intelligence
Deep Learning • Data analytics: insight
Python as a Programming Language

Python is a general-purposed
high-level programming language
• Web development
• Networking
• Scientific computing
• Data analytics
• …
Python as a Data Analytics Tool

The nature of Python makes it


a perfect-fit for data analytics
• Easy to learn
• Readable
• Scalable
• Extensive set of libraries
• Easy integration with other apps
• Active community & ecosystem
18
Popular Python Data Analytics Libraries

Library Usage
numpy, scipy Scientific & technical computing
pandas Data manipulation & aggregation
mlpy, scikit-learn Machine learning
theano, tensorflow, keras Deep learning

statsmodels Statistical analysis


nltk, gensim Text processing
networkx Network analysis & visualization
bokeh, matplotlib, seaborn, plotly Visualization
beautifulsoup, scrapy Web scraping
19
iPython & Jupyter Notebook

iPython is a Python command shell


for interactive computing

Jupyter Notebook (the former iPython


Notebook) is a web-based interactive
data analysis environment that
supports iPython
20
Comparison – R vs. Python
• Comparison between R and Python has been absolutely one of the hottest
topics in data science communities

R Vs. Python
R came from the statisticians community,
whereas Python came from the computer scientists community

Python is said to be a challenger against R, but in general it’s a tie

It’s up to you to choose the one that best fits your needs
For detailed comparison, refer to https://www.datacamp.com/community/tutorials/r-or-python-for-data-analysis
21
Start Jupyter notebook

jupyter notebook

25
Loading Python Libraries

#Import Python Libraries


import numpy as np
import scipy as sp
import pandas as pd
import matplotlib as mpl
import seaborn as sns

Press Shift+Enter to execute the jupyter cell

25
Reading data using pandas

In [ ]: #Read csv file


df = pd.read_csv("http://rcs.bu.edu/examples/python/data_analysis/Salaries.csv")

Note: The above command has many optional arguments to fine-tune the data import process.

There is a number of pandas commands to read other data formats:

pd.read_excel('myfile.xlsx',sheet_name='Sheet1', index_col=None, na_values=['NA'])


pd.read_stata('myfile.dta')
pd.read_sas('myfile.sas7bdat')
pd.read_hdf('myfile.h5','df')

25
Exploring data frames

In [3]: #List first 5 records


df.head()

Out[3]:

25
Data Frames methods

Unlike attributes, python methods have parenthesis.


All attributes and methods can be listed with a dir() function: dir(df)

df.method() description
head( [n] ), tail( [n] ) first/last n rows

describe() generate descriptive statistics (for numeric columns only)

max(), min() return max/min values for all numeric columns

mean(), median() return mean/median values for all numeric columns

std() standard deviation

sample([n]) returns a random sample of the data frame

dropna() drop all the records with missing values


25
Summary

• Typical Python data analytics process for beginners


1. Identify the dataset of interest from a file/database/web
2. Load the dataset into a Pandas dataframe
3. Check the column names and see the first few rows
4. Derive additional columns if needed and handle missing data
5. Do analysis with visualization or apply advanced data analytics techniques

25
Any Doubts ?

THANK YOU !!

25

Das könnte Ihnen auch gefallen