Sie sind auf Seite 1von 25

DATA SCIENCE AND ANALYTICS COURSES

TRAINING PROSPECTUS 2017

www.dataseer.com
2

Contents
About
Become a data ninja 3
Our training clients 4
Our faculty 5
Meet the lead trainer: Isaac Reyes 6
What is data science? 7
The skills of the data scientist 8

Our Courses
Data Storytelling for Business
Course Overview 9
Course Outline Day 1 10
Course Outline Day 2 11

Advanced Visualization and Dashboard Design


Course Overview 12
Course Outline Day 1 13

Introduction to R Programming for Business Applications


Course Overview 14
Course Outline Day 1 & Day 2 15
Course Outline Day 3 16
Introduction to Data Science and Machine Learning in R and Azure
Course Overview 16
Course Outline Day 1 17
Course Outline Day 2 18
Course Outline Day 3 19

Predictive Analytics and Advanced Machine Learning in R and Azure


Course Overview 20
Course Outline Day 1 21
Course Outline Day 2 22
Course Outline Day 3 23

For more information, visit our website at www.dataseer.com


3

Become a
data ninja.
Data is useless without the skill to analyse it.
Data alone is merely a commodity. Its data scientists and analysts who
breathe life into this data and create value, advantage and impact. And
the business world agreesMcKinsey predicts that the United States alone
faces a shortage of 140,000-190,000 people with deep analytical skills.

We train the regions analytics talent so that they are prepared to face the
challenges and opportunities posed by the new data environment.

Our difference - real business datasets.


Computer science and statistics courses from the University sector do not
create professionals who are prepared for the rigors of commercial data.
Real business data is often large (millions of rows), high dimensional
(hundreds of variables), unstructured and high velocity. It is also rarely clean,
awash with missing values, data breaks and outliers.

All of our courses utilise real commercial datasets that will prepare you for
the information you will encounter in your next role as a data scientist or
analyst.

You cannot give me too much data. I see big data as storytelling whether it
is through information graphics or other visual aids that explain it in a way
that allows others to understand across sectors. I always push for the full
scope of the data over averages and aggregations and I like to go to the raw
data because of the possibilities of things you can do with it.
Mike Cavaretta
Data Scientist and Manager, Ford Motor Company
4

Trusted by industry
Data science and analytics is revolutionizing business across all industry verticals.
Since 2015, weve trained over 100 companies, government departments and NGOs in
fundamental data science skills. From banking to telcos and retail to real estate: weve
trained people in your field.
5

OUR FACULTY
Learn from thought leaders on the field
DataSeer is an analytics and data science training provider that has been offering innovative public and
private training courses since 2015.

ISAAC REYES, Data Scientist


Isaac is concurrently lead trainer at DataSeer and
Head of Data Science at Altis, Australias largest
information management consultancy. At Altis,
Isaac leads a team of data scientists who design
analytics and machine learning solutions for
enterprise clients throughout AU/NZ.
A former university lecturer in statistics at the
Australian National University, Isaac is also a
TEDx speaker and regular keynote at big data
conferences.
Isaac holds a Masters Degree in Statistics from
the Australian National University and a Bachelors
Degree in Actuarial Science from Macquarie
University.

JAY MANAHAN, Data Storytelling Expert


A data storytelling expert, Jay is concurrently a trainer at
DataSeer and Head of Operations at Magpie.IM, an online
payments startup.
In his prior role, Jay was the Head of the Manila Shared
Services Center for Kforce (Nasdaq: KFRC) and a Business
Development Director at analytics company, Sencor.
Jay holds an MBA and B.S. in Mathematics from Ateneo de
Manila University.

ALAN WHITE, Data Management Expert


Alan is a leading data management expert with over 15
years experience in the data management and analytics
industry. A published data management thought leader,
he has delivered successful data management solutions
to multiple Fortune 500 clients including Pfizer, AIG and
ADP. Alan holds a Mini-MBA from the Wharton School of
the University of Pennsylvania.
6

Meet our lead trainer: Isaac Reyes

I live and breathe it.


This is how Isaac Reyes describes his machine learning competitions, including
decade long relationship with data. And those issued by industrial equipment giant,
with over 3,000 hours of data science Caterpillar, and the large European retailer,
training experience at the worlds leading Rossman.
institutions, the numbers certainly add up.
More recently, Isaac shared his vision for
Teaching is definitely a passion, says Isaac. Data Science with perhaps the biggest
Ive always kept one foot in the education stage of them all TEDx. Speaking about
sector and one foot in the commercial the intersection of data science and world
sector. A trainer who is unfamiliar with the issues at a TED event was something that
commercial application of his methods Ive always wanted to do. My TED talk
risks becoming too esoteric in his teaching. focused on how we can use data science to
On the other hand, a practitioner who measure how much we really care about
doesnt teach misses out on the peer review the issues that matter.
process that occurs when presenting to a
smart audience. So what does Isaac have in store for
DataSeer training in 2017?
A thought leader in the field, Isaac was
the keynote speaker at last years Big Data 2017 is the year we implement all of the
Analytics Conference 2016, where he feedback we collected from our course
spoke about bringing analytics projects attendees.
from conceptualization through to to We plan on creating more realistic
productionization. workshop problems around commercial
Responsible for the design and delivery of datasets that reflect the digital, high
the DataSeer training curriculum, Isaac is throughput and unstructured data

Our clients will continue to win because they end up with


staff capable of playing at the highest levels of the analytics
value chain.
a big believer in what he calls data driven environment of 2017. We are also set to
education: We like to practice what we deliver domain specific custom trainings
preach, so DataSeers whole training that provide our corporate partners with
process is heavily data driven. We are the exact outcomes they need for their
big believers in pre and post training specific industry vertical or departmental
assessments that allow us to measure needs.
whether our clients are getting the specific
training outcomes they need. We also Finally, the vision is for our clients and
encourage validation of the effectiveness of course attendees to keep winning. Our
our courses by measuring the ROI or quality training attendees will continue to win
of analytics projects both before and after because they build analytical skills that
our training. increase their value in the labor market.
Our clients will continue to win because
Yet another method is used to validate they end up with staff capable of playing
DataSeer training outcomes: global at the highest levels of the analytics value
data science competitions such as chain.
Kaggle. I couldnt be more proud of our
graduates who have gone on to succeed Isaac holds a Bachelors Degree in
in international data science competitions, Actuarial Science from Macquarie
like Kaggle, Isaac says. Back in 2015, University and a Masters Degree in
our first batch of DataSeer bootcamp Statistics from the Australian National
graduates ranked in the top 2.6% of Data University. He was previously a Data
Scientists worldwide. Since then, DataSeer Scientist at Quantium, a Biostatistician at
graduates have continued to post top 5% Datapharm and an Actuarial Analyst at
fnishes in the worlds most competitive PricewaterhouseCoopers.
7

What is data
science?
Data Science is one of the fastest growing disciplines
in the business sector today. New findings from MIT
research show that companies with data-driven decision
making environments had 4% higher productivity and 6%
higher profits than other businesses.

In 2008, Dr DJ Patil and Jeff Hammerbacher, You Cant Hide From Data
heads of analytics and data at LinkedIn and The combination of distributed processing
Facebook respectively, coined the term data power in the cloud, ultra-fast internet and
science to describe the emerging field of cheap storage has made one thing clear:
study that focused on teasing out the hidden data is here to stay. Unprecedented amounts
value in the data that was being collected of data are now being collected, saved, and
from touchpoints all over the retail and stored safely in the cloud. As exabyte upon
business sectors. exabyte is stored, a new discipline grows to
tunnel through the mountain of datasets to
Data Science is now the umbrella term used find the nuggets of gold: actionable insights
for a discipline that spans Programming, that can change the way you do business.
Statistics, Data Mining, Artificial Intelligence,
Networking, Analytics, Business Intelligence,
Visualisation and a host of other subject
areas. The science is constantly changing
and evolving, as it moves to keep abreast of
technology and business practices alike. Data
Science has applications not only in business
decisions, but also across a wide range of
verticals including biostatistics, astronomy
and molecular biology. Wherever you find
large amounts of information, youll find an
application for data science.

Without big data, companies are blind and deaf, wandering out
onto the web like deer on a freeway. - Geoffrey Moore
8

Data Scientist: The


Sexiest Job of the 21st

M K
S

AT N
L
Century

IL

H OW
SK

& L
MACHINE

ST ED
LEARNING

IN

AT G
The Big Three Skills: Coding, Statistics

IS E
CO

TI
and Business DATA

CS
SCIENCE

Never before have these three been DANGER TRADITIONAL


so closely aligned: Coding to query and ZONE! RESEARCH

manipulate large datasets. Statistics to


run robust analyses. Business expertise BUSINESS
EXPERTISE
to know how to ask the right questions
and create useable insights. But data
science isnt just a static owchart - its
a conglomeration of skills in individuals
who can use data to let companies know
how to move forward and along which 3 Business: Analysing the Results
vertices.
Data Scientists are a rare class among
1 Coding Skills their technical brethren: they need to
have excellent client facing and human
Every good data scientist knows that the interfacing skills to complement their
quality of insights are dependent on the technical skills. Often the point of contact
quality of data input. The first task of between the C-suite and analytics teams,
any analytics project is to extract data, data scientists must have a firm grasp
whether that data is stored within an on core business processes, costs,
on premise data warehouse or housed project management methodologies,
alongside terabytes in the cloud. Coding production systems and corporate
skills in languages like SQL, R, Spark and culture. The creation of actionable,
Python are required to extract, clean and positive ROI recommendations, backed
prepare data for analysis. by solid analysis and good data is the
end game, and is the primary reason
the profession has grown to be one of
2 Math and Statistics
the most desirable skillsets in corporate
While statistics is hardly a new feld, circles today.
todays data scientists have experienced
a paradigm shift in statistical application.
Where once the field of statistics
concentrated on achieving valid results
with small samples, today, with a torrent
of information, modern data scientists
face the challenge of separating the signal
from the noise. Judicious application of
statistical methods, coupled with rigorous
mathematical theory allow data scientists
to create models that power actionable
insights.

Consumer data will be the biggest differentiator in the next two


to three years. Whoever unlocks the reams of data and uses it
strategically will win. -Angela Ahrendts, Apple
9

COURSE ONE
COURSE DURATION:

2 DAYS DATA STORYTELLING


FOR BUSINESS
PREREQUISITES:
None. Data Storytelling is predicted to be the
top business skill of the next 5 years.
LAPTOP SPECS: Well told data stories are change drivers within the modern organisation.
Minimum required specs of But how do we find the most important insights in our business data and
Intel i3 processor, 2GB RAM. communicate them in a compelling way? How do we connect the data that we
Either Mac or Windows have to the key underlying business issue?
operating system
This course takes students from the fundamentals (what should we be
measuring and why?) through to the elements of good visualisation design
(what does a good chart look like?) through to proficiency in data storytelling.
REQUIRED SOFTWARE: By the end of the course, participants will know how to produce engaging,
Any data visualization cohesive and memorable data stories using Excel and PowerPoint. The course
also teaches attendees the importance of producing statistically robust
software package (e.g.
visualisations and insights.
Excel, Tableau, PowerBI,
Qlik, R, Python) and
Powerpoint
Suitable For
This is our most popular course. Its suited towards any professional who
works with data and charts. If you need to tell better stories with your data,
then this course is for you.
DATA STORYTELLING FOR BUSINESS

Course Outline Day 1


I. Introductions, Ice Breaker (9:00am 9:15am)
II. Overview of the Four Keys to Data Storytelling (9:15am 9:30am)
Knowing your audience
Preparing your data
Choosing the right visual and designing it well
Telling the story
III. Preparing your data: Exploratory Data Analysis in the Business Setting (9:30am 10:15am)
Step 1 - Know the story behind your data
Step 2 - Variable classifcation
Step 3 - Handle missingness
Step 4 - Sanity check
Step 5 - Univariate EDA
Step 6 - Bivariate EDA
IV. Q&A / Break (10:15am 10:30am)
V. Tables Versus Charts Versus Single Metrics - What to Use and When? (10:30am 11:15am)
Choosing between tables, charts and single headline metrics - guidelines
Visualisation is the fastest bandwidth channel for transferring high dimensional information into
the human brain
Visualisation separates data structure from data noise
Visualisation uncovers hidden patterns
Visualisation grabs attention
Visualisation uncovers cause and eect relationships
When to not use graphs - Recognizing situations where a table is most appropriate
When to not use graphs - Recognizing situations where a single headline metric is appropriate
VI. Q&A / Break (11:15am 11:30am)
VII. The Visualisation Arsenal (11:15am 12:00pm)
The Histogram - The most underutilized visualization in business
The Bar Chart - The king of flexibility, guidelines on vertical and horizontal variations
The Case for and Against Stacked Bar Charts
The Pie Chart - Theory and controversy, smack down with bar charts
The Scatter Plot - Theory and guidelines for large datasets
The Line Chart - Theory, comparison with clustered bar charts, discussion on dual axis line charts
Bubble, Waterfall and Area Charts - Quick opinions
VIII. LUNCH (12:00pm 1:00pm)
IX. Recent Developments in Data Visualization Media (1:15pm 1:45pm)
Virtual Reality Data Visualization Demo
Interactivity and animation, d3.js
Macros for more efficient and consistent designs
Histograms in Excel 2016 - An Applied Walkthrough
IX. Workshop: Team Activity (1:45pm 4:15pm)
X. Group Work Submission Deadline (4:15pm)
XI. Group Presentations, Feedback and Day 1 Wrap Up (4:15pm 5:00pm)

10
DATA STORYTELLING FOR BUSINESS

Course Outline Day 2


I. Ice Breaker Exercise Lets Tell Stories as a Group (9:00am 9:15am)
II. The Elements of Data Visualisation Design (9:15am 10:00am)
Above all else, show the data
Tuftes war on chart-junk
Tuftes data-ink ratio
Using color to focus attention
Dimension, perspective and 3D
The Gestalt principles of visual perception
Proximity
Similarity
Closure
Continuity
Connectedness
Enclosure
III. Q&A / Break (10:00am 10:15am)
IV. The Elements of Data Storytelling (10:15am 11:00am)
Knowing your audience
Designing your visuals and narrative around The Big Takeaway
Delivering insights
Creating memorable soundbites
Structuring your data story - What is an appropriate story flow?
From reporting to strategy - Is your data story actionable?
V. Q&A / Break (11:00am 11:15am)
VI. Examples of good data stories (11:15am 12:00pm)
The Apathy Gap Real life replay of Isaacs TEDx talk
200 Countries, 200 Years, 4 Minutes Hans Roslings Animated Take on Global Health
Examples of data stories from the top management consulting firms
VII. LUNCH (12:00pm 1:00pm)
VIII. The Statistics Behind Good Data Storytelling (1:00pm 1:30pm)
Sample size and inference - Why its important
Correlation and causation - Applied examples
X. Workshop: Team Activity and Presentation (1:30pm 4:15pm)
XI. Group Work Submission Deadline (4:15pm)
XII. Group Feedback, Course Wrap Up, Awarding of Certificates of Completion (4:15pm 5:00pm)

Dataset
This course utilises a 50,000 row, 70 variable Customer Relationship Management (CRM) dataset as a
learning tool.

Data Fields
The dataset includes over 25 customer behavior variables including information about customer spend,
customer complaints, customer retention and purchase frequency. The dataset also features over 20
customer demographic variables including age, occupation and marital status.

Data Format
The data is provided to participants in unstructured .dat format. Participants are taught how to import
the dataset into Excel and convert the .dat file into an .xlsx file.

11
12

COURSE TWO
COURSE DURATION:

1 DAY ADVANCED
VISUALIZATION AND
PREREQUISITES:
None. DASHBOARD DESIGN
Take your visualization and dashboard
LAPTOP SPECS: skills to the next level.
Intel i3 processor, 2GB
RAM.
Either Mac or Windows Advanced Visualization and Dashboard Design is aimed at the professional
operating system who already possesses fundamental data visualization and data storytelling
skills. A natural continuation point from Data Storytelling for Business, this
course provides participants with the skills needed to produce stunning,
understandable business dashboards and graphs. Taught using a variety of
REQUIRED SOFTWARE: visualization tools, the course covers the keys to designing for interactivity and
Any data visualization drill down effects. The course also covers less commonly used but valuable
visualization methods, including methods for visualizing networks and flows.
software package (e.g.
Dashboard design is covered in detail, with participants creating a dashboard
Excel, Tableau, PowerBI,
makeover during the class practical workshop.
Qlik, R, Python) and
Powerpoint

Suitable For
This course is suited to any professional who wants to improve their data visu-
alization and dashboard skills
ADVANCED VISUALIZATION AND DASHBOARD DESIGN

Course Outline Day 1


I. Introductions, Ice Breaker (9:00am 9:15am)
II. Advanced Visualization Design (9:15am 10:00am)
Interactivity - Overview first, zoom and filter, details on demand - Shneiderman (1996)
Taxonomy of interactive dynamics for visual analysis in Heer & Shneiderman (2012)
Guidelines for Annotation layers: rollovers, highlights, auto-summaries
Tools for adding interactivity and annotations - from Excel to d3.js
III. Q&A / Break (10:00am 10:15am)
IV. Extremely Useful Charts That You Wont Find in Excel (10:15am 11:00am)
Tree maps
Mosaic plots
Trellis displays
Chord diagrams
Sankey diagrams
IV. Q&A / Break (11:00am 11:15am)
V. Good Dashboard Design (11:15am 12:00pm)
The unique challenges and opportunities posed by the dashboard layout
Dashboard variations
To label or not to label
Common dashboard features such as the speedometer
The characteristics of well designed dashboards
VIII. LUNCH (12:00pm 1:00pm)
IX. Walkthrough - Lets Give a Poor Dashboard a Makeover (1:00pm 1:30pm)
IX. Workshop: Team Activity - Lets Create Good Dashboards Together (1:30pm 4:15pm)
XI. Workshop Feedback, Presentation from Winning Model, Awarding of Certificates and Course
Wrap Up (4:15pm 5:00pm)

13
14

COURSE THREE
COURSE DURATION:

3 DAYS INTRODUCTION TO
R PROGRAMMING
PREREQUISITES:
- None. FOR BUSINESS
LAPTOP SPECS:
APPLICATIONS
Intel i3 processor,


4GB RAM
Windows operating system
R is the worlds leading data science
Unrestricted PC that has and statistics programming language.
install permissions
In this introduction to R, you will master the basics of this beautiful open
source language, including factors, lists and data frames. After completing
REQUIRED SOFTWARE: the course, you will be ready to undertake your very own end-to-end data
Base R or Microsoft R analysis projects using the worlds most sophisticated data analysis tool. R
Open itself is completely free and can be used to extend the capabilities of data
RStudio warehousing software such as SQL Server 2016 and Microsoft Azure ML
Microsoft account Studio! Working on business datasets in class, you will leverage the power of R
to inform business decision making and analyses. Join millions of R users world
(for Jupyter via Azure
wide in a user community that is growing by 40% every year!
ML Studio or Azure

Suitable For
This course is suited for quants and IT professionals who want a crash course
in an end-to-end data science workflow that is completely implemented in R. It
is also suitable for professionals who seek to understand the ecosystem and
community behind R and make it a powerful and cost-effective application for
their enterprise.
INTRODUCTION TO R PROGRAMMING FOR BUSINESS APPLICATIONS

Course Outline Day 1


I. Introductions, Ice Breaker (9:00am 9:15am)
II. R overview: open-source statistical programming language (9:15am -- 9:45am)
III. R core: R Development Core Team and enhanced distros (9:45am -- 10:15am)

Q&A/Break (10:15am -- 10:30am)

IV. Extending R: user-contributed packages and repositories (10:30am -- 11:15am)

V. R communities: journal, online fora, and blogs (11:15am -- 12:00pm)

Lunch (12:00pm -- 1:00pm)

VI. Using stock R: shell and RGUI (1:00pm -- 1:15pm)

VII. R notebooks: Jupyter (1:15pm -- 1:45pm)

VIII. De facto R IDE: RStudio (1:45pm -- 2:30pm)

Q&A/Break (2:30pm -- 2:45pm)


IX. R integration: R in-database and R in the cloud (2:45pm -- 3:30pm)

X. R connections: samples of R APIs and bindings with other languages (3:30pm -- 4:15pm)

XI. Day 1 wrap-up (4:15pm -- 5:00pm)

Course Outline Day 2


I. From spreadsheets to prompts: intro to interactive programming in R (9:00am -- 9:45am)
II. R data objects: modes, classes, and coercion (9:45pm -- 10:30am)
Q&A/Break (10:30am -- 10:45am)
III. Workshop 1 (10:45pm -- 11:30am)
Lunch (11:30am -- 12:30pm)

IV. Special values in R: missing values, nulls, infinite values, and NaNs (12:30pm -- 1:00pm)

V. Functions: class-specific behaviour and user-defined functions (1:00pm -- 1:15pm)

Lunch (12:00pm -- 1:00pm)

VI. R packages: Installing packages and exposing libraries (1:15pm -- 2:00pm)

Q&A/Break (2:00pm -- 2:15pm)

VII. Loops and conditionals: basic programming in R (2:15pm -- 3:00pm)

VIII. Vectorisation: *apply() and do.call() (3:00pm -- 3:30pm)


IX. Workshop 2 (3:30pm -- 4:30pm)

X. Workshop feedback & Day 2 wrap-up (4:30pm -- 5:00pm)

15
INTRODUCTION TO R PROGRAMMING FOR BUSINESS APPLICATIONS

Course Outline Day 3


I. Chaining R commands: magrittr package (9:00am -- 9:30am)
II. Workshop 1 (9:30am -- 10:00am)
Q&A/Break (10:00am -- 10:15am)
III. Reading and writing data: readxl and readr packages (10:15am -- 10:45am)

IV. Data wrangling: dplyr and reshape2 packages (10:45am -- 11:30am)

Lunch (11:30am -- 12:30pm)

V. Workshop 2 (12:30pm -- 1:15pm)

VI. Introduction to modelling in R: OLS regression (1:15pm -- 2:00pm)

VII. Workshop 3 (2:00pm -- 2:45pm)

Q&A/Break (2:45pm -- 3:00pm)

VIII. Introduction to visualization in R: ggplot2 package (3:00pm -- 3:45pm)

IX. Workshop 4 (3:45pm -- 4:30pm)

X. Workshop feedback & course wrap-up (4:30pm -- 5:00pm)

16
17

COURSE FOUR
COURSE DURATION:

3 DAYS INTRODUCTION TO
DATA SCIENCE AND
PREREQUISITES:
It is recommended
that participants have
MACHINE LEARNING
completed an introductory
R programming course or
IN R AND AZURE
MOOC and at least one
introductory statistics unit at
the university level
Learn the fundamentals of data
science and analytics, from problem
formulation through to model building
LAPTOP SPECS:
Intel i3 processor,
and interpretation of results.
4GB RAM
Windows operating system
Introduction to Data Science and Machine Learning in R and Azure is aimed at the
Unrestricted PC that has
professional who wants an understanding of data science fundamentals with
install permissions a strong focus on business applications. By the end of the course, participants
will be capable of building, tuning and deploying regression and classification
models for a variety of business problems. Participants will also gain an
REQUIRED SOFTWARE: understanding of unsupervised learning techniques and big data architecture.
Excel 2010, 2013 or 2016
R or RStudio latest version Taught using a variety of open source and cloud technologies, the course
A free trial or paid teaches techniques for handling, manipulating and analyzing high volume
(millions of rows), high dimension (thousands of variables) business data. Real
subscription to Microsoft
world projects from the DataSeer analytics consulting team are extensively
Azure ML Studio used to illustrate how each models is used in the real world.

Suitable For
This course is suitable for any person who wants to acquire fundamental data
science skills.
INTRODUCTION TO DATA SCIENCE AND MACHINE LEARNING IN R AND AZURE

Course Outline Day 1


I. Introductions, Ice Breaker (9:00am 9:15am)
II. Introduction to Data Science, Big Data and Analytics (9:15am 10:00am)
What is data science? What does a data scientist do?
What is analytics? What is predictive analytics?
The analytics value chain - Myth and reality
Mapping business problems to data science problems
The 5 Vs of big data
The current state and future of machine learning and AI
III. Q&A / Break (10:00am 10:15am)
IV. The Data Science Process (10:15am 11:00am)
Ask an interesting question
Get the data
Explore the data
Model the data (including comparison of regression and classification problems)
Communicate and visualize the results
Walkthrough of process using a real world DataSeer data science consulting project
Technology overview - from Azure to Hadoop to RStudio
V. Q&A / Break (11:00am 11:15am)
VI. Linear Regression I - Breaking Open the Blackbox (11:15am 12:00nn)
Recognizing regression model applications in business
Why is it called ordinary least squares (OLS) regression?
Comparison of OLS with Least Absolute Deviations (LAD) regression
A simple linear regression model calculated from first principles using Solver
A multiple linear regression model calculated from first principles using Solver
Caution on extrapolating outside the range of provided data
Using a linear regression model to make predictions
VII. LUNCH (12:00nn 1:00pm)
VI. Workshop: Team Activity Fitting a Regression Model to Business Data (1:00pm 2:15pm)
VII. Machine Learning Fundamentals and Linear Regression II (2:15pm 3:00pm)
What is a machine learning model?
What is a test and training dataset?
Polynomial regression
Variable transformations
Feature selection and feature engineering
Overfitting and underfitting
VIII. Q&A / Break (3:00pm 3:15pm)
IX. Workshop: Apply Your Regression Skills in a Kaggle Style Competition (3:15pm 5:00pm)
In this workshop, small groups use their new regression skills to build models on real business
data with the aim of achieving the lowest possible root mean square error on a hold out test
dataset
X. Workshop Feedback, Presentation from Winning Model and Day 1 Wrap Up (5:00pm 5:15pm)

18
INTRODUCTION TO DATA SCIENCE AND MACHINE LEARNING IN R AND AZURE

Course Outline Day 2


I. Introduction to Azure ML Studio (9:00am 9:30am)
Azure ML Studio - Overview, capabilities, limitations
Business considerations - data center locations, data confidentiality, cost
Connecting ML Studio to a data source
Exploring and pre-processing data in ML Studio
Running experiments and setting up workflows
Supervised and unsupervised learning in ML Studio
Deploying and productionizing an ML Studio model as a service
II. Workshop: Team Activity Fitting Yesterdays Regression Model in Azure ML Studio and
Comparing Results (9:30am 10:30am)
III. Q&A / Break (10:30am 10:45am)
IV. Overview of Other Regression Models and Fitting in ML Studio (11:15am 12:00nn)
Decision trees
Random forest
Neural networks
Parametric and non-parametric models
Comparing models in Azure ML Studio
V. LUNCH (12:00nn 1:00pm)
VI. Logistic Regression I - Breaking Open the Blackbox (1:00pm 1:45pm)
Recognizing classification model applications in businessTrue positives, false positives, true
negatives and false negatives
Walkthrough of a classification model case study from business
What does maximum likelihood mean?
A logistic regression model calculated from first principles using Solver
Using a logistic regression model to make probability predictions
Converting probability predictions into labelled predictions
VII. Q&A / Break (1:45pm 2:00pm)
VIII. Comparing Classification Models and Logistic Regression II (2:00pm 2:30pm)
Assessing model accuracy
True positives, false positives, true negatives and false negatives
The Receiver Operator Characteristic (ROC) curve and Area Under the Curve (AUC)
A brief introduction to other classification models - Decision trees, support vector machines,
gradient boosting and neural nets
IX. Workshop: Apply Your Classification Skills in a Kaggle Style Competition (2:30pm 5:00pm)
In this workshop, small groups use their new classification skills to build models on real
business data in Azure ML Studio with the aim of achieving the lowest possible classification
error on a hold out test dataset
X. Workshop Feedback, Presentation From Winning Model and Day 2 Wrap Up (5:00pm 5:15pm)

19
INTRODUCTION TO DATA SCIENCE AND MACHINE LEARNING IN R AND AZURE
20

Course Outline Day 3


I. Decision Trees (9:00am 9:45am)
Introduction to decision trees
Advantages and disadvantages compared to statistical models
Advantages and disadvantages compared to statistical models
Decision tree algorithms
Prediction using decision trees
Example using business data
II. Q&A / Break (10:15 10:30am)
III. Workshop: Team Activity Decision Trees (10:00am 11:15am)
IV. Unsupervised Learning and PCA (11:15am 12:00nn)
Unsupervised learning models
Introduction to Principal Components Analysis
Visualizing PCA results
Interpreting PCA results
V. LUNCH (12:00nn 1:00pm)
VI. Workshop: Team Activity PCA with a Customer Demographics Dataset (1:00pm 2:45pm)
VII. Fundamentals of Big Data Engineering (2:45pm 3:30pm)
Introduction to distributed computing
MapReduce
Hadoop and HDFS
Hive
Mahout
Spark
Event ingestion and stream processing
How will things change as IoT ramps up?
VIII. Q&A / Break (3:30pm 3:45pm)
IX. Workshop: Big Data Engineering (3:45pm 4:45pm)
X. Workshop Feedback, Awarding of Certificates and Course Wrap Up (4:45pm 5:15pm)

The topics were handled very well.


Statistical theories were explained in a way
that it can be grasped by a participant
with a non-statistical background.
Celina, Indra Philippines Inc.
21

COURSE FIVE
COURSE DURATION:

3 DAYS PREDICTIVE
ANALYTICS AND
PREREQUISITES:
It is recommended
that participants have
ADVANCED MACHINE
completed an introductory
R programming course or
LEARNING IN R AND
MOOC and at least one 2nd
year statistics unit at the AZURE
university level.
Use R and Azure ML Studio to build
and tune advanced machine learning
LAPTOP SPECS:
Intel i3 processor, 4GB models.
RAM.
Windows operating system
Predictive analytics and machine learning techniques are revolutionizing
Unrestricted PC that has
business and government. Predictive Analytics and Machine Learning in R &
install permissions Azure is aimed at the person who wants to have a better understanding of the
mechanics behind the models and how these models are realistically applied
in the business setting. In addition to covering advanced machine learning
REQUIRED SOFTWARE: techniques in depth, the course covers the management of stakeholder
Excel 2010, 2013 or 2016 expectations during predictive analytics projects and analytics project
R or RStudio latest version management. Advanced machine learning methods are discussed in depth,
A free trial or paid including those used to win global data science competitions.
subscription to Microsoft
Azure ML Studio Suitable For
This course is suited to any professional who already understands
analytics and machine learning basics and is ready to progress to higher
levels of sophistication. It is also suitable to any professional who is
interested in who predictive analytics projects are conceptualized, scoped
and project managed.
PREDICTIVE ANALYTICS AND ADVANCED MACHINE LEARNING IN R & AZURE

Course Outline Day 1


I. Introductions, Ice Breaker (9:00am 9:30am)
II. Dimensionality, Parsimony, Testing Accuracy (9:15am 10:00am)
The curse of dimensionality
The principle of parsimony
Testing model accuracy
John Elders Target Shuffling
Lift charts
Bootstrap sampling
III. Q&A / Break (10:15am 10:30am)
IV. Shrinkage - More Than What Happens in the Pool (10:15am 11:00am)
How shrinkage methods depart from traditional statistical methods
Ridge regression
The LASSO method
How does the LASSO method help perform variable selection?
Sparsity
V. Q&A / Break (11:00am 11:15am)
VI. Workshop: Team Activity - Lets compare LASSO and ridge regression (11:15am 12:00nn)
VI. LUNCH (11:30am 12:30pm)
VI. Workshop: Team Activity (cont.) - Lets compare LASSO and ridge regression (1:00pm 1:30pm)
VII. Cross Validation, Bagging and Ensembling (1:30pm 2:15pm)
Bootstrap aggregation
K-fold cross validation
Model ensembling
Choosing weights for ensemble models
VIII. Q&A / Break (2:15pm 2:30pm)
IX. Workshop: Lets bag, ensemble and cross validate! (2:30pm 4:45pm)
X. Workshop Feedback, Presentation from Winning Model and Day 1 Wrap Up (4:45pm 5:00pm)

22
PREDICTIVE ANALYTICS AND ADVANCED MACHINE LEARNING IN R & AZURE

Course Outline Day 2


I. Artifical Neural Networks (9:00am 10:00am)
A gentle introduction to ANNs using colors
What is deep learning?
What is forward and back propagation?
How many hidden layers should we use?
ANN and linear regression smack down in Azure ML Studio
II. Q&A / Break (10:00am 10:15am)
III. Workshop: Team Activity - Lets build and tune neural nets (10:15am 12:00nn)
IV. LUNCH (12:00nn 1:00pm)
V. Predictive Analytics in Practice - Managing Analytics Projects and Teams (1:00pm 1:45pm)
Where should the analytics team be situated in the corporate structure? Research findings.
Managing stakeholder expectations in analytics projects
The importance of having analytics champions
Project management for analytics projects - how does it differ from regular IT projects?
VI. Q&A / Break (1:45pm 2:00pm)
VI. Support Vector Machines (2:00pm 3:00pm)
The maximal margin classifier
The support vector classifier
Kernels and SVMs
Performance comparison to other classification methods
VII. Q&A / Break (3:00pm 3:15pm)
VIII. Workshop: Lets build and tune SVMs! (3:15pm 4:45pm)
IX. Workshop Feedback, Presentation from Winning Model and Day 2 Wrap Up (4:45pm 5:00pm)

Dataset
This course utilises the following datasets as learning tools:
A 50,000 row, 70 variable Customer Relationship Management (CRM) dataset as a learning tool.
A 750,000 row, 30 variable digital marketing dataset from the insurance sector
A 227,000 row, 21 variable airlines dataset

Data Fields
The dataset includes over 25 customer behavior variables including information about customer spend,
customer complaints, customer retention and purchase frequency. The dataset also features over 20
customer demographic variables including age, occupation and marital status.

The digital marketing dataset includes information about customer demographics, product category
purchased and the digital marketing channel the customer engaged with at each respective online
touchpoint. The airlines dataset includes information on domestic US flights that departed Houston in
2011. The fields include departure time, arrival time, flight number and destination location (alongside
17 other fields).

Data Format
The data is provided to participants in unstructured .dat format. Participants are taught how to import
the dataset into Excel and convert the .dat file into an .xlsx file.

23
PREDICTIVE ANALYTICS AND ADVANCED MACHINE LEARNING IN R & AZURE

Course Outline Day 3


I. Market Basket Analysis and Affinity Analysis (9:00am 9:45am)
What is association rule mining?
What is the business case for market basket analysis?
Support, lift and confidence
Visualizing market basket results
II. Q&A / Break (9:45am 10:00am)
III. Workshop: Lets use arules to perform MBA on supermarket data (10:00am 11:30am)
IV. Introduction to Kaggle Competitions (11:30am 12:15pm)
Kaggle overview
Kaggle competition strategies
Private and public LB
Team merging
V. LUNCH (12:15pm 1:15pm)
VI. Workshop: Team Activity - Lets Kaggle! (1:15pm 4:45pm)
During this capstone team activity, course participants will enrol in a live Kaggle competition.
With the aim of achieving a top 50% leaderboard ranking by the end of the day, the full data
science process will be implemented. Toward the end of the task, a strategy for continued
learning and success in the competition will be discussed.

VII. Workshop Feedback, Awarding of Certificates and Course Wrap-up (4:45pm 5:00pm)

I like that I have a better handle on the


backend workings of regression,
instead of just automatically
generating it using a tool.
JP, ABS-CBN Corporation

24
25

How can we help?


Contact us today for enrollment and inquiries.

DataSeer
www.dataseer.com
info@dataseer.com

111 North Bridge Rd #08-18 Peninsula


Plaza Singapore 179098

PH: +632 908 2565 or +632 908 2566


(Business Hours)

+639176773825 (After Hours)

SG: +65 3152 6845

Das könnte Ihnen auch gefallen