Sie sind auf Seite 1von 17

3 Clicks to Data Science

… an overview of rattle

“Science … resolves the whole into parts,


the organism into organs, the obscure into
the known. … Science gives us knowledge,
but only philosophy can give us wisdom …

Graham Williams
[to] synthesize knowledge to resolve the
obscure into the known.”

Director of Data Science After the Philosopher Durant.

Asia Pacific, Microsoft


Contributions
RevoScaleR in Rattle
- Surendra Tipparaju
- Durga Prasad Chappidi
- Dinesh Manyam Venkata
- Mrinal Chakraborty

DplyrXdf – Dr Hong Ooi

XGBoost – Dr Zhou Fang

AzureDSVM – Dr Zhang Le
Machine Learning
Asks the Questions ...

1. How much / how many? Regression

2. Is this A or B? Classification

3. How is this organised? Clustering

4. Is this weird? Anomalies

5. What to do next? Recommender


Machine Learning
Asks the Questions ...

1. How much / how many? Regression

2. Is this A or B? Classification

3. How is this organised? Clustering

4. Is this weird? Anomalies

5. What to do next? Recommender


Machine Learning: Decision Trees
Identify patterns in the data – associated with the
outcome of interest – Rain Tomorrow?

Recursive Partitioning
… aka Divide and Conquer
… aka Map and Reduce

One of the earliest algorithms and still going very strong!

Deep learning with Random Forests using massive data


and massive compute characterises current AI/ML surge.

Artificial Intelligence

knowledge representation

sense the world

discover knowledge

reason with that knowledge

make decisions autonomously
4 Clicks to Data Science

First Model in 4 Clicks


mywac01.southeastasia.cloudapp.azure.com
• Most popular statistical programming language.
Language • Data visualisation tool.
Platform • Free (as in Libre) open source software (FLOSS).

What is
• More than 3 million users.
• Taught in most universities.
Community
• Ecosystem of use cases shared openly.
• Thriving user groups worldwide.

• 10,851 contributed packages.


Ecosystem
• Rich application & platform integration.
Rattle for Data Science
Using

– Glade point and click GUI builder (XML)


– RGtk2 bindings for the cross platform GUI
– R to implement all the callbacks

– >20,000 downloads per month

Log tab collects documented, formatted, R scripts as


a starting point for real work in R
The Data Scientist’s Toolkit –
DSVM on Azure
• Specialized VM image on Azure.
• Data science and Azure tools and SDKs.
• Pre-configured and ready to use.
• Pay for cloud hardware usage only.
• No separate software charges!
• Windows and Linux Versions.
$200 USD credit for 12 months
• Up and running quickly (5 minutes). http://aka.ms/dsvmfree
Requires
Microsoft Machine Learning & Data Sciences Conference, 8 & 9 Aug, 2016 credit card but no charge.
Linux Data Science Virtual Machine - Azure
https://aka.ms/linuxdsvm

Vowpal Wabbit CNTK

Rattle
Hands On – Simplifying the Cloud with R and AzureDSVM

Deploying a Virtual Machine


Deploying a Cluster of Servers
Distributing an R Workload When Required

mywac01.southeastasia.cloudapp.azure.com
Hands On – Programming With R

R Scripts to Process Data and Build Models

The Rattle Log Tab


Open Source R but …
• In-Memory Operation

• Data Movement
& Duplication

• Lack of Parallelism
Rattle with MRS – Version 5.0.16
Now supports
Microsoft R
Server – Big Data

No limit on the dataset sizes

Parallel data processing and model building


Rattle with Microsoft R Server

myubu01.southeastasia.cloudapp.azure.com
Future Options

MicrosoftML integration

Deep Forest and Deep NN

LightGBM to replace xgboost

????
Resources

Overview of the Linux Data Science Virtual Machine

https://aka.ms/linuxdsvm

Essentials Guide to Setting Up a Linux DSVM and
R and Rstudio and Rattle

https://aka.ms/ldsvm

Rattle Home Page

https://rattle.togaware.com