Sie sind auf Seite 1von 32

1.

FUNDAMENTALS
Matrices & Linear Algebra Fundamentals
Hash Functions, Binary Tree, O(n)
Relational Algebra, DB Basics
Inner, Outer, Cross, Theta Join
Tabular Data
Sharding
OLAP
Multidimensional Data Model
ETL
Reporting Vs BI Vs Analytics
JSON & XML
NoSQL
Regex
Vendor Landscape
Env Setup

2. STATISTICS

Pick a Dataset (UCI Repo)


Descriptive Statistics (mean, median, range, SD, Var)
Exploratory Data Analysis
Histograms
Percentiles & Outliers
Probability Theory
Bayes Theorem
Random Variables
Cumul Dist Fn (CDF)
Continuos Distributions (Normal, Poisson, Gaussian)
Skewness
ANOVA
Prob Den Fn (PDF)
Central Limit Theorem
Monte Carlo Method
Hypothesis Testing
p-Value
Chi2 Test
Estimation
Confid Int (CI)
MLE
Kernel Density Estimate
Regression
Covariance
Correlation
Pearson Coeff
Causation
Least2 Fit

Euclidean Distance

{
{

Classification

Regression
Clustering

4.MACHINE LEARNING
What is ML?
Numerical Var
Categorical Var
Supervised Learning
Unsupervised Learning
Concepts, Inputs & Attributes
Training & Test Data
Classifier
Prediction
Lift
Overfitting
Bias & Variance
Trees & Classification
Classification Rate
Decision Trees
Boosting
Nave Bayes Classifiers
K-Nearest Neighbor
Logistic Regression
Ranking
Linear Regression
Perceptron
Hierarchical Clustering
K-means Clustering
Neural Networks
Sentiment Analysis
Collaborative Filtering
Tagging

3.PROGRAMMING
Python Basics
Working in Excel
R Setup RStudio
R Basics
Expressions

8.DATA INGESTION

IBM SPSS

Summary of Data Formats

Variables

Data Discovery

Rapid Miner

Data Sources & Adquisition

Vectors

Data Integration

Matrices

Data Fusion

Arrays

Transformation Enrichment

Factors

Data Survey

Lists

Google OpenRefine

Data Frames

How much Data?

Reading CSV Data

Using ETL

Reading Raw Data


Subsetting Data
Manipulate Data Frames
Functions
Factor Analysis
Install Pkgs
6.VISUALIZATION
Data Exploration in R (Hist, Boxplot, etc)
Uni, Bi & Multivariate Viz
ggplot2
Histogram & Pie (Uni)
Tree & Tree Map
Scatter Plot (Bi)
Line Charts (Bi)
Spatial Charts
Survey Plot
Timeline
Decision Tree
D3.js
InfoVis
IBM ManyEyes
Tableau
7.BIG DATA
Map Reduce Fundamentals
Hadoop Components
Data Replication Principles
Setup Hadoop (IBM / Cloudera / HortonWorks)
Name & Data Nodes
Job & Task Tracker

M/R Programming
Sqoop: Loading Data in HDSF
Flume, Scribe: For Unstruct Data
SQL with Pig
DWH with Hive
Scribe, Chukwa for Weblog
Using Mahout
Zookeeper Avro
Storm: Hadoop Realtime
Rhadoop, RHIPE
rmr
Cassandra
MongoDB, Neo4j

5.TEXT MINING / NLP


Corpus
Name Entity Recognition
Text Analysis
UIMA
Term Document Matrix
Term Frequency & Weight
Support Vector Machines
Association Rules
Market Base Analysis
Feature Extraction
Using Mahout
Using Weka
Using NLTK
Classify Text
Vocabulary Mapping

9.DATA MUNGING
Dimensionality & Numerosity Reduction
Normalization
Data Scrubbing
Handling Missing Values
Unbiased Estimators
Binning Sparse Values
Feature Extraction
Denoising
Sampling

Stratified Sampling
Principal Component Analysis
10.TOOLBOX
MS Excel w/ Analysis ToolPak
Java, Python
R, R Studio, Rattle
Weka, Kmine, RapidMiner
Hadoop Dist of Choice
Spark, Storm
Flume, Scribe, Chukwa
Nutch, Talend, Scraperwiki
Webscraper, Flume, Sqoop
tm, RWeka, NLTK
RHIPE
D3,js, ggplot2, Shiny
IBM Languageware
Cassandra, MongoDB

1. FUNDAMENTALS
Matrices & Linear Algebra Fundamentals
Hash Functions, Binary Tree, O(n)
Relational Algebra, DB Basics
Inner, Outer, Cross, Theta Join
Tabular Data
Sharding
OLAP
Multidimensional Data Model
ETL
Reporting Vs BI Vs Analytics
JSON & XML
NoSQL
Regex
Vendor Landscape
Env Setup

6.VISUALIZATION
Data Exploration in R (Hist, Boxplot, etc)
Uni, Bi & Multivariate Viz
ggplot2
Histogram & Pie (Uni)
Tree & Tree Map
Scatter Plot (Bi)
Line Charts (Bi)
Spatial Charts
Survey Plot
Timeline
Decision Tree
D3.js
InfoVis
IBM ManyEyes
Tableau

2. STATISTICS
Pick a Dataset (UCI Repo)
Descriptive Statistics (mean, median, range, SD, Var)
Exploratory Data Analysis
Histograms
Percentiles & Outliers
Probability Theory
Bayes Theorem
Random Variables
Cumul Dist Fn (CDF)
Continuos Distributions (Normal, Poisson, Gaussian)
Skewness
ANOVA
Prob Den Fn (PDF)
Central Limit Theorem
Monte Carlo Method
Hypothesis Testing
p-Value
Chi2 Test
Estimation
Confid Int (CI)
MLE
Kernel Density Estimate
Regression
Covariance
Correlation
Pearson Coeff
Causation
Least2 Fit
Euclidean Distance
7.BIG DATA
Map Reduce Fundamentals
Hadoop Components
Data Replication Principles
Setup Hadoop (IBM / Cloudera / HortonWorks)
Name & Data Nodes
Job & Task Tracker
M/R Programming
Sqoop: Loading Data in HDSF
Flume, Scribe: For Unstruct Data
SQL with Pig
DWH with Hive
Scribe, Chukwa for Weblog
Using Mahout
Zookeeper Avro
Storm: Hadoop Realtime

Rhadoop, RHIPE
rmr
Cassandra
MongoDB, Neo4j

3.PROGRAMMING

4.MACHINE LEARNING

Python Basics

What is ML?

Working in Excel

Numerical Var

R Setup RStudio

Categorical Var

R Basics

Supervised Learning

Expressions

Unsupervised Learning

IBM SPSS

Concepts, Inputs & Attributes

Variables

Training & Test Data

Rapid Miner

Classifier

Vectors

Prediction

Matrices

Lift

Arrays

Overfitting

Factors

Bias & Variance

Lists

Trees & Classification

Data Frames

Classification: Classification Rate

Reading CSV Data

Classification: Decision Trees

Reading Raw Data

Classification: Boosting

Subsetting Data

Classification: Nave Bayes Classifiers

Manipulate Data Frames

Classification: K-Nearest Neighbor

Functions

Classification: Logistic Regression

Factor Analysis

Regression: Ranking

Install Pkgs

Regression: Linear Regression


Regression: Perceptron
Clustering: Hierarchical Clustering
Clustering: K-means Clustering
Neural Networks
Sentiment Analysis
Collaborative Filtering
Tagging

8.DATA INGESTION

9.DATA MUNGING

Summary of Data Formats

Dimensionality & Numerosity Reduction

Data Discovery

Normalization

Data Sources & Adquisition


Data Integration
Data Fusion
Transformation Enrichment
Data Survey
Google OpenRefine
How much Data?
Using ETL

Data Scrubbing
Handling Missing Values
Unbiased Estimators
Binning Sparse Values
Feature Extraction
Denoising
Sampling
Stratified Sampling
Principal Component Analysis

5.TEXT MINING / NLP


Corpus
Name Entity Recognition
Text Analysis
UIMA
Term Document Matrix
Term Frequency & Weight
Support Vector Machines
Association Rules
Market Base Analysis
Feature Extraction
Using Mahout
Using Weka
Using NLTK
Classify Text
Vocabulary Mapping

10.TOOLBOX
MS Excel w/ Analysis ToolPak
Java, Python
R, R Studio, Rattle
Weka, Kmine, RapidMiner
Hadoop Dist of Choice
Spark, Storm
Flume, Scribe, Chukwa
Nutch, Talend, Scraperwiki
Webscraper, Flume, Sqoop
tm, RWeka, NLTK
RHIPE
D3,js, ggplot2, Shiny
IBM Languageware
Cassandra, MongoDB

References
LEVEL

NUMBER

01. FUNDAMENTALS

01. FUNDAMENTALS

01. FUNDAMENTALS

01. FUNDAMENTALS

01. FUNDAMENTALS

01. FUNDAMENTALS
01. FUNDAMENTALS
01. FUNDAMENTALS
01. FUNDAMENTALS
01. FUNDAMENTALS
01. FUNDAMENTALS
01. FUNDAMENTALS
01. FUNDAMENTALS
01. FUNDAMENTALS
01. FUNDAMENTALS
02. STATISTICS
02. STATISTICS
02. STATISTICS
02. STATISTICS
02. STATISTICS
02. STATISTICS
02. STATISTICS
02. STATISTICS
02. STATISTICS
02. STATISTICS
02. STATISTICS
02. STATISTICS

6
7
8
9
10
11
12
13
14
15
1
2
3
4
5
6
7
8
9
10
11
12

02. STATISTICS
02. STATISTICS
02. STATISTICS
02. STATISTICS
02. STATISTICS
02. STATISTICS
02. STATISTICS
02. STATISTICS
02. STATISTICS
02. STATISTICS
02. STATISTICS
02. STATISTICS
02. STATISTICS
02. STATISTICS
02. STATISTICS
02. STATISTICS
02. STATISTICS
03.PROGRAMMING
03.PROGRAMMING
03.PROGRAMMING
03.PROGRAMMING
03.PROGRAMMING
03.PROGRAMMING
03.PROGRAMMING
03.PROGRAMMING
03.PROGRAMMING
03.PROGRAMMING
03.PROGRAMMING
03.PROGRAMMING
03.PROGRAMMING
03.PROGRAMMING
03.PROGRAMMING
03.PROGRAMMING
03.PROGRAMMING
03.PROGRAMMING
03.PROGRAMMING
03.PROGRAMMING
03.PROGRAMMING
04.MACHINE LEARNING
04.MACHINE LEARNING
04.MACHINE LEARNING
04.MACHINE LEARNING
04.MACHINE LEARNING
04.MACHINE LEARNING
04.MACHINE LEARNING
04.MACHINE LEARNING
04.MACHINE LEARNING
04.MACHINE LEARNING
04.MACHINE LEARNING
04.MACHINE LEARNING
04.MACHINE LEARNING
04.MACHINE LEARNING
04.MACHINE LEARNING
04.MACHINE LEARNING
04.MACHINE LEARNING
04.MACHINE LEARNING
04.MACHINE LEARNING

13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

04.MACHINE LEARNING
04.MACHINE LEARNING
04.MACHINE LEARNING
04.MACHINE LEARNING
04.MACHINE LEARNING
04.MACHINE LEARNING
04.MACHINE LEARNING
04.MACHINE LEARNING
04.MACHINE LEARNING
05.TEXT MINING / NLP
05.TEXT MINING / NLP
05.TEXT MINING / NLP
05.TEXT MINING / NLP
05.TEXT MINING / NLP
05.TEXT MINING / NLP
05.TEXT MINING / NLP
05.TEXT MINING / NLP
05.TEXT MINING / NLP
05.TEXT MINING / NLP
05.TEXT MINING / NLP
05.TEXT MINING / NLP
05.TEXT MINING / NLP
05.TEXT MINING / NLP
05.TEXT MINING / NLP
06.VISUALIZATION
06.VISUALIZATION
06.VISUALIZATION
06.VISUALIZATION
06.VISUALIZATION
06.VISUALIZATION
06.VISUALIZATION
06.VISUALIZATION
06.VISUALIZATION
06.VISUALIZATION
06.VISUALIZATION
06.VISUALIZATION
06.VISUALIZATION
06.VISUALIZATION
06.VISUALIZATION
07.BIG DATA

20
21
22
23
24
25
26
27
28
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
1

07.BIG DATA

07.BIG DATA
07.BIG DATA
07.BIG DATA
07.BIG DATA
07.BIG DATA
07.BIG DATA
07.BIG DATA
07.BIG DATA
07.BIG DATA
07.BIG DATA
07.BIG DATA
07.BIG DATA
07.BIG DATA
07.BIG DATA
07.BIG DATA

3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

07.BIG DATA
07.BIG DATA
08.DATA INGESTION
08.DATA INGESTION
08.DATA INGESTION
08.DATA INGESTION
08.DATA INGESTION
08.DATA INGESTION
08.DATA INGESTION
08.DATA INGESTION
08.DATA INGESTION
08.DATA INGESTION
09.DATA MUNGING
09.DATA MUNGING
09.DATA MUNGING
09.DATA MUNGING
09.DATA MUNGING
09.DATA MUNGING
09.DATA MUNGING
09.DATA MUNGING
09.DATA MUNGING
09.DATA MUNGING
09.DATA MUNGING
10.TOOLBOX
10.TOOLBOX
10.TOOLBOX
10.TOOLBOX
10.TOOLBOX
10.TOOLBOX
10.TOOLBOX
10.TOOLBOX
10.TOOLBOX
10.TOOLBOX
10.TOOLBOX
10.TOOLBOX
10.TOOLBOX
10.TOOLBOX

18
19
1
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
10
11
1
2
3
4
5
6
7
8
9
10
11
12
13
14

https://www.experfy.com/blog/become-data-scientist/
COURSE
Matrices & Linear Algebra Fundamentals

Hash Functions, Binary Tree, O(n)

Relational Algebra, DB Basics

Inner, Outer, Cross, Theta Join

Tabular Data
Sharding
OLAP
Multidimensional Data Model
ETL
Reporting Vs BI Vs Analytics
JSON & XML
NoSQL
Regex
Vendor Landscape
Env Setup
Pick a Dataset (UCI Repo)
Descriptive Statistics (mean, median, range, SD, Var)
Exploratory Data Analysis
Histograms
Percentiles & Outliers
Probability Theory
Bayes Theorem
Random Variables
Cumul Dist Fn (CDF)
Continuos Distributions (Normal, Poisson, Gaussian)
Skewness
ANOVA

Prob Den Fn (PDF)


Central Limit Theorem
Monte Carlo Method
Hypothesis Testing
p-Value
Chi2 Test
Estimation
Confid Int (CI)
MLE
Kernel Density Estimate
Regression
Covariance
Correlation
Pearson Coeff
Causation
Least2 Fit
Euclidean Distance
Python Basics
Working in Excel
R Setup RStudio
R Basics
Expressions
IBM SPSS
Variables
Rapid Miner
Vectors
Matrices
Arrays
Factors
Lists
Data Frames
Reading CSV Data
Reading Raw Data
Subsetting Data
Manipulate Data Frames
Functions
Factor Analysis
Install Pkgs
What is ML?
Numerical Var
Categorical Var
Supervised Learning
Unsupervised Learning
Concepts, Inputs & Attributes
Training & Test Data
Classifier
Prediction
Lift
Overfitting
Bias & Variance
Trees & Classification
Classification: Classification Rate
Classification: Decision Trees
Classification: Boosting
Classification: Nave Bayes Classifiers
Classification: K-Nearest Neighbor
Classification: Logistic Regression

Regression: Ranking
Regression: Linear Regression
Regression: Perceptron
Clustering: Hierarchical Clustering
Clustering: K-means Clustering
Neural Networks
Sentiment Analysis
Collaborative Filtering
Tagging
Corpus
Name Entity Recognition
Text Analysis
UIMA
Term Document Matrix
Term Frequency & Weight
Support Vector Machines
Association Rules
Market Base Analysis
Feature Extraction
Using Mahout
Using Weka
Using NLTK
Classify Text
Vocabulary Mapping
Data Exploration in R (Hist, Boxplot, etc)
Uni, Bi & Multivariate Viz
ggplot2
Histogram & Pie (Uni)
Tree & Tree Map
Scatter Plot (Bi)
Line Charts (Bi)
Spatial Charts
Survey Plot
Timeline
Decision Tree
D3.js
InfoVis
IBM ManyEyes
Tableau
Map Reduce Fundamentals
Hadoop Components
Data Replication Principles
Setup Hadoop (IBM / Cloudera / HortonWorks)
Name & Data Nodes
Job & Task Tracker
M/R Programming
Sqoop: Loading Data in HDSF
Flume, Scribe: For Unstruct Data
SQL with Pig
DWH with Hive
Scribe, Chukwa for Weblog
Using Mahout
Zookeeper Avro
Storm: Hadoop Realtime
Rhadoop, RHIPE
rmr

Cassandra
MongoDB, Neo4j
Summary of Data Formats
Data Discovery
Data Sources & Adquisition
Data Integration
Data Fusion
Transformation Enrichment
Data Survey
Google OpenRefine
How much Data?
Using ETL
Dimensionality & Numerosity Reduction
Normalization
Data Scrubbing
Handling Missing Values
Unbiased Estimators
Binning Sparse Values
Feature Extraction
Denoising
Sampling
Stratified Sampling
Principal Component Analysis
MS Excel w/ Analysis ToolPak
Java, Python
R, R Studio, Rattle
Weka, Kmine, RapidMiner
Hadoop Dist of Choice
Spark, Storm
Flume, Scribe, Chukwa
Nutch, Talend, Scraperwiki
Webscraper, Flume, Sqoop
tm, RWeka, NLTK
RHIPE
D3,js, ggplot2, Shiny
IBM Languageware
Cassandra, MongoDB

VIDEO
https://www.youtube.com/watch?v=vIu2vu2UqfM&index=3&list=PLWbnIo7XnOkz1hBLdv_0LwcdwUjvoZL5s
https://www.youtube.com/watch?v=xyAuNHPsq-g&list=PLFD0EB975BA0CC1E0
https://www.youtube.com/watch?v=eyX8BXDSE5I
https://www.youtube.com/watch?v=92S4zgXN17o&list=PL2_aWCzGMAwI3W_JlcBbtYTwiQSsOTa6P
https://www.youtube.com/watch?v=V6mKVRU1evU
https://www.youtube.com/watch?v=N5_sTEhnZKM
https://www.youtube.com/watch?v=P00xJgWzz2c&list=PL89B61F78B552C1AB
https://www.youtube.com/watch?v=cVQUvZh64Y8
https://www.youtube.com/watch?v=bVg442ixI6k
https://www.youtube.com/watch?v=SYIHyXJG29M
https://www.youtube.com/watch?v=qdHqidsGfNU
https://www.youtube.com/watch?v=r_h9yBnNh0U
https://www.youtube.com/watch?v=TroKfazVhwM
https://www.youtube.com/watch?v=VXRhir7GQpQ
https://www.youtube.com/channel/UC5ZAemhQUQuNqW3c9Jkw8ug/videos?flow=list&view=0&sort=dd&live_view=500
https://www.youtube.com/channel/UCjkGzGfgvX_Zd8kxs4ldhFw/videos
https://www.youtube.com/watch?v=nhqOAuARzOM&list=PL1MJdy9N8XJKW4HYg_cOsz1-xRzD2zfsT
https://www.youtube.com/watch?v=ClcqCB4sEEs
https://www.youtube.com/watch?v=wz6XnW9nk4w
https://www.youtube.com/watch?v=bfbD8owP_Bs
https://www.youtube.com/watch?v=VhGIC5Nqd4g
https://www.youtube.com/watch?v=e7Pr1VgPK4w&list=PL_c9BZzLwBRK0Pc28IdvPQizD2mJlgoID
https://www.youtube.com/watch?v=iluA_w0tcqw
https://www.youtube.com/watch?v=xAhfQNTIeOM&list=PLx5CT0AzDJCnO9k98RsrPY9WGAXj8yeKL
https://www.youtube.com/watch?v=r6WO5BUN66k
https://www.youtube.com/watch?v=rhGG-vFZBy0
https://www.youtube.com/watch?v=Pgza1vS1lic
https://www.youtube.com/watch?v=JV85enI7jXs
https://www.youtube.com/watch?v=kyGVhx5LwXw
https://www.youtube.com/watch?v=4Z9KEBexzcM&list=PL1LIXLIF50uXWJ9alDSXClzNCMynac38g

Das könnte Ihnen auch gefallen