Beruflich Dokumente
Kultur Dokumente
FUNDAMENTALS
Matrices & Linear Algebra Fundamentals
Hash Functions, Binary Tree, O(n)
Relational Algebra, DB Basics
Inner, Outer, Cross, Theta Join
Tabular Data
Sharding
OLAP
Multidimensional Data Model
ETL
Reporting Vs BI Vs Analytics
JSON & XML
NoSQL
Regex
Vendor Landscape
Env Setup
2. STATISTICS
Euclidean Distance
{
{
Classification
Regression
Clustering
4.MACHINE LEARNING
What is ML?
Numerical Var
Categorical Var
Supervised Learning
Unsupervised Learning
Concepts, Inputs & Attributes
Training & Test Data
Classifier
Prediction
Lift
Overfitting
Bias & Variance
Trees & Classification
Classification Rate
Decision Trees
Boosting
Nave Bayes Classifiers
K-Nearest Neighbor
Logistic Regression
Ranking
Linear Regression
Perceptron
Hierarchical Clustering
K-means Clustering
Neural Networks
Sentiment Analysis
Collaborative Filtering
Tagging
3.PROGRAMMING
Python Basics
Working in Excel
R Setup RStudio
R Basics
Expressions
8.DATA INGESTION
IBM SPSS
Variables
Data Discovery
Rapid Miner
Vectors
Data Integration
Matrices
Data Fusion
Arrays
Transformation Enrichment
Factors
Data Survey
Lists
Google OpenRefine
Data Frames
Using ETL
M/R Programming
Sqoop: Loading Data in HDSF
Flume, Scribe: For Unstruct Data
SQL with Pig
DWH with Hive
Scribe, Chukwa for Weblog
Using Mahout
Zookeeper Avro
Storm: Hadoop Realtime
Rhadoop, RHIPE
rmr
Cassandra
MongoDB, Neo4j
9.DATA MUNGING
Dimensionality & Numerosity Reduction
Normalization
Data Scrubbing
Handling Missing Values
Unbiased Estimators
Binning Sparse Values
Feature Extraction
Denoising
Sampling
Stratified Sampling
Principal Component Analysis
10.TOOLBOX
MS Excel w/ Analysis ToolPak
Java, Python
R, R Studio, Rattle
Weka, Kmine, RapidMiner
Hadoop Dist of Choice
Spark, Storm
Flume, Scribe, Chukwa
Nutch, Talend, Scraperwiki
Webscraper, Flume, Sqoop
tm, RWeka, NLTK
RHIPE
D3,js, ggplot2, Shiny
IBM Languageware
Cassandra, MongoDB
1. FUNDAMENTALS
Matrices & Linear Algebra Fundamentals
Hash Functions, Binary Tree, O(n)
Relational Algebra, DB Basics
Inner, Outer, Cross, Theta Join
Tabular Data
Sharding
OLAP
Multidimensional Data Model
ETL
Reporting Vs BI Vs Analytics
JSON & XML
NoSQL
Regex
Vendor Landscape
Env Setup
6.VISUALIZATION
Data Exploration in R (Hist, Boxplot, etc)
Uni, Bi & Multivariate Viz
ggplot2
Histogram & Pie (Uni)
Tree & Tree Map
Scatter Plot (Bi)
Line Charts (Bi)
Spatial Charts
Survey Plot
Timeline
Decision Tree
D3.js
InfoVis
IBM ManyEyes
Tableau
2. STATISTICS
Pick a Dataset (UCI Repo)
Descriptive Statistics (mean, median, range, SD, Var)
Exploratory Data Analysis
Histograms
Percentiles & Outliers
Probability Theory
Bayes Theorem
Random Variables
Cumul Dist Fn (CDF)
Continuos Distributions (Normal, Poisson, Gaussian)
Skewness
ANOVA
Prob Den Fn (PDF)
Central Limit Theorem
Monte Carlo Method
Hypothesis Testing
p-Value
Chi2 Test
Estimation
Confid Int (CI)
MLE
Kernel Density Estimate
Regression
Covariance
Correlation
Pearson Coeff
Causation
Least2 Fit
Euclidean Distance
7.BIG DATA
Map Reduce Fundamentals
Hadoop Components
Data Replication Principles
Setup Hadoop (IBM / Cloudera / HortonWorks)
Name & Data Nodes
Job & Task Tracker
M/R Programming
Sqoop: Loading Data in HDSF
Flume, Scribe: For Unstruct Data
SQL with Pig
DWH with Hive
Scribe, Chukwa for Weblog
Using Mahout
Zookeeper Avro
Storm: Hadoop Realtime
Rhadoop, RHIPE
rmr
Cassandra
MongoDB, Neo4j
3.PROGRAMMING
4.MACHINE LEARNING
Python Basics
What is ML?
Working in Excel
Numerical Var
R Setup RStudio
Categorical Var
R Basics
Supervised Learning
Expressions
Unsupervised Learning
IBM SPSS
Variables
Rapid Miner
Classifier
Vectors
Prediction
Matrices
Lift
Arrays
Overfitting
Factors
Lists
Data Frames
Classification: Boosting
Subsetting Data
Functions
Factor Analysis
Regression: Ranking
Install Pkgs
8.DATA INGESTION
9.DATA MUNGING
Data Discovery
Normalization
Data Scrubbing
Handling Missing Values
Unbiased Estimators
Binning Sparse Values
Feature Extraction
Denoising
Sampling
Stratified Sampling
Principal Component Analysis
10.TOOLBOX
MS Excel w/ Analysis ToolPak
Java, Python
R, R Studio, Rattle
Weka, Kmine, RapidMiner
Hadoop Dist of Choice
Spark, Storm
Flume, Scribe, Chukwa
Nutch, Talend, Scraperwiki
Webscraper, Flume, Sqoop
tm, RWeka, NLTK
RHIPE
D3,js, ggplot2, Shiny
IBM Languageware
Cassandra, MongoDB
References
LEVEL
NUMBER
01. FUNDAMENTALS
01. FUNDAMENTALS
01. FUNDAMENTALS
01. FUNDAMENTALS
01. FUNDAMENTALS
01. FUNDAMENTALS
01. FUNDAMENTALS
01. FUNDAMENTALS
01. FUNDAMENTALS
01. FUNDAMENTALS
01. FUNDAMENTALS
01. FUNDAMENTALS
01. FUNDAMENTALS
01. FUNDAMENTALS
01. FUNDAMENTALS
02. STATISTICS
02. STATISTICS
02. STATISTICS
02. STATISTICS
02. STATISTICS
02. STATISTICS
02. STATISTICS
02. STATISTICS
02. STATISTICS
02. STATISTICS
02. STATISTICS
02. STATISTICS
6
7
8
9
10
11
12
13
14
15
1
2
3
4
5
6
7
8
9
10
11
12
02. STATISTICS
02. STATISTICS
02. STATISTICS
02. STATISTICS
02. STATISTICS
02. STATISTICS
02. STATISTICS
02. STATISTICS
02. STATISTICS
02. STATISTICS
02. STATISTICS
02. STATISTICS
02. STATISTICS
02. STATISTICS
02. STATISTICS
02. STATISTICS
02. STATISTICS
03.PROGRAMMING
03.PROGRAMMING
03.PROGRAMMING
03.PROGRAMMING
03.PROGRAMMING
03.PROGRAMMING
03.PROGRAMMING
03.PROGRAMMING
03.PROGRAMMING
03.PROGRAMMING
03.PROGRAMMING
03.PROGRAMMING
03.PROGRAMMING
03.PROGRAMMING
03.PROGRAMMING
03.PROGRAMMING
03.PROGRAMMING
03.PROGRAMMING
03.PROGRAMMING
03.PROGRAMMING
03.PROGRAMMING
04.MACHINE LEARNING
04.MACHINE LEARNING
04.MACHINE LEARNING
04.MACHINE LEARNING
04.MACHINE LEARNING
04.MACHINE LEARNING
04.MACHINE LEARNING
04.MACHINE LEARNING
04.MACHINE LEARNING
04.MACHINE LEARNING
04.MACHINE LEARNING
04.MACHINE LEARNING
04.MACHINE LEARNING
04.MACHINE LEARNING
04.MACHINE LEARNING
04.MACHINE LEARNING
04.MACHINE LEARNING
04.MACHINE LEARNING
04.MACHINE LEARNING
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
04.MACHINE LEARNING
04.MACHINE LEARNING
04.MACHINE LEARNING
04.MACHINE LEARNING
04.MACHINE LEARNING
04.MACHINE LEARNING
04.MACHINE LEARNING
04.MACHINE LEARNING
04.MACHINE LEARNING
05.TEXT MINING / NLP
05.TEXT MINING / NLP
05.TEXT MINING / NLP
05.TEXT MINING / NLP
05.TEXT MINING / NLP
05.TEXT MINING / NLP
05.TEXT MINING / NLP
05.TEXT MINING / NLP
05.TEXT MINING / NLP
05.TEXT MINING / NLP
05.TEXT MINING / NLP
05.TEXT MINING / NLP
05.TEXT MINING / NLP
05.TEXT MINING / NLP
05.TEXT MINING / NLP
06.VISUALIZATION
06.VISUALIZATION
06.VISUALIZATION
06.VISUALIZATION
06.VISUALIZATION
06.VISUALIZATION
06.VISUALIZATION
06.VISUALIZATION
06.VISUALIZATION
06.VISUALIZATION
06.VISUALIZATION
06.VISUALIZATION
06.VISUALIZATION
06.VISUALIZATION
06.VISUALIZATION
07.BIG DATA
20
21
22
23
24
25
26
27
28
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
1
07.BIG DATA
07.BIG DATA
07.BIG DATA
07.BIG DATA
07.BIG DATA
07.BIG DATA
07.BIG DATA
07.BIG DATA
07.BIG DATA
07.BIG DATA
07.BIG DATA
07.BIG DATA
07.BIG DATA
07.BIG DATA
07.BIG DATA
07.BIG DATA
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
07.BIG DATA
07.BIG DATA
08.DATA INGESTION
08.DATA INGESTION
08.DATA INGESTION
08.DATA INGESTION
08.DATA INGESTION
08.DATA INGESTION
08.DATA INGESTION
08.DATA INGESTION
08.DATA INGESTION
08.DATA INGESTION
09.DATA MUNGING
09.DATA MUNGING
09.DATA MUNGING
09.DATA MUNGING
09.DATA MUNGING
09.DATA MUNGING
09.DATA MUNGING
09.DATA MUNGING
09.DATA MUNGING
09.DATA MUNGING
09.DATA MUNGING
10.TOOLBOX
10.TOOLBOX
10.TOOLBOX
10.TOOLBOX
10.TOOLBOX
10.TOOLBOX
10.TOOLBOX
10.TOOLBOX
10.TOOLBOX
10.TOOLBOX
10.TOOLBOX
10.TOOLBOX
10.TOOLBOX
10.TOOLBOX
18
19
1
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
10
11
1
2
3
4
5
6
7
8
9
10
11
12
13
14
https://www.experfy.com/blog/become-data-scientist/
COURSE
Matrices & Linear Algebra Fundamentals
Tabular Data
Sharding
OLAP
Multidimensional Data Model
ETL
Reporting Vs BI Vs Analytics
JSON & XML
NoSQL
Regex
Vendor Landscape
Env Setup
Pick a Dataset (UCI Repo)
Descriptive Statistics (mean, median, range, SD, Var)
Exploratory Data Analysis
Histograms
Percentiles & Outliers
Probability Theory
Bayes Theorem
Random Variables
Cumul Dist Fn (CDF)
Continuos Distributions (Normal, Poisson, Gaussian)
Skewness
ANOVA
Regression: Ranking
Regression: Linear Regression
Regression: Perceptron
Clustering: Hierarchical Clustering
Clustering: K-means Clustering
Neural Networks
Sentiment Analysis
Collaborative Filtering
Tagging
Corpus
Name Entity Recognition
Text Analysis
UIMA
Term Document Matrix
Term Frequency & Weight
Support Vector Machines
Association Rules
Market Base Analysis
Feature Extraction
Using Mahout
Using Weka
Using NLTK
Classify Text
Vocabulary Mapping
Data Exploration in R (Hist, Boxplot, etc)
Uni, Bi & Multivariate Viz
ggplot2
Histogram & Pie (Uni)
Tree & Tree Map
Scatter Plot (Bi)
Line Charts (Bi)
Spatial Charts
Survey Plot
Timeline
Decision Tree
D3.js
InfoVis
IBM ManyEyes
Tableau
Map Reduce Fundamentals
Hadoop Components
Data Replication Principles
Setup Hadoop (IBM / Cloudera / HortonWorks)
Name & Data Nodes
Job & Task Tracker
M/R Programming
Sqoop: Loading Data in HDSF
Flume, Scribe: For Unstruct Data
SQL with Pig
DWH with Hive
Scribe, Chukwa for Weblog
Using Mahout
Zookeeper Avro
Storm: Hadoop Realtime
Rhadoop, RHIPE
rmr
Cassandra
MongoDB, Neo4j
Summary of Data Formats
Data Discovery
Data Sources & Adquisition
Data Integration
Data Fusion
Transformation Enrichment
Data Survey
Google OpenRefine
How much Data?
Using ETL
Dimensionality & Numerosity Reduction
Normalization
Data Scrubbing
Handling Missing Values
Unbiased Estimators
Binning Sparse Values
Feature Extraction
Denoising
Sampling
Stratified Sampling
Principal Component Analysis
MS Excel w/ Analysis ToolPak
Java, Python
R, R Studio, Rattle
Weka, Kmine, RapidMiner
Hadoop Dist of Choice
Spark, Storm
Flume, Scribe, Chukwa
Nutch, Talend, Scraperwiki
Webscraper, Flume, Sqoop
tm, RWeka, NLTK
RHIPE
D3,js, ggplot2, Shiny
IBM Languageware
Cassandra, MongoDB
VIDEO
https://www.youtube.com/watch?v=vIu2vu2UqfM&index=3&list=PLWbnIo7XnOkz1hBLdv_0LwcdwUjvoZL5s
https://www.youtube.com/watch?v=xyAuNHPsq-g&list=PLFD0EB975BA0CC1E0
https://www.youtube.com/watch?v=eyX8BXDSE5I
https://www.youtube.com/watch?v=92S4zgXN17o&list=PL2_aWCzGMAwI3W_JlcBbtYTwiQSsOTa6P
https://www.youtube.com/watch?v=V6mKVRU1evU
https://www.youtube.com/watch?v=N5_sTEhnZKM
https://www.youtube.com/watch?v=P00xJgWzz2c&list=PL89B61F78B552C1AB
https://www.youtube.com/watch?v=cVQUvZh64Y8
https://www.youtube.com/watch?v=bVg442ixI6k
https://www.youtube.com/watch?v=SYIHyXJG29M
https://www.youtube.com/watch?v=qdHqidsGfNU
https://www.youtube.com/watch?v=r_h9yBnNh0U
https://www.youtube.com/watch?v=TroKfazVhwM
https://www.youtube.com/watch?v=VXRhir7GQpQ
https://www.youtube.com/channel/UC5ZAemhQUQuNqW3c9Jkw8ug/videos?flow=list&view=0&sort=dd&live_view=500
https://www.youtube.com/channel/UCjkGzGfgvX_Zd8kxs4ldhFw/videos
https://www.youtube.com/watch?v=nhqOAuARzOM&list=PL1MJdy9N8XJKW4HYg_cOsz1-xRzD2zfsT
https://www.youtube.com/watch?v=ClcqCB4sEEs
https://www.youtube.com/watch?v=wz6XnW9nk4w
https://www.youtube.com/watch?v=bfbD8owP_Bs
https://www.youtube.com/watch?v=VhGIC5Nqd4g
https://www.youtube.com/watch?v=e7Pr1VgPK4w&list=PL_c9BZzLwBRK0Pc28IdvPQizD2mJlgoID
https://www.youtube.com/watch?v=iluA_w0tcqw
https://www.youtube.com/watch?v=xAhfQNTIeOM&list=PLx5CT0AzDJCnO9k98RsrPY9WGAXj8yeKL
https://www.youtube.com/watch?v=r6WO5BUN66k
https://www.youtube.com/watch?v=rhGG-vFZBy0
https://www.youtube.com/watch?v=Pgza1vS1lic
https://www.youtube.com/watch?v=JV85enI7jXs
https://www.youtube.com/watch?v=kyGVhx5LwXw
https://www.youtube.com/watch?v=4Z9KEBexzcM&list=PL1LIXLIF50uXWJ9alDSXClzNCMynac38g