Sie sind auf Seite 1von 1

6/12/2014

Data Science Ontology

Data Science Ontology


click to expand or collapse
click on terminal nodes to see wiki
k-Nearest
Neighbor
(kNN)
Learning Vector
Quantization
(LVQ)
Naive
Bayes
(NB)
Bayesian Belief Network (BBN)
C4.5/5.0
Classification
and Regression Tree (CART)
M5'
AlgorithmAutomatic
Chi-squared
Interaction Detection (CHAID)
Decision
Stump
Multivariate
Adaptive
Regression
Splines (MARS)
Gradient Boosting Machines
(GBM)
Incremental
Reduced
Error
Pruning
(IREP) Error Reduction (RIPPER)
Repeated Incremental Pruning to Produce
Multiple
Linear Regression
Logistic
Regression
Stepwise
Regression
Ridge
Regression
Least
Absolute
Shrinkage
and Selection
(LASSO)
Multivariate
Adaptive
Regression
SplinesOperator
(MARS)
Locally Estimated
Scatterplot
Smoothing
(LOESS)
Multilayer
Feedforward
Network
Perceptron
Back-Propagation
Hopfield Network
Self-Organizing
Map (SOM)
Restricted
Machine
Deep BeliefBoltzmann
Networks
(DBN) (RBM)
Convolutional
Network
Stacked Auto-encoders
Support
Vector
Machines
Radial
Basis
Function
(RBF) (LDA)
Linear Discriminant
Analysis
Apriori
Algorithm
Eclat Algorithm
Single
Linkage
Hierarchical Clustering
Complete
Linkage
Average Linkage
k-Means
k-Medoids
k-Medians
K-Means++
Fuzzy C-Means
Expectation-Maximization (EM) Algorithm
DBSCAN
OPTICS
EnDBSCAN
Mean-shift
CLIQUE
BIRCH
CLARANS
BSAS
DaviesBouldin
index
Dunn
indexcoefficient
Silhouette
Rand
Measure
F-measure
Jaccard
index
FowlkesMallows
Bagging
Mutual Informationindex
Boosting
Random Forests

Instance-Based Methods
Probabilistic Classifiers
Decision Trees
Rule Learners
Regression
Learning Algorithms

Artificial Neural Networks


Deep Learning Neural Nets
Kernel Methods
Association Rules

Clustering

Model Validation
Cluster Validity
Model Performance

Meta Learning

Data Visualization
Relational
Databases
NoSQL

Data Science

Paradigms
Platforms
Resource Managers
Production

Architectures/Frameworks
Agnostic Specifications
Distributed ML Libraries
Cloud

Programming Languages
Data Cleaning
Data Preparation

Dimensionality Reduction

Statistics
Web Frameworks
Development
Business Acumen

http://www.datascienceontology.com/

Visualization
Version Control

Connectivity Models
Centroid Models
Distribution Models
Density Models
Subspace Models
Confusion
Matrix
Performant
Kappa
Statistic
Sensitivity
andRecall
Specificity
Precision
and
ROC Curves
Internal Evaluation
Cross
Validation
Bootstrap Sampling
External Evaluation
Automated Parameter
Customized
Tuning Tuning
Regularization
Stacked Generalization
Gradient
Boosting Machines (GBM)
Line
Chart
Bar Chart
Histogram
Scatterplot
Boxplot
Pareto
Chart
Pie
Chart
Area
Chart
Control
Chart
Run
Chart
Stem-and-Leaf Display
Cartogram
Microsoft
SQL Server
Sparkline
MySQL
Table
SQLite
PostgreSQL
Netezza
SQL
Azure
EnterpriseDB
DB2
Oracle
Key-Value Store
Document Store
Column Family Stores
Graph
Batch
Real-Time/Streaming
Mixed

R
Python
Scala
Julia
Java
Clojure
Type
Conversion
Character
Manipulation
Character
Encoding
Missing
Values
Special
Values
Outliers
Inconsistencies
Error Localization
Transformation
Deductive
Correction
Imputation
Minimal Value Adjustment
Feature
Selection
Feature Engineering
Hypothesis
Testing
P-Value
Effect
Size Interval
Confidence
Meta
Analysis
Heteroskedasticity
Benford's
Law
Multiple Hypothesis
Testing
Familywise
ErrorRate
Rate
False
Discovery
Covariance
Correlation
Frequentist
Approaches
R
Bayesian
Approaches
Java
PHP
Ruby
Python
Javascript
CSS

MapReduce
Dataflow
Pig
Hive
YARN

PMML
Mahout
MLlib
GraphLab
AWS/EC2
GCE

D3
GitHub
Strong
Communication
Simplification
of ComplextoConcepts
Alignment
of
Algorithms
Pain
Points
Augmenting
Organizational
Values
Boosting
Existing
Helping Build
DataEmployee
Culture Skills

DynamoDB
Riak
Redis
Berkeley
DB
Voldemort
MemcacheDB
ArangoDB
MongoDB
CouchDB
RavenDB
RaptorDB
HBase
Cassandra
Hypertable
Accumulo
Neo4J
Infinite
Graph
HyperGraphDB
OrientDB
Hadoop
Spark
Storm
Apache Kafka
Lambda
Apache
Naiad Samza

Shiny
Foundation
Framework
Twitter
Bootstrap
Yahoo Pure

Sean McClure
Data Scientist, ThoughtWorks

1/1