Sie sind auf Seite 1von 17

Inspire…Educate…Transform.

Machine Learning – Review


and Summary

Dr. Manoj Chinnakotla


Mentor, INSOFE
Senior Applied Scientist, Microsoft
Adjunct Professor, IIIT Hyderabad

The best place for students to learn Applied Engineering http://www.insofe.edu.in


The Model Building Process

The best place for students to learn Applied Engineering 2 http://www.insofe.edu.in


Basic Classification of ML Problems

Machine
Learning
Training Data with Data without Labe
Labels
Supervised Unsupervised

Dimensionali
Sequence
Classification Regression Clustering ty
Learning
Reduction

The best place for students to learn Applied Engineering 3 http://www.insofe.edu.in


The Model Building Process
Results Known
+ Model
+ Training set Builder
-
-
Data +
Evaluate
Model Builder
Predictions
# +
# Validation set performance
-
Y N + is used for tuning parameters!
Validation set -

+
- Final Evaluation
+
Final Test Set Model -

The test data shouldn’t be used for any parameter tuning!

The best place for students to learn Applied Engineering 4 http://www.insofe.edu.in


Clustering
• Belongs to the paradigm of “Unsupervised Learning”
– The task of discovering intrinsic patterns from data
without any supervision
• Depending on the specific objective to be optimized
and assumptions made about data, there are many
clustering algorithms proposed in literature
• Some clustering algorithms we discussed:
– K-Means
– Agglomerative Clustering
– Expectation Maximization (EM)
– Spectral Clustering (non-spherical clusters)
• We studied practical issues while using the above
algorithms
• We studied the notion of cluster evaluation

The best place for students to learn Applied Engineering 5 http://www.insofe.edu.in


Decision Trees
• Decision Trees are very intuitive and explainable
models
• Used for both classification as well as regression
tasks
• Works well in practice for many applications
• Issues while learning Decision Trees
– Splits – binary, multi-way
– Split criteria – entropy, info gain, gini, …
– Missing value treatment
– Pruning
– Rule extraction from trees
• Both C4.5 and CART are robust tools
The best place for students to learn Applied Engineering 6 http://www.insofe.edu.in
Support Vector Machines (SVMs)

• SVMs work very well in practice for a large class of


classification problems
• SVMs work on the principle of learning a maximum
margin hyperplane which results in good generalization
• The basic linear SVM formulation could be extended to
handle noisy and non-separable data
• The Kernel Trick could be used to learn complex non-
linear patterns
• For better performance, one has to tune the SVM
parameters such as C, kernel parameters using validation
set

The best place for students to learn Applied Engineering 7 http://www.insofe.edu.in


Rule Learning
• Association rule mining has been extensively
studied in the data mining community.
• There are many efficient algorithms and model
variations.
• Based on the concept of support and confidence
• The Apriori algorithm is one of the best known
algorithms
– Finds all itemsets with a minimum support
• The basic notion of single support could be
extended to multiple minimum item supports
• Frequent-Pattern Tree (FP-Tree) data-structure
allows efficient mining of frequent patterns
without explicit candidate generation
The best place for students to learn Applied Engineering 8 http://www.insofe.edu.in
KNNs
• kNN is an example of “Instance Based Learning”
• Conceptually simple, yet able to solve complex
problems
• Can work with relatively little information
• Learning is simple (no learning at all!)
• Suffers from the curse of dimensionality
– Sensitive to representation
– Feature selection and weighting extremely important
• For practical applications, need to use data
structures to speed up retrieval of “close”
neighbours
The best place for students to learn Applied Engineering 9 http://www.insofe.edu.in
Collaborative Filtering (CF)
• The most prominent approach to generate recommendations
• Basic Idea
– Use the "wisdom of the crowd" to recommend items
– Application of KNNs where closeness is applied on users and
items
• Some variation of CF models
– User based vs. Item based
– Model based vs. Memory based
• Need to deal with a lot of practical issues such as cold-start,
deciding on rating scales etc.
• Offline evaluation usually done using MAE, RMSE
• Online evaluation based on business specific objectives – Click-
through rates, Ad conversion rates, User dwell time etc.

The best place for students to learn Applied Engineering 10 http://www.insofe.edu.in


Artificial Neural Networks (ANNs)
• ANN is a computational model inspired from the
workings of the human brain
• Although a perceptron can simply represent linear
functions, multiple layers of perceptrons can
represent arbitrary complex functions
• The Back Propagation algorithm can be used to learn
the parameters in a multi-layered feed forward neural
network
• The various parameters of a feed forward ANN such
as learning rate, number of hidden layers, initial
weight vectors need to be carefully chosen
• An ANN allows learning of deep feature
representations from the original training data

The best place for students to learn Applied Engineering 11 http://www.insofe.edu.in


Ensemble Models
• Ensemble model improves accuracy and robustness over single model
methods
• They reduce the bias and variance when compared to individual
models
• Generalization error for any model is strongly related to bias and
variance
• Bagging
• Create ensembles by repeatedly random sampling with replacement from training
data
• Mainly a variance reduction technique
• Boosting
• Boost a set of weak learners by sequentially making mis-classified records more
important
• Gradient Boosting generalizes the boosting technique for arbitrary loss functions
• Mainly a bias reduction technique
The best place for students to learn Applied Engineering 12 http://www.insofe.edu.in
Dimensionality Reduction (DR)
• DR is a way to project data onto lower-
dimensional space to avoid the curse of
dimensionality
• PCA is a most commonly used technique for
DR
• It has wide applications such as
compression, data visualization, feature
selection etc.
• SVD is a factorization of a real non-square
matrix which produces a low-rank
approximation of the original matrix.
The best place for students to learn Applied Engineering 13 http://www.insofe.edu.in
The Machine Learning Matrix
(Supervised Learning)
Sparse data Dense data
Moderate noise Moderate noise
Low noise or outlier Noise and outlier Low noise or outlier Noise and outlier
SVM Good Soft margin SVM Good Not good
Neural net Moderate Good
K-Nearest
Neighbor
Logistic
regression
Not good if all
attributes are
Decision Tree continuous

Fill in good, moderate, not good in above table. Feel free to add comments.

The best place for students to learn Applied Engineering 14 http://www.insofe.edu.in


The Machine Learning Matrix
(Supervised Learning)
Sparse data Dense data
Moderate noise Moderate noise
Low noise or outlier Noise and outlier Low noise or outlier Noise and outlier
Good, only if Soft margin SVM
majority data is (Try hold out
SVM numeric analysis) Not good Good Soft margin SVM Not good
Moderate, only if
majority data is
Neural net numeric Moderate Moderate Good Good Good
K-Nearest
Neighbor Good Moderate Depends on K Good Moderate Depends on K
Logistic
regression Moderate Not good Not good Good Moderate Moderate
Not good if all
Good, but only if Not good (Pruning attributes are
majority attributes may work, do continuous,
Decision Tree are categorical holdout analysis) Not good Otherwise good Moderate Moderate

Fill in good, moderate, not good in above table. Feel free to add comments.

The best place for students to learn Applied Engineering 15 http://www.insofe.edu.in


Cheat Sheet for ML

The best place for students to learn Applied Engineering 16 http://www.insofe.edu.in


International School of Engineering
2-56/2/19, Khanamet, Madhapur, Hyderabad - 500 081

For Individuals: +91-9177585755 or 040-65743991


For Corporates: +91-7893866005

Web: http://www.insofe.edu.in
Facebook: www.facebook.com/insofe/
LinkedIn: http://www.linkedin.com/groups/Big-Data-Analytics-
Hadoop-Hyderabad-4488721?trk=myg_ugrp_ovr
YouTube: http://www.youtube.com/InsofeVideos
SlideShare: http://www.slideshare.net/INSOFE

This presentation may contain references to findings of various reports available in the public domain. INSOFE makes no representation as to their accuracy or that the organization
subscribes to those findings.

The best place for students to learn Applied Engineering 17 http://www.insofe.edu.in

Das könnte Ihnen auch gefallen