You are on page 1of 40

Machine Learning and Data Mining

Data Mining and Machine Learning


• Data Mining
Data Mining is the process of extraction of interesting (non-
trivial, implicit, previously unknown and potentially useful)
patterns or knowledge from massive amount of data.
– “mine for specific data” from a large data sets.
• Machine Learning
Machine learning is the science of getting computers to act
(learn) without being explicitly programmed from a given set
of data to achieve a desirable outcome.
– a machine that learns on its own

2
• Statistics quantifies numbers
• Data Mining explains patterns
• Machine Learning predicts with models
• Artificial Intelligence behaves and reasons

3
Machine learning
• We probably used machine learning dozens of
times a day without knowing it.
• Examples: Self-driving cars, speech recognition,
web search

4
Why Machine Learning ?
• Learning is used when:
– Human expertise does not exist (navigating on Mars),
– Humans are unable to explain their expertise (speech
recognition)
– Solution changes in time (routing on a computer
network)
– Solution needs to be adapted to particular cases (user
biometrics)
• Develop systems that can automatically adapt and
customize themselves to individual users.
• Discover new knowledge from large databases

5
Type of Data
• Relational Data (Tables/Transaction/Legacy
Data)
• Text Data (Web)
• Semi-structured Data (XML)
• Graph Data
– Social Network, Semantic Web, …
• Streaming Data
– Network traffic, sensor data,…

6
7
Open Data
• A wide variety of datasets made available by
governments and other organizations
– Open Data Institute (theodi.org)
– Data.gov
– Data.gov.uk
• Data includes
– Housing
– Healthcare
– Travel
– Finance
– Geography

8
Data-driven Discovery in Science

9
Large Synoptic Survey Telescope (LSST)
This telescope will produce the deepest, widest, image of the
Universe:
• 27-ft (8.4-m) mirror, the width of a singles tennis court
• 3200 megapixel camera
• Each image the size of 40 full moons
• 37 billion stars and galaxies
• 10 year survey of the sky
• 10 million alerts, 1000 pairs of exposures,
15 Terabytes of data .. every night!

10
Square Kilometre Array (SKA)
• Exploring the Universe with the world's largest
radio telescope

11
The Large Hadron Collider (LHC)
• The Large Hadron Collider (LHC) is the world’s
largest and most powerful particle accelerator.

12
The Era of e-Science and Big Data

13
Machine Learning: A Definition
A computer program is said to learn from experience E with
respect to some class of tasks T and performance measure
P, if its performance at tasks in T, as measured by P,
improves with experience E.

14
Advanced Computing Infrastructure
• Large scale, distributed, heterogeneous,
multicore/manycore, accelerators, deep storage
hierarchies, experimental systems ….

15
World fastest Super Computer
• Sunway TaihuLight
• 10,649,600 CPU cores across the entire system
• 25 PF / 10.65 M cores
• 1.31 PB memory
Current Big Data Market

17
Source: Forbes.com
18
Resources: Datasets
• UCI Repository:
http://www.ics.uci.edu/~mlearn/MLRepository.html
• UCI KDD Archive:
http://kdd.ics.uci.edu/summary.data.application.html
• LIBSVM datasets:
https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/
• Statlib: http://lib.stat.cmu.edu/
• Delve: http://www.cs.utoronto.ca/~delve/

19
Resources: Journals
• Journal of Machine Learning Research
www.jmlr.org
• Machine Learning
• Pattern Recognition Letters, Pattern Recognition
• IEEE Transactions on Neural Networks
• IEEE Transactions on Pattern Analysis and
Machine Intelligence
• Annals of Statistics
• Journal of the American Statistical Association
• ...
20
Resources: Conferences
• International Conference on Machine Learning (ICML)
• European Conference on Machine Learning (ECML)
• Neural Information Processing Systems (NIPS)
• Computational Learning
• International Joint Conference on Artificial Intelligence (IJCAI)
• ACM SIGKDD Conference on Knowledge Discovery and Data Mining
(KDD)
• IEEE Int. Conf. on Data Mining (ICDM)
• Association for the Advancement of Artificial Intelligence (AAAI)
• Uncertainty in Artificial Intelligence (UAI)
• …

21
Techniques
• Association Mining
• Supervised Learning
– Classification,
– Regression, etc
• Unsupervised Learning
• Reinforcement Learning

22
Learning Association
• Market basket analysis:
P (Y | X ) probability that somebody who buys
X also buys Y where X and Y are
products/services.

Example: P ( Bread| Butter ) = 0.7

23
Supervised Learning: Uses
• Prediction of future cases: Use the rule to
predict the output of future inputs
• Knowledge extraction: The rule is easy to
understand
• Outlier detection: Exceptions not covered by
the rule, e.g., fraud

24
Classification
• Example: Credit
scoring
• Differentiating
between low-risk
and high-risk
customers from
their income and
savings

Discriminant: IF income > θ1 AND savings > θ2


THEN low-risk ELSE high-risk 25
Classification: Applications
• Face recognition: Pose,make-up, hair style
• Character recognition: Different handwriting
styles.
• Medical diagnosis: From symptoms to
illnesses
• Biometrics: Recognition/authentication using
physical and/or behavioral characteristics:
Face, iris, signature, etc.

26
Regression
• Example: Price of a
used car
• x : car attributes
y = wx+w0
y : price
y = g (x | q )
g ( ) model,
q parameters

27
Unsupervised Learning
• Learning “what normally happens”
• No output
• Clustering: Grouping similar instances
• Example applications
– Customer segmentation in CRM
– Image compression: Color quantization

28
Reinforcement Learning
• It is the problem of getting an agent to act in the
world so as to maximize its rewards.
• For example, consider teaching a dog a new trick:
you cannot tell it what to do, but you can
reward/punish it if it does the right/wrong thing.
• Game playing
• Robot in a maze

29
Common data mining tasks

• Description
• Estimation
• Prediction
• Classification
• Clustering
• Association

30
Description
• Find ways to describe patterns and trends lying
within data.
• For example, a pollster (a person who conducts
or analyses opinion polls) may uncover
evidence that those who have been laid off are
less likely to support the present prime minster
in the election.
• Decision trees provide an intuitive and human
friendly explanation of their results

31
Estimation
• Estimation is similar to classification except that the target
variable is numerical rather than categorical (divided into
groups).
• For example, we might be interested in estimating the
systolic blood pressure reading of a hospital patient, based
on the patient’s age, gender, body-mass index, and blood
sodium levels.
• Estimation model can be used to new cases.
• Linear Regression, Neural networks

32
Estimation Examples
• Estimating the amount of money a randomly chosen family
of four will spend for back-to-school shopping this fall.
• Estimating the percentage decrease in rotary-movement
sustained by a National Football League running back with a
knee injury.
• Estimating the Grade-Point Average (CGPA) of a graduate
student, based on that student’s undergraduate CGPA.

33
Prediction
• Prediction is similar to classification and estimation, except
that for prediction, the results lie in the future.
– Predicting the price of a stock three months into the
future
– Predicting the percentage increase in traffic deaths next
year if the speed limit is increased
– Predicting whether a particular molecule in drug
discovery will lead to a profitable new drug for a
pharmaceutical company

34
Classification
• In classification, there is a target categorical variable,
such as income bracket which for example, could be
partitioned into different classes or categories:
– High income,
– Middle income, and
– Low income.

35
Clustering
• Clustering refers to the grouping of records,
observations, or cases into classes of similar objects

36
Association
• The association task for data mining is the job of finding
which attributes “go together”
• Finding out which items in a supermarket are purchased
together and which items are never purchased together
• Apriori algorithm

37
Identify the relevant data mining task
• As a dimension-reduction tool when the data set has
hundreds of attributes
– Clustering
• The Boston Celtics would like to approximate how
many points their next opponent will score against
them.
– Estimation: estimating the number of points
(numeric target).

38
Identify the relevant data mining task
Cont’d
• A military intelligence officer is interested in learning
about the respective proportions of Sunnis and Shias
in a particular strategic region.
– Exploratory data analysis finds similarities and
differences between the Sunni and Shias
proportions

39
Identify the relevant data mining task
Cont’d
• A political strategist is seeking the best groups to
canvass for donations in a county.
– Clustering: examine the profile of each
homogeneous group derived from a particular
county’s population;
– Association: discover interesting rules pertaining
to a large proportion of the population

40