Sie sind auf Seite 1von 34

Lecture Slides for

INTRODUCTION
TO
MACHİNE
LEARNİNG
3RD EDİTİON
ETHEM ALPAYDIN
© The MIT Press, 2014

alpaydin@boun.edu.tr
http://www.cmpe.boun.edu.tr/~ethem/i2ml3e
CHAPTER 1:

INTRODUCTİON
Machine Learning
3

 Data is continuously getting bigger


 Theory to process it, & turn it into knowledge
 Everyday science – astronomy to biology, everyday
life
 Scientific or personal data
 Smart people – make use of that data – useful product
or service
 Structure to data – not just numbers, character
strings – images, video, audio, documents, web
pages, click logs, graphs
Data
4

 Away from parametric assumptions – normality


 Dynamic – time dimension
 Multi-view observations – differenct sensors & modalities

 Complex & voluminous data


 Simple explanation
 Millions of customers – buy thousands of products online
or from their local supermarket
 Pattern to this data
 People do not shop at random
 Throwing a party, baby at home
 Hidden factors that explain custom behavior
 Infer this hidden model from observed data
Big Data
5

 Widespread use of personal computers and wireless


communication leads to “big data”
 Buy a product, rent a movie, visit a webpage, write a blog,
post on the social media, walk or drive around
 Needs to be understood, interests to be predicted

 We are both producers and consumers of data


 Data is not random, it has structure, e.g., customer
behavior
 We need “big theory” to extract that structure from
data for
(a) Understanding the process
(b) Making predictions for the future
Examples
6

 Supermarket chain – Each transaction


 Date, customer id, goods bought, their amount, total money
spent
 Maximize sales & profit – predict which customer is likely to
buy which product
 Customer - Set of products best matching his/her needs
 Customer behavior – time, geographic location
 Buy this ice cream flavor
 Buy next book of this author
 See this movie
 Visit this city
 Click this link
Examples
7

 Computer – need a algorithm – Sorting


 No algorithm
 Predicting customer behavior
 Spam emails from legitimate ones
◼ Input: email – list of characters
◼ Output: yes/no – spam or not
◼ Transform input to output – changes in time, from individual to
individual
 Construct a good and useful approximation
 Not possible to identify the complete process
 Detect certain pattern or regularities
 Assume: future not much different from the past when the sample
data was collected
Data Mining
8

 Large volume of earth, raw material extracted from a


mine
 Processed: small amount of very precious material
 Large volume of data – construct a simple model, high
predictive accuracy
 Finance banks – credit applications, fraud detection, stock
market
 Manufacturing – optimization, control, trouble shooting
 Medical diagnosis
 Telecommunication – network optimization, maximizing QoS
 Physics, astronomy, biology
Why “Learn” ?
9

 Machine learning is programming computers to optimize


a performance criterion using example data or past
experience.
 There is no need to “learn” to calculate payroll
 Learning is used when:
 Human expertise does not exist (navigating on Mars),
 Humans are unable to explain their expertise (speech
recognition)
 Solution changes in time (routing on a computer network)
 Solution needs to be adapted to particular cases (user
biometrics)
What We Talk About When We Talk
10
About “Learning”
 Learning general models from a data of particular
examples
 Data is cheap and abundant (data warehouses,
data marts); knowledge is expensive and scarce.
 Example in retail: Customer transactions to consumer
behavior:
People who bought “Blink” also bought “Outliers”
(www.amazon.com)
 Build a model that is a good and useful
approximation to the data.
Face recognition
11

 We do effortlessly – despite differences in pose,


lighting, hair style
 Face has structure, symmetric – eyes, nose, mouth
 Pattern recognition
 Model:
 Predictive – make predictions
 Descriptive – gain knowledge from data
 Both

 Occlusion – glasses hide the eyes & eyebrows, beard


hide the chin
Data Mining
12

 Retail: Market basket analysis, Customer


relationship management (CRM)
 Finance: Credit scoring, fraud detection
 Manufacturing: Control, robotics, troubleshooting
 Medicine: Medical diagnosis
 Telecommunications: Spam filters, intrusion detection
 Bioinformatics: Motifs, alignment
 Web mining: Search engines
 ...
What is Machine Learning?
13

 Optimize a performance criterion using example


data or past experience.
 Role of Statistics: Inference from a sample
 Role of Computer science: Efficient algorithms to
 Solve the optimization problem
 Representing and evaluating the model for inference
Applications
14

 Association
 Supervised Learning
 Classification

 Regression

 Unsupervised Learning
 Reinforcement Learning
Learning Associations
15

 Basket analysis:
P (Y | X ) probability that somebody who buys X
also buys Y where X and Y are products/services.
Example: P(notebook|pen ) = 0.7
P(Y|X,D) – D (customer attributes – gender, age, class
studying)
Classification
16

 Example: Credit
scoring
 Differentiating
between low-risk and
high-risk customers
from their income and
savings

Discriminant: IF income > θ1 AND savings > θ2


THEN low-risk ELSE high-risk
Loan
17

 Predict the risk associated with a loan


 Bank will make a profit
 Not inconvenience a customer with a loan over his or
her financial capacity
 Income, savings, collaterals, profession, age, past
financial history
Optical character recognition
18

 Multiple classes
 Handwritten characters – zip codes on envelopes,
amounts on cheques – small, large, slanted, pen,
pencil
Classification: Applications
19

 Aka Pattern recognition


 Face recognition: Pose, lighting, occlusion (glasses,
beard), make-up, hair style
 Character recognition: Different handwriting styles.
 Speech recognition: Temporal dependency.
 Medical diagnosis: From symptoms to illnesses
 Biometrics: Recognition/authentication using physical
and/or behavioral characteristics: Face, iris,
signature, etc
 Outlier/novelty detection:
Face Recognition
20

Training examples of a person

Test images

ORL dataset,
AT&T Laboratories, Cambridge UK
Medical diagnosis
21

 Input: patient information, output: illness classes


 Age, gender, past medical history, current symptoms
 SPEECH RECOGNITION
 Age, gender, accent
 Input is temporal

 Integration of a language model

 Natural Language Processing


◼ Spam filtering, document summarizing, analyzing blogs or
posts – extract trending topics, what to advertise, machine
translation
Biometrics
22

 Physical characteristics – Face, fingerprint, iris, palm


 Behavioral characteristics – dynamics of signature,
voice, gait, key stroke
 Usual identification procedures: photo, printed
signature, password
 Many different (uncorrelated) inputs – forgeries
(spoofing) difficult
Knowledge Extraction
23

 Simple model that explains the data


 Compression
 Rules of addition – need not remember sum of every
possible pair of numbers
 OUTLIER DETECTION
 Finding instances that do not obey the general rule
(exceptions)
 Anomaly requiring attention (fraud)
 Novel, but previously unseen, but valid – NOVELTY
DETECTION
Regression

 Example: Price of a used car


 x : car attributes – brand,
year, engine capacity, y = wx+w0
mileage
y : price
y = g (x | q )
g ( ) model,
q parameters

24
Regression Applications
25

 Navigating a car: Angle of the steering – video


camera, GPS
 Kinematics of a robot arm
(x,y) α1= g1(x,y)
α2= g2(x,y)
α2

α1

◼ Response surface design


Optimize a function
26

 Machine that roasts coffee


 Temperature, time, type of coffee bean
 Quality of coffee – customer satisfaction

 RESPONSE SURFACE DESIGN

 Recommendation system for movies


 List
ordered – believe the user is likely to enjoy each
 Genre, actor, ratings of user he/she has already seen

 Learn a RANKING function


Supervised Learning: Uses
27

 Prediction of future cases: Use the rule to predict


the output for future inputs
 Knowledge extraction: The rule is easy to
understand
 Compression: The rule is simpler than the data it
explains
 Outlier detection: Exceptions that are not covered
by the rule, e.g., fraud
Unsupervised Learning
28

 Learning “what normally happens”


 No output
 Clustering: Grouping similar instances
 Example applications
 Customer segmentation in CRM
 Image compression: Color quantization

 Bioinformatics: Learning motifs


Unsupervised learning
29

 Density estimation
 Clustering
 CustomerRelationship Management (CRM) – customer
segmentation
 Image Compression
 24 bits – 16 million colors, 64 main colors - ? Bits
 16 x 16 bitmap = 32 bytes, ASCII - ?

 Document clustering
 News reports – politics, sports, fashion, arts
 Bag of words
Bioinformatics
30

 DNA – blueprint of life


 Sequence of bases – A, G, C and T
 RNA – transcribed from DNA
 Proteins are translated from RNA – sequence of amino
acids
 Alignment – matching one sequence to another
 Learning motifs – clustering, sequences of amino acids
that occur repeatedly in proteins, structural or functional
elements within the sequences they characterize
 Amino acids – letters, proteins – sentences, motifs -
words
Reinforcement Learning
31

 Learning a policy: A sequence of outputs (correct


actions)
 No supervised output but delayed reward
 Credit assignment problem
 Game playing
 Robot in a maze
 Multiple agents, partial observability, ...
 Team of robots playing soccer
Resources: Datasets
32

 UCI Repository: http://www.ics.uci.edu/~mlearn/MLRepository.html


 Statlib: http://lib.stat.cmu.edu/
Resources: Journals
33

 Journal of Machine Learning Research www.jmlr.org


 Machine Learning
 Neural Computation
 Neural Networks
 IEEE Trans on Neural Networks and Learning Systems
 IEEE Trans on Pattern Analysis and Machine Intelligence
 Journals on Statistics/Data Mining/Signal
Processing/Natural Language
Processing/Bioinformatics/...
Resources: Conferences
34

 International Conference on Machine Learning (ICML)


 European Conference on Machine Learning (ECML)
 Neural Information Processing Systems (NIPS)
 Uncertainty in Artificial Intelligence (UAI)
 Computational Learning Theory (COLT)
 International Conference on Artificial Neural Networks
(ICANN)
 International Conference on AI & Statistics (AISTATS)
 International Conference on Pattern Recognition (ICPR)
 ...

Das könnte Ihnen auch gefallen