Beruflich Dokumente
Kultur Dokumente
Learning
Ata Kaban
A.Kaban@cs.bham.ac.uk
School of Computer Science
University of Birmingham
Overview
Today we learn:
Bayesian classification
E.g. How to decide if a patient is ill or healthy,
based on
A probabilistic model of the observed data
Prior knowledge
Training data: examples of the form (d,h(d))
where d are the data objects to classify (inputs)
and h(d) are the correct class info for d, h(d){1,K}
Goal: given d
new
, provide h(d
new
)
Classification problem
Why Bayesian?
Provides practical learning algorithms
E.g. Nave Bayes
Prior knowledge and observed data can be
combined
It is a generative (model based) approach, which
offers a useful conceptual framework
E.g. sequences could also be classified, based on
a probabilistic model specification
Any kind of objects can be classified, based on a
probabilistic model specification
Bayes Rule
) (
) ( ) | (
) | (
d P
h P h d P
d h p
) data the seen having after hypothesis of ty (probabili posterior
data) the of y probabilit (marginal evidence data
is hypothesis the if data the of ty (probabili likelihood
data) any seeing before hypothesis of ty (probabili belief prior
d h
h
h
: ) | (
: ) ( ) | ( ) (
true) : ) | (
: ) (
d h P
h P h d P d P
h d P
h P
h
Y
Y X P X P ) , ( ) (
Does patient have cancer or not?
A patient takes a lab test and the result comes back
positive. It is known that the test returns a correct
positive result in only 98% of the cases and a correct
negative result in only 97% of the cases. Furthermore,
only 0.008 of the entire population has this disease.
1. What is the probability that this patient has cancer?
2. What is the probability that he does not have cancer?
3. What is the diagnosis?
?? . 3
....... .......... .......... ) | ( . 2
.......... ) (
03 . 0 ) | (
....... .......... .......... .......... .......... .......... ..........
) ( ) | ( ) ( ) | ( ) (
008 . 0 ) (
98 . 0 ) | (
..........
..... .......... ..........
..... .......... ..........
) (
) ( ) | (
) | ( . 1
' ' :
}
' ' : 2
' ' : 1
Diagnosis
cancer P
cancer P
cancer P
cancer P cancer P cancer P cancer P P
cancer P
cancer P
P
cancer P cancer P
cancer P
data
H space hypothesis
cancer hypothesis
cancer hypothesis
Choosing Hypotheses
Maximum Likelihood
hypothesis:
Generally we want the
most probable hypothesis
given training data.This is
the maximum a posteriori
hypothesis:
Useful observation: it does
not depend on the
denominator P(d)
) | ( max arg d h P h
H h
MAP
) | ( max arg h d P h
H h
ML
t
t T
h a P h a a P h P ) | ( ) | ,..., ( ) | (
1
d
Example. Play Tennis data
Day Outlook Temperature Humidity Wind Play
Tennis
Day1 Sunny Hot High Weak No
Day2 Sunny Hot High Strong No
Day3 Overcast Hot High Weak Yes
Day4 Rain Mild High Weak Yes
Day5 Rain Cool Normal Weak Yes
Day6 Rain Cool Normal Strong No
Day7 Overcast Cool Normal Strong Yes
Day8 Sunny Mild High Weak No
Day9 Sunny Cool Normal Weak Yes
Day10 Rain Mild Normal Weak Yes
Day11 Sunny Mild Normal Strong Yes
Day12 Overcast Mild High Strong Yes
Day13 Overcast Hot Normal Weak Yes
Day14 Rain Mild High Strong No
Nave Bayes solution
Classify any new datum instance x=(a
1
,a
T
) as:
To do this based on training examples, we need to estimate the
parameters from the training examples:
For each target value (hypothesis) h
For each attribute value a
t
of each datum instance
) ( estimate : ) (
h P h P
) | ( estimate : ) | (
h a P h a P
t t
t
t
h h
Bayes Naive
h a P h P h P h P h ) | ( ) ( max arg ) | ( ) ( max arg x
Based on the examples in the table, classify the following datum x:
x=(Outl=Sunny, Temp=Cool, Hum=High, Wind=strong)
That means: Play tennis or not?
Working:
) | ( ) | ( ) | ( ) | ( ) ( max arg
) | ( ) ( max arg ) | ( ) ( max arg
] , [
] , [ ] , [
h strong Wind P h high Humidity P h cool Temp P h sunny Outlook P h P
h a P h P h P h P h
no yes h
t
t
no yes h no yes h
NB
x
no x PlayTennis answer
no strong P no high P no cool P no sunny P no P
yes strong P yes high P yes cool P yes sunny P yes P
etc
no PlayTennis strong Wind P
yes PlayTennis strong Wind P
no PlayTennis P
yes PlayTennis P
) ( :
) | ( ) | ( ) | ( ) | ( ) (
0053 . 0 ) | ( ) | ( ) | ( ) | ( ) (
.
60 . 0 5 / 3 ) | (
33 . 0 9 / 3 ) | (
36 . 0 14 / 5 ) (
64 . 0 14 / 9 ) (
0.0206
Learning to classify text
Learn from examples which articles are of
interest
The attributes are the words
Observe the Nave Bayes assumption just
means that we have a random sequence
model within each class!
NB classifiers are one of the most effective for
this task
Resources for those interested:
Tom Mitchell: Machine Learning (book) Chapter 6.
Results on a benchmark text corpus
Remember
Bayes rule can be turned into a classifier
Maximum A Posteriori (MAP) hypothesis estimation
incorporates prior knowledge; Max Likelihood doesnt
Naive Bayes Classifier is a simple but effective Bayesian
classifier for vector data (i.e. data with several attributes)
that assumes that attributes are independent given the
class.
Bayesian classification is a generative approach to
classification
Resources
Textbook reading (contains details about using Nave
Bayes for text classification):
Tom Mitchell, Machine Learning (book), Chapter 6.
Software: NB for classifying text:
http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-
bayes.html
Useful reading for those interested to learn more about
NB classification, beyond the scope of this module:
http://www-2.cs.cmu.edu/~tom/NewChapters.html