How To Build ROC Curve

How to Construct an ROC curve How to Construct an ROC curve
Use classifier that produces Instance P(+) True class FPR TPR
Instance P(+|A) True Class
posterior probability for each
1 0.95 + 1 0.95 + 0 1/5
test instance P(+|A)
2 0.93 + 2 0.93 + 0 2/5
3 0.87 - Sort the instances according
3 0.87 - 1/5 2/5
4 0.85 - to P(+|A) in decreasing order
5 0.85 -
4 0.85 -
Apply threshold at each
6 0.85 + unique value of P(+|A) 5 0.85 -
7 0.76 - 6 0.85 + 3/5 3/5
8 0.53 + Count the number of TP, FP,
TN, FN at each threshold 7 0.76 - 4/5 3/5
9 0.43 -
10 0.25 + 8 0.53 + 4/5 4/5
TP rate, TPR = TP/(TP+FN)
9 0.43 - 1 4/5
FP rate, FPR = FP/(FP + TN)
10 0.25 + 1 1
Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 2
How to construct an ROC curve How to Construct an ROC curve
Instance P(+) True class FPR TPR

1 0.95 + 0 1/5
2 0.93 + 0 2/5
3 0.87 + 0 3/5
4 0.85 + 0 4/5
5 0.83 + 0 1
6 0.80 - 1/5 1
7 0.76 - 2/5 1
8 0.53 - 3/5 1
9 0.43 - 4/5 1
10 0.25 - 1 1
How to construct an ROC curve How to Construct an ROC curve
Instance P(+) True class FPR TPR

1.0
1 0.95 + 0 1/5
0.8
2 0.93 - 1/5 1/5

3 0.87 + 1/5 2/5
0.6
4 0.85 - 2/5 2/5

TPR
5 0.83 + 2/5 3/5

0.4
6 0.80 - 3/5 3/5

0.2
7 0.76 + 3/5 4/5

8 0.53 - 4/5 4/5
0.0
9 0.43 + 4/5 1
0.0 0.2 0.4 0.6 0.8 1.0
FPR
10 0.25 - 1 1
How to construct an ROC curve Model Evaluation
l Metrics for Performance Evaluation

1.0
How to evaluate the performance of a model?

0.8
l Methods for Performance Evaluation

0.6
TPR
How to obtain reliable estimates?

0.4
Methods for Model Comparison

0.2
l
How to compare the relative performance
0.0
among competing models?

0.0 0.2 0.4 0.6 0.8 1.0
FPR
Confidence Interval for Accuracy Confidence Interval for Accuracy
l Prediction can be regarded as a Bernoulli trial Area = 1 -

l For large test sets (N > 30),
A Bernoulli trial has 2 possible outcomes
acc has a normal distribution
Possible outcomes for prediction: correct or wrong with mean p and variance
Collection of Bernoulli trials has a Binomial distribution: p(1-p)/N
u x Bin(N, p) x: number of correct predictions
e.g: Toss a fair coin 50 times, how many heads would turn up? acc p
u
Expected number of heads = Np = 50 0.5 = 25
P ( Z / 2 < < Z / 2 )
p (1 p) / N
= 1 -Z/2 Z /2
l Given x (# of correct predictions) or equivalently,
acc=x/N, and N (# of test instances), l (1-) 100% Confidence Interval for p:
v
acc Z/2t acc(1acc)

u
u
Can we predict p (true accuracy of model)?
u
N
Confidence Interval for Accuracy Model Evaluation
l Consider a model that produces an accuracy of l Metrics for Performance Evaluation

80% when evaluated on 100 test instances: How to evaluate the performance of a model?
N=100, acc = 0.8 1- Z/2
Let 1- = 0.95 (95% confidence)
0.99 2.58 l Methods for Performance Evaluation
From probability table, Z/2=1.96
0.98 2.33 How to obtain reliable estimates?
N 50 100 500 1000 5000 0.95 1.96
p(lower) 0.689 0.722 0.765 0.775 0.789 0.90 1.65 l Methods for Model Comparison
How to compare the relative performance
p(upper) 0.911 0.878 0.835 0.825 0.811
among competing models?
Comparing Performance of 2 Models Errors of M1 and M2 are independent
Prob. C2
l Given two models, say M1 and M2, which Let X=C1-C2
incorrect (0) correct (1)
is better? C1 incorrect (0) 0.04 0.16
l Usually the models are evaluated on the correct (1) 0.16 0.64
same test sample.
l Make use of correlation between
predictions. X -1 0 +1 E(X)=0
P(X) .16 .68 .16 VAR(X)= E(X E(X))2 =

0.16(-1)2 +0.68(0)2 +0.16(1)2
= 0.32
Strong positive correlation Comparing Performance of 2 Models
Prob. C2
Let X=C1-C2 l Larger differences are more likely if errors are
incorrect (0) correct (1)
independent, and less likely if errors are
C1 incorrect (0) 0.18 0.02 positively correlated.
correct (1) 0.02 0.78
l Hence, an observed difference may be regarded
as significant for models with positively correlated
errors but not for models with independent errors.
X -1 0 +1 E(X)=0 l Our test should reflect (make use of) this
P(X) .02 .96 .02 VAR(X)= E(X E(X))2 = property.
0.02(-1)2 +0.96(0)2 +0.02(1)2
= 0.04
Comparing Performance of 2 Models Comparing Performance of 2 Models
l Make a cross-table of l Ignore cells a and d (both incorrect, both

correct and incorrect correct).
predictions of M1 and
l If models were equally good, we would
M2.
Model M2
Count expect counts in cells b and c to be in
incorrect correct balance.
l Under the null hypothesis that models have
Model incorrect a b
M1
the same error, number in cell b has a
correct c d
binomial distribution with n=n(b)+n(c), and
p=0.5.
Comparing Performance of 2 Models Comparing Performance of 2 Models
We test the null hypothesis: Errors of M1 and M2 are independent
H0 : e1 = e2
Count Model M2
against
incorrect correct
Ha : e1 e2
Model incorrect 6 14
M1
where ei denotes the true error-rate of model i.
correct 24 56
Binomial distr. with n=38 and p=0.5. Comparing Performance of 2 Models

0.12
Errors of M1 and M2 are positively correlated

0.10
p-value = 0.14 Count Model M2

0.08
incorrect correct
0.06
Model incorrect 18 2
0.04
M1
0.02
correct 12 68
0.00
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38
Binomial distr. with n=14 and p=0.5. Comparing Performance of 2 Models
Although the difference in error-rate is the same in both

0.20
cases, the independent case produced a p-value of 0.14

(typically not regarded significant) leading to the conclusion
p-value = 0.012 that we cannot reject the null hypothesis that both models
0.15
have the same error-rate.

0.10
The example with positively correlated errors produces a

p-value of 0.012, leading to the conclusion that M1 has a
significantly lower error rate than M2.
0.05
0.00
0 1 2 3 4 5 6 7 8 9 10 12 14
Data Mining Bayes (Generative) Classifier
Classification: Alternative Techniques
l A probabilistic framework for solving classification
problems
Lecture Notes for Chapter 5 l Conditional Probability: P( A, C )
P (C | A) =
P ( A)
Introduction to Data Mining P( A, C )
P( A | C ) =
by P (C )
Tan, Steinbach, Kumar l Bayes theorem:
P ( A | C ) P (C )
P(C | A) =
P( A)
Example of Bayes Theorem Bayesian Classifiers
l Given: l Consider each attribute and class label as random

A doctor knows that meningitis causes stiff neck 50% of the variables
time
Prior probability of any patient having meningitis is 1/50,000
l Given a record with attributes (A1, A2,,An)
Prior probability of any patient having stiff neck is 1/20
Goal is to predict class C
l If a patient has stiff neck, whats the probability Specifically, we want to find the value of C that
he/she has meningitis? maximizes P(C| A1, A2,,An )
P ( S | M ) P ( M ) 0.5 1 / 50000
P(M | S ) = = = 0.0002 l Can we estimate P(C| A1, A2,,An ) directly from
P(S ) 1 / 20
data?
Bayesian Classifiers Curse of dimensionality
l Approach: l How to estimate P(A1, A2, , An | C )?

compute the posterior probability P(C | A1, A2, , An) for
all values of C using the Bayes theorem l If each attribute is discrete with, say, 5 possible
values, then to estimate each possible combination
P ( A A K A | C ) P (C )
P (C | A A K A ) =
1 2 n
1 2 n
requires the estimation of 5n probabilities per class.
P(A A K A ) 1 2 n
l For 10 attributes (n=10) this is about ten million
Choose value of C that maximizes probabilities. In general: mn probabilities per class.
P(C | A1, A2, , An)
l This simple approach runs into the curse of
Equivalent to choosing value of C that maximizes dimensionality.

P(A1, A2, , An|C) P(C) l To be practical, we need to make some simplifying
assumptions.
l How to estimate P(A1, A2, , An | C )?
Conditional Independence Nave Bayes Classifier
X and Y are independent iff P(X,Y)=P(X)P(Y), l Assume independence among attributes Ai when class is
or, equivalently, P(X|Y)=P(X). given:
P(A1, A2, , An |Cj) = P(A1| Cj) P(A2| Cj) P(An| Cj)
Intuition: Y doesnt provide any information about X
(and vice versa).
Can estimate P(Ai| Cj) for all Ai and Cj.
X and Y are independent given Z iff P(X,Y|Z)=P(X|Z)P(Y|Z), Now we only need to estimate m n probabilities per
or, equivalently, P(X|Y,Z)=P(X|Z). class.
Intuition: if we know the value of Z, then Y doesnt provide

New point is classified to Cj if P(Cj) P(Ai| Cj) is
any information about X (and vice versa).
maximal.
How to Estimate
l l
Probabilities from Data? How to Estimate Probabilities from Data?
a a s
r ic r ic
u ou
go go tin ss
te te n a l Class: P(C) = Nc/N
ca ca co cl l For continuous attributes:
Tid Refund Marital Taxable e.g., P(No) = 7/10,
Status Income Evade
P(Yes) = 3/10 Discretize the range into bins
1 Yes Single 125K No
Two-way split: (A < v) or (A v)
2 No Married 100K No
l For discrete attributes: u choose only one of the two splits as new attribute
3 No Single 70K No
4 Yes Married 120K No P(Ai | Ck) = |Aik|/ Nc k Probability density estimation:
5 No Divorced 95K Yes
where |Aik| is number of u Assume attribute follows a normal distribution
6 No Married 60K No
instances having attribute u Use data to estimate parameters of distribution
7 Yes Divorced 220K No
Ai and belongs to class Ck
8 No Single 85K Yes (i.e., mean and standard deviation)
9 No Married 75K No
Examples:
u Once probability distribution is known, can use it to
10 No Single 90K Yes P(Status=Married|No) = 4/7 estimate the conditional probability P(Ai|c)
P(Refund=Yes|Yes)=0
10
al
How toorEstimate al uProbabilities
s
from Data? How to Estimate Probabilities from Data?
ic ic
or uo
g g in s
te te nt as
ca ca co cl
0.08
Tid Refund Marital

Status
Taxable
Income Evade
l Normal distribution:
( Ai ij ) 2
1
P( A | c ) =
2 ij2
1 Yes Single 125K No
e Class No
2
i j 2 Class Yes
0.06
2 No Married 100K No
ij
3 No Single 70K No
4 Yes Married 120K No One for each (Ai,cj) pair
5 No Divorced 95K Yes
0.04
6 No Married 60K No l For (Income, Class=No):

7 Yes Divorced 220K No
If Class=No
8 No Single 85K Yes
0.02
u sample mean = 110

9 No Married 75K No
10 No Single 90K Yes u sample variance = 2975
10
0.00
1
( 120110 ) 2
P ( Income = 120 | No) = e 2 ( 2975 )

= 0.0072
2 (54.54) 0 50 100 150
Income
200 250 300
Example of Nave Bayes Classifier Nave Bayes Classifier
Given a Test Record:
X = (Refund = No, Married, Income = 120K) l If one of the conditional probability is zero, then
the entire expression becomes zero
naive Bayes Classifier:
l Probability estimation:
P(Refund=Yes|No) = 3/7 l P(X|Class=No) = P(Refund=No|Class=No)
P(Refund=No|No) = 4/7 P(Married| Class=No)
P(Refund=Yes|Yes) = 0 P(Income=120K| Class=No)
P(Refund=No|Yes) = 1 = 4/7 4/7 0.0072 = 0.0024
P(Marital Status=Single|No) = 2/7 N ic
P(Marital Status=Divorced|No)=1/7 Original : P ( Ai | C ) = a: number of values of Ai
P(Marital Status=Married|No) = 4/7 l P(X|Class=Yes) = P(Refund=No| Class=Yes) Nc
P(Marital Status=Single|Yes) = 2/7 P(Married| Class=Yes)
P(Marital Status=Divorced|Yes)=1/7 P(Income=120K| Class=Yes) N ic + 1
P(Marital Status=Married|Yes) = 0 = 1 0 1.2 10-9 = 0 Laplace : P ( Ai | C ) =
For taxable income:
Nc + a
If class=No: sample mean=110 Since P(X|No)P(No) > P(X|Yes)P(Yes)
sample variance=2975
Therefore P(No|X) > P(Yes|X)
If class=Yes: sample mean=90
sample variance=25 => Class = No
Example of Nave Bayes Classifier Nave Bayes (Summary)
Name
human
Give Birth
yes no
Can Fly Live in Water Have Legs
no yes
Class
mammals
A: attributes l Robust to isolated noise points
python no no no no non-mammals M: mammals
salmon no no yes no non-mammals
whale yes no yes no mammals N: non-mammals
frog
komodo
no
no
no
no
sometimes
no
yes
yes
non-mammals
non-mammals
6 6 2 2
l Handle missing values by ignoring the instance
bat
pigeon
yes
no
yes
yes
no
no
yes
yes
mammals
non-mammals
P( A | M ) = = 0.06 during probability estimate calculations
cat yes no no yes mammals
7 7 7 7
leopard shark yes no yes no non-mammals 1 10 3 4
turtle no no sometimes yes non-mammals P( A | N ) = = 0.0042
penguin no no sometimes yes non-mammals 13 13 13 13
porcupine yes no no yes mammals
7 l Robust to irrelevant attributes
P( A | M ) P (M ) = 0.06 = 0.021
eel no no yes no non-mammals
salamander no no sometimes yes non-mammals
gila monster no no no yes non-mammals 20
platypus no no no yes mammals
13
owl
dolphin
no
yes
yes
no
no
yes
yes
no
non-mammals
mammals
P( A | N ) P( N ) = 0.004 = 0.0027 l Independence assumption may not hold for some
20
eagle no yes no yes non-mammals
attributes
P(A|M)P(M) > P(A|N)P(N)
Give Birth Can Fly Live in Water Have Legs Class Use other techniques such as Bayesian Belief
yes no yes no ? => Mammals
Networks (BBN)
Nave Bayes (Summary) Example
l Independence assumption may not hold for Suppose P(Yes|A1=a1,,An=an)=0.7 is the true probability
(some) attributes, but of Class Yes for a given attribute vector.
To minimize the error rate we should classify this attribute
l If we evaluate on error-rate, then all that matters,
vector as Yes.
in the binary case, is whether the probability
estimate is on the right side of 0.5. ^
As long as we have P(Yes|A 1=a1,,An=an) > 0.5, we will
l With more than two classes similar reasoning assign to the optimal class.
applies, but the margin of error becomes
The probability estimate itself may be way off!
smaller.
l For ROC curve, what matters is that we get the If we evaluate on likelihood this doesnt fly!
probabilities in the right order.
Example
C=0 A2 C=1 A2
0 1 P(A1) 0 1 P(A1)
A1 0 0.3 0.1 0.4 A1 0 0.6 0.1 0.7
1 0.1 0.5 0.6 1 0.1 0.2 0.3
P(A2) 0.4 0.6 1 P(A2) 0.7 0.3 1
P(C=0)=1/2, P(C=1)=1/2.
P (A1=1,A2=1|C=0)P (C=0) 40
P (C = 0|A1 = 1, A2 = 1) = P (A =1,A =1|C=0)P (C=0)+P (A1=1,A2=1|C=1)P (C=1) = 56 0.71
1 2
With Naive Bayes:

(A1=1|C=0)P (A2=1|C=0)P (C=0)
P (C = 0|A1 = 1, A2 = 1) = P (A =1|C=0)P (A P=1|C=0)P (C=0)+P (A1=1|C=1)P (A2=1|C=1)P (C=1) = 0.8
1 2
Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 43

How To Build ROC Curve

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

How To Build ROC Curve

Hochgeladen von

Copyright:

Verfügbare Formate

How to Construct an ROC curve How to Construct an ROC curve

How to construct an ROC curve How to Construct an ROC curve

Instance P(+) True class FPR TPR

How to construct an ROC curve How to Construct an ROC curve

Instance P(+) True class FPR TPR

2 0.93 - 1/5 1/5

4 0.85 - 2/5 2/5

5 0.83 + 2/5 3/5

6 0.80 - 3/5 3/5

7 0.76 + 3/5 4/5

l Metrics for Performance Evaluation

How to evaluate the performance of a model?

l Methods for Performance Evaluation

How to obtain reliable estimates?

Methods for Model Comparison

among competing models?

Confidence Interval for Accuracy Confidence Interval for Accuracy

l Prediction can be regarded as a Bernoulli trial Area = 1 -

acc Z/2t acc(1acc)

Confidence Interval for Accuracy Model Evaluation

l Consider a model that produces an accuracy of l Metrics for Performance Evaluation

P(X) .16 .68 .16 VAR(X)= E(X E(X))2 =

Strong positive correlation Comparing Performance of 2 Models

Comparing Performance of 2 Models Comparing Performance of 2 Models

l Make a cross-table of l Ignore cells a and d (both incorrect, both

We test the null hypothesis: Errors of M1 and M2 are independent

Binomial distr. with n=38 and p=0.5. Comparing Performance of 2 Models

Errors of M1 and M2 are positively correlated

p-value = 0.14 Count Model M2

Binomial distr. with n=14 and p=0.5. Comparing Performance of 2 Models

Although the difference in error-rate is the same in both

cases, the independent case produced a p-value of 0.14

have the same error-rate.

The example with positively correlated errors produces a

Example of Bayes Theorem Bayesian Classifiers

l Given: l Consider each attribute and class label as random

Bayesian Classifiers Curse of dimensionality

l Approach: l How to estimate P(A1, A2, , An | C )?

Equivalent to choosing value of C that maximizes dimensionality.

Intuition: if we know the value of Z, then Y doesnt provide

Tid Refund Marital

6 No Married 60K No l For (Income, Class=No):

u sample mean = 110

P ( Income = 120 | No) = e 2 ( 2975 )

Example of Nave Bayes Classifier Nave Bayes (Summary)

Nave Bayes (Summary) Example

With Naive Bayes:

Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 43

Das könnte Ihnen auch gefallen