Sie sind auf Seite 1von 2

CLASSIFICATION : MODEL EVALUATION Confusion Matrix | CAP curve | ROC curve

Confusion matrix is a table that is used to describe the performance of a classification model on a
set of test data for which the true values are known. It uses the Predicted Class on horizontal axis
Vs Actual class on vertical.
 True positive and True negatives are the observations that are correctly predicted
 False positives and False negatives need to be minimized
o FP : When actual class is no and predicted class is yes
o FN : When actual class is yes but predicted class in no
ACCURACY
 ratio of correctly predicted observation to the total observations.
 0.803 which means our model is approx. 80% accurate
 accuracy is a great measure only when for symmetric datasets i.e. FP & FN are almost same

𝑨𝒄𝒄𝒖𝒓𝒂𝒄𝒚 = 𝑬𝒓𝒓𝒐𝒓 𝒓𝒂𝒕𝒆 =

PRECISION (LOW-FALSE POSITIVES) When we predict positives, how often they are right.
 ratio of correctly predicted positive observations to the total predicted positive observations
 E.g. of all passengers that labelled as survived, how many actually survived
 How many of those who we labelled as diabetic are actually diabetic?
 Precision is a good evaluation metric to use when the cost of a false positive is very high and
the cost of a false negative is low.
o E.g. A restaurant owner endorsing a product as very good, but customers may not like

𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑇𝑃
𝑷𝒓𝒆𝒄𝒊𝒔𝒊𝒐𝒏 = =
𝑇𝑜𝑡𝑎𝑙 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑇𝑃 + 𝐹𝑃

RECALL (SENSITIVITY) Of all actual positives, how many did we get right
 ratio of correctly predicted positive observations to the all observations in actual class – yes
 E.g. Of all the passengers that truly survived, how many did we label?
 E.g. Of all the people who are diabetic, how many of those we correctly predict?
 When the cost of false negatives is high, it is better to use recall as an evaluation metric
o For example, you should use recall when looking to predict whether a credit card
charge is fraudulent or not.
𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑇𝑃
𝑇𝑃𝑅 = 𝑹𝒆𝒄𝒂𝒍𝒍 = =
𝐴𝑐𝑡𝑢𝑎𝑙 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑇𝑃 + 𝐹𝑁
False Positive Rate: False Negative Rate:
𝑭𝑷 𝑭𝑁
𝑭𝑷𝑹 = 𝑭𝑁𝑹 =
𝑭𝑷 + 𝑻𝑵 𝑭𝑁 + 𝑻𝑃
FPR = 1 – TNR True Negative Rate: (SPECIFICITY)
Error Rate = 1 - Accuracy 𝑇𝑁
𝑻𝑵𝑹 =
𝑇𝑁 + 𝐹𝑃
F-Score F1 Score is the weighted average of Precision and Recall, Takes into account both FP & FN
 Better measure than accuracy in case of uneven class distribution (asymmetric FP & FN)
 Accuracy works best if false positives and false negatives have similar cost
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∗ 𝑅𝑒𝑐𝑎𝑙𝑙
𝐹𝑆𝑐𝑜𝑟𝑒 = 2 ∗
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙
Model Evaluation for Binary classification models
 Confusion Matrix
 CAP curve
 ROC curve
ROC - Receiver Operating Characteristics curve: Intent is to analyse performance of any classifier
model
 ROC curves plot the true-positive rate against the false-positive rate of classification
o ROC Curves summarize the trade-off between the TP rate and FP rate for a
predictive model using different cut-off probability (thresholds).
 ROC Evaluation :
o More the area under the curve (AUC), better is the model at distinguishing between
classes
o Classifiers that give curves closer to the top-left corner indicate a better performance
o AUC is equivalent to the probability that a randomly chosen positive instance is
ranked higher than a randomly chosen negative instance

As a baseline, a random classifier is expected to give points lying along the diagonal (FPR = TPR).
The closer the curve comes to the 45-degree diagonal of the ROC space, the less accurate the test.

TPR = True Positives / Actual positives (Recall)


FPR = False Positives / Actual Negative ( 1 – Specificity)

 Helps choose the optimal cut-off value


 To select the best model amongst multiple models based on area under curve
 Desirable to be in the north-west corner of curve
 Attempt to maximize the area under the curve in order to get the best performing model (High
TP and lower FP)

Plotting and analysis processes for an ROC


curve:
1: Choose a Cut-Off Probability value for a
given model
2: Calculate True Positive Rate, also known
as Recall or Sensitivity
3: Calculate the False Positive Rate, which is
expressed as 1-Specificity
4: Vary the cut-off probability value to
arrive at the curve

𝑻𝑷 𝑻𝑷
TPR/Recall/Sensitivity = =
𝑨𝒄𝒕𝒖𝒂𝒍 𝑷𝒐𝒔𝒊𝒕𝒊𝒗𝒆𝒔 𝑻𝑷 𝑭𝑵

𝑭𝑷 𝑭𝑷
FPR = =
𝑨𝒄𝒕𝒖𝒂𝒍 𝑵𝒆𝒈𝒂𝒕𝒊𝒗𝒆𝒔 𝑻𝑵 𝑭𝑷

𝑻𝑵
False Positivity Rate = 1 – Specificity Specificity =
𝑻𝑵 𝑭𝑷

Das könnte Ihnen auch gefallen