5 views

Uploaded by sharathdhamodaran

An overview of Binary classification metrics

- 118
- Pre link
- JournalNX- Big Data Analysis
- IRJET-Photo Optical Character Recognition Model
- Qualirtu
- An Optical Character Recognition for Handwritten Devanagari Script
- Diagnosis of Heart Disease Using Data Mining Algorithm
- Lecture # 1- Pattern Recognition
- Science Text
- Lecture1.pptx
- Chapter - 2 Notes - Classification & Tabulation.doc
- iris.pdf
- JCIT4-184028 Camera Ready
- 91.pdf
- vss2010_kohler
- DMML
- 778_2009_Article_170
- 123-hersa
- Anomaly Detection: A tutorial
- Learning to Detect Phishing Urls

You are on page 1of 6

METU and HAVELSAN, Ankara, Turkey Gazi University, Ankara, Turkey

gcanbek@havelsan.com.tr ss@gazi.edu.tr

METU, Ankara, Turkey METU, Ankara, Turkey

ttemizel@metu.edu.tr baykal@metu.edu.tr

AbstractBinary classification is one of the most frequent unsuccessful aspects of a classification model. Some represent

studies in applied machine learning problems in various domains, performance from the specific point of view while ignoring the

from medicine to biology to meteorology to malware analysis. others. Many researchers who design a classification model give

Many researchers use some performance metrics in their narrow metrics causing misperceptions.

classification studies to report their success. However, the

literature has shown a widespread confusion about the In this study, our approach to performance metrics is from

terminology and ignorance of the fundamental aspects behind holistic perspective covering the wide range of the subject. This

metrics. This paper clarifies the confusing terminology, suggests is important for some emerging domains such as malware

formal rules to distinguish between measures and metrics for the classification or other new machine learning classification

first time, and proposes a new comprehensive visualized roadmap applications that focus on implementation details and acquainted

in a leveled structure for 22 measures and 22 metrics for exploring with only a few misleading metrics such as Accuracy (ACC),

binary classification performance. Additionally, we introduced True Positive Rate (TPR) or F-measures to claim their success.

novel concepts such as canonical notation, duality, and The researchers who want to improve their machine learning

complementation for measures/metrics, and suggested two new algorithms on different domain problems and compare their test

canonical base measures simplifying equations. It is expected that results with others have difficulties to understand performance

the study will guide other studies to have standardized approach metrics and select the most proper ones from the wide set of

to performance metrics for machine learning based solutions.

possibilities. For this reason, an originally developed visually

Keywordsbinary classification; classification performance;

enhanced performance metrics roadmap is designed as a chart

metrics; measures; machine learning; visualization; ontology based on the confusion matrix to help these researchers.

The proposed comprehensive roadmap shows the complete

I. INTRODUCTION set of primary metrics not only the common ones such as ACC,

Machine learning classification performance that is an TPR, True Negative Rate (TNR), False Positive Rate, (FPR),

important subject in several domains is related to state how well Positive Predictive Value (PPV), False Negative Rate (FNR), F1

a classifier that implements a specific machine learning score but also the others such as Prevalence, (Label) Bias,

algorithm or model makes a correct distinction between classes. INFORM (informedness), MARK (markedness), MCC

The most basic and studied classification type is binary (Matthews Correlation Coefficient), BACC (balanced accuracy,

classification or two-class classification that separates a given also known as strength), Gm (G-mean), Cohens Kappa (CK),

input into two opposite classes such as 'presence' vs. 'absence' of and Matthews Correlation Coefficient (MCC).

a disease or a condition, respond vs. no respond for a

treatment [1], 'spam' vs. 'non-spam' for an e-mail, and 'malign' The roadmap is domain independent and useful in all the data

vs. 'benign' for software. mining, machine learning and statistics studies. We aim that this

study is also a reference study for covering all the primary

Stating or comparing a classification performance with only measures and metrics with their equations specifically arranged

4 base measures is not suitable and understandable. Therefore, in binary classification context. We also reviewed metrics'

several metrics have been proposed for evaluating classification naming used interchangeably in academic and online resources

performances. Area Under (ROC) Curve (AUC) has its origins and included here in order to suit different naming conventions.

in signal detection theory in the 1970s is considered as a best The corresponding terminology in other domains such as

metric to state the performance [2], but there are other combined meteorology, medicine, or statistics is provided to see the

metrics that are useful for indicating the successful and synonyms of the measures and metrics.

The aim of our study is to review, summarize and clarify the literature [7]. However, many of them are duplicate and

large number of binary classification primary performance including the ones we provided in this study [7], [8]. Relatively

measures and metrics from a structural point of view so that the new works have been concentrated on performance measures

individual measures/metrics and their dependencies and and metrics and their properties related to classification

relations become visible and could be understood easily in a (skewness effect [9], accuracy paradox [10], multi-class [11],

natural way. The study makes the following contributions: chance correction [12], usage cost [13], constraints [14],

chronology [7], equation patterns [3]), requirements and

Reviewed and clarified performance measure-metrics recommendations [15]). Lavesson and Davidsson examine the

terminology, conventional performance metrics from quality attributes and

Put the difference between measures and metrics with metrics in software engineering domain (robustness, complexity

other confused terms such as scores, indicators, criteria, along with performance that includes accuracy, space, and time

factors or indices, sub-attributes) [16]. Similar to performance metrics, the

Introduced canonical forms for performance equations, combination of binary association measures is practically

Proposed formal rules to decide whether an equation is limitless. Paradowski proposed generalized measures/metrics

a measure or metric, based on coefficients [6] whereas Koyejo et al. introduced

Provided the systematic approach on which the generalized performance metric [17].

performance measures and metrics are defined with a

leveled structure showing dependency and relations, III. THE CLARIFICATION OF TERMINOLOGY

Discovered duality and complementation in measures The classification performance is calculated, presented and

and metrics and presented notations for expressing compared by some 'performance metrics' that are also called as

duality and complementation, 'performance measures', 'evaluation measures', and 'prediction

Suggested diagonal and off-diagonal totals as new scores' in supervised learning where a training set is provided

additional measures: True Classification (TC) and False with known inputs along with the correct labels (outcomes, i.e.

Classification (FC), the corresponding class for given inputs). Classification

Provided equations to calculate measures and metrics in performance is named differently in other domains such as

binary classification notation as a reference study, diagnostic (test) accuracy in medicine [18] or skill score or

Transformed the related equations into a new simplified forecast skill in meteorology (forecast vs. observation) [19].

canonical form for easy interpretation, and Categorization is commonly used in philosophy and statistics

Designed a roadmap visualizing 22 measures and 22 instead of classification [2].

metrics in all-in-one style so that the designated new

Although metrics, scores, measures, indicators,

geometry and leveling approach provide a better

criteria, factors or indices seem the same and are used

understanding of them with relations and dependencies.

interchangeably to state classification performance, there are

The rest of the paper is organized as follows. Section II slight and important differences. Surprisingly, the confusion

surveys literature on performance metrics. This section is short over such terms is widespread in the literature. Even, the studies

due to the lack of space nevertheless it provides many resources related to classification performance use incorrect terminologies

examining different aspects of performance metrics. Section III and to the best of our knowledge, this study is the first one that

provides the clear distinction among confused terminologies clarifies the terminology semantically in this section and

metrics, measures, and indicators. Section IV presents and formally as described in explained in Section V.

describes our comprehensive visualized roadmap for binary

As specifically examined from the general perspective by

classification performance metrics and describes the base and 1st

Texel [20], measures at the bottom of pyramid hierarchy of

level measures. It also describes the new two measures proposed

concept are numerical values with little or no context, metrics

by us. Section V introduces our new canonical form approach in

that are above measures possess a collection of measures in

measures and metrics equations and our new suggestion of

context, and indicators at the top are the comparison of metrics

certain formal rules to define a given equation as a measure or

to a baseline. Score has a similar meaning to measure. Therefore,

metric. Section VI reintroduces duality and complementation in

correct usage in classification is performance metrics.

measures and metrics with proposed notation to state them.

Section VII describes the 2nd level measures, Section VIII Another terminological clarification that we address is about

describes base and conventional 1st level metrics with the four types of direct results of binary classification namely

column/row geometry and examines Accuracy as the most used (number of) True Positives (TP), False Positives (FP), False

performance metric in studies and other base metrics. It Negatives (FN), and True Negatives (TN) that are explained in

addressed the complicated usage of Accuracy metric. Section IX Section IV.A. These values, which are displayed in 2 rows by 2

briefs the 1st and 2nd level metrics that are rarely used in binary columns of confusion matrix or contingency table for binary

classification studies in machine learning. The final section classification, should be stated as classification measures not

summarizes the goals and outlines the contributions. metrics, etc. Performance metrics are calculated based on these

base measures and some other measures are derived from those

II. LITERATURE REVIEW base measures. These three terminology clarifications (i.e.

Performance measures and metrics are historically based on measure vs. metric) in mind, you see that many past and even

binary similarity, distance or association measures and metrics current classification studies in the literature use incorrect

and several studies on several domains examine them [3][6]. terminology and intermingle with one another.

There are hundreds of measures and metrics defined in the

IV. THE PROPOSED COMPHREHENSIVE VISUALIZED innocence. An anti-malware product should be designed or

ROADMAP FOR BINARY CLASSIFICATION PERFORMANCE configured to decrease the False Positives (or False Alarms) to

avoid annoying interruptions due to excessive malware

We have designed Figure 1 originally to provide a visualized

warnings. Whereas precautionary logic focuses on more

roadmap for binary classification performance measures and

underestimates (False Negatives) than overestimates (False

metrics in all-in-one style. Both 22 measures and 22 metrics that

Positives) in criminal justice [22].

are described in next sections are shown in one chart like the

periodic table of elements. Gray colored cells correspond to B. The Proposed Performance Measures (TC, FC) and The

measures and orange colored ones correspond to metrics and 1st Level Measures (P, N, OP, ON, Sn)

they are grouped in a leveled structure and positioned according

The 1st level performance measures are one level above the

to similarities and row/column geometries. The depended

base performance measures. Condition Positive (P) and

metrics and measures in their equations that are also shown in condition Negative (N) measures that are column totals (also

the roadmap. Measures and metrics are built upon total 4 and 3

known as marginal totals in probability theory) of confusion

levels, respectively. The roadmap provides a neat distinction matrix represent the real or actual values of the two classes (i.e.

between measure and metrics and describes measures, metrics,

the real labels). These measures correspond to the reality,

their similarities, dependencies, and types via geometrical and

observed or ground truth. OP and ON measures that are row

visualization techniques.

totals (also known as marginal totals in probability theory) of

The roadmap also presents useful information as explained confusion matrix represent the predicted (test or classification

in the legend. The names of measures and metrics that have no result) of the two classes (i.e. the given labels). These measures

upper limit are written in bold. The numbering for measures are correspond to the prediction or estimated (classification output).

written in italic. The measures and metrics in the left of the

When we examined many measures and metrics we had seen

confusion matrix square are row type (depended solely upon

that some of them have specific elements that are the total value

base measures, Sample Size (Sn) and OP or ON whereas the

of diagonal base measures (TP and TN) and off-diagonal ones

ones in above the confusion matrix square are column type (the

(FP and FN). For the first time, we have introduced and named

same as row type but OP or ON are replaced by number of

those totals as True Classification (TC) and False Classification

condition Positive (P) and Negative (N). Row types are related

(FC), respectively. Substituting those totals have significantly

to testing or prediction whereas column types are related to

simplified the metrics equation and their interpretation. Sn

reality as described more in below. The roadmap with these

measure that is the sum of four base performance measures

details is worthy of all-in-one attribution. Note that AUC

could be stated by P plus N measures, by OP plus ON, and by

metric is out of scope in this study.

TC plus FC.

In this section and next sections, we provide all the

Before going into details of performance measures and

performance measures and metrics equations in binary

metrics based on the measures explained above, two attributes

classification specific terms and give equivalent or derived

of them, namely duality and complement concepts, are

canonical equations to assist interpreting them easily. The

introduced and described here. The literature has not addressed

measures that represent all combinations of a classification are metrics duality and complements sufficiently. We also

stated by two combinations of two letter groups as T: True or

introduced the canonical transformation of the equations and

matches vs. F: False or non-matches for classification proposed formal rules to define classification performance

correctness with P: Positive vs. N: Negative for

measures and metrics. In the abundance of metrics, these

representation of two-classes. They are presented in 2x2 attributes that we presented in this study make measures and

confusion matrix or contingency table.

metrics more comprehensible.

A. Confusion Matrix, The Four Base Performance Measures

V. A NEW FORMAL VIEW: CANONICAL FORMS AND

As stated in Section III, we call the four direct outputs of DEFINITIONS OF MEASURES AND METRICS

classification performance (TP, FP, FN, TN) as base measures

in a binary classification with supervised learning approach. In this study, we suggest canonical forms of measure and

True classification results or prediction/reality matches (TP metric equations that are defined as follows:

and TN) are located on diagonal whereas False classification

results, non-matches or errors (FP and FN) are off-diagonal in Definition: Canonical Forms

confusion matrix as seen in Figure 1.

The forms where equations are stated by 11 base measures

In critical engineering and medicine practices, type II errors, and 1st level measures (i.e. TP, FP, FN, TN, P, N, OP, ON, TC,

False Negatives, could be more serious or worse than type I FC, and Sn).

errors, False Positives. But the proper approach depends on the

domain and its specific application. For example, in malware We have clarified the difference between metrics and

analysis, it could be better to mistakenly label a benign software measures semantically in Section III. Determining a given

as malign than miss a malign software by incorrectly labeling equation is a performance measure or metric is also not studied

it as benign (labeled malware are prioritized and an expert could before. Because a binary classification performance is related to

go through further manual malware analysis to eliminate false TP, FP, FN, TN base measures, metrics should depend on at

positives then [21]). Although in law or social perspective the least one of them. The following proposed rules are valid for

opposite is likely to be valid to ensure presumption of measures. Otherwise, given canonical form is a metric.

Fig. 1. Visualized roadmap for binary classification performance measures/metrics (22 measures and 22 metrics providing names,

abbreviations, other namings, levels, dependencies, row/column geometries, equations with new canonical forms, and special notes such as

range apart from (0,1), duals, complements, etc.). See the legend for details. Visit http://bitly.com/metricsroadmap for feature updates.

Definition: Measure formally in every metric or measure.

The canonical form includes only P, N, OP, ON, or Sn We propose to adapt duality notation used in vector space

Otherwise, the possible values have no lower (-) that states a duality of a vector space V to its dual vector space

with asterisk superscript (*) as V*. In this notation, we can

and/or upper limit (+)

express the duality among performance measures and metrics.

For example, the followings are the dualities for the 1st level

measures defined in previous subheadings:

VI. THE DUALITY AND COMPLEMENTATION REVEALED IN

& & & & *+

CLASSIFICATION PERFORMANCE MEASURES/METRICS

Duality is basically related to the transformation of one The symmetry or involution is always valid for the duality of

concept into another concept in a bilateral approach. We performance measures and metrics. The duality is especially

introduce the duality concept in classification performance important to transform a mapping known in one concept to the

metrics in this study. The proper transformation for performance same mapping in dual concept. For example, a mapping exists

metrics duality lies behind the column-row approach on in one definition of a metric could be transformed or seek in

confusion matrix that we described above. The row versus corresponding dual metric.

column transformation corresponds to prediction versus reality.

Because all the performance measures and metrics are for classification model and always labeled a given instance with

binary classes and normalized in range (0, 1) mostly and (-1, 1) Negative. Any classification should perform better than this

rarely, the ratios could be complemented. Therefore, all the limit. NIR that is an improved version of NER specifies the

measures and metrics have complements. The complements minimum performance by counting the larger condition from

could be written by this adapted notation: Positive and Negative conditions. Eventually, the following

conditions should be achieved at least for a better classification.

" # Note that a classification may have a close performance to NER

# & ! " and NIR measures, this case is called as Accuracy Paradox

% " ( " ( [10], [24]. Therefore, providing Accuracy as a single

& $ & performance metric is not sufficient.

%" % ( " (

In contrast with duality, having both a measure/metric and ' )

CKc is Kappa Chance coefficient that is also called as

complement could be used for simplification of equations or Random Agreement consists of column and row totals. Go

switching the primary point of view to another one such as

further in geometry as seen in Figure 1, likelihood ratios (LRP,

switching from positive condition based view (e.g. TPR or PPV) LRN) are row based measures that do not have an upper limit

to negative one (e.g. FNR or FDR) or focusing on errors (i.e.

(e.g. not in range (0, 1) as in other metrics). Odds Ratio (OR) is

Misclassification Rate, MCR) instead of correctness (i.e. ACC) a 3rd level measure. It is the ratio of Positive Likelihood Ratio to

as illustrated in Figure 1.

Negative Likelihood Ratio 2nd level measures. Positives have

VII. THE 2ND LEVEL MEASURES OR times the odds of being positive compared to Negatives [25].

Discriminant Power (DP) and OR that are different from all the

Prevalence is the ratio of Positive condition size (P) to the metrics and measures are in (-, +) range.

total number of conditions (P + N) or sample size (Sn).

Prevalence (PREV), which is called as Pretest Probability of IX. BEYOND THE CONTROVERSIAL ACCURACY METRIC

Admission in medicine, is related to the reality. (Label) Bias

(BIAS) is the ratio of Outcome Positive size (OP) to the total A. Informedness, Markedness

number of test size (OP + ON) or sample size (Sn). Bias is Informedness and Markedness are dual metrics, the former

related to the prediction (classification output). It is also known is about reality whereas the latter is about prediction. It

as Detection Prevalence or Warning Rate. Prevalence and represents consolidated true classification capability per

bias are dual measures. (Class) skew that is the ratio of majority condition so that the performance is about reality. Informedness

class to minority class (usually N to P) is an important measure is equal to Peirce Skill Score (e.g. in weather forecasting [19]).

along with Prevalence in classification [9]. Markedness, also known as Difference in Proportions

(NPVFOR) represents consolidated true classification

VIII.BASE AND CONVENTIONAL 1ST LEVEL METRICS capability per test outcome so that the performance is about

The base metrics are calculated within measures in confusion prediction. Markedness is equal to Clayton Skill Score [19]

matrix columns (vertically) or rows (horizontally) as seen in

Figure 1. The column type base metrics (TPR, TNR, FPR, FNR) B. More TPR-TNR Combinations

are the rate of each base measure to corresponding condition BACC and Gm are the alternatives to ACC. They are actually

(Positive or Negative). The row type base metrics are the rates means of correct classification rates; the former is arithmetic and

of each base measure to corresponding test output. The first level the latter is geometric mean of TPR and TNR. BACC, also known

metrics are the most preferred and known metrics in expressing as Strength. Gm is also known as FowlkesMallows index.

binary classification performance as a single value. C. F-measures (F1, F0.5, F2, F), Cohens Kappa (CK),

A. Accuracy (ACC) Matthews Correlation Coefficient (MCC)

Accuracy is the ratio of total number of correct F-measures are the metrics covering two base metrics one

classifications to sample size. It is a diagonal metric covering from column (TPR) and one from row (PPV). F-measures that

neither rows nor columns of confusion matrix. It was defined as are harmonic means of those two metrics are actually parametric

a similarity measure called matching coefficient between two metrics, which define the weights on TPR and PPV. They are

individuals characterized by a number of binary attributes by insensitive to TN [26]. F1 is harmonic mean of precision and

Sokal and Michener in 1958 [23]. Accuracy is the most provided recall with equal weights whereas F0.5 puts more emphasize on

and perhaps the most abused metric in binary classification TPR whereas F2 puts more emphasize on PPV. CK that is a

performance reports as described below. bidirectional metric in (-1, 1) range is related to Accuracy [9] (it

is equal to Heidke Skill Score [19]). MCC is bidirectional in (-1,

B. Minimum Expected Performance and Other Measures 1). F-measures, CK, and MCC are the unconventional metrics

Although the ultimate goal of a classification is achieving the that we suggest the researchers take into account additional to

highest accuracy as possible another performance criterion may Accuracy and other metrics.

be disregarded: what is the minimum performance measure that

should be expected that a binary classification? The two X. DISCUSSION AND CONCLUSION

measures are the answers to this question namely Null Error Performance metrics are critical instruments for assessing

Rate (NER) and No Information Rate (NIR). NER specifies the and expressing the success of a binary classification study.

minimum successful classification rate if we do not have a Although many metrics are available in the literature only a few

permission metrics are reported in the conducted studies. As Association Measures, Int. J. Appl. Math. Comput. Sci., vol. 25, no. 3,

explained in Accuracy Paradox, these accustomed metrics pp. 645657, 2015.

have some complications. Using the proper and sufficient [7] C. Seung-Seok, C. Sung-Hyuk, and C. C. Tappert, A Survey of Binary

Similarity and Distance Measures., J. Syst. Cybern. Informatics, vol. 8,

number of metrics while comparing different binary no. 1, pp. 4348, 2010.

classification approaches leads to a more objective assessment. [8] Z. Hublek, Coefficients of Association and Similarity, Based on

We have seen that there are many studies in the literature on Binary (Presence-Absence) Data: an Evaluation, Biol. Rev., vol. 57, no.

4, pp. 669689, 1982.

several domains (botanical, meteorology, chemistry, biology,

[9] S. Straube and M. M. Krell, How to evaluate an agents behavior to

medicine, economics, malware analysis, etc.) in which different infrequent events? Reliable performance estimation insensitive to class

metrics and measures are proposed. Because of domain distribution, Frontiers in Computational Neuroscience, vol. 8, no.

independence of performance measures, those studies may April, pp. 16, 2014.

provide researchers new alternative resources in other domains [10] F. J. Valverde-Albacete and C. Pelez-Moreno, 100% classification

to explore so that knowledge transfer would be possible. accuracy considered harmful: The normalized information transfer factor

explains the accuracy paradox, PLoS One, vol. 9, no. 1, 2014.

We also have seen that there are many different notation [11] M. Sokolova and G. Lapalme, A systematic analysis of performance

adaptations in the related studies and established different measures for classification tasks, Inf. Process. Manag., vol. 45, no. 4,

notation conventions for the same metrics on different domains pp. 427437, 2009.

(e.g. recall vs. sensitivity for TPR). Nevertheless, we suggested [12] V. Labatut and H. Cherifi, Evaluation of Performance Measures for

our specific naming and notation that still reflects the majority Classifiers Comparison, Ubiquitous Comput. Commun. J., vol. 6, pp.

2134, 2011.

of common conventions in machine learning classification

literature. We preferred abbreviations of metrics that are more [13] B.-G. Hu and W.-M. Dong, A study on cost behaviors of binary

classification measures in class-imbalanced problems, Comput. Res.

explicit for binary classification. Repos., vol. abs/1403.7, 2014.

In this study, we have uncovered the semantic and formal [14] A. Forbes, Classification-algorithm evaluation: five performance

distinction between performance measure and metric. Although measures based on confusion matrices, J. Clin. Monit. Comput., vol. 11,

no. 3, pp. 189206, 1995.

classification performance measures/metrics are indispensable

[15] R. E. Tulloss, Assessment of Similarity Indices for Undesirable

instruments of classification experiments, the confusing Properties and a new Tripartite Similarity Index Based on Cost

terminology that is widespread in even academic studies has not Functions, in Mycology in Sustainable Development: Expanding

been clarified before. We have suggested formal rules to Concepts, Vanishing Borders., 1997, pp. 122143.

determine a given equation as a measure or metric as well as [16] N. Lavesson and P. Davidsson, Analysis of Multi-Criteria Metrics for

establishing the terminology. We have provided 44 measures Classifier and Algorithm Evaluation, in Proceedings of the 24th

and metrics with their equations and introduced new concepts AnnualWorkshop of the Swedish Artificial Intelligence Society, 2007, pp.

1122.

on handling them such as canonical forms to use in definitions,

[17] O. O. Koyejo, N. Natarajan, P. K. Ravikumar, and I. S. Dhillon,

relating different measures/metrics by duality/complementation. Consistent Binary Classification with Generalized Performance

In addition, we have designed a comprehensive, visualized, Metrics, Adv. Neural Inf. Process. Syst. 27 Annu. Conf. Neural Inf.

Process. Syst. 2014, December 8-13 2014, Montr. Quebec, Canada, pp.

row/column structured, leveled, and all-in-one style roadmap 27442752, 2014.

shown in Figure 1 that could be considered as the periodic table [18] K. J. van Stralen, V. S. Stel, J. B. Reitsma, F. W. Dekker, C. Zoccali, and

of elements in binary classification performance in machine K. J. Jager, Diagnostic methods I: sensitivity, specificity, and other

learning. Our new basing and leveling approach in the layout measures of accuracy, Kidney Int., vol. 75, no. 12, pp. 12571263, 2009.

contributes to ordering measures/metrics logically and providing [19] D. S. Wilks, Statistical methods in the atmospheric sciences, 2nd ed.,

a more comprehensive style. It is expected that the domain- vol. 59. Elsevier, 2006.

independent roadmap gives researchers a more standard, [20] P. P. Texel, Measure, metric, and indicator: An object-oriented

complete and easier way to understand the performance approach for consistent terminology, in Proceedings of IEEE

Southeastcon, 2013.

measures/metrics, their relations and dependencies.

[21] G. McWilliams, S. Sezer, and S. Y. Yerima, Analysis of Bayesian

ACKNOWLEDGMENT classification-based approaches for Android malware detection, IET

Inf. Secur., vol. 8, no. 1, pp. 2536, 2014.

G.C. thanks HAVELSAN for supporting this study. [22] H. M. Lomel, Punishing the uncommitted crime: Prevention, pre-

emption, precaution and the transformation of criminal law, in Justice

REFERENCES and Security in the 21st Century: Risks, Rights and the Rule of Law, 1st

[1] A. Shaar, T. Abdessalem, and O. Segard, Pessimistic Uplift Modeling, ed., B. Hudson and S. Ugelvik, Eds. Abingdon, Oxon, United Kingdom:

in 22nd SIGKDD Conference on Knowledge Discovery and Data Mining Routledge, 2012.

(ACM SIGKDD), 2016. [23] D. W. Goodall, The distribution of the matching coefficient,

[2] C. Sammut and G. I.Webb, Eds., Encyclopedia of Machine Learning. Biometrics, vol. 23, no. 4, pp. 647656, 1967.

New York: Springer, 2011. [24] T. Bruckhaus, The business impact of predictive analytics, in

[3] M. J. Warrens, Similarity Coefficients for Binary Data, Leiden Knowledge Discovery and Data Mining: Challenges and Realities, X.

University, 2008. Zhu and I. Davidson, Eds. Information Science Reference, 2007, pp.

114138.

[4] M. R. Anderberg, Measures of Association among Variables, in

Cluster Analysis for Applications: Probability and Mathematical [25] M. Szumilas, Explaining odds ratios, J. Can. Acad. Child Adolesc.

Statistics: A Series of Monographs and Textbooks, Academic Press, Psychiatry, vol. 19, no. 3, pp. 227229, 2010.

1973, pp. 8292. [26] D. M. W. Powers, What the F-measure doesnt measure: Features,

[5] M. M. Deza and E. Deza, Encyclopedia of Distances. Springer, 2009. Flaws, Fallacies and Fixes, KIT-14-001, 2015.

[6] M. Paradowski, On the Order Equivalence Relation of Binary

- 118Uploaded byJiovanny
- Pre linkUploaded bySouvik Pratiher
- JournalNX- Big Data AnalysisUploaded byJournalNX - a Multidisciplinary Peer Reviewed Journal
- IRJET-Photo Optical Character Recognition ModelUploaded byIRJET Journal
- QualirtuUploaded bysrinivass115
- An Optical Character Recognition for Handwritten Devanagari ScriptUploaded byAnonymous 7VPPkWS8O
- Diagnosis of Heart Disease Using Data Mining AlgorithmUploaded byHeshan Rodrigo
- Lecture # 1- Pattern RecognitionUploaded byMaissa Hassan
- Science TextUploaded byCatherine Magpantay-Mansia
- Lecture1.pptxUploaded byGlairet Gonzalez
- Chapter - 2 Notes - Classification & Tabulation.docUploaded byKishan Gupta
- iris.pdfUploaded byAnonymous Kuf6yfG
- JCIT4-184028 Camera ReadyUploaded bydiankusuma123
- 91.pdfUploaded byidreesgis
- vss2010_kohlerUploaded bydetskerbareikke2794
- DMMLUploaded byromeoopk
- 778_2009_Article_170Uploaded bycutiepie321
- 123-hersaUploaded byJorge
- Anomaly Detection: A tutorialUploaded byBalaji
- Learning to Detect Phishing UrlsUploaded byInternational Journal of Research in Engineering and Technology
- Machine Learning Algorithms ApplicationsUploaded byasadfx
- Deep Learning for Comp Bio ReviewUploaded byErmia Bivatan
- HELTHcrmUploaded byBoris Milovic
- IJAIEM-2014-12-31-96Uploaded byAnonymous vQrJlEN
- Automatic Detection of Diabetic Retinopathy From Color Fundus Retinal ImagesUploaded byEditor IJRITCC
- Mini-Lecture for CPR_051511Uploaded byKuan-Fu Chen
- Pattern RecognitionUploaded byAjayChandrakar
- early Glaucoma Detection and OCTUploaded byYazmin Peñaloza Roa
- Machine Learning __ Text Feature Extraction (Tf-idf) – Part I _ Terra Incognita Part 1Uploaded bySeun -nuga Daniel
- Cv TaycirYahmed RecommendationsUploaded byAnonymous nG1kNGBwQ

- Fifty Challenging Problems In Probability With SolutionsUploaded byapi-26125866
- Workflow of statistical data analysisUploaded bysharathdhamodaran
- Reg Ex Cheat SheetUploaded byIan Flores
- IBM presentationUploaded bysharathdhamodaran
- python-regular-expressions-cheat-sheet.pdfUploaded bysharathdhamodaran
- 327191804-Cheat-Sheet-Calculus-pdf.pdfUploaded bymichael17ph2003
- Lecture 3. Partitioning-Based Clustering MethodsUploaded bysharathdhamodaran
- SQL Server interview questionsUploaded byJigar Badgujar
- 2015-FIeld-Guide-To-Data-Science.pdfUploaded byLordger Liu
- Machine Learning Cheat Sheet 2015.pdfUploaded byFlorent Bersani
- Machine Learning Cheat Sheet 2015.pdfUploaded byFlorent Bersani

- Proceedings IPW11Uploaded byPablo Benitez
- Dispersion in MathsUploaded byVihar Patel
- Astm d1871Uploaded byEdwin Ramirez
- Music Intervention for Burn PatientUploaded byMedhia Iqlima
- A2Uploaded bySan Li
- 14SCS41Uploaded bySharath Kumar V
- Final Report on Using Eu Indicators of Immigrant Integration June 2013 EnUploaded byAndrei Gagiu
- introduction to using statistics in Archaeology Part 1Uploaded byDr Richard Haddlesey
- Childrens Education and Home Electrification_ a Case Study in Northwestern carUploaded byLeonardo Ramirez Lopez
- Key FormulasUploaded byJodie Hernández
- _ps115Uploaded byDuma Dumai
- educ3625 assignment 2b newUploaded byapi-428315733
- 2D Versus 3D Kinematic Horse Trot Miró 2009Uploaded byFernanda Godoi
- pmwj40-Nov2015-Narbaev-project-management-maturity-in-kazakhstan.pdfUploaded byTimur Narbaev
- Analysis of Clustered and Longitudinal Data (Course Content Syllabus)Uploaded byMecobio
- 3.quiz.docxUploaded byVanny Joyce
- formula Add MathUploaded byFaez Darlis
- ANG2ed-3-rUploaded bybenieo96
- Statistics a Gentle Introduction Ch_3Uploaded byabdal113
- QuantitativeUploaded byMadhan Mohan
- Machine Learning Applications in FinanceUploaded byOwen
- Topic 1 the Nature of Probability and StatisticsUploaded byKenneth Leong
- 11 Chapter 4Uploaded byGaurav Tanwar
- The Data Science Design ManualUploaded bysabsamtarte
- Threshold Autoregressive Models in Finance- A Comparative ApproacUploaded byRara Aya Tiara
- Statistics for Business and Economics 12th Edition Anderson Test BankUploaded byMahmoud Wehd
- Social WorkUploaded byboltu
- pstUploaded bySivabalan
- The Role of Culture in Emotion-Antecedent Appraisal - Klaus R. SchererUploaded byDanielsows
- moment generation functionUploaded byAzwan Mahmud