Visualize of Machine Learning Decisions2

Seite 1, Printdate: 28/04/2020, 16:00 Uhr
(Kopfzeile)
Visualise and Validate of Machine Learning Data in

VS Code.
Subhead 1: Explainable models create trust
The further development of visualization in code has brought about some interesting and
promising innovations in recent years. This includes in particular the continuous integration of some
special technologies of machine learning mapping such as the integration of the Jupyter notebook
format in VS code, MS Power BI or the calling of tensorboard by TensorFlow to display and record
the training results. This illustrates how far the optimization of code visualization has already
progressed or could be.
by Max Kleiner
However, an immediate benefit is already clear today: Areas such as robotics, expert systems,
mathematical optimization, anomaly detection, feature reduction or model-based control would be
easier to explain if the model could show the features found for decision directly by means of a
corresponding graphic. The basics of this report serve this purpose.
The goal is to understand as exactly as possible why and how an AI makes certain decisions. With
image recognition algorithms, for example, a colored heat map shows the locations of an image that
are particularly relevant for its classification.
We start with a simple data set of a classification system and visualize the decision of the
classification with a confusion matrix and associated heat map. As an IDE, I use Visual Studio Code
with the two configuration files tasks.json and the project-specific settings.json including test units
and path details. Both files can be viewed as a listing below.
As an introduction to VS code with Python, I can recommend the tutorial [1], which Microsoft has
published with the current version March 2020 (version 1.44): "Tutorials for creating Python
containers and building Data Science models".
Now we start with the imported modules in Listing 1 and call our script logregclassifier2.py [2]
or [7] as a notebook.
Listing 1
// get the modules as we need
import matplotlib.pyplot as plt
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix
Ende
The Dataset
After different ML projects, I wanted to write this article to share my experience and maybe
help some of you integrate Machine Learning with classification. The data itself is deliberately
neutral and simple, so that it is optimally clarified and understandable. I chose a (completely
senseless) data series from 0 to 9 (samples) as training data to classify a target with 0 and 1 1. In the
case of known labels (target), one also speaks of supervised learning. So we want to train the system
so far that the low numbers are likely to be classified with 0 and the high numbers with 1 :.
X=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
y=[0, 0, 0, 0, 1, 1, 1, 1, 1, 1]
Listing 2
// arrays for the input (X) and output (y) values:
X = np.arange(10).reshape(-1, 1)
y = np.array([0, 0, 0, 0, 1, 1, 1, 1, 1, 1])
Ende
We can use the np.arange (10) command to create an array that contains all the integers from 0
to 9. As a convention, I see X as a two-dimensional array (matrix) and y as a one-dimensional target
(vector). Reshape (-1.1) means we only have 1 feature as a column. Features are the feature carriers
which help the model to find unknown patterns.
Now I define the model that already trains the data with the fit method to create a relationship
between the influencing variables (determinants) and the target:
Listing 3
// Once you have input and output prepared, define your classification model.
model= LogisticRegression(solver='liblinear', random_state=0)
model.fit(X, y)
print(model)
Ende
Now the model is set up, and accordingly I can now use predict () to try a first classification
with a score and immediately create the confusion matrix to validate. Needless to say that the
implementation of ML-based solutions can lead to major cost savings, higher predictability, and the
increased availability of the systems.
Listing 4
print(model.predict(X))
print(model.score(X, y))
// One false positive prediction: The fourth observation is a zero that was
wrongly predicted as one.
print(confusion_matrix(y, model.predict(X)))
Ende
Real: [0 0 0 0 1 1 1 1 1 1]
Predict: [0 0 0 1 1 1 1 1 1 1]
Score: 0.9
Confusion Matrix:
0: [[3 1] :4
1: [0 6]] :6
1 It could also be patients 0 to 9 who are taking a medical test.

And lo and behold, a false positive (false alarm) has crept in. The model mistakenly classified a
0 as 1, as if the system incorrectly activated a quiet situation as a fire alarm. The confusion matrix
shows this as a false alarm (false positive).
Ideally, with a 100% score, the matrix has the following picture:
0: [[4 0] :4
1: [0 6]] :6
The data set becomes an image

The next step is the visual preparation of the matrix in order to create an optical relationship
between the real data and the predicted ones.
Listing 5
plt.rcParams.update({'font.size': 16})
fig, ax = plt.subplots(figsize=(4, 4))
ax.imshow(cm)
ax.grid(False)
ax.xaxis.set(ticks=(0, 1), ticklabels=('Predicted 0s', 'Predicted 1s'))

ax.yaxis.set(ticks=(0, 1), ticklabels=('Actual 0s', 'Actual 1s'))
ax.set_ylim(1.5, -0.5)
for i in range(2):
for j in range(2):
ax.text(j, i, cm[i, j], ha='center', va='center', color='red')
plt.show()
Ende
Fig. 1: Konfusionsmatrix with Pyplot

file: logreg2cm2.png
This graphic can also be made simpler and more modern with an additional library. We need
the Python Library Seaborn, which can best be installed directly in the VS Code with Pip Install
using the integrated command line shell. By the way validate and translate are 2 funny words.
Listing 6
import seaborn as sns
// get the instance of confusion_matrix:
cm = confusion_matrix(y, model.predict(X))
sns.heatmap(cm, annot=True)
plt.title('heatmap confusion matrix')
plt.show()
Ende
Fig. 2: Konfusionsmatrix with Seaborn

File: heatmapconfusionmatrix.png
Class 0 has 3 correct cases (true negative) and class 1 has 6 correct cases (true positive). User
accuracy also shows a single false positive result. The user accuracy (consumer risk versus producer
risk) is also referred to as transfer errors or errors of type 1, errors of type 2 are then false negative.
The .heatmap () function from the "seaborn" library defines the type of diagram I'm using. The
following arguments parameterize the appearance of the diagram. Let's take a look at the error
analysis, which is defined by the default threshold of probability at 0.5. The discrimination between
0 and 1 took place too early, so that our model classified a 0 too early as 1. Of course, these so-called
hyper parameters can be optimized to find a fairer distribution of the classification.
It has to be said that the effect on discrete, dichotomous variables [0,1] cannot be explained and
verified with the method of the classic linear regression analysis.
Hyperparameter
The current distribution with the associated classification looks like this:
Fig. 3: The first 3 samples are counted as 0 and the rest as 1.

File: class_logplot2.png
Listing 7
sns.set(style = 'whitegrid')
sns.regplot(X, model.predict_proba(X)[:,1], logistic=True,
scatter_kws={"color": "red"}, line_kws={"color": "blue"})
#label=model.predict(X))
plt.title('Logistic Probability Plot')
plt.show()
Listing 7 contains the estimated probability as a target in the regplot function. Not every
classifier offers the internal probabilities. The Naive Bayes classifier, which is named after the
English mathematician Thomas Bayes, is also probabilistic 2; it is derived from the Bayes theorem.
The corresponding decision boundary is also visually recognizable for the analysis and helps to
interpret the result or to find a better solver (see below):
2 The basic assumption with the naive Bayes classifier is (therefore naive) to assume that the
characteristics used are strictly independent.
Fig. 4: Decision Boundary with the false positive (blue dot in white area)
file: classifier_decision2.png
Imagine a medical research institute proposing a screening to screen a large group of people for
the presence of a particular disease (which is for the moment context-sensitive). An important
counter argument for such a screening are the false positive results, which we have to consider as a
conditional probability:
T precision recall f1-score support CM
0 1.00 0.75 0.86 4 [[3 1]

1 0.86 1.00 0.92 6 [0 6]]
Table 1: Classification Report
We can see from the table that there is 1 case of false positive and no case of false negative.
This means that only in 86% of all cases a positive result also corresponds to a disease, the precision
pays off as follows: True positive / (True positive + False positive) =
6 / (6+1) = 0.8571 = 0.86
It is therefore crucial to include the false positive cases in the accuracy of the tests (screening).
By the way, similar examples of conditional probability can be found on the website "Lies with
Statistics" [3]. Again, I calculated and visualized a case (from the field of mammography) and the
false positives look more complex:
Fig. 5: Non-linear analysis of false positives in a hyperplane (Support Vector Machine)

File: cell_class_boundaries.png
Optimise with Optic

Now we want to bring the hyper parameters mentioned into play, some of which exist and
which are part of the model evaluation.
 C is a positive floating point number (1.0 by default) that defines the relative strength of the
regularization. Smaller values indicate a stronger regularization.
 Solver is a string ('liblinear' by default) that decides which solver to use to customize the model
and can be part of a kernel. Other options are 'newton-cg', 'lbfgs', 'sag' and 'saga'.
 max_iter is an integer (100 by default) that defines the maximum number of iterations through
the solver during model fitting.
Listing 8
model = LogisticRegression(solver='liblinear', C=1, random_state=0).fit(X, y)
// show more model details
print(model)
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, max_iter=100, multi_class='warn',
n_jobs=None, penalty='l2', random_state=0, solver='liblinear',
tol=0.0001, verbose=0, warm_start=False)
The actual adjustment is simply and means to use a different solver:

model = LogisticRegression(solver='lbfgs', C=1, random_state=0).fit(X, y)
print(classification_report(y, model.predict(X)))
In Listing 8 above we can see the preset model parameters, which can of course be changed.
However, I cannot directly determine the best value for a model hyper parameter in relation to a
specific problem. You can use empirical values, copy values that I have used for other problems, or
try to find the best value by trying. I mainly use the value C (regulator), the kernel or the solver to
optimize.
The difference between parameters and hyperparameters:
Model Hyper parameters have to be defined before the training and cannot be learned from the
model (e.g. learning rate, hidden layers, regulator).
Model parameters are then learned from the model and are derived from the data (e.g. word
frequency, weighting, bias, variance).
Hyper-parameters are those which we supply to the model, for example: number of hidden
Nodes and Layers,input features, Learning Rate, Activation Function etc in Neural Network, while
Parameters are those which would be learned by the machine like Weights and Biases.
In machine learning, a model M with parameters and hyper-parameters looks like,
Y≈MH(Φ|D)
where Φ are parameters and H are hyper-parameters. D is training data and Y is output data (class
labels in case of classification task). y≈MH(Φ|X)
A model hyper parameter is a configuration that is external to the model and whose value cannot be
estimated from data.
 They are often used in processes to help estimate model parameters.

 They are often specified by the practitioner.
 They can often be set using heuristics.
 They are often tuned for a given predictive modelling problem.
We cannot know the best value for a model hyper-parameter on a given problem. We may use
rules of thumb, copy values used on other problems, or search for the best value by trial and error.
Once we have all this information, it becomes possible to decide which modelling strategy fits best
with the available data and the desired output. The results are now optimal in terms of the quality of
the algorithm of our number series, which also withstand the optical comparison to a decision tree.
There are multiple modelling strategies for predictive maintenance and we will describe two of them
(that I worked almost on the most) concerning the question they aim to answer and which kind of
data they require for example in the domain of predictive maintenance:
1. Regression models to predict remaining useful lifetime (RUL)

2. Classification models to predict failure within a given time window
For this scenario, we need static and historical data, and that every event is labelled. Moreover,
several events of each type of failure must be part of the dataset. Ideally, we prefer to build such
models when the degradation process is linear [9].
Fig. 6: Optimal decision of the classification

File: class_logplot3optsolver.png
The decision tree procedure in Listing 9 is a common option for regression or classification using a
multivariate data set. I can use the procedure, for example, to classify the solvency of customers or
to form a function to predict false reports3 or fake news.
In practice, however, the process presents data scientists with major challenges with regard to their
interpretation and overfitting (memorizing the trained examples), even though the tree itself offers
transparent and legible graphics. For this I use the installed Graphviz2.38 in VS Code and an
additional line in the code that directly sets the path information in the OS path. So I can configure
adjustments to another version or platform directly in the code.
Listing 9
from sklearn.tree import DecisionTreeClassifier

from converter import app, request
import unittest
import os
from sklearn.tree import DecisionTreeClassifier
from sklearn.tree import export_graphviz
os.environ["PATH"] += os.pathsep + 'C:/Program Files (x86)/Graphviz2.38/bin/'
os.environ["PATH"] += os.pathsep + 'C:/Program Files/Pandoc/'
3 Fraud detection is a knowledge-intensive activity.

Fig. 7: The confusion matrix no longer has any wrong ones.

File: heatmapconfusionmatrix_solver.png
Important: The dimensions of the confusion matrix are unfortunately not standardized. In the
example, the truth is "Real Actual" in the rows and the estimate "Predict" in the columns (from
Present to Target), but depending on the software used, the dimensions can be reversed. It seems
important to me to start the matrix at 0, i.e. to standardize True Negative at the top left, see Fig. 8.
And clearly for an N-class problem, the confusion matrix then consists of an NxN matrix, so it is not
limited to a binary classification.
Abb. 8: a standardized confusion matrix, File: cm_mock_template.png

Jupyter Notebook in VS Code
Here is a look at the integration of Jupyter [6]. Jupyter (formerly IPython Notebook) is an
open source project, with which I can easily combine interactive markdown text and executable
Python source code on a canvas, which is known as a notebook. Visual Studio Code supports
working with Jupyter notebooks and Python code files, and my experience with debugging or code
metrics is also good.
An Anaconda environment in VS Code or another Python environment is required to work with
Jupyter notebooks, but a Jupyter package must be installed beforehand. This gives us the possibility
to directly integrate graphics, document or execute interactive code in VS Code:
Fig. 9: Work with Jupyter

File: vscode_jupyter_librosa_demo2.png
Fig. 10: With the terminal, images can also be controlled interactively in code!
File: vscode_jupyter_librosa_demo3.png
Listing 10 viper2\.vscode\settings.json
{
"python.pythonPath":
"C:\\Users\\Max\\AppData\\Local\\Programs\\Python\\Python37\\python.exe",
"python.testing.pytestArgs": [
"freshonion"
],
"python.testing.unittestEnabled": false,
"python.testing.nosetestsEnabled": false,
"python.testing.pytestEnabled": false,
"python.testing.unittestArgs": [
"-v",
"-s",
"./freshonion",
"-p",
"*test.py"
],
"python.testing.promptToConfigure": false
}
Ende
Listing 11 \viper2\.vscode\tasks.json
{
// See https://go.microsoft.com/fwlink/?LinkId=733558
// for the documentation about the tasks.json format
// build from older win8.1. to win10.2 by max
"version": "2.0.0",
"tasks": [
"label": "buildpython",
"type": "shell",
"command":
"C:\\Users\\Max\\AppData\\Local\\Programs\\Python\\Python37\\python.exe",
"args": ["${file}"],
"showOutput":"always",
"problemMatcher": [],
"group": {
"kind": "build",
"isDefault": true
}
Ende
Max Kleiner's professional environment lies in the areas of machine learning, e-learning, OOP,
UML and system architecture - including as a trainer, developer, consultant and publicist. His focus
is on training, IT security, databases and frameworks that work in an event-oriented manner. As a
lecturer and consultant at a university of applied sciences and on behalf of a company,
microcontrollers and IoT have also been added. His book "Patterns in C #", published in 2003, is still
up to date with the Clean Code Initiative.
https://basta.net/speaker/max-kleiner/
Links & Literature

[1] https://code.visualstudio.com/docs/python/data-science-tutorial
[2] http://www.softwareschule.ch/examples/logregclassifier2.py.txt
[3] https://de.statista.com/statistik/lexikon/definition/8/luegen_mit_statistiken/
[4] https://sourceforge.net/projects/cai/
[5] https://maxbox4.wordpress.com/blog/
[6] https://code.visualstudio.com/docs/python/jupyter-support
[7] https://github.com/maxkleiner/maXbox/blob/master/logisticregression2.ipynb
Literature of the Free Book:

[8] https://www.oreilly.com/programming/free/python-data-for-developers.csp
[9] https://towardsdatascience.com/how-to-implement-machine-learning-for-predictive-maintenance-
4633cdbe4860
Appendix Source package for MS PowerBI: PBIDesktop_x64.msi
News
Python Data for Developers

A Curated Collection of Chapters from the O'Reilly
Data and Programming Library
Get the free ebook

Data is everywhere, and not just for data scientists. Developers are increasingly seeing it enter their
realm, requiring new skills and problem solving. Python has emerged as a giant in the field,
combining an easy-to-learn language with strong libraries and a vibrant community. If you have a
programming background (in Python or otherwise), this free ebook will provide a snapshot of the
landscape for you to start exploring more deeply.
https://www.oreilly.com/

Visualize of Machine Learning Decisions2

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Visualize of Machine Learning Decisions2

Hochgeladen von

Copyright:

Verfügbare Formate

Seite 1, Printdate: 28/04/2020, 16:00 Uhr

Visualise and Validate of Machine Learning Data in

import matplotlib.pyplot as plt

from sklearn.linear_model import LogisticRegression

from sklearn.metrics import classification_report, confusion_matrix

model= LogisticRegression(solver='liblinear', random_state=0)

wrongly predicted as one.

1 It could also be patients 0 to 9 who are taking a medical test.

The data set becomes an image

fig, ax = plt.subplots(figsize=(4, 4))

ax.xaxis.set(ticks=(0, 1), ticklabels=('Predicted 0s', 'Predicted 1s'))

ax.text(j, i, cm[i, j], ha='center', va='center', color='red')

Fig. 1: Konfusionsmatrix with Pyplot

// get the instance of confusion_matrix:

plt.title('heatmap confusion matrix')

Fig. 2: Konfusionsmatrix with Seaborn

Fig. 3: The first 3 samples are counted as 0 and the rest as 1.

sns.regplot(X, model.predict_proba(X)[:,1], logistic=True,

scatter_kws={"color": "red"}, line_kws={"color": "blue"})

plt.title('Logistic Probability Plot')

T precision recall f1-score support CM

0 1.00 0.75 0.86 4 [[3 1]

Table 1: Classification Report

6 / (6+1) = 0.8571 = 0.86

Fig. 5: Non-linear analysis of false positives in a hyperplane (Support Vector Machine)

Optimise with Optic

// show more model details

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,

intercept_scaling=1, max_iter=100, multi_class='warn',

n_jobs=None, penalty='l2', random_state=0, solver='liblinear',

tol=0.0001, verbose=0, warm_start=False)

The actual adjustment is simply and means to use a different solver:

The difference between parameters and hyperparameters:

 They are often used in processes to help estimate model parameters.

1. Regression models to predict remaining useful lifetime (RUL)

Fig. 6: Optimal decision of the classification

from sklearn.tree import DecisionTreeClassifier

from sklearn.tree import DecisionTreeClassifier

from sklearn.tree import export_graphviz

os.environ["PATH"] += os.pathsep + 'C:/Program Files (x86)/Graphviz2.38/bin/'

os.environ["PATH"] += os.pathsep + 'C:/Program Files/Pandoc/'

3 Fraud detection is a knowledge-intensive activity.

Fig. 7: The confusion matrix no longer has any wrong ones.

Abb. 8: a standardized confusion matrix, File: cm_mock_template.png

Jupyter Notebook in VS Code

Fig. 9: Work with Jupyter

// for the documentation about the tasks.json format

// build from older win8.1. to win10.2 by max

Links & Literature

Literature of the Free Book:

Appendix Source package for MS PowerBI: PBIDesktop_x64.msi

Python Data for Developers

Get the free ebook

Das könnte Ihnen auch gefallen