Beruflich Dokumente
Kultur Dokumente
(Kopfzeile)
The further development of visualization in code has brought about some interesting and
promising innovations in recent years. This includes in particular the continuous integration of some
special technologies of machine learning mapping such as the integration of the Jupyter notebook
format in VS code, MS Power BI or the calling of tensorboard by TensorFlow to display and record
the training results. This illustrates how far the optimization of code visualization has already
progressed or could be.
by Max Kleiner
However, an immediate benefit is already clear today: Areas such as robotics, expert systems,
mathematical optimization, anomaly detection, feature reduction or model-based control would be
easier to explain if the model could show the features found for decision directly by means of a
corresponding graphic. The basics of this report serve this purpose.
The goal is to understand as exactly as possible why and how an AI makes certain decisions. With
image recognition algorithms, for example, a colored heat map shows the locations of an image that
are particularly relevant for its classification.
We start with a simple data set of a classification system and visualize the decision of the
classification with a confusion matrix and associated heat map. As an IDE, I use Visual Studio Code
with the two configuration files tasks.json and the project-specific settings.json including test units
and path details. Both files can be viewed as a listing below.
As an introduction to VS code with Python, I can recommend the tutorial [1], which Microsoft has
published with the current version March 2020 (version 1.44): "Tutorials for creating Python
containers and building Data Science models".
Now we start with the imported modules in Listing 1 and call our script logregclassifier2.py [2]
or [7] as a notebook.
Listing 1
// get the modules as we need
import numpy as np
Ende
The Dataset
After different ML projects, I wanted to write this article to share my experience and maybe
help some of you integrate Machine Learning with classification. The data itself is deliberately
neutral and simple, so that it is optimally clarified and understandable. I chose a (completely
Seite 2, Printdate: 28/04/2020, 16:00 Uhr
senseless) data series from 0 to 9 (samples) as training data to classify a target with 0 and 1 1. In the
case of known labels (target), one also speaks of supervised learning. So we want to train the system
so far that the low numbers are likely to be classified with 0 and the high numbers with 1 :.
X=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
y=[0, 0, 0, 0, 1, 1, 1, 1, 1, 1]
Listing 2
// arrays for the input (X) and output (y) values:
X = np.arange(10).reshape(-1, 1)
y = np.array([0, 0, 0, 0, 1, 1, 1, 1, 1, 1])
Ende
We can use the np.arange (10) command to create an array that contains all the integers from 0
to 9. As a convention, I see X as a two-dimensional array (matrix) and y as a one-dimensional target
(vector). Reshape (-1.1) means we only have 1 feature as a column. Features are the feature carriers
which help the model to find unknown patterns.
Now I define the model that already trains the data with the fit method to create a relationship
between the influencing variables (determinants) and the target:
Listing 3
// Once you have input and output prepared, define your classification model.
model.fit(X, y)
print(model)
Ende
Now the model is set up, and accordingly I can now use predict () to try a first classification
with a score and immediately create the confusion matrix to validate. Needless to say that the
implementation of ML-based solutions can lead to major cost savings, higher predictability, and the
increased availability of the systems.
Listing 4
print(model.predict(X))
print(model.score(X, y))
// One false positive prediction: The fourth observation is a zero that was
print(confusion_matrix(y, model.predict(X)))
Ende
Real: [0 0 0 0 1 1 1 1 1 1]
Predict: [0 0 0 1 1 1 1 1 1 1]
Score: 0.9
Confusion Matrix:
0: [[3 1] :4
1: [0 6]] :6
And lo and behold, a false positive (false alarm) has crept in. The model mistakenly classified a
0 as 1, as if the system incorrectly activated a quiet situation as a fire alarm. The confusion matrix
shows this as a false alarm (false positive).
Ideally, with a 100% score, the matrix has the following picture:
0: [[4 0] :4
1: [0 6]] :6
Listing 5
plt.rcParams.update({'font.size': 16})
ax.imshow(cm)
ax.grid(False)
ax.set_ylim(1.5, -0.5)
for i in range(2):
for j in range(2):
plt.show()
Ende
This graphic can also be made simpler and more modern with an additional library. We need
the Python Library Seaborn, which can best be installed directly in the VS Code with Pip Install
using the integrated command line shell. By the way validate and translate are 2 funny words.
Listing 6
import seaborn as sns
cm = confusion_matrix(y, model.predict(X))
sns.heatmap(cm, annot=True)
plt.show()
Ende
Class 0 has 3 correct cases (true negative) and class 1 has 6 correct cases (true positive). User
accuracy also shows a single false positive result. The user accuracy (consumer risk versus producer
risk) is also referred to as transfer errors or errors of type 1, errors of type 2 are then false negative.
The .heatmap () function from the "seaborn" library defines the type of diagram I'm using. The
following arguments parameterize the appearance of the diagram. Let's take a look at the error
analysis, which is defined by the default threshold of probability at 0.5. The discrimination between
0 and 1 took place too early, so that our model classified a 0 too early as 1. Of course, these so-called
hyper parameters can be optimized to find a fairer distribution of the classification.
Seite 5, Printdate: 28/04/2020, 16:00 Uhr
It has to be said that the effect on discrete, dichotomous variables [0,1] cannot be explained and
verified with the method of the classic linear regression analysis.
Hyperparameter
The current distribution with the associated classification looks like this:
Listing 7
sns.set(style = 'whitegrid')
#label=model.predict(X))
plt.show()
Listing 7 contains the estimated probability as a target in the regplot function. Not every
classifier offers the internal probabilities. The Naive Bayes classifier, which is named after the
English mathematician Thomas Bayes, is also probabilistic 2; it is derived from the Bayes theorem.
The corresponding decision boundary is also visually recognizable for the analysis and helps to
interpret the result or to find a better solver (see below):
2 The basic assumption with the naive Bayes classifier is (therefore naive) to assume that the
characteristics used are strictly independent.
Seite 6, Printdate: 28/04/2020, 16:00 Uhr
Fig. 4: Decision Boundary with the false positive (blue dot in white area)
file: classifier_decision2.png
Imagine a medical research institute proposing a screening to screen a large group of people for
the presence of a particular disease (which is for the moment context-sensitive). An important
counter argument for such a screening are the false positive results, which we have to consider as a
conditional probability:
We can see from the table that there is 1 case of false positive and no case of false negative.
This means that only in 86% of all cases a positive result also corresponds to a disease, the precision
pays off as follows: True positive / (True positive + False positive) =
It is therefore crucial to include the false positive cases in the accuracy of the tests (screening).
By the way, similar examples of conditional probability can be found on the website "Lies with
Statistics" [3]. Again, I calculated and visualized a case (from the field of mammography) and the
false positives look more complex:
Seite 7, Printdate: 28/04/2020, 16:00 Uhr
C is a positive floating point number (1.0 by default) that defines the relative strength of the
regularization. Smaller values indicate a stronger regularization.
Solver is a string ('liblinear' by default) that decides which solver to use to customize the model
and can be part of a kernel. Other options are 'newton-cg', 'lbfgs', 'sag' and 'saga'.
max_iter is an integer (100 by default) that defines the maximum number of iterations through
the solver during model fitting.
Listing 8
model = LogisticRegression(solver='liblinear', C=1, random_state=0).fit(X, y)
print(model)
print(classification_report(y, model.predict(X)))
Seite 8, Printdate: 28/04/2020, 16:00 Uhr
In Listing 8 above we can see the preset model parameters, which can of course be changed.
However, I cannot directly determine the best value for a model hyper parameter in relation to a
specific problem. You can use empirical values, copy values that I have used for other problems, or
try to find the best value by trying. I mainly use the value C (regulator), the kernel or the solver to
optimize.
Model Hyper parameters have to be defined before the training and cannot be learned from the
model (e.g. learning rate, hidden layers, regulator).
Model parameters are then learned from the model and are derived from the data (e.g. word
frequency, weighting, bias, variance).
Hyper-parameters are those which we supply to the model, for example: number of hidden
Nodes and Layers,input features, Learning Rate, Activation Function etc in Neural Network, while
Parameters are those which would be learned by the machine like Weights and Biases.
In machine learning, a model M with parameters and hyper-parameters looks like,
Y≈MH(Φ|D)
where Φ are parameters and H are hyper-parameters. D is training data and Y is output data (class
labels in case of classification task). y≈MH(Φ|X)
A model hyper parameter is a configuration that is external to the model and whose value cannot be
estimated from data.
We cannot know the best value for a model hyper-parameter on a given problem. We may use
rules of thumb, copy values used on other problems, or search for the best value by trial and error.
Once we have all this information, it becomes possible to decide which modelling strategy fits best
with the available data and the desired output. The results are now optimal in terms of the quality of
the algorithm of our number series, which also withstand the optical comparison to a decision tree.
There are multiple modelling strategies for predictive maintenance and we will describe two of them
(that I worked almost on the most) concerning the question they aim to answer and which kind of
data they require for example in the domain of predictive maintenance:
For this scenario, we need static and historical data, and that every event is labelled. Moreover,
several events of each type of failure must be part of the dataset. Ideally, we prefer to build such
models when the degradation process is linear [9].
Seite 9, Printdate: 28/04/2020, 16:00 Uhr
The decision tree procedure in Listing 9 is a common option for regression or classification using a
multivariate data set. I can use the procedure, for example, to classify the solvency of customers or
to form a function to predict false reports3 or fake news.
In practice, however, the process presents data scientists with major challenges with regard to their
interpretation and overfitting (memorizing the trained examples), even though the tree itself offers
transparent and legible graphics. For this I use the installed Graphviz2.38 in VS Code and an
additional line in the code that directly sets the path information in the OS path. So I can configure
adjustments to another version or platform directly in the code.
Listing 9
import unittest
import os
Important: The dimensions of the confusion matrix are unfortunately not standardized. In the
example, the truth is "Real Actual" in the rows and the estimate "Predict" in the columns (from
Present to Target), but depending on the software used, the dimensions can be reversed. It seems
important to me to start the matrix at 0, i.e. to standardize True Negative at the top left, see Fig. 8.
And clearly for an N-class problem, the confusion matrix then consists of an NxN matrix, so it is not
limited to a binary classification.
Here is a look at the integration of Jupyter [6]. Jupyter (formerly IPython Notebook) is an
open source project, with which I can easily combine interactive markdown text and executable
Python source code on a canvas, which is known as a notebook. Visual Studio Code supports
working with Jupyter notebooks and Python code files, and my experience with debugging or code
metrics is also good.
An Anaconda environment in VS Code or another Python environment is required to work with
Jupyter notebooks, but a Jupyter package must be installed beforehand. This gives us the possibility
to directly integrate graphics, document or execute interactive code in VS Code:
Fig. 10: With the terminal, images can also be controlled interactively in code!
File: vscode_jupyter_librosa_demo3.png
Seite 12, Printdate: 28/04/2020, 16:00 Uhr
Listing 10 viper2\.vscode\settings.json
{
"python.pythonPath":
"C:\\Users\\Max\\AppData\\Local\\Programs\\Python\\Python37\\python.exe",
"python.testing.pytestArgs": [
"freshonion"
],
"python.testing.unittestEnabled": false,
"python.testing.nosetestsEnabled": false,
"python.testing.pytestEnabled": false,
"python.testing.unittestArgs": [
"-v",
"-s",
"./freshonion",
"-p",
"*test.py"
],
"python.testing.promptToConfigure": false
}
Ende
Listing 11 \viper2\.vscode\tasks.json
{
// See https://go.microsoft.com/fwlink/?LinkId=733558
"version": "2.0.0",
"tasks": [
"label": "buildpython",
"type": "shell",
"command":
"C:\\Users\\Max\\AppData\\Local\\Programs\\Python\\Python37\\python.exe",
"args": ["${file}"],
"showOutput":"always",
"problemMatcher": [],
"group": {
"kind": "build",
"isDefault": true
}
Ende
Seite 13, Printdate: 28/04/2020, 16:00 Uhr
Max Kleiner's professional environment lies in the areas of machine learning, e-learning, OOP,
UML and system architecture - including as a trainer, developer, consultant and publicist. His focus
is on training, IT security, databases and frameworks that work in an event-oriented manner. As a
lecturer and consultant at a university of applied sciences and on behalf of a company,
microcontrollers and IoT have also been added. His book "Patterns in C #", published in 2003, is still
up to date with the Clean Code Initiative.
https://basta.net/speaker/max-kleiner/
News
https://www.oreilly.com/