Sie sind auf Seite 1von 36

A

Top Down Approach to

Machine Learning

by

Marty Jacobs
Contents
Introduction

Chapter 1: Foundational Learning

Chapter 2: Supervised Learning

Chapter 3: Unsupervised Learning

Chapter 4: Reinforcement Learning

Chapter 5: Intermission

Chapter 6: Machine Learning with Tensorflow


Introduction

Machine Learning (ML) has some hefty gravitational force in the Software
development world at the moment. But what exactly is it? In this post I’ll take a
top-down approach attempting to make it crystal clear, what it is, and what it can
be used for in the real world. Machine Learning is a branch of Artificial
Intelligence. Fundamentally it is Software that works like our brain, learning
from information (data), then applying it to make smart decisions. Machine
Learning algorithms can improve software (a robot) and it’s ability to solve
problems through gaining experience. Somewhat like the human memory.
Whether you know it, or not, you’re probably already using applications that
leverage Machine Learning algorithms.
Applications might be monitoring your behaviour to give you more personalised
content. A simple example… Google uses Machine Learning in their “Search”
product to predict what you might want to search for next. Remember too, it is
likely when using the product that the responded suggestions are sometimes
entirely inaccurate or not helpful. This is the nature of using a probabilistic
method of approach. Some times you hit, and some times you miss.
More, and more people are becoming interested in Machine Learning.
Companies are adopting it to gauge a better understanding of their clients, which
results in providing better customer service. It is being used in streams of
gambling, and stock market applications to predict rises and falls of stock prices.
Particularly as a Software developer, the demand has become more prominent
for skills in the AI and Machine Learning realm. It doesn’t look like this trend is
slowing down anytime soon. Here is a snapshot of the World’s growing interest
in ML over the last 5 years…

Foundational Learning
Chapter 1:


The term Agent is commonly referred to in AI, representing a type of computer
program. What makes it different from other computer programs? It is a program
that gathers information on a particular environment, then takes action(s)
autonomously using gathered information. This could be a web crawler, a stock-
trading platform, or any other program that can make informed decisions.
How do we define an Agent?
State space
The set of all possible states that the agent can be in. Example, the light switch
can only ever be “on” or “off”.
Action space
The set of all possible actions that the agent can perform. Example, the light
switch can only ever be “flicked up” or “flicked down”.
Percept space
The set of all possible things the agent can perceive in the world. Example,
“Fog-of-War in a gaming context, where you can only see what is visible on the
map.”.
World dynamics
The change of one state to another, given a particular action. Example, perform
the light switch action “flicked up” with the state “off” will result in a change of
state to “on”.
Percept function
A change in state results in a new perception of the world. Example, in a gaming
context, moving into the enemy base will show you enemy resources.”.
Utility function
The Utility function is used to assign a value to a state. This is can be used to
ensure your agent performs an action to land in the best possible state.
This design can be used as a ‘structure’ to work from, and build an AI agent. But
you might ask, how does it even relate to Machine Learning? Machine Learning
algorithms can enhance the agent to learn better, and perform smarter actions.
This is achieved by providing the algorithm with data to learn from and make
some smart estimates/predictions. You might be thinking… but wait, can’t we
just feed it all of the data on the internet to teach it everything? It doesn’t
exactly work like that. In order for an ML algorithm to learn properly, we need
to provide it with the right combination of data, mixed with the right amount of
data. Too much, and we might run into an overfitting problem. Too little, and we
might have a shitty model that doesn’t provide decent predictions.
I also invite you to take a look into the Tree data structure if you aren’t familiar
with it already. These are used throughout many modern day applications.
Ok let’s dive head first into the 3 major types of algorithms in the field of
Machine Learning;
Supervised learning, Unsupervised learning and Reinforcement learning.
Supervised Learning
Chapter 2:


Supervised learning is the name of ML algorithms that learn from examples.
This means the we must provide the algorithm with training data prior to running
the algorithm. An example of this was hilariously shown in the T.V show
“Silicon Valley”, where a mobile app ‘Not hot dog’ made media headlines. In the
television show, Jian Yang had to provide training data for his ML algorithm to
learn from hot dogs. This was to classify if an image of a hotdog was in fact a
hotdog, or it was… not a hotdog. How did he do this? In the show, he had to
manually scrape the internet for many images of hotdogs (a.k.a the training
data).

This technique is called Boolean Classification, as the result itself is a binary
value. The algorithm can make a prediction that in the picture there is a hotdog.
This was done by analysing a large quantity of hotdog pictures and the algorithm
has learnt to identify what looks like a hotdog. Statistical Classification is used
in Supervised learning where the training data is a set of correctly identified
observations. For example, in the hotdog scenario, there must only be pictures of
hotdogs.
One of the simplest Supervised learning algorithms to implement is the Decision
Tree. A Decision Tree is where the leaf nodes of the tree are the results, non-leaf
nodes are the attributes, and the edges of the tree are the values. The Decision
Tree analyses the attributes and returns a result that has been filtered from the
tree.
Note: As you can see the result in the Decision Tree above can be either “Yes”
or “No”.
Another technique that is classified as ‘Supervised learning’ is regression. The
simplest form of regression is linear regression, where intuitively this is just
drawing a straight line through some data to infer a trend. This could be used in
a gambling scenario, analysing a history of chosen numbers. For example,
performing regression on this history might show numbers ‘5’ and ‘7’ are chosen
more than numbers ‘3’ and ‘2’. As we can see below, this is an example of
performing regression analysis on some data points…


Unsupervised Learning
Chapter 3:


Unsupervised learning is where the results of the training data are not
known. Simply put, we can give the ML algorithm some training data and it can
respond back with what the algorithm has found. Sounds exciting! We might
receive completely new insights into the data that we would never expect to
observe. How is it done? Unsupervised learning commonly uses clustering
techniques that aim to find patterns in the data.
One common clustering technique is called “k-means clustering”, which aims to
solve clustering problems. One common clustering problem is Spam filtering.
Spam emails can be sometimes be tricky to identify, and might get through to
your email inbox (instead of the junk folder). The K-means clustering approach
aims to partition N observations into K clusters. Essentially, it just moves the
spam in the spam cluster, and the real emails into the inbox cluster. N is
dependent upon the number of emails, and K is dependent upon the number of
clusters. Still interested? Here is a study that has shown K-means is a better
approach to take than using Support Vector Machines (SVM) in a Spam filtering
context.


Hybrid Supervised/Unsupervised learning
Some ML algorithms can be used for both Supervised and Unsupervised
learning. After all, the only dramatic difference between the two is just knowing
the end result. The commonly used methods that can be relevant for both cases
are Bayesian Networks, Neural Networks and …Decision Trees! Yes that’s right,
a Supervised learning algorithm can also be used for Unsupervised learning.
This is straight up magic. What we’re actually trying to do is run a Supervised
algorithm and find an Unsupervised result (a completely new / unexpected
result). To do this, we must provide the algorithm with a second group of
observations, this way it can recognise the difference between the two
observation groups. As a result, the Decision tree can find new clusters by
having additional observation groups.
Bayesian Networks utilise graphs, probability theory and statistics to model real-
world situations and infer data insights. These are very compact and can be used
for modelling a solution quite quickly. How do we create one? Well we need…

1. An acyclic graph; and 2. Conditional Probability Tables (CPT) Shown


below is an example graph, and the CPTs given according to the node
placements on the graph.

Here we can make some simple inferences from the CPTs, such as when it is
raining and the sprinkler is turned on, then there is 99% chance that the grass is
wet. Sure, it’s a silly example but the point is that we can apply this to more
valuable use cases that yield greater results. These can get more complicated
with adding extra parent nodes into the equation, also having to estimate
probability values in some cases.
A Neural Network is a Machine Learning algorithm capable of simulating the
human brain. A Neural network is made up of interconnected artificial neurons.
A neuron is basically just a function applied to a linear combination of inputs.
Each input is multiplied by a weight, which is essentially a measurement to see
how strong the input is for determining the output.
Y is the linear combination of inputs, and A is the activation function. Hmm..
can we input just any data into the neuron? Not really. Neural Networks only
work with numerical data, which means you cannot initialize a variable like eg.
X_1 = “Apple”. However, it is possible to get around this if we were trying to
make predictions on natural language. This gets complicated fast, but the idea
being is we encode the string so it can be fed into the Neural network. Here is an
example of the “Bag-of-words” model used in Natural Language Processing to
encode an array of words to numbers.

Above is a Neural network representation. Don’t freak out about what


everything means. Let us deduce it. The left hand side is the input range X_1…
X_n and the “transfer function” is simply the function to combine all the inputs.
It hands the result, in our case Y, to the activation function which computes A.
Another aspect to take into account when building a Neural network is the
technique used for learning the weights. Calibrating the weights of the Neural
network is the “training process”. This is done by alternating techniques,
“Forward propagation” and “Back propagation”. Forward propagation is how we
approached the above equation, applying the weights to the input data before
computing the activation function. We received the output, and could compare it
to a real value to get a margin of error (checking if it is what we wanted).
Backpropagation is the process of going backwards through the network to
reduce the margin of error.
Here we can see a more complicated Neural network with many hidden layers.
What are these extra layers even for? Well… our last example only had one
layer, which was really to compute a specific function (task specific). Adding
extra layers to the Neural network allows us to learn much more than a specific
result, but allows us to classify things from raw data. This process is called
Feature learning. This can be used to analyse unstructured data such as images,
video and sensor data. You probably have used applications that have
implemented this technique before. Feature learning can be used to identify
people, places, things, you name it. There has been great technological
advancements recently for Deep learning and Feature learning, especially after
the rise of Web 2.0.
Reinforcement Learning
Chapter 4:


Reinforcement learning is the learning by doing approach. To solve a
Reinforcement learning problem, we must have the agent perform actions in any
given situation to maximise its reward. There are two main strategies used in
Reinforcement learning, which are…

1. Model-based; and

2. Model-free


Model-based is the strategy where the agent learns the “model” to produce the
best action at any given time. This is done by finding the probability of landing
in the desired states, and the rewards for doing so. How is it done? Keep a record
of all the states that the agent has been in when performing an action, and update
a probability table of landing in that desired state. Ah yeah… also keep a record
of the rewards too. That’s how we determine what the best action to take is (the
one with the highest reward).
Model-free is the strategy where the agent learns how to make great actions
without knowing anything about the probability of landing in some state. How is
it done? Q-Learning is one way. The agent learns an action-value function, and
uses it to perform the best action at every state. Shit, sounds pretty good! The
action-value function simply assigns every action the agent can take with a
specific value, then the agent chooses the action with the highest value.
Intermission
Chapter 5:

Machine learning has already shown some insanely good results thus far. There
is also an increasingly large number of people flocking to the field of AI, which
should birth better design for agents and ML algorithms. The rise of Deep
learning has brought applications that almost have a mind of their own. If you’re
looking how to use these algorithms in your application, but think it will be too
complicated… fear not. Large companies (Google, Amazon etc.) provide cloud
services that have already built ML algorithms which can be used quite easily.
There are ML libraries out there to integrate ML algorithms into your existing
application, Tensorflow is just one. Machine Learning is incredibly interesting
and there is still so much more to come. The best thing about ML - there is
always new and exciting concepts to learn!
You have made it to halfway! Get yourself a drink, and be ready for the
demonstration J
Machine Learning with Tensorflow
Chapter 6:


Let’s take a different approach, a more practical approach. This will be for
those who are keen to improve their Machine Learning skills in the real-world.
So what will we build? Hmmm.. let’s build a Convolutional Neural Network
(CNN). The Neural Network will be multi-layered, and we will use Python and
Google’s open-source library, “Tensorflow”.
We’ll be using the MNIST dataset as we can train our model without the need of
a GPU. What is MNIST? It is an image database filled with hand-written digits.
Ok… Let’s build a simple two layer convolutional neural network, with
maxpooling, dropout, and a couple of fully connected layers. We will also set up
a log directory where we can catch log data from both the training and validation
sets. This will help us monitor the performance graphically (using TensorBoard),
rather than with plain old print statements.
Preliminaries

Python version 3.6 - Python can be found here
TensorFlow version 1.1.0 - you can install Tensorflow here
Import the following libraries:

import os import numpy as np import


tensorflow as tf import matplotlib.pyplot as plt
Data Exploration

TensorFlow makes it real simple to obtain the MNIST dataset - just import the
input_dataand call the method read_data_sets.
# import MNIST data
from tensorflow.examples.tutorials.mnist import input_data mnist =
input_data.read_data_sets("MNIST_data/", one_hot=True)

Let’s explore the ‘mnist’ object under the microscope and see what is inside it…
# We see that it's a Datasets type, which makes sense
type(mnist)
# Let's check the last 5 methods that can be called on this object
dir(mnist)[-5:]
dir(mnist.train)[-5:]

Images are typically stored as a two-dimensional array of pixels per channel.


The MNIST dataset has only one channel, hence why there is no colour. Below
we see that there are 55,000 images in the training set, but each image is
represented as a vector of length 784. This length represents the flattened version
of a 28x28 pixel image.
mnist.train.images.shape # Out:
(55000, 784)
To view an image, we must first convert it back into matrix form. We do this
using numpy’s reshape method. Reshape the image into its original 28x28 form,
then display the image in black and white using the cmap=’gray’ option. Notice
below the numbers and tick marks on the x and y axes, showing our notion of
the 28x28 pixel size of each image.
# Let's see an example of an image in the training set
plt.imshow(mnist.train.images[0].reshape((28, 28)), cmap='gray')


Ok still with me? let’s now write a function to make it easier to sample a few
images at a time, displaying them in a 3x3 grid. This makes sampling a faster
process.

def show_grid_3x3(images): """


Display a 3x3 grid of 9 randomly sampled numpy array images.
: images: A batch of image data. Numpy array with shape (batch_size,
28, 28, 1) """
plt.rcParams['figure.figsize'] = 6, 6
fig, axes = plt.subplots(nrows=3, ncols=3, sharex=True, sharey=True)
rand_idx = np.random.choice(images.shape[0], 9, replace=False) # get 5
random indices
images = images[rand_idx]
for i in range(3): for j in range(3): axes[i, j].imshow(images[i +
3*j].reshape((28, 28)), cmap='gray') plt.tight_layout()


Cool! now let’s call the show_grid_3x3 function on the training set.
show_grid_3x3(mnist.train.images)
TensorBoard Setup

We’ll use TensorBoard to visualize several aspects of our neural network, such
as the distribution of the weights and biases over time, the classification
accuracy of the training and validation sets, and the computational graph. Also,
we need to create a log file directory for when the neural network starts running.
Now we are going to write a function to create a directory path with a time-
stamp. We wouldn’t want TensorFlow overwriting our previous logs every time
we run the code.

# For logging
from datetime import datetime
def logdir_timestamp(root_logdir="tf_logs"): """Return a string with a
timestamp to use as the log directory for TensorBoard."""
now = datetime.utcnow().strftime("%Y%m%d_%H%M%S") return
os.path.join(root_logdir, "run-{}/".format(now)) logdir =
logdir_timestamp()

We may now run TensorBoard and instruct it to monitor the directory named
tf_logs:
mkdir tf_logs
tensorboard --logdir=tf_logs
Navigate to localhost:6006 in your web browser to view the TensorBoard
console.
Feel free to have a look around, but there won’t be anything there until we use a
FileWriter to write some data to disk while the neural network is running.
Graph Construction

In Tensorflow, we must first construct a graph. At this stage, we lay down the
blueprint for our neural network, but no actual operations are being executed.
Once the graph is complete, we will create a TensorFlow session where we can
execute the operations defined in the graph.
Let’s have a look at what the graph should look like when we are done. We’ll
step through one layer at a time, starting from the bottom, where X is reshaped
and fed into the convolutiona1 layer.

Create Data input tensors


The first step is to create placeholders for the data to feed into the graph. We’ll
create a variable X to represent a batch of images, and the variable y_ to
represent the corresponding labels for each image. Notice that we expect the
input as a flattened vector, because that is the form in which we obtained the
MNIST data. But since we are performing convolutions in this neural network,
we would like to retain the two-dimensional spatial structure in the image data,
so we reshape X and assigned it to the variable X_image.
Shown below are the two methods returning placeholders for the graph:

def neural_net_image_input(image_shape): """Constructs a tensor for a


batch of image input
image_shape: Shape of the images as a list-like object
return: Tensor for image input
"""
shape = None, *image_shape return tf.placeholder(tf.float32,
shape=shape,
name="X") def neural_net_label_input(n_classes): """Constructs a
tensor for a batch of label input
n_classes: Number of classes
return: Tensor for label input
"""
shape = None, n_classes return tf.placeholder(tf.float32,
shape=shape,
name="y")

Below we input the length 784 into the Neural Network (NN), remember this is
the length of the flattened image vector. The labels, denoted by the placeholder
y_, has a shape of 10 as there are ten different digits to be classified in the
dataset. When creating a placeholder, we use the value None to indicate an
arbitrarily sized batch of images or labels.
X = neural_net_image_input([784]) y_ = neural_net_label_input(10)
X_image = tf.reshape(X, [-1, 28, 28, 1]) # rehaped to [batch_size, rows,
cols, channels]

Create the first convolutional layer

We can now write a function to create a convolutional layer since we’ll be


repeating this step to create another layer.
We initialize the weights by sampling from a truncated normal distribution with
a standard deviation of 0.1. A truncated normal distribution is similar to a normal
distribution, but if a weight is more than two standard deviations away from the
mean, it is dropped and repicked. We hard-code the filter (also called a kernel) to
have a size of 5x5. See this for a visualization of how convolutional filters work.
In the first layer, we input a single image, so the size_invariable is set to 1.
size_out is the number of convolutional filters we want to create; in this case 32.
The size of the filter and the number of filters are hyper-parameters we can
experiment with, in an effort to improve performance - the current values are by
no means optimal!
The image placeholder and the newly initialized weights are passed into the
tf.nn.conv2dTensorFlow library function. To learn more about strides and
padding, please refer to the TensorFlow documentation.
tf.nn.relu is another TensorFlow library function which is applied to the result of
the conv2d operation. ReLU is an abbreviation for rectified linear unit, which
returns the value of its argument or 0, whichever is greater.

def convolution_layer(inp, size_in, size_out, name="convolution"):


"""Creates a convolutional layer with filter of size 5x5, and size_out
number of filters.
Applies stride of [1, 1, 1, 1] with SAME padding, and appies ReLU
activation.
No downsampling within this layer - returns tensor with activation
function applied only """
with tf.name_scope(name): # Hard code convolutional filter of size 5x5
W = tf.Variable(tf.truncated_normal([5, 5, size_in, size_out],
stddev=0.1), name="W") b = tf.Variable(tf.constant(0.1, shape=
[size_out]), name="b") conv = tf.nn.conv2d(inp, W, strides=[1, 1, 1, 1],
padding='SAME') act = tf.nn.relu(conv + b)
tf.summary.histogram("weights", W) tf.summary.histogram("biases",
b) tf.summary.histogram("activations", act) return act



Turning to the TensorFlow graph, let’s look at what is actually happening inside
the first convolutional layer. The graph appears to show a fairly straightforward
representation of the code…



Assign the output of the convolution_layer function to a variable named act1.
This will be used as the input for the next layer.
act1 = convolution_layer(X_image, 1, 32, "convolution1")

Create the first downsampling layer


The output of the convolution layer is downsampled using maxpooling with a
kernel of size 2x2. This means that the maximum value is taken for every 2x2
region of the input. This reduces the spatial size of the input, effectively reducing
the number of parameters in the network and thereby reducing computational
complexity and the propensity to overfit. We’ll return to the topic of overfitting
when we discuss the TensorBoard graphs showing the training and validation set
accuracies.

def downsample_layer(act, name="maxpooling"): """Creates


downsampling layer by applying maxpooling with hardcode kernel size
[1, 2, 2, 1] and strides [1, 2, 2, 1] with SAME padding.
"""
with tf.name_scope(name): return tf.nn.max_pool(act, ksize=[1, 2, 2,
1], strides=[1, 2, 2, 1], padding='SAME')

Notice below how the number of parameters are reduced after the maxpool
operation - from 28x28 to 14x14.


Store the output of the downsampling layer in the variable h_pool1.
h_pool1 = downsample_layer(act1, "downsample1")

Create the second convolutional layer

The structure of the second convolutional layer is identical to the first one. It
might be hard to see below, but notice the size of the tensors coming in, and the
tensors going out - 14x14x32 to 14x14x64.

This time, set the input size to 32, and create 64 convolutional filters.
act2 = convolution_layer(h_pool1, 32, 64, "convolution2")

Create the second downsampling layer

Once again, notice the shape of the outgoing tensor. We would like to flatten this
tensor into a vector, so that we can connect every single neuron together in the
dense layer, a.k.a a fully connected layer. This is the reason for the 7*7*64 value
for the reshape operation - the input is a 7x7x64 tensor which will then be
converted into a vector of length 7*7*64=3136. The same value is then passed
into the dense_layer method to create tensors of weights and biases sized
appropriately.
h_pool2 = downsample_layer(act2, "downsample2")

Create the first dense layer

The dense layer performs a simple matrix multiplication followed by adding the
biases. This time, we do not apply an activation function within the layer. Why?
So we can apply a different activation function (softmax) to the output of the
final layer. After the first dense layer, the ReLU activation function is applied
separately outside the dense_layer function.

def dense_layer(inp, size_in, size_out, name="dense"): """Creates fully


connected layer with size [size_in, size_out]. Initialize weights with
standard deviation 0.1. Returns tensor without applying any activation
function.
"""
with tf.name_scope(name): W =
tf.Variable(tf.truncated_normal([size_in, size_out], stddev=0.1),
name="W") b = tf.Variable(tf.constant(0.1, shape=[size_out]),
name="b") act = tf.matmul(inp, W) + b
tf.summary.histogram("weights", W) tf.summary.histogram("biases",
b) tf.summary.histogram("activations", act) return act

Notice the size of the output - 1024. This will be the number of neurons in the
second fully connected layer. Before we get to the next layer, however, we apply
the dropout technique.
h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64]) h_fc1 =
tf.nn.relu(dense_layer(h_pool2_flat, 7*7*64, 1024, "dense1"))
Dropout

Dropout is a regularization technique which controls overfitting. During the


training phase, a fixed proportion of randomly selected neurons are disabled. In
this example, we use a value of 0.5 to be injected into a placeholder when the
network is running. So, in every iteration during training, half the neurons per
layer are disabled. Note that this is only done during training and not when
generating predictions on a test set.

def dropout(inp, keep_prob, name="dropout"): """Apply dropout with


probability defined by placeholder tensor keep_prob."""
with tf.name_scope(name): return tf.nn.dropout(inp, keep_prob)

keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = dropout(h_fc1, keep_prob)

Create the second dense layer



Set the output size for the final fully connected layer to equal the number of
classes, which is 10 for the MNIST dataset.
y_conv = dense_layer(h_fc1_drop, 1024, 10, "dense2")
We want each of the 10 neurons to output a probability. We can apply the
softmax activation function to do this. In order to evaluate the model, we will
also need a cost function. For classification problems, a frequent choice is cross-
entropy. TensorFlow has a function that will perform both these operations in a
way that is numerically stable.
As in, the functions we created for each of the layers, we use name scopes so
that TensorFlow groups all the ops in the with block inside the computational
graph. This helps keep the graph looking nice and clean. You can try creating a
graph without the name scopes, just to get a visual on how it looks.
with tf.name_scope("xentropy"): cross_entropy =
tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_,
logits=y_conv))
Let’s use the Adam optimizer to minimize the loss function. You might want to
consider picking a learning rate with a smaller value, such as 1e-4. This is
another important hyperparameter to tune - a value that is too small willrequire
unnecessarily long training times, but a value that is too large may not achieve
an optimal local minimum for the cross-entropy loss function.
lr=1e-2 # Learning rate
with tf.name_scope("train"): optimizer =
tf.train.AdamOptimizer(learning_rate=lr)
training_op = optimizer.minimize(cross_entropy)

We’ll execute the training_op variable in the TensorFlow session. We’ll also
create an operation to compute the accuracy of our model.
with tf.name_scope("accuracy"): correct_prediction =
tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1)) accuracy =
tf.reduce_mean(tf.cast(correct_prediction, dtype=tf.float32))
tf.summary.scalar('accuracy', accuracy)

Create some file writers to save log data for TensorBoard to use for the
visualizations.
write_op = tf.summary.merge_all() writer_train =
tf.summary.FileWriter(logdir + 'train', tf.get_default_graph())
writer_val = tf.summary.FileWriter(logdir + 'val', tf.get_default_graph())
Graph Execution

With the graph construction complete, we can now begin the execution stage.
Here we create a TensorFlow session, in which we repeatedly run training_op.
Even though we created variables earlier, they have to be initialized before we
can actually use them. Rather than individually initializing each variable, you
can use tf.global_variables_initializer(). Inside the for loop, a randomly sampled
batch of 100 images is obtained from the training and validation sets. On every
fifth iteration, TensorFlow writes information to disk via the write_op operation
we defined earlier. Notice that we feed in the placeholders with the feed_dict
argument. Once training is complete, the model is evaluated by running it on the
test set. The result is then printed out to the console.

with tf.Session() as sess: sess.run(tf.global_variables_initializer()) for i in


range(1001): batch_X, batch_y = mnist.train.next_batch(100)
val_batch_X, val_batch_y = mnist.validation.next_batch(100) if i % 5 ==
0: summary_str = sess.run(write_op, feed_dict={X: batch_X, y_:
batch_y, keep_prob: 1.0}) writer_train.add_summary(summary_str, i)
writer_train.flush()
summary_str = sess.run(write_op, feed_dict={X: val_batch_X, y_:
val_batch_y, keep_prob: 1.0}) writer_val.add_summary(summary_str, i)
writer_val.flush()
training_op.run(feed_dict={X: batch_X, y_: batch_y, keep_prob: 0.5})
test_accuracy = accuracy.eval(feed_dict={X: mnist.test.images, y_:
mnist.test.labels, keep_prob:1.0}) print('Test accuracy
{}'.format(test_accuracy))
TensorBoard Visualization

While the graph is executing, you can observe its progress through the
TensorBoard interface. You should see some visualizations that look something
like the following:


This is perhaps the most important graph. It shows the classification accuracy of
the training set (green) and validation set (yellow). In general, we want the
training and validation accuracies to track each other fairly closely. The gap
between the training and validation accuracy shows how much your model is
overfitting - if the training accuracy is higher than the validation accuracy, that
means your model is overfitting. On the other hand, it is possible that the model
is underfitting if the accuracies are too close - this would mean that the model is
too simple to capture the complexity of the data.
For simplicity, the accuracy here is plotted against the number of iterations, but
normally we would place the number of epochs on the x-axis. Check this out for
more info.

Other useful visualizations to look at are the distributions and histograms of the
parameters and the activations for each layer of the network. The distribution
and histogram plots essentially give you two different ways of visualizing the
same thing - the distribution of parameters evolving over time. For example, in
the top right graph above (the dense1 layer biases), you can see the variance
increasing over time, whereas the mean is decreasing, indicated by the
distribution shifting slightly to the left.
You can use these plots to diagnose problems such as an incorrect initialization
of parameters in your model. Watch out for distributions getting stuck at 0 or at
the extreme ends of the range of the activation function (in the case of bounded
activations).


Want to learn more about TensorBoard? The TensorFlow official source offers
tutorials for sharpening your skills to build predictive models. It is found here:
https://www.tensorflow.org/tutorials/
Logging Off

Machine learning has already shown some great results thus far. There is also an
increasingly large number of people flocking to the field of AI, which should
birth better design for agents and ML algorithms. The rise of Deep learning has
brought applications that almost have a mind of their own. If you’re looking how
to use these algorithms in your application, but think it will be too
complicated… fear not. Large companies (Google, Amazon etc.) provide cloud
services that have already built ML algorithms which can be used quite easily.
There are ML libraries out there to integrate ML algorithms into your existing
application, Tensorflow is just one. Machine Learning is incredibly interesting
and there is still so much more to come. The best thing about ML - there is
always new and exciting concepts to learn!
Thanks for reading! Please leave a review if you liked the book J

Das könnte Ihnen auch gefallen