Sie sind auf Seite 1von 9

INTRODUCTION

This course introduces you to deep learning: the state-of-the-art approach to building artificial
intelligence algorithms. We cover the basic components of deep learning, what it means, how it works,
and develop code necessary to build various algorithms such as deep convolutional networks,
variational autoencoders, generative adversarial networks, and recurrent neural networks. A major
focus of this course will be to not only understand how to build the necessary components of these
algorithms, but also how to apply them for exploring creative applications. We'll see how to train a
computer to recognize objects in an image and use this knowledge to drive new and interesting
behaviors, from understanding the similarities and differences in large datasets and using them to self-
organize, to understanding how to infinitely generate entirely new content or match the aesthetics or
contents of another image. Deep learning offers enormous potential for creative applications and in
this course we interrogate what's possible. Through practical applications and guided homework
assignments, you'll be expected to create datasets, develop and train neural networks, explore your
own media collections using existing state-of-the-art deep nets, synthesize new content from
generative algorithms, and understand deep learning's potential for creating entirely new aesthetics
and new ways of interacting with large amounts of data.

Promo
Deep learning has emerged at the forefront of nearly every major computational breakthrough in the
last 4 years. It is no wonder that it is already in many of the products we use today, from netflix or
amazon's personalized recommendations; to the filters that block our spam; to ways that we interact
with personal assistants like Apple's Siri or Microsoft Cortana, even to the very ways our personal
health is monitored. And sure deep learning algorithms are capable of some amazing things. But it's
not just science applications that are benefiting from this research.

Artists too are starting to explore how Deep Learning can be used in their own practice. Photographers
are starting to explore different ways of exploring visual media. Generative artists are writing
algorithms to create entirely new aesthetics. Filmmakers are exploring virtual worlds ripe with potential
for procedural content.

In this course, we're going straight to the state of the art. And we're going to learn it all. We'll see how
to make an algorithm paint an image, or hallucinate objects in a photograph. We'll see how to train a
computer to recognize objects in an image and use this knowledge to drive new and interesting
behaviors, from understanding the similarities and differences in large datasets to using them to self
organize, to understanding how to infinitely generate entirely new content or match the aesthetics or
contents of other images. We'll even see how to teach a computer to read and synthesize new
phrases.

But we won't just be using other peoples code to do all of this. We're going to develop everything
ourselves using Tensorflow and I'm going to show you how to do it. This course isn't just for artists nor
is it just for programmers. It's for people that want to learn more about how to apply deep learning with
a hands on approach, straight into the python console, and learn what it all means through creative
thinking and interaction.

I'm Parag Mital, artist, researcher and Director of Machine Intelligence at Kadenze. For the last 10
years, I've been exploring creative uses of computational models making use of machine and deep
learning, film datasets, eye-tracking, EEG, and fMRI recordings exploring applications such as
generative film experiences, augmented reality hallucinations, and expressive control of large
audiovisual corpora.

But this course isn't just about me. It's about bringing all of you together. It's about bringing together
different backgrounds, different practices, and sticking all of you in the same virtual room, giving you
access to state of the art methods in deep learning, some really amazing stuff, and then letting you go
wild on the Kadenze platform. We've been working very hard to build a platform for learning that rivals
anything else out there for learning this stuff.

You'll be able to share your content, upload videos, comment and exchange code and ideas, all led
by the course I've developed for us. But before we get there we're going to have to cover a lot of
groundwork. The basics that we'll use to develop state of the art algorithms in deep learning. And that's
really so we can better interrogate what's possible, ask the bigger questions, and be able to explore
just where all this is heading in more depth. With all of that in mind, Let's get started>

Join me as we learn all about Creative Applications of Deep Learning with Tensorflow.

Session Overview
We're first going to talk about Deep Learning, what it is, and how it relates to other branches of learning.
We'll then talk about the major components of Deep Learning, the importance of datasets, and the
nature of representation, which is at the heart of deep learning.

If you've never used Python before, we'll be jumping straight into using libraries like numpy, matplotlib,
and scipy. Before starting this session, please check the resources section for a notebook introducing
some fundamentals of python programming. When you feel comfortable with loading images from a
directory, resizing, cropping, how to change an image datatype from unsigned int to float32, and what
the range of each data type should be, then come back here and pick up where you left off. We'll then
get our hands dirty with Tensorflow, Google's library for machine intelligence. We'll learn the basic
components of creating a computational graph with Tensorflow, including how to convolve an image
to detect interesting features at different scales. This groundwork will finally lead us towards
automatically learning our handcrafted features/algorithms.

LEARNING FROM DATA


Deep Learning vs. Machine Learning
So what is this word I keep using, Deep Learning. And how is it different to Machine Learning? Well
Deep Learning is a type of Machine Learning algorithm that uses Neural Networks to learn. The type
of learning is "Deep" because it is composed of many layers of Neural Networks. In this course we're
really going to focus on supervised and unsupervised Deep Learning. But there are many other
incredibly valuable branches of Machine Learning such as Reinforcement Learning, Dictionary
Learning, Probabilistic Graphical Models and Bayesian Methods (Bishop), or Genetic and Evolutionary
Algorithms. And any of these branches could certainly even be combined with each other or with Deep
Networks as well. We won't really be able to get into these other branches of learning in this course.
Instead, we'll focus more on building "networks", short for neural networks, and how they can do some
really amazing things. Before we can get into all that, we're going to need to understand a bit more
about data and its importance in deep learning.

Invariances
Deep Learning requires data. A lot of it. It's really one of the major reasons as to why Deep Learning
has been so successful. Having many examples of the thing we are trying to learn is the first thing
you'll need before even thinking about Deep Learning. Often, it is the biggest blocker to learning about
something in the world. Even as a child, we need a lot of experience with something before we begin
to understand it. I find I spend most of my time just finding the right data for a network to learn. Getting
it from various sources, making sure it all looks right and is labeled. That is a lot of work. The rest of it
is easy as we'll see by the end of this course.

Let's say we would like build a network that is capable of looking at an image and saying what object
is in the image. There are so many possible ways that an object could be manifested in an image. It's
rare to ever see just a single object in isolation. In order to teach a computer about an object, we would
have to be able to give it an image of an object in every possible way that it could exist.

We generally call these ways of existing "invariances". That just means we are trying not to vary based
on some factor. We are invariant to it. For instance, an object could appear to one side of an image,
or another. We call that translation invariance. Or it could be from one angle or another. That's called
rotation invariance. Or it could be closer to the camera, or farther. and That would be scale invariance.
There are plenty of other types of invariances, such as perspective or brightness or exposure to give
a few more examples for photographic images.

Scope of Learning
With Deep Learning, you will always need a dataset that will teach the algorithm about the world. But
you aren't really teaching it everything. You are only teaching it what is in your dataset! That is a very
important distinction. If I show my algorithm only faces of people which are always placed in the center
of an image, it will not be able to understand anything about faces that are not in the center of the
image! Well at least that's mostly true.

That's not to say that a network is incapable of transfering what it has learned to learn new concepts
more easily. Or to learn things that might be necessary for it to learn other representations. For
instance, a network that has been trained to learn about birds, probably knows a good bit about trees,
branches, and other bird-like hangouts, depending on the dataset. But, in general, we are limited to
learning what our dataset has access to.

So if you're thinking about creating a dataset, you're going to have to think about what it is that you
want to teach your network. What sort of images will it see? What representations do you think your
network could learn given the data you've shown it?

One of the major contributions to the success of Deep Learning algorithms is the amount of data out
there. Datasets have grown from orders of hundreds to thousands to many millions. The more data
you have, the more capable your network will be at determining whatever its objective is.

Existing datasets
With that in mind, let's try to find a dataset that we can work with. There are a ton of datasets out there
that current machine learning researchers use. For instance if I do a quick Google search for Deep
Learning Datasets, i can see for instance a link on deeplearning.net, listing a few interesting ones
e.g. http://deeplearning.net/datasets/, including MNIST, CalTech, CelebNet, LFW, CIFAR, MS Coco,
Illustration2Vec, and there are ton more. And these are primarily image based. But if you are interested
in finding more, just do a quick search or drop a quick message on the forums if you're looking for
something in particular.
MNIST
CalTech
CelebNet
ImageNet: http://www.image-net.org/
LFW
CIFAR10
CIFAR100
MS Coco: http://mscoco.org/home/
WLFDB: http://wlfdb.stevenhoi.com/
Flickr
8k: http://nlp.cs.illinois.edu/HockenmaierGroup/Framing_Image_Description/KCCA.html
Flickr 30k

PREPROCESSING DATA
In this section, we're going to learn a bit about working with an image based dataset. We'll see how
image dimensions are formatted as a single image and how they're represented as a collection using
a 4-d array. We'll then look at how we can perform dataset normalization. If you're comfortable with all
of this, please feel free to skip to the next video.

We're first going to load some libraries that we'll be making use of.
In [1]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('ggplot')
I'll be using a popular image dataset for faces called the CelebFaces dataset. I've provided some
helper functions which you can find on the resources page, which will just help us with manipulating
images and loading this dataset.
In [2]:
from libs import utils
# utils.<tab>
files = utils.get_celeb_files()
Let's get the 50th image in this list of files, and then read the file at that location as an image, setting
the result to a variable, img, and inspect a bit further what's going on:
In [3]:
img = plt.imread(files[50])
# img.<tab>
print(img)
[[[178 185 191]
[182 189 195]
[149 156 162]
...,
[216 220 231]
[220 224 233]
[220 224 233]]

[[177 184 190]


[182 189 195]
[153 160 166]
...,
[215 219 230]
[220 224 233]
[220 224 233]]

[[177 184 190]


[182 189 195]
[161 168 174]
...,
[214 218 229]
[219 223 232]
[219 223 232]]

...,
[[ 11 14 23]
[ 11 14 23]
[ 11 14 23]
...,
[ 4 7 16]
[ 5 8 17]
[ 5 8 17]]

[[ 9 12 21]
[ 9 12 21]
[ 9 12 21]
...,
[ 8 11 18]
[ 5 8 17]
[ 5 8 17]]

[[ 9 12 21]
[ 9 12 21]
[ 9 12 21]
...,
[ 8 11 18]
[ 5 8 17]
[ 5 8 17]]]
When I print out this image, I can see all the numbers that represent this image. We can use the
function imshow to see this:
In [4]:
# If nothing is drawn and you are using notebook, try uncommenting the next l
ine:
#%matplotlib inline
plt.imshow(img)
Out[4]:
<matplotlib.image.AxesImage at 0x1147f9320>

Understanding Image Shapes


Let's break this data down a bit more. We can see the dimensions of the data using
the shape accessor:
In [5]:
img.shape
# (218, 178, 3)
Out[5]:
(218, 178, 3)
This means that the image has 218 rows, 178 columns, and 3 color channels corresponding to the
Red, Green, and Blue channels of the image, or RGB. Let's try looking at just one of the color channels.
In [6]:
plt.imshow(img[:, :, 0], cmap='gray')
plt.imshow(img[:, :, 1], cmap='gray')
plt.imshow(img[:, :, 2], cmap='gray')
Out[6]:
<matplotlib.image.AxesImage at 0x1148638d0>

We use the special colon operator to say take every value in this dimension. This is saying, give me
every row, every column, and the 0th dimension of the color channels. What we're seeing is the
amount of Red, Green, or Blue contributing to the overall color image.

Let's use another helper function which will load every image file in the celeb dataset rather than just
give us the filenames like before. By default, this will just return the first 100 images because loading
the entire dataset is a bit cumbersome. In one of the later sessions, I'll show you how tensorflow can
handle loading images using a pipeline so we can load this same dataset. For now, let's stick with this:
In [7]:
imgs = utils.get_celeb_imgs()
We now have a list containing our images. Each index of the imgs list is another image which we can
access using the square brackets:
In [8]:
plt.imshow(imgs[0])
Out[8]:
<matplotlib.image.AxesImage at 0x1151abba8>

The Batch Dimension


Remember that an image has a shape describing the height, width, channels:
In [9]:
imgs[0].shape
Out[9]:
(218, 178, 3)
It turns out we'll often use another convention for storing many images in an array using a new
dimension called the batch dimension. The resulting image shape will be exactly the same, except
we'll stick on a new dimension on the beginning... giving us number of images x the height x the width
x the number of color channels.

NxHxWxC

A Color image should have 3 color channels, RGB.

We can combine all of our images to have these 4 dimensions by telling numpy to give us an array of
all the images.
In [10]:
data = np.array(imgs)
data.shape
Out[10]:
(100, 218, 178, 3)
This will only work if every image in our list is exactly the same size. So if you have a wide image,
short image, long image, forget about it. You'll need them all to be the same size. If you are unsure of
how to get all of your images into the same size, then please please refer to the online resources for
the notebook I've provided which shows you exactly how to take a bunch of images of different sizes,
and crop and resize them the best we can to make them all the same size.

Mean/Deviation of Images
Now that we have our data in a single numpy variable, we can do alot of cool stuff. Let's look at the
mean of the batch channel:
In [11]:
mean_img = np.mean(data, axis=0)
plt.imshow(mean_img.astype(np.uint8))
Out[11]:
<matplotlib.image.AxesImage at 0x115387fd0>

This is the first step towards building our robot overlords. We've reduced down our entire dataset to a
single representation which describes what most of our dataset looks like. There is one other very
useful statistic which we can look at very easily:
In [12]:
std_img = np.std(data, axis=0)
plt.imshow(std_img.astype(np.uint8))
Out[12]:
<matplotlib.image.AxesImage at 0x115f1ae80>
So this is incredibly cool. We've just shown where changes are likely to be in our dataset of images.
Or put another way, we're showing where and how much variance there is in our previous mean image
representation.

We're looking at this per color channel. So we'll see variance for each color channel represented
separately, and then combined as a color image. We can try to look at the average variance over all
color channels by taking their mean:
In [13]:
plt.imshow(np.mean(std_img, axis=2).astype(np.uint8))
Out[13]:
<matplotlib.image.AxesImage at 0x11bab7c18>

This is showing us on average, how every color channel will vary as a heatmap. The more red, the
more likely that our mean image is not the best representation. The more blue, the less likely that our
mean image is far off from any other possible image.

Dataset Preprocessing
Think back to when I described what we're trying to accomplish when we build a model for machine
learning? We're trying to build a model that understands invariances. We need our model to be able
to express all of the things that can possibly change in our data. Well, this is the first step in
understanding what can change. If we are looking to use deep learning to learn something complex
about our data, it will often start by modeling both the mean and standard deviation of our dataset. We
can help speed things up by "preprocessing" our dataset by removing the mean and standard
deviation. What does this mean? Subtracting the mean, and dividing by the standard deviation.
Another word for that is "normalization".

Das könnte Ihnen auch gefallen