Sie sind auf Seite 1von 16

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/310235006

Multi-class Generative Adversarial Networks with the L2 Loss Function

Article · November 2016

CITATIONS READS
54 1,653

5 authors, including:

Qing Li Haoran Xie

445 PUBLICATIONS   4,064 CITATIONS   


The Education University of Hong Kong
139 PUBLICATIONS   1,147 CITATIONS   
SEE PROFILE
SEE PROFILE

Raymond Y. K. Lau
City University of Hong Kong
94 PUBLICATIONS   1,552 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Set union knapsack problem View project

News impact analysis View project

All content following this page was uploaded by Haoran Xie on 19 November 2016.

The user has requested enhancement of the downloaded file.


Multi-class Generative Adversarial Networks with
the L2 Loss Function
arXiv:1611.04076v1 [cs.CV] 13 Nov 2016

Xudong Mao∗1 , Qing Li†1 , Haoran Xie‡2 , Raymond Y.K. Lau§3 ,


and Zhen Wang¶4
1
Department of Computer Science, City University of Hong Kong
2
Department of Mathematics and Information Technology, The
Education University of Hong Kong
3
Department of Information Systems, City University of Hong
Kong
4
Qingdao University

Abstract
Generative adversarial networks (GANs) have achieved huge success
in unsupervised learning. Most of GANs treat the discriminator as a
classifier with the binary sigmoid cross entropy loss function. However,
we find that the sigmoid cross entropy loss function will sometimes lead
to the saturation problem in GANs learning. In this work, we propose to
adopt the L2 loss function for the discriminator. The properties of the
L2 loss function can improve the stabilization of GANs learning. With
the usage of the L2 loss function, we propose the multi-class generative
adversarial networks for the purpose of image generation with multiple
classes. We evaluate the multi-class GANs on a handwritten Chinese
characters dataset with 3740 classes. The experiments demonstrate that
the multi-class GANs can generate elegant images on datasets with a large
number of classes. Comparison experiments between the L2 loss function
and the sigmoid cross entropy loss function are also conducted and the
results demonstrate the stabilization of the L2 loss function.

1 Introduction
Deep learning has led to significant improvements in many computer vision tasks
such as image classification [1], object detection [2] and segmentation [3]. These
∗ xudonmao@gmail.com
† itqli@cityu.edu.hk
‡ hrxie2@gmail.com
§ raylau@cityu.edu.hk
¶ zhenwang0@gmail.com

1
tasks all belong to supervised learning, which means that a lot of labeled data
are provided for the learning processes. However, unsupervised learning tasks
such as generative models have less impact by deep learning compared with
supervised learning. Although many proposed deep generative models have
been proposed including RBM [4], DBM [5] and VAE [6], these models have the
difficulty of approximating intractable functions or distributions, which limits
the effectiveness of these models. Generative adversarial network [7] is another
type of deep generative model, which is based on differentiable networks. One
advantage of GAN is that it does not require any approximation and can be
trained end-to-end through the differentiable networks. The basic idea of GANs
is to train a discriminator and a generator simultaneously. The discriminator
aims to distinguish whether an image is from training data or from the generator.
On the other hand, the generator tries to generate fake images to make the
discriminator believe the fake images are from training data. GANs have been
proved its effectiveness in image generation [8], image super-resolution [9] and
semi-supervised learning [10].
Although GANs have achieved great success in image generation, designing
networks of GANs is still difficult in practice. The architectures of networks
should be well designed, otherwise, learning a GAN would be unstable. Several
well-designed networks and empirical techniques have been proposed [8, 10, 11]
to improve the stabilization of GANs learning. We find that the instability of
GANs learning is partly caused by the saturation of the discriminator. Figure
3(a) shows an example of saturated GANs learning, where the discriminator
becomes saturated at iteration 2K. Then the gradients will be very small and
almost no signal will flow through the network to update the weights. Most
of GANs adopt the sigmoid cross entropy loss function for the discriminator.
However, the sigmoid cross entropy loss function has the saturation problem
when the training data are well classified. We argue that this property of the
sigmoid cross entropy loss function will lead to the saturation problem in GANs
learning. Instead of using the sigmoid cross entropy loss function, our proposed
models adopt the L2 loss function. For classification tasks, the L2 loss function
performs worse than the sigmoid cross entropy loss function as it penalizes the
samples which are well classified. But we find that the classification capability
of the L2 loss function is powerful enough for GANs learning. The advantage
of the L2 loss function is that it has less saturation problem than the sigmoid
cross entropy loss function. The following sections will analyze the properties
of the two loss functions and compare their performances.
GANs are able to generate elegant images on MNIST which contains 10
classes of images. However, the images in MNIST are simple. We further
evaluate GANs on a handwritten Chinese characters dataset. As Figure 1 shows,
the generated images are difficult to recognize. As the number of classes grows,
the generated images become weirder. The generated images seem to contain
multiple features of different classes. This phenomenon is caused by that one
GAN model tries to learn features of multiple classes, which is not reasonable.
This is similar to the scenario of the classification tasks. Given 1000 classes
of images to classify, if the classifier tries to classify the images into 2 classes,

2
the classifier will be confused by multiple features in the combined classes. We
propose the multi-class GANs that can guide the model to learn the features
of the multiple classes separately. The multi-class GANs are able to generate
elegant images even when the number of classes is very large.

(a) Classes = 10 (b) Classes = 100 (c) Classes = 1000

Figure 1: Generated images of handwritten Chinese characters by GANs

2 Related Work
2.1 Deep generative models
Deep generative models attempt to capture the probability distributions over
the given data samples. Deep generative models have been applied to data gen-
eration [7], image classification [12] and multimodal learning [13]. Restricted
Boltzmann Machines (RBMs) are the basis of many other hierarchical models.
RBMs have been used to model the distributions of images [14] and documents
[15]. Deep Belief Networks (DBNs) and Deep Boltzmann Machines (DBMs) are
extended from the RBMs. The top two layers of DBNs are identical to RBMs
but several directed layers are added under the top two undirected layers. The
most successful application of DBNs is for image classification [12], where a DBN
is used to extract feature representations. Unlike the DBN, Deep Boltzmann
Machines [16] are totally undirected models. One important property of DBMs
compared with DBNs is that the hidden variables in a layer are conditionally
independent given the other layers. This makes the posterior distribution of
hidden variables more elegant than the one of DBNs. However, all of RBMs,
DBNs and DBMs have the difficulties of an intractable partition function or
an intractable posterior distribution. Therefore approximation methods have
to be used to learn the models. Another important deep generative model is
Variational Autoencoders (VAE) [6]. The VAE is a directed model and can
be trained with gradient-based optimization methods. But VAEs also make an
approximation for the objective function, which introduces small errors to the
model. Generative Adversarial Network (GAN) [7] is another type of generative
deep model. One advantage of GANs is that it does not require any approxi-
mation method and can be trained through the differentiable networks. GANs

3
have shown the powerful capabilities of image generation on various kinds of
datasets including digits [7], scenes [8] and even ImageNet [10].

2.2 Generative Adversarial Networks


The GANs were first proposed in 2014 by Goodfellow et al. [7]. The follow-up
works for GANs can be classified into two categories. One is to improve the
stabilization of GANs learning [8, 11, 17, 10] and the other is to apply GANs
to specific tasks [9, 18]. A lot of works have been proposed to improve the
stabilization of GANs learning. Radford et al. [8] introduced the deep convo-
lutional generative adversarial network (DCGAN), and presented some stable
architectures of DCGANs. DCGANs are able to generate high-quality scene
images on the LSUN [19] bedroom dataset. An energy-based generative adver-
sarial network (EBGAN) was introduced in [11]. The discriminator is viewed as
an energy function. It has the benefit that various architectures such as auto-
encoder can be incorporated into the models. Further, several techniques were
introduced to reach better convergence in [10]. One of the techniques is called
feature matching. Instead of maximizing the output of the discriminator for the
generator, it makes the generator to generate samples that match the statistics
of the real data by minimizing the mean square error on an intermediate layer
of the discriminator. The conditional GANs [20] introduce extra information
to both discriminator and generator, which make it possible to generate im-
ages conditioned on class labels or different modal data. Laplacian pyramid of
generative adversarial network (LAPGAN) [17] uses a Laplacian pyramid frame-
work to generate high-resolution images based on a series of conditional GANs.
As GANs have shown the powerful capability for unsupervised tasks, GANs
have been applied to many specific tasks. One successful application is the
super-resolution generative adversarial network (SRGAN) [9] which combines
the traditional content loss for super-resolution task and the adversarial loss.
Another application is for semi-supervised learning [10] which achieves state-of-
the-art performance in semi-supervised classification tasks. Furthermore, Reed
et al. [18] proposed a method for the text to image synthesis task based on the
conditional GANs.

3 Approach
In this section, we first review the formulation of GANs briefly and then present
the limitation of the sigmoid cross entropy loss function by analyzing an example
of saturated GAN learning. The feasibility and advantage of the L2 loss function
are analyzed in section 3.3. Finally, the multi-class GANs with the L2 loss
function are presented in section 3.4.

4
3.1 Generative Adversarial Networks
The learning process of the GANs is to train a discriminator D and a generator
G simultaneously. The target of G is to learn the distribution pg over data x. By
adopting the reparametrization trick, G starts from sampling input variables z
from a uniform distribution pz (z), then maps the input variables z to data space
G(z; θg ) through a differentiable network. On the other hand, D is a classifier
D(x; θd ) that aims to recognize whether an image is from training data or from
G. The minimax objective for GANs can be defined as follows:

min max V (D, G) = Ex∼pdata (x) [log D(x)] + Ez∼pz (z) [log(1 − D(G(z)))]. (1)
G D

The conditional generative adversarial networks introduce some extra infor-


mation y. Both discriminator and generator are conditioned on y. Then the
minimax objective for conditional GANs would be as Equation (2).

min max V (D, G) = Ex∼pdata (x) [log D(x|y)] + Ez∼pz (z) [log(1 − D(G(z|y)))].
G D
(2)
We find that conditional GANs are more powerful for generating images
with multiple classes. Our proposed multi-class GANs are extended from the
conditional GANs.

(a) (b) (c)

Figure 2: Network architectures for comparison experiments, where BN represents for


batch normalization. The layer with BN means that the layer is followed by a batch
normalization layer. Left (a): The architecture of the generator. Middle (b): The
architecture of the discriminator with the sigmoid cross entropy loss function. Right
(c): The architecture of the discriminator with the L2 loss function.

3.2 Analysis of Saturated GAN Learning


When the network architectures of GANs are not well designed, the learning
processes may become unstable. For example, at the early stage of the learn-
ing process, the generator is poor, and if the discriminator performs too well,

5
(a)

(b)

Figure 3: The output distributions of the final layer in the discriminator. (a): An
example of saturated GAN learning. (b): An example of successful GAN learning.
Left: The output distribution of generated images. Right: The output distribution of
real images.

the model may become saturated and the gradient signals will vanish. We de-
sign a conditional GAN to show this phenomenon. The network architecture
is shown in Figure 2(a)(b). The generator uses ReLU activations except for
the last fully-connected layer with sigmoid activation. The discriminator uses
LeakyReLu activations except for the last fully-connected layer with sigmoid
activation. All the layers are conditioned on the label vector. Then we evaluate
this conditional GAN on MNIST. The Figure 3(a) shows the output distribution
of the final layer in the discriminator. The output values are separated from
generated images (Figure 3 Left(a)) and real images (Figure 3 Right(a)). At
iteration 0.7K, the discriminator almost classifies all the images correctly. Then
the model becomes saturated and the gradient signals vanish. Another interest-
ing phenomenon in Figure 3(a) is that the output values from both real images
and fake images all become 1. The reason is that although the discriminator
performs well starting from iteration 0.7K, the discriminator is very biased be-
cause of the poorly generated images. At iteration 2.2K, the generator finds a
solution to fool the biased discriminator. Figure 4(b1) shows that the generator
begins to generate similar images that can fool the biased discriminator. Then
the discriminator tries to update the weights for the errors. However, the dis-
criminator has been saturated and it can hardly update the weights. Therefore
the discriminator keeps a long period to output 1 for the generated images. Sim-
ilarly, the generator finds another image which is shown in Figure 4(c1) to fool
the discriminator at iteration 6.5K. The model can hardly break the saturated

6
(a1) Iter=0.7K (b1) Iter=2.2K (c1) Iter=6.5K (d1) Iter=10K

(a2) Iter=0.7K (b2) Iter=2.2K (c2) Iter=6.5K (d2) Iter=10K

Figure 4: Generated images on MNIST. Upper: Images generated with the sigmoid
loss function. Lower: Images generated with the L2 loss function.

state. At iteration 50K, the generator still generates similar images.

(a) (b)

Figure 5: Left (a): The sigmoid loss function. Right (b): The L2 loss function.

3.3 Proposed Approach


As stated in section 3.1 , the discriminators of traditional GANs act as classifiers,
where the sigmoid cross entropy loss function is adopted. However, we find that
the saturation problem is partially caused by the sigmoid cross entropy loss
function. The sigmoid cross entropy loss function is defined as:
L = −y log σ(x) − (1 − y) log(1 − σ(x)) (3)
where σ(·) denotes the sigmoid function. We plot the sigmoid cross entropy loss
function with y = 1 in Figure 5(a). The curve in Figure 5(a) indicates that the
sigmoid cross entropy loss fucntion will saturate when x is relatively large.

7
Figure 3(b) shows an example of successful GAN learning. It shows that
the discriminator does not perform well and has misclassifications along the
learning process. Our proposed approach is motivated by this observation. We
propose to use the L2 loss function for the discriminator. The L2 loss function
for classification task is defined as:

L = y(x − 1)2 + (1 − y)x2 (4)

For the classification tasks, the performance of the L2 loss function is limited
as the L2 loss function will penalize a lot for correctly classified samples. But in
terms of GANs learning, the discriminator does not need to perform well. The
following experiments demonstrate that the classification capability of the L2
loss function is powerful enough for GANs learning. The limited classification
capability of the L2 loss function relieves the saturation problem of GANs learn-
ing. Another important property of the L2 loss function is that the saturation
region of the L2 loss function is much smaller than the one of the sigmoid cross
entropy loss function. As Figure 5 (b) shows, the L2 loss function will saturate
only near x = 1. For the saturated example introduced in section 3.2, if the sig-
moid cross entropy loss function is replaced by the L2 loss function, the model
will be able to converge to a good state. The following experiments will discuss
this in detail. The above two properties of the L2 loss function make it more
stable for GANs learning and this stabilization provides us more flexibility to
explore more powerful network architectures. The following proposed models
all adopt the L2 loss function.

(a) (b)

Figure 6: Network architectures for handwritten Chinese characters. Left (a): Archi-
tecture of the generator. Right (b): Architecture of the discriminator.

8
3.4 Model Architectures
3.4.1 Multi-class GANs for Handwritten Chinese Characters
For learning a GAN on MNIST, it is able to generate high-quality images. How-
ever, the Chinese characters are more complex than digits. For Chinese char-
acters, it is difficult to achieve a good convergence by GANs and the generated
images is difficult to be recognized as shown in Figure 1. For multiple classes
datasets such as Chinese characters, the experimental results indicate that the
conditional GANs can achieve a better convergence. The reason is that for
GANs without conditions, the generator may generate images combined from
two or more characters and the discriminator treats these images as real images.
But for conditional GANs, the training images are conditioned on their labels,
which equals to train multiple models for multiple classes. Another benefit of
conditional GANs learning is that the images are generated with labels and the
labeled images can be more useful for other applications such as data augmenta-
tion. We leave the GANs learning on Chineses characters for data augmentation
for the future work.
One-hot encoding for labels is adopted for learning conditional GANs on
MNIST in [20]. However, there are much more classes for Chinese characters.
For examples, the Handwritten Chinese Characters dataset (HWDB1.0) [21]
contains 3740 classes. If one-hot encoding vectors are directly concatenated to
the network layers, the size of the network will become huge such that both the
GPU memory cost and the computational time cost are infeasible. We propose
to use a linear mapping layer before concatenation, which maps the big one-hot
vectors into small vectors. The linear mapping layer is used as an encoder. The
experimental results show that the linear mapping layer works well. To make
the learning process stable, the L2 loss function is used for the discriminator.
The network architecture for handwritten Chinese characters is shown in Figure
6. ReLU activations are used for the generator and LeakyReLu activations are
used for the discriminator. The layers to be concatenated are determined by
empirical results.

3.4.2 Model for LSUN


To further demonstrate the stabilization of the L2 loss function, we evaluate it
on LSUN-bedroom dataset. The designed network architecture for learning on
LSUN-bedroom is shown in Figure 7. The architecture of the discriminator is
identical to the one in DCGAN except for the usage of the L2 loss function.
The design of the generator is motivated by the VGG model [22]. Compared
with DCGAN, the top two deconvolutional layers are followed by a stride=1
deconvolutional layer. Our model is able to generate high-quality images with
112 × 112 resolution, where the resolution is higher than in DCGAN. Simi-
lar to DCGAN, ReLU activations are used for the generator and LeakyReLu
activations are used for the discriminator.

9
(a) (b)

Figure 7: Network architectures for LSUN-bedroom. Left (a): Architecture of the


generator. Right (b): Architecture of the discriminator.

4 Experiments
In this section, we first present the comparison experiments between the L2 loss
function and the sigmoid cross entropy loss function on MNIST. This experiment
shows that the L2 loss function is more stable than the sigmoid cross entropy loss
function for GANs learning. Then we evaluate the proposed multi-class GANs
on a handwritten Chinese characters dataset (HWDB1.0). Finally, we evaluate
a deeper GAN with the L2 loss function on the LSUN-bedroom dataset.

Dataset #Categories #Samples


MNIST 10 70,000
HWDB1.0 3740 1,246,991
LSUN-bedroom 1 3,033,042

Table 1: Statistics of the datasets.

4.1 Datasets and Implementation Details


We evaluate our proposed models on 3 datasets including MNIST, HWDB1.0
and LSUN-bedroom. The details of these datasets are shown in Table 1. The
implementation of our proposed models is based on a public implementation of
DCGAN 1 using TensorFlow [23]. All the experiments are conducted on the
1 https://github.com/carpedm20/DCGAN-tensorflow

10
GeForce GTX 1080. The batch size for our implementation is 64. The stddev
for the weights initialization is set to 0.02. The learning rates for handwritten
Chinese characters model and LSUN model are 0.0002 and 0.001 respectively.
The β1 for Adam algorithm is set to 0.5. All the codes and details of our imple-
mentation can be found at https://github.com/xudonmao/Multi-class_GAN.

Figure 8: The output distributions of the final layer in the discriminator with the L2
loss function. Left: The output distribution of generated images. Right: The output
distribution of real images.

4.2 Comparison Experiments on MNIST


To compare the stabilization performance between the L2 loss function and the
sigmoid cross entropy loss function, we design two network architectures. Figure
2(b) is with the sigmoid cross entropy loss function and Figure 2(c) is with the
L2 loss function. Everything of the two architectures is the same except for
the loss function. We evaluate the two architectures on MNIST. The generated
images are shown in Figure 4. The model with the L2 loss function is able
to generate elegant images at iteration 10K. But the model with the sigmoid
cross entropy loss function generates weird images. Actually, at iteration 100K,
the model with the sigmoid cross entropy loss function still can not generate
recognizable images. We also show the output distributions of the discriminator
with the L2 loss function in Figure 8. Compared with Figure 3(a), it shows that
the model with the L2 loss function converges normally and the output values
for real images and fake images both converge to around the optimal value 0.5.
This experiment verifies our analysis in section 3.3 and we can conclude that
the L2 loss function performs more stable than the sigmoid cross entropy loss
function for GANs learning.

4.3 Handwritten Chinese Characters


To evaluate the capabilities of the multi-class GANs, we trained the multi-class
GANs on a handwritten Chinese characters dataset (HWDB1.0) which contains
3740 classes. The architectures of the multi-class GANs are shown in Figure
6. The labels are encoded to one-hot vectors first and then a linear mapping
layer is used to map the one-hot vectors into 256 dimension vectors. The top 3
layers of the discriminator are concatenated with the label vectors and only the

11
(a) Generated images (b) Real images

Figure 9: Generated images of handwritten Chinese characters by multi-class GANs.


Left (a): Generated images. Right (b): Real images. The images in the same position
belong to the same class.

input layer is concatenated for the generator. The layers to be concatenated


are determined by empirical results. Figure 9 shows the generated images by
multi-class GANs as well as the corresponding real images from the same class.
Compared with the images generated by GANs in Figure 1, the generated images
by multi-class GANs are more elegant. The generated characters are easy to be
recognized even when the number of the classes achieves 3740.

4.4 LSUN
To further evaluate the performance of the L2 loss function, we designed a deep
GAN which is shown in Figure 7 and evaluated it on the LSUN-bedroom dataset.
The architecture of the generator is motivated by the VGG model, where two
stride=1 deconvolutional layers are added after two stride=2 deconvolutional
layers. The model is able to generate high-quality images with 112 × 112 resolu-
tion as shown in Figure 10. Compared with the generated images in DCGAN,
our generated images contain more details. For example, from the image at row
4 and column 2, we can see the beach from the window and from the image at
row 1 and column 4, there is a crystal chandelier at the top of the bedroom.

5 Conclusions and Future Work


In this work we propose to use the L2 loss function for the discriminator in-
stead of the sigmoid cross entropy loss function. The comparison experiments
demonstrate that the L2 loss function performs more stable than the sigmoid
cross entropy loss function for GANs learning. The multi-class GANs for gener-
ating images with multiple classes are also proposed. The experimental results

12
Figure 10: Generated images on LSUN-bedroom by our proposed model

on a handwritten Chinese characters dataset with 3740 classes show that the
proposed model improves the quality of the generated images significantly com-
pared with the traditional GANs. In addition, we propose a deep GAN with
the L2 loss function for generating high-resolution images. In future work, we
plan to extend the multi-class GANs to be applied in more complex multi-class
datasets such as ImageNet. The applications of the multi-class GANs such as
data augmentation for Chinese characters recognition are also worth studying.

References
[1] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition,” arXiv preprint arXiv:1512.03385, 2015.
[2] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-
Time Object Detection with Region Proposal Networks,” ArXiv e-prints,
June 2015.

13
[3] J. Long, E. Shelhamer, and T. Darrell, “Fully Convolutional Networks for
Semantic Segmentation,” ArXiv e-prints, Nov. 2014.
[4] G. Hinton and R. Salakhutdinov, “Reducing the dimensionality of data
with neural networks,” Science, vol. 313, no. 5786, pp. 504 – 507, 2006.

[5] R. Salakhutdinov and G. Hinton, “Deep Boltzmann machines,” in Proceed-


ings of the International Conference on Artificial Intelligence and Statistics,
vol. 5, pp. 448–455, 2009.
[6] D. P. Kingma and M. Welling, “Auto-Encoding Variational Bayes,” ArXiv
e-prints, Dec. 2013.

[7] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,


S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in
Advances in Neural Information Processing Systems 27 (Z. Ghahramani,
M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, eds.),
pp. 2672–2680, Curran Associates, Inc., 2014.

[8] A. Radford, L. Metz, and S. Chintala, “Unsupervised Representation


Learning with Deep Convolutional Generative Adversarial Networks,”
ArXiv e-prints, Nov. 2015.
[9] C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Aitken, A. Tejani, J. Totz,
Z. Wang, and W. Shi, “Photo-Realistic Single Image Super-Resolution Us-
ing a Generative Adversarial Network,” ArXiv e-prints, Sept. 2016.
[10] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and
X. Chen, “Improved Techniques for Training GANs,” ArXiv e-prints, June
2016.

[11] J. Zhao, M. Mathieu, and Y. LeCun, “Energy-based Generative Adversarial


Network,” ArXiv e-prints, Sept. 2016.
[12] G. E. Hinton, S. Osindero, and Y.-W. Teh, “A fast learning algorithm for
deep belief nets,” Neural Comput., vol. 18, pp. 1527–1554, July 2006.
[13] N. Srivastava and R. R. Salakhutdinov, “Multimodal learning with deep
boltzmann machines,” in Advances in Neural Information Processing Sys-
tems 25 (F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger,
eds.), pp. 2222–2230, Curran Associates, Inc., 2012.
[14] G. W. Taylor, R. Fergus, Y. LeCun, and C. Bregler, Convolutional Learning
of Spatio-temporal Features, pp. 140–153. Berlin, Heidelberg: Springer
Berlin Heidelberg, 2010.
[15] G. E. Hinton and R. R. Salakhutdinov, “Replicated softmax: an undirected
topic model,” in Advances in Neural Information Processing Systems 22
(Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams, and A. Cu-
lotta, eds.), pp. 1607–1614, Curran Associates, Inc., 2009.

14
[16] R. Salakhutdinov and G. Hinton, “Deep Boltzmann machines,” in Proceed-
ings of the International Conference on Artificial Intelligence and Statistics,
vol. 5, pp. 448–455, 2009.
[17] E. L. Denton, S. Chintala, A. Szlam, and R. Fergus, “Deep generative image
models using a laplacian pyramid of adversarial networks,” in Advances in
Neural Information Processing Systems 28 (C. Cortes, N. D. Lawrence,
D. D. Lee, M. Sugiyama, and R. Garnett, eds.), pp. 1486–1494, Curran
Associates, Inc., 2015.
[18] S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee, “Gen-
erative adversarial text-to-image synthesis,” in Proceedings of The 33rd
International Conference on Machine Learning, 2016.
[19] F. Yu, A. Seff, Y. Zhang, S. Song, T. Funkhouser, and J. Xiao, “LSUN:
Construction of a Large-scale Image Dataset using Deep Learning with
Humans in the Loop,” ArXiv e-prints, June 2015.

[20] M. Mirza and S. Osindero, “Conditional Generative Adversarial Nets,”


ArXiv e-prints, Nov. 2014.
[21] C.-L. Liu, F. Yin, Q.-F. Wang, and D.-H. Wang, “Icdar 2011 chinese hand-
writing recognition competition,” in Proceedings of the 2011 International
Conference on Document Analysis and Recognition, ICDAR ’11, (Washing-
ton, DC, USA), pp. 1464–1469, IEEE Computer Society, 2011.

[22] K. Simonyan and A. Zisserman, “Very deep convolutional networks for


large-scale image recognition,” CoRR, vol. abs/1409.1556, 2014.
[23] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Cor-
rado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp,
G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Lev-
enberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster,
J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke,
V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke,
Y. Yu, and X. Zheng, “TensorFlow: Large-scale machine learning on het-
erogeneous systems,” 2015. Software available from tensorflow.org.

15

View publication stats

Das könnte Ihnen auch gefallen