Convolutional Adversarial Network for Generating Anime Characters

Yann LeCun’s Speech

GAN is my most interesting research that I have ever done, since it enables me to feel the fusion of art and technology. Other than many machine learning projects, the direct attraction of playing with generative adversarial networks is the diverse datasets. I have never collected and generated this amount of photos of my favorite anime character. At first, when I manually labeled characters headshots, I could not imagine that my new CNN networks can automatically detect and classify the characters in few days training.

DCGAN for MNIST Handwriting Dataset

I will not explain what is GAN, since this is research report not a tutorial. If you do not understand the idea, you can review the original publication or other documents.

My method of generating dataset is here. Thanks for ‘s perfect detector, so that I can significantly speed up my R-CNN model.

Basic DC-GAN – Generate Chitanda

This is the first test of DCGAN. One of my favorite characters in Kyoto Anime, Chitanda Eru, was my first experiment object. The training used my manually-labeled dataset which included 650 images like picture below.

Part of Chitanda Dataset

The discriminator network had 6 convolutional layers. The numbers of channel doubled for every connection of layers. The generator consisted one convolutional layer and 5 deconvolution layers. The dimension of noise input of generator was 64. My first structure was really simple compared to other implementations.

Epoch 10
Epoch 150
Epoch 50
Epoch 300

These are the results after I trained these networks on a Tesla K80 for 1 hour. I set the sampling frequency to 10 epochs and each sample had 32 images from random noise, 16 images from fixed noise. The result seems not well, since they have many fuzzy collapses. So I improved dataset and network parameters in next test.

Larger Network – Generate Misaka Mikoto

In this experiment, I manually labeled hundreds of faces from anime faces before the R-CNN training. The dataset generator creates 20k headshots with label from anime in 10 minutes. In this case, I increased network parameters and trained them on an Tesla P100.

At first, I used the configuration in previous test. However, the result was worse than first test. Maybe, it was caused by the increase of data complexity, since Misaka is more expressive and active in To Aru Kagaku No Rail Gun. So, I doubled the numbers of channels and image size. In addition, I add noise dimension from 64 to 96 to make the result more diverse.

Due to the increase of dataset complexity, the result still had many collapses. Even when I change the channel number to 128, the result was not improved. So, I decided to add residual blocks to generator since it will enable generator to gather more features from dataset and be more ‘intelligent’. In other test like ImageNet, COCO, ResNet-based production worked very well even renewed many records in computer vision area, indicating that deep residual network has amazing capability to process difficult image features.

Epoch 9
Epoch 100
Epoch 30
Epoch 300

Residual Blocks Tests

With inspiration from SRGAN and my SE-ResNet experiment, I decided to add residual blocks inside networks. The rough structures of networks are shown below. I will adjust the parameters during testing.

This research continues in next post.

Different networks are being trained in progress with larger dataset. I will upgrade the result when all the trainings complete.


Share This Page:

Leave a Reply

Your email address will not be published. Required fields are marked *

18 − six =

This site uses Akismet to reduce spam. Learn how your comment data is processed.