This post illustrates the test of different residual blocks and architectures of deep convolutional generative adversarial network in detail, since I faced many problems when testing residual blocks. If you want to see more results of basic DCGAN, please go to my last post. Click the link below.
The picture below shows the architecture of SR-GAN, using GAN to generate high-resolution photo from blurry one. Residual blocks in network offer it great ability of feature restoration. Inspired by SR-GAN, I decided to add different residual blocks into DC-GAN to fix the collapse problems in previous tests.
Residual Block with SE-Layer
Squeeze and Excitation are very interesting structure in residual blocks. With two compression layers, It provides weights for every channels, offering the ability of ‘filtrating’ channels. According to the data from original paper, It works really well in classification problem. So I have to conduct some experiments and apply this amazing idea into GAN, as well as to solve the collapse problem.
At first, I simply added 4 residual blocks between deconvolution layers and output layer in generator. I turned off all bias units in the networks. When I downloaded the result to my computer, I was shocked by the disappearance of character after epoch 15. And all the fake images turned to be highly similar after epoch 20. What’s wrong with the network? Be sure that it is not caused by the data, I decided to conduct more architecture experiments among this dataset.
After many trials for different architecture, I solved this problem by following instruction from many documents, like adding bias units in downsampling and upsampling layers, replacing ReLu activation function by LeakyReLu and adding Tanh function at the end of generator. I didn’t know what exactly the problem is. This phenomenon is really rare. It sometimes happened and sometimes did not happen before I adjusted the networks. Whatever it was solved.
The Balance between Discriminator and Generator
I have tried many combination of two networks, but most of the trials collapsed after several epochs training. I found that for these trials, the discriminator converged so fast that the loss dropped to 0.001 after only few steps. I searched this problem on Google. One of the results mentioned that when using own dataset, we should take care of the balance between two networks, keeping two networks in a healthy adversarial relationship.
Initially, I weaken the discriminator by decreasing the number of channels in hidden layers. The converging speed got sightly lower when I cut 50 percent of channels. After several epochs, I examined the generated images, finding lots of messy images, indicating that the networks did not really learn things. I changed another way. I tried to only update discriminator once in three steps while generator kept original method.
After many tests, this problem continually bothered me. Finally, the configurations only includes 1 or 2 residual blocks worked well. More blocks mean faster collapse of the adversarial relationship. The results are shown below. The quality was slightly improved compared with original configuration. I will continue this test and update report after CycleGAN tests.
TO BE CONTINUE