Harmonizing Maximum Likelihood with GANs for Multimodal Conditional Generation
Lee, Soochan, Ha, Junsoo, Kim, Gunhee
Recent advances in conditional image generation tasks, such as image-to-image translation and image inpainting, are largely accounted to the success of conditional GAN models, which are often optimized by the joint use of the GAN loss with the reconstruction loss However, we reveal that this training recipe shared by almost all existing methods causes one critical side effect: lack of diversity in output samples. In order to accomplish both training stability and multimodal output generation, we propose novel training schemes with a new set of losses named moment reconstruction losses that simply replace the reconstruction loss. We show that our approach is applicable to any conditional generation tasks by performing thorough experiments on image-to-image translation, super-resolution and image inpainting using Cityscapes and CelebA dataset. Quantitative evaluations also confirm that our methods achieve a great diversity in outputs while retaining or even improving the visual fidelity of generated samples. Recently, active research has led to a huge progress on conditional image generation, whose typical tasks include image-to-image translation (Isola et al. (2017)), image inpainting (Pathak et al. (2016)), super-resolution (Ledig et al. (2017)) and video prediction (Mathieu et al. (2016)). At the core of such advances is the success of conditional GANs (Mirza & Osindero (2014)), which improve GANs by allowing the generator to take an additional code or condition to control the modes of the data being generated. However, training GANs, including conditional GANs, is highly unstable and easy to collapse (Goodfellow et al. (2014)). Indeed, using these two types of losses is synergetic in that the GAN loss complements the weakness of the reconstruction loss that output samples are blurry and lack high-frequency structure, while the reconstruction loss offers the training stability required for convergence. In spite of its success, we argue that it causes one critical side effect; the reconstruction loss aggravates the mode collapse, one of notorious problems of GANs. In conditional generation tasks, which are to intrinsically learn one-to-many mappings, the model is expected to generate diverse outputs from a single conditional input, depending on some stochastic variables (e.g.
Feb-25-2019
- Country:
- Europe > Italy
- Calabria > Catanzaro Province > Catanzaro (0.04)
- Asia > South Korea
- Europe > Italy
- Genre:
- Research Report (1.00)