Forward Super-Resolution: How Can GANs Learn Hierarchical Generative Models for Real-World Distributions

Allen-Zhu, Zeyuan, Li, Yuanzhi

arXiv.org Machine Learning 

In practice, by simply training a generator and a discriminator together consisting of multi-layer neural networks with non-linear activation functions, using local search algorithms such as stochastic gradient descent ascent (SGDA), the generator network can be trained efficiently to generate samples from highly-complicated distributions (such as the distribution of images). Despite the great empirical success of GAN, it remains to be one of the least understood models on the theory side of deep learning. Most of existing theories focus on the statistical properties of GANs at the global-optimum [15, 16, 20, 87]. However, on the training side, gradient descent ascent only enjoys efficient convergence to a global optimum when the loss function is convex-concave, or efficient convergence to a critical point in general settings [37, 38, 48, 53, 71, 73, 75, 77, 78]. Due to the extreme non-linearity of the networks in both the generator and the discriminator, it is highly unlikely that the training objective of GANs can be convex-concave. In particular, even if the generator and the discriminator are linear functions over prescribed feature mappings-- such as the neural tangent kernel (NTK) feature mappings [3, 8, 9, 17, 18, 32, 35, 40, 41, 47, 51, 54, 65, 69, 92, 97] -- the training objective can still be non-convex-concave.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found