Adversarially trained generative models (GANs) have recently achieved compelling image synthesis results. But despite early successes in using GANs for unsupervised representation learning, they have since been superseded by approaches based on self-supervision. In this work we show that progress in image generation quality translates to substantially improved representation learning performance. Our approach, BigBiGAN, builds upon the state-of-the-art BigGAN model, extending it to representation learning by adding an encoder and modifying the discriminator. We extensively evaluate the representation learning and generation capabilities of these BigBiGAN models, demonstrating that these generation-based models achieve the state of the art in unsupervised representation learning on ImageNet, as well as in unconditional image generation.
The capabilities of artificial intelligence (AI) are growing exponentially, especially in the area of creating synthetic images that are photorealistic. In 2014, generative adversarial networks (GANs) were introduced. A few years later, bidirectional GANs (BiGANs) were created. Then came along BigGANs that outperformed state-of-the-art GANs in image synthesis. But wait, there's more: Last week researchers from Alphabet Inc.'s DeepMind debuted BigBiGANs.
We find that, just as a large transformer model trained on language can generate coherent text, the same exact model trained on pixel sequences can generate coherent image completions and samples. By establishing a correlation between sample quality and image classification accuracy, we show that our best generative model also contains features competitive with top convolutional nets in the unsupervised setting. Unsupervised and self-supervised learning, or learning without human-labeled data, is a longstanding challenge of machine learning. Recently, it has seen incredible success in language, as transformer models like BERT, GPT-2, RoBERTa, T5, and other variants have achieved top performance on a wide array of language tasks. However, the same broad class of models has not been successful in producing strong features for image classification.
Uncertainty estimates help to identify ambiguous, novel, or anomalous inputs, but the reliable quantification of uncertainty has proven to be challenging for modern deep networks. To improve uncertainty estimation, we propose On-Manifold Adversarial Data Augmentation or OMADA, which specifically attempts to generate the most challenging examples by following an on-manifold adversarial attack path in the latent space of an autoencoder-based generative model that closely approximates decision boundaries between two or more classes. On a variety of datasets and for multiple network architectures, OMADA consistently yields more accurate and better calibrated classifiers than baseline models, and outperforms competing approaches such as Mixup and CutMix, as well as achieving similar performance to (at times better than) post-processing calibration methods such as temperature scaling. Variants of OMADA can employ different sampling schemes for ambiguous on-manifold examples based on the entropy of their estimated soft labels, which exhibit specific strengths for generalization, calibration of predicted uncertainty, or detection of out-of-distribution inputs.
Deep generative modeling has aroused a lot of interest as a method for data generation and representation learning. Consider the observed real data X from an unknown distribution p r on X R d and the latent variable Z with a known prior p z on Z R k . In unidirectional data generation, we are interested in learning a transformation G: Z E X so that the distribution of the transformed variable G(Z, ɛ) becomes close to p r, where ɛ E is the source of randomness with a specified distribution p ɛ and G is referred to as a generator. In many applications, bidirectional generative modeling is favored due to the ability to learn representations, where we additionally learn a transformation E: X E Z, known as an encoder. The principled formulation of bidirectional generation is to match the distributions of two data-latent pairs (X, E(X, ɛ)) and (G(Z, ɛ), Z). Classical methods including Variational Autoencoder (VAE)  and Bidirectional Generative Adversarial Networks (BiGAN) [2,3] turn out to handle this task using one specific distance measure as the objective. In this paper, we generally consider the f-divergence which is a natural and broad class of distance measures. We discuss the advantages of this general formulation in several concerned issues including unidirectional generation, mode coverage and cycle consistency, especially for the Kullback-Leibler (KL) divergence which is our main choice. For optimization, both VAE and BiGAN are limited to specific divergences and assumptions for the encoder and generator distributions, and hence do not apply in our general formulation.