Adversarially trained generative models (GANs) have recently achieved compelling image synthesis results. But despite early successes in using GANs for unsupervised representation learning, they have since been superseded by approaches based on self-supervision. In this work we show that progress in image generation quality translates to substantially improved representation learning performance. Our approach, BigBiGAN, builds upon the state-of-the-art BigGAN model, extending it to representation learning by adding an encoder and modifying the discriminator. We extensively evaluate the representation learning and generation capabilities of these BigBiGAN models, demonstrating that these generation-based models achieve the state of the art in unsupervised representation learning on ImageNet, as well as in unconditional image generation.
The capabilities of artificial intelligence (AI) are growing exponentially, especially in the area of creating synthetic images that are photorealistic. In 2014, generative adversarial networks (GANs) were introduced. A few years later, bidirectional GANs (BiGANs) were created. Then came along BigGANs that outperformed state-of-the-art GANs in image synthesis. But wait, there's more: Last week researchers from Alphabet Inc.'s DeepMind debuted BigBiGANs.
Uncertainty estimates help to identify ambiguous, novel, or anomalous inputs, but the reliable quantification of uncertainty has proven to be challenging for modern deep networks. To improve uncertainty estimation, we propose On-Manifold Adversarial Data Augmentation or OMADA, which specifically attempts to generate the most challenging examples by following an on-manifold adversarial attack path in the latent space of an autoencoder-based generative model that closely approximates decision boundaries between two or more classes. On a variety of datasets and for multiple network architectures, OMADA consistently yields more accurate and better calibrated classifiers than baseline models, and outperforms competing approaches such as Mixup and CutMix, as well as achieving similar performance to (at times better than) post-processing calibration methods such as temperature scaling. Variants of OMADA can employ different sampling schemes for ambiguous on-manifold examples based on the entropy of their estimated soft labels, which exhibit specific strengths for generalization, calibration of predicted uncertainty, or detection of out-of-distribution inputs.
Deep generative modeling has aroused a lot of interest as a method for data generation and representation learning. Consider the observed real data X from an unknown distribution p r on X R d and the latent variable Z with a known prior p z on Z R k . In unidirectional data generation, we are interested in learning a transformation G: Z E X so that the distribution of the transformed variable G(Z, ɛ) becomes close to p r, where ɛ E is the source of randomness with a specified distribution p ɛ and G is referred to as a generator. In many applications, bidirectional generative modeling is favored due to the ability to learn representations, where we additionally learn a transformation E: X E Z, known as an encoder. The principled formulation of bidirectional generation is to match the distributions of two data-latent pairs (X, E(X, ɛ)) and (G(Z, ɛ), Z). Classical methods including Variational Autoencoder (VAE)  and Bidirectional Generative Adversarial Networks (BiGAN) [2,3] turn out to handle this task using one specific distance measure as the objective. In this paper, we generally consider the f-divergence which is a natural and broad class of distance measures. We discuss the advantages of this general formulation in several concerned issues including unidirectional generation, mode coverage and cycle consistency, especially for the Kullback-Leibler (KL) divergence which is our main choice. For optimization, both VAE and BiGAN are limited to specific divergences and assumptions for the encoder and generator distributions, and hence do not apply in our general formulation.
This paper presents SimCLR: a simple framework for contrastive learning of visual representations. We simplify recently proposed contrastive self-supervised learning algorithms without requiring specialized architectures or a memory bank. In order to understand what enables the contrastive prediction tasks to learn useful representations, we systematically study the major components of our framework. We show that (1) composition of data augmentations plays a critical role in defining effective predictive tasks, (2) introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations, and (3) contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning. By combining these findings, we are able to considerably outperform previous methods for self-supervised and semi-supervised learning on ImageNet. A linear classifier trained on self-supervised representations learned by SimCLR achieves 76.5% top-1 accuracy, which is a 7% relative improvement over previous state-of-the-art, matching the performance of a supervised ResNet-50. When fine-tuned on only 1% of the labels, we achieve 85.8% top-5 accuracy, outperforming AlexNet with 100X fewer labels.