transgan
TransGAN: Two Pure Transformers Can Make One Strong GAN, and That Can Scale Up
The recent explosive interest on transformers has suggested their potential to become powerful ``universal models for computer vision tasks, such as classification, detection, and segmentation. While those attempts mainly study the discriminative models, we explore transformers on some more notoriously difficult vision tasks, e.g., generative adversarial networks (GANs). Our goal is to conduct the first pilot study in building a GAN \textit{completely free of convolutions}, using only pure transformer-based architectures. Our vanilla GAN architecture, dubbed \textbf{TransGAN}, consists of a memory-friendly transformer-based generator that progressively increases feature resolution, and correspondingly a multi-scale discriminator to capture simultaneously semantic contexts and low-level textures. On top of them, we introduce the new module of grid self-attention for alleviating the memory bottleneck further, in order to scale up TransGAN to high-resolution generation. We also develop a unique training recipe including a series of techniques that can mitigate the training instability issues of TransGAN, such as data augmentation, modified normalization, and relative position encoding. Our best architecture achieves highly competitive performance compared to current state-of-the-art GANs using convolutional backbones.
7c220a2091c26a7f5e9f1cfb099511e3-Supplemental.pdf
Appendix of "TransGAN: Two Pure Transformers Can Make One Strong GAN, and That Can Scale Up" We also evaluate the effectiveness of stronger augmentation on high-resolution generative tasks (E.g. Table 1, 2, 3, 4. For the generator architectures, the "Block" represents the basic Transformer Block "Grid Block" denotes the Transformer Block where the standard self-attention is replaced by the propose For the discriminator architectures, we use "Layer Flatten" to represent the process of We compare the GPU memory cost between standard self-attention and grid self-attention. We evaluate the inference cost of these two architectures, without calculating the gradient. We include more high-resolution visual examples on Figure 3,4.
- South America > Peru > Loreto Department (0.23)
- North America > Mexico > Gulf of Mexico (0.21)
- Europe > Poland > Pomerania Province (0.05)
- Oceania > Australia > New South Wales > Sydney (0.04)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- South America > Peru > Loreto Department (0.23)
- North America > Mexico > Gulf of Mexico (0.21)
- Oceania > Australia > New South Wales > Sydney (0.04)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
TransGAN: Two Pure Transformers Can Make One Strong GAN, and That Can Scale Up
The recent explosive interest on transformers has suggested their potential to become powerful universal" models for computer vision tasks, such as classification, detection, and segmentation. While those attempts mainly study the discriminative models, we explore transformers on some more notoriously difficult vision tasks, e.g., generative adversarial networks (GANs). Our goal is to conduct the first pilot study in building a GAN \textit{completely free of convolutions}, using only pure transformer-based architectures. Our vanilla GAN architecture, dubbed \textbf{TransGAN}, consists of a memory-friendly transformer-based generator that progressively increases feature resolution, and correspondingly a multi-scale discriminator to capture simultaneously semantic contexts and low-level textures. On top of them, we introduce the new module of grid self-attention for alleviating the memory bottleneck further, in order to scale up TransGAN to high-resolution generation.
Paper Explained: TransGAN -- Two Transformers can make One Strong GAN
Most of the NLP tasks are currently solved using the Transformer network or a variation in the Transformer network. Transformers have become an integral part of the NLP eco-system over the past few years because of their reusability. Some multi-modal tasks are using the transformer network somewhere; still, those aren't CNN free. Any Computer Vision task coupled with Transformers; also employs a CNN as backbones for feature extraction. But with TransGAN, a pure transformer network-based architecture is developed to train a GAN for image synthesis.
Hot papers on arXiv from the past month – February 2021
Abstract: Conceptual abstraction and analogy-making are key abilities underlying humans' abilities to learn, reason, and robustly adapt their knowledge to new domains. Despite of a long history of research on constructing AI systems with these abilities, no current AI system is anywhere close to a capability of forming humanlike abstractions or analogies. This paper reviews the advantages and limitations of several approaches toward this goal, including symbolic methods, deep learning, and probabilistic program induction. The paper concludes with several proposals for designing challenge tasks and evaluation measures in order to make quantifiable and generalizable progress in this area.
- Overview (0.90)
- Research Report (0.55)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Vision (0.98)
- Information Technology > Sensing and Signal Processing > Image Processing (0.97)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)
Papers with Code - TransGAN: Two Transformers Can Make One Strong GAN
The recent explosive interest on transformers has suggested their potential to become powerful "universal" models for computer vision tasks, such as classification, detection, and segmentation. However, how further transformers can go - are they ready to take some more notoriously difficult vision tasks, e.g., generative adversarial networks (GANs)?.. Driven by that curiosity, we conduct the first pilot study in building a GAN \textbf{completely free of convolutions}, using only pure transformer-based architectures. Our vanilla GAN architecture, dubbed \textbf{TransGAN}, consists of a memory-friendly transformer-based generator that progressively increases feature resolution while decreasing embedding dimension, and a patch-level discriminator that is also transformer-based. We then demonstrate TransGAN to notably benefit from data augmentations (more than standard GANs), a multi-task co-training strategy for the generator, and a locally initialized self-attention that emphasizes the neighborhood smoothness of natural images. Equipped with those findings, TransGAN can effectively scale up with bigger models and high-resolution image datasets.