7c220a2091c26a7f5e9f1cfb099511e3-Supplemental.pdf

Neural Information Processing Systems 

Appendix of "TransGAN: Two Pure Transformers Can Make One Strong GAN, and That Can Scale Up" We also evaluate the effectiveness of stronger augmentation on high-resolution generative tasks (E.g. Table 1, 2, 3, 4. For the generator architectures, the "Block" represents the basic Transformer Block "Grid Block" denotes the Transformer Block where the standard self-attention is replaced by the propose For the discriminator architectures, we use "Layer Flatten" to represent the process of We compare the GPU memory cost between standard self-attention and grid self-attention. We evaluate the inference cost of these two architectures, without calculating the gradient. We include more high-resolution visual examples on Figure 3,4.