not-so-BigGAN: Generating High-Fidelity Images on Small Compute with Wavelet-based Super-Resolution

Han, Seungwook, Srivastava, Akash, Hurwitz, Cole, Sattigeri, Prasanna, Cox, David D.

arXiv.org Machine Learning 

State-of-the-art models for high-resolution image generation, such as BigGAN and VQVAE-2, require an incredible amount of compute resources and/or time (512 TPU-v3 cores) to train, putting them out of reach for the larger research community. On the other hand, GAN-based image super-resolution models, such as ESRGAN, can not only upscale images to high dimensions, but also are efficient to train. First, we generate images in low-frequency bands by training a sampler in the wavelet domain. Wavelet-based down-sampling method preserves more structural information than pixel-based methods, leading to significantly better generative quality of the low-resolution sampler (e.g., 64 64). Since the sampler and decoder can be trained in parallel and operate on much lower dimensional spaces than end-to-end models, the training cost is substantially reduced. On ImageNet 512 512, our model achieves a Fréchet Inception Distance (FID) of 10.59 - beating the baseline BigGAN model - at half the compute (256 TPU-v3 cores). Generative modeling of natural images has achieved great success in recent years (Kingma & Welling, 2013; Goodfellow et al., 2014; Arjovsky et al., 2017; Menick & Kalchbrenner, 2019; Zhang et al., 2018a). Yet, generating high-dimensional complex data, such as ImageNet, still remains challenging and extremely resource intensive.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found