Vector quantization loss analysis in VQGANs: a single-GPU ablation study for image-to-image synthesis

Verma, Luv, Mohan, Varun

arXiv.org Artificial Intelligence 

The introduction of VQ-VAE models marked a transformative moment in the field of image processing, introducing the powerful concept of vector codebooks for discrete latent representation. VQ-VAEs, adept at modeling long-term dependencies, successfully harnessed the compressed discrete latent space to generate images, action sequences, and even meaningful speech in an unsupervised manner [Van Den Oord et al., 2017]. Building on this foundation, VQGANs introduced a fusion of GANs for image reconstruction, with a particular emphasis on using transformer attention layers in both encoder and decoder [Esser et al., 2021]. This development has enabled high-resolution image synthesis through the innovative representation of images as compositions of perceptually rich constituents. These advancements have showcased significant success stories, particularly in handling large image datasets, often surpassing state-ofthe-art convolutional approaches [Esser et al., 2021]. The utilization of codebook vectors has not only reduced data requirements but also transitioned the modeling space from continuous to discrete. In this study, we sought to examine how these sophisticated models behave when applied to a smaller, more constrained dataset with limited computational resources, such as a single GPU a100 Corporation [2020]. The Oxford 102 Flower dataset Visual Geometry Group [2008], with its rich variations in colors and features, serves as a suitable testing ground for this examination.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found