Vector quantization loss analysis in VQGANs: a single-GPU ablation study for image-to-image synthesis

Aug-9-2023–arXiv.org Artificial Intelligence

The introduction of VQ-VAE models marked a transformative moment in the field of image processing, introducing the powerful concept of vector codebooks for discrete latent representation. VQ-VAEs, adept at modeling long-term dependencies, successfully harnessed the compressed discrete latent space to generate images, action sequences, and even meaningful speech in an unsupervised manner [Van Den Oord et al., 2017]. Building on this foundation, VQGANs introduced a fusion of GANs for image reconstruction, with a particular emphasis on using transformer attention layers in both encoder and decoder [Esser et al., 2021]. This development has enabled high-resolution image synthesis through the innovative representation of images as compositions of perceptually rich constituents. These advancements have showcased significant success stories, particularly in handling large image datasets, often surpassing state-ofthe-art convolutional approaches [Esser et al., 2021]. The utilization of codebook vectors has not only reduced data requirements but also transitioned the modeling space from continuous to discrete. In this study, we sought to examine how these sophisticated models behave when applied to a smaller, more constrained dataset with limited computational resources, such as a single GPU a100 Corporation [2020]. The Oxford 102 Flower dataset Visual Geometry Group [2008], with its rich variations in colors and features, serves as a suitable testing ground for this examination.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

Aug-9-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Massachusetts > Suffolk County > Boston (0.04)
- Europe > United Kingdom
  - England > Oxfordshire > Oxford (0.04)

Genre:
- Research Report > New Finding (1.00)

Technology:
- Information Technology
  - Sensing and Signal Processing > Image Processing (1.00)
  - Artificial Intelligence
    - Vision (1.00)
    - Machine Learning > Neural Networks
      - Deep Learning (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found