Vector Quantization using Gaussian Variational Autoencoder

Xu, Tongda, Zheng, Wendi, He, Jiajun, Hernandez-Lobato, Jose Miguel, Wang, Yan, Zhang, Ya-Qin, Tang, Jie

arXiv.org Artificial Intelligence 

V ector quantized variational autoencoder (VQ-V AE) is a discrete auto-encoder that compresses images into discrete tokens. It is difficult to train due to dis-cretization. In this paper, we propose a simple yet effective technique, dubbed Gaussian Quant (GQ), that converts a Gaussian V AE with certain constraint into a VQ-V AE without training. GQ generates random Gaussian noise as a code-book and finds the closest noise to the posterior mean. Theoretically, we prove that when the logarithm of the codebook size exceeds the bits-back coding rate of the Gaussian V AE, a small quantization error is guaranteed. Practically, we propose a heuristic to train Gaussian V AE for effective GQ, named target divergence constraint (TDC). Empirically, we show that GQ outperforms previous VQ-V AEs, such as VQGAN, FSQ, LFQ, and BSQ, on both UNet and ViT architectures. Furthermore, TDC also improves upon previous Gaussian V AE discretization methods, such as TokenBridge. V ector-quantized variational autoencoder (V an Den Oord et al., 2017) is an autoencoder that compresses images into discrete tokens. It is fundamental to autoregressive generative models (Esser et al., 2021; Chang et al., 2022; Y u et al., 2023; Sun et al., 2024b). However, VQ-V AE is difficult to train: the encoding process of VQ-V AE is not differentiable and challenges such as codebook collapse often emerge (Sønderby et al., 2017).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found