Vector Quantization using Gaussian Variational Autoencoder
Xu, Tongda, Zheng, Wendi, He, Jiajun, Hernandez-Lobato, Jose Miguel, Wang, Yan, Zhang, Ya-Qin, Tang, Jie
–arXiv.org Artificial Intelligence
V ector quantized variational autoencoder (VQ-V AE) is a discrete auto-encoder that compresses images into discrete tokens. It is difficult to train due to dis-cretization. In this paper, we propose a simple yet effective technique, dubbed Gaussian Quant (GQ), that converts a Gaussian V AE with certain constraint into a VQ-V AE without training. GQ generates random Gaussian noise as a code-book and finds the closest noise to the posterior mean. Theoretically, we prove that when the logarithm of the codebook size exceeds the bits-back coding rate of the Gaussian V AE, a small quantization error is guaranteed. Practically, we propose a heuristic to train Gaussian V AE for effective GQ, named target divergence constraint (TDC). Empirically, we show that GQ outperforms previous VQ-V AEs, such as VQGAN, FSQ, LFQ, and BSQ, on both UNet and ViT architectures. Furthermore, TDC also improves upon previous Gaussian V AE discretization methods, such as TokenBridge. V ector-quantized variational autoencoder (V an Den Oord et al., 2017) is an autoencoder that compresses images into discrete tokens. It is fundamental to autoregressive generative models (Esser et al., 2021; Chang et al., 2022; Y u et al., 2023; Sun et al., 2024b). However, VQ-V AE is difficult to train: the encoding process of VQ-V AE is not differentiable and challenges such as codebook collapse often emerge (Sønderby et al., 2017).
arXiv.org Artificial Intelligence
Dec-9-2025
- Country:
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Genre:
- Research Report (0.82)
- Technology: