Scaling the Codebook Size of VQ-GAN to 100,000 with a Utilization Rate of 99%

May-26-2025, 17:14:34 GMT–Neural Information Processing Systems

In the realm of image quantization exemplified by VQGAN, the process encodes images into discrete tokens drawn from a codebook with a predefined size. Recent advancements, particularly with LLAMA 3, reveal that enlarging the codebook significantly enhances model performance. However, VQGAN and its derivatives, such as VQGAN-FC (Factorized Codes) and VQGAN-EMA, continue to grapple with challenges related to expanding the codebook size and enhancing codebook utilization. For instance, VQGAN-FC is restricted to learning a codebook with a maximum size of 16,384, maintaining a typically low utilization rate of less than 12% on ImageNet. In this work, we propose a novel image quantization model named VQGAN-LC (Large Codebook), which extends the codebook size to 100,000, achieving an utilization rate exceeding 99%.

artificial intelligence, machine learning, natural language, (7 more...)

Neural Information Processing Systems

May-26-2025, 17:14:34 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (0.66)
  - Machine Learning > Neural Networks (0.42)
  - Vision (0.40)