An Image is Worth 32 Tokens for Reconstruction and Generation

May-27-2025, 20:18:54 GMT–Neural Information Processing Systems

Recent advancements in generative models have highlighted the crucial role of image tokenization in the efficient synthesis of high-resolution images. Tokenization, which transforms images into latent representations, reduces computational demands compared to directly processing pixels and enhances the effectiveness and efficiency of the generation process. Prior methods, such as VQGAN, typically utilize 2D latent grids with fixed downsampling factors. However, these 2D tokenizations face challenges in managing the inherent redundancies present in images, where adjacent regions frequently display similarities. To overcome this issue, we introduce Transformer-based 1-Dimensional Tokenizer (TiTok), an innovative approach that tokenizes images into 1D latent sequences.

reconstruction and generation, representation, worth 32, (8 more...)

Neural Information Processing Systems

May-27-2025, 20:18:54 GMT

Conferences Web Page

Add feedback

Genre:
- Research Report > Promising Solution (0.40)
- Overview > Innovation (0.40)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (0.60)
  - Machine Learning (0.57)