Image Understanding Makes for A Good Tokenizer for Image Generation

May-26-2025, 21:16:31 GMT–Neural Information Processing Systems

Modern image generation (IG) models have been shown to capture rich semantics valuable for image understanding (IU) tasks. However, the potential of IU models to improve IG performance remains uncharted. We address this issue using a token-based IG framework, which relies on effective tokenizers to project images into token sequences. Currently, **pixel reconstruction** (e.g., VQGAN) dominates the training objective for image tokenizers. In contrast, our approach adopts the **feature reconstruction** objective, where tokenizers are trained by distilling knowledge from pretrained IU encoders.

artificial intelligence, image understanding, machine learning, (6 more...)

Neural Information Processing Systems

May-26-2025, 21:16:31 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks (0.65)
  - Vision > Image Understanding (0.65)