Image Understanding Makes for A Good Tokenizer for Image Generation

May-29-2025, 02:58:07 GMT–Neural Information Processing Systems

Modern image generation (IG) models have been shown to capture rich semantics valuable for image understanding (IU) tasks. However, the potential of IU models to improve IG performance remains uncharted. We address this issue using a token-based IG framework, which relies on effective tokenizers to map images into token sequences. Currently, pixel reconstruction (e.g., VQGAN) dominates the training objective for tokenizers. In contrast, our approach adopts the feature reconstruction objective, where tokenizers are trained by distilling knowledge from pretrained IU encoders. Comprehensive comparisons indicate that tokenizers with strong IU capabilities achieve superior IG performance across a variety of metrics, datasets, tasks, and proposal networks.

artificial intelligence, machine learning, tokenizer, (15 more...)

Neural Information Processing Systems

May-29-2025, 02:58:07 GMT

Conferences PDF

Add feedback

Country:
- Asia > China (0.14)

Genre:
- Research Report > Experimental Study (0.93)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning > Neural Networks (1.00)
    - Vision (1.00)
  - Sensing and Signal Processing > Image Processing (1.00)