CogView: Mastering Text-to-Image Generation via Transformers

Dec-24-2025, 16:01:21 GMT–Neural Information Processing Systems

Text-to-Image generation in the general domain has long been an open problem, which requires both a powerful generative model and cross-modal understanding. We propose CogView, a 4-billion-parameter Transformer with VQ-VAE tokenizer to advance this problem. We also demonstrate the finetuning strategies for various downstream tasks, e.g.

cogview, mastering text-to-image generation, transformer, (4 more...)

Neural Information Processing Systems

Dec-24-2025, 16:01:21 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Vision (0.71)
  - Machine Learning > Neural Networks (0.46)