CogView: Mastering Text-to-Image Generation via Transformers
–Neural Information Processing Systems
Text-to-Image generation in the general domain has long been an open problem, which requires both a powerful generative model and cross-modal understanding. We propose CogView, a 4-billion-parameter Transformer with VQ-VAE tokenizer to advance this problem. We also demonstrate the finetuning strategies for various downstream tasks, e.g.
Neural Information Processing Systems
Dec-24-2025, 16:01:21 GMT
- Technology: