CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers

Neural Information Processing Systems 

Development of transformer-based text-to-image models is impeded by its slow generation and complexity, for high-resolution images. In this work, we put forward a solution based on hierarchical transformers and local parallel autoregressive generation.