Text-to-Image Generation Grounded by Fine-Grained User Attention

Open in new window