Dense Text-to-Image Generation with Attention Modulation