Towards Better Text-to-Image Generation Alignment via Attention Modulation