Multimodal Large Language Models Make Text-to-Image Generative Models Align Better