On the Scalability of Diffusion-based Text-to-Image Generation