Grounding Text-To-Image Diffusion Models For Controlled High-Quality Image Generation