GrounDiT: Grounding Diffusion Transformers via Noisy Patch Transplantation

Neural Information Processing Systems 

T. Each image is generated based on a text prompt along with bounding boxes, which are displayed in the upper right corner of each image.