Ctrl-X: Controlling Structure and Appearance for Text-To-Image Generation Without Guidance

Neural Information Processing Systems 

Recent controllable generation approaches such as FreeControl [24] and Diffusion Self-Guidance [7] bring fine-grained spatial and appearance control to text-toimage (T2I) diffusion models without training auxiliary modules.