Customizable Image Synthesis with Multiple Subjects
–Neural Information Processing Systems
Synthesizing images with user-specified subjects has received growing attention due to its practical applications. Despite the recent success in single subject customization, existing algorithms suffer from high training cost and low success rate along with increased number of subjects. Towards controllable image synthesis with multiple subjects as the constraints, this work studies how to efficiently represent a particular subject as well as how to appropriately compose different subjects. We find that the text embedding regarding the subject token already serves as a simple yet effective representation that supports arbitrary combinations without any model tuning. Through learning a residual on top of the base embedding, we manage to robustly shift the raw subject to the customized subject given various text conditions. We then propose to employ layout, a very abstract and easy-to-obtain prior, as the spatial guidance for subject arrangement.
Neural Information Processing Systems
Dec-26-2025, 14:13:47 GMT
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning (0.39)
- Vision (0.62)
- Information Technology > Artificial Intelligence