sicoremgmpuplellearx
–Neural Information Processing Systems
Each row presents an example with overlapping instances, and image captions are shown below. More examples and more detailed failure descriptions can be found in Appendix C. in controllable image generation [Li et al., 2023b, Zhang et al., 2023]. A recent line of work proposes generating images conditioned on layouts, commonly referred to as Layout-to-Image (L2I) generation, which allows users to directly specify spatial locations [Xie et al., 2023b, Wang et al., 2024b, Li et al., 2023b] and object counts [Binyamin et al., 2024, Yang et al., 2023] in the generated outputs. While existing frameworks [Xie et al., 2023b, Wang et al., 2024b, Li et al., 2023b] can achieve satisfactory spatial and numerical control over image generation, these approaches fail to generate distinct, coherent objects when multiple bounding boxes overlap in layout and their associated categories are semantically similar. As illustrated in Figure 2, such scenarios lead to artifacts including object blending, spatial ambiguity, and visual distortion.
Neural Information Processing Systems
Jun-16-2026, 03:32:23 GMT
- Genre:
- Research Report > Experimental Study (1.00)
- Industry:
- Information Technology (0.46)
- Technology:
- Information Technology > Artificial Intelligence
- Vision (1.00)
- Natural Language (1.00)
- Machine Learning > Neural Networks (1.00)
- Representation & Reasoning (0.93)
- Information Technology > Artificial Intelligence