6294a235c0b80f0a2b224375c546c750-Paper-Conference.pdf

Jun-17-2026, 19:47:49 GMT–Neural Information Processing Systems

Text-to-Image (T2I) diffusion models [11, 41, 38, 43, 8, 7, 25], trained on large-scale datasets, have achieved remarkable success in generating high-quality, semantically aligned images from natural language prompts. While language-based control offers intuitive and flexible guidance, it often lacks the precision needed for fine-grained visual control, such as specific object positions, shapes, or scene layouts. To overcome this, recent works [19, 35, 28, 58, 27, 39, 59, 53] incorporate explicit spatial signals--like edge maps, depth maps, and segmentation masks to control diffusion models. To enable spatial control while preserving the generative quality of pre-trained diffusion models, existing methods typically employ control adapters [58, 35, 28] that inject spatial signals into a frozen T2I model. However, these adapters are usually trained independently for each spatial control task, requiring substantial computational resources and extensive labeled data for a new task. Alternatively, reusing pre-trained multi-task adapters - either directly [39, 53] or with minimal updates [59]- struggle to generalize to tasks that differ from their training distribution, and often show poor adaptability.

diffusion model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Jun-17-2026, 19:47:49 GMT

Conferences PDF

Add feedback

Country:
- Europe (1.00)

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (0.93)

Industry:
- Leisure & Entertainment > Sports (0.67)
- Transportation > Ground (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Machine Learning > Neural Networks (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found