Magnet: We Never Know How Text-to-Image Diffusion Models Work, Until We Learn How Vision-Language Models Function Chenyi Zhuang

Oct-10-2025, 04:52:02 GMT–Neural Information Processing Systems

While previous studies suggest that blended text embeddings lead to improper attribute binding, few have explored this in depth. In this work, we critically examine the limitations of the CLIP text encoder in understanding attributes and investigate how this affects diffusion models.

diffusion model, magnet, padding, (14 more...)

Neural Information Processing Systems

Oct-10-2025, 04:52:02 GMT

Conferences PDF

Add feedback

Country:
- Europe > Switzerland
  - Zürich > Zürich (0.14)
- Asia > China
  - Jiangsu Province > Nanjing (0.04)

Genre:
- Research Report > Experimental Study (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Machine Learning > Neural Networks (0.68)
  - Natural Language > Text Processing (0.46)

Duplicate Docs Excel Report

Title
688ffe062732aabd87dfe57bcb0bf3ae-Paper-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found