Magnet: We Never Know How Text-to-Image Diffusion Models Work, Until We Learn How Vision-Language Models Function Chenyi Zhuang
–Neural Information Processing Systems
While previous studies suggest that blended text embeddings lead to improper attribute binding, few have explored this in depth. In this work, we critically examine the limitations of the CLIP text encoder in understanding attributes and investigate how this affects diffusion models.
Neural Information Processing Systems
Oct-10-2025, 04:52:02 GMT
- Country:
- Asia > China
- Jiangsu Province > Nanjing (0.04)
- Europe > Switzerland
- Asia > China
- Genre:
- Research Report > Experimental Study (1.00)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning > Neural Networks (0.68)
- Natural Language > Text Processing (0.46)
- Vision (1.00)
- Information Technology > Artificial Intelligence