Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment
–Neural Information Processing Systems
Existing image-text modality alignment in Vision Language Models (VLMs) treats each text token equally in an autoregressive manner.
Neural Information Processing Systems
Oct-9-2025, 23:16:17 GMT
- Country:
- Asia > China
- Hubei Province > Wuhan (0.04)
- Europe > Switzerland
- Asia > China
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (0.67)
- Research Report
- Technology: