Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment
–Neural Information Processing Systems
Existing image-text modality alignment in Vision Language Models (VLMs) treats each text token equally in an autoregressive manner.
Neural Information Processing Systems
Feb-11-2026, 03:13:19 GMT
- Country:
- Asia > China
- Hubei Province > Wuhan (0.04)
- Europe > Switzerland
- Asia > China
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (0.67)
- Research Report
- Technology: