See What You Are Told: Visual Attention Sink in Large Multimodal Models

Open in new window