Voila-A: Aligning Vision-Language Models with User's Gaze Attention, Lei Ji

Neural Information Processing Systems 

In recent years, the integration of vision and language understanding has led to significant advancements in artificial intelligence, particularly through Vision-Language Models (VLMs).