Voila-A: Aligning Vision-Language Models with User's Gaze Attention

Open in new window