Can LVLMs and Automatic Metrics Capture Underlying Preferences of Blind and Low-Vision Individuals for Navigational Aid?
An, Na Min, Kim, Eunki, Kang, Wan Ju, Kim, Sangryul, Shim, Hyunjung, Thorne, James
–arXiv.org Artificial Intelligence
Vision is a primary means of how humans perceive the environment, but Blind and Low-Vision (BLV) people need assistance understanding their surroundings, especially in unfamiliar environments. The emergence of semantic-based systems as assistance tools for BLV users has motivated many researchers to explore responses from Large Vision-Language Models (LVLMs). However, it has yet been studied preferences of BLV users on diverse types/styles of responses from LVLMs, specifically for navigational aid. To fill this gap, we first construct Eye4B dataset, consisting of human-validated 1.1k curated outdoor/indoor scenes with 5-10 relevant requests per scene. Then, we conduct an in-depth user study with eight BLV users to evaluate their preferences on six LVLMs from five perspectives: Afraidness, Nonactionability, Sufficiency, and Conciseness. Finally, we introduce Eye4B benchmark for evaluating alignment between widely used model-based image-text metrics and our collected BLV preferences. Our work can be set as a guideline for developing BLV-aware LVLMs towards a Barrier-Free AI system.
arXiv.org Artificial Intelligence
Feb-15-2025
- Country:
- North America > United States (0.14)
- Genre:
- Questionnaire & Opinion Survey (0.54)
- Research Report (0.81)
- Industry:
- Health & Medicine (0.68)
- Transportation > Ground (0.46)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning > Neural Networks
- Deep Learning (0.94)
- Natural Language
- Chatbot (0.69)
- Large Language Model (1.00)
- Representation & Reasoning (0.93)
- Vision (1.00)
- Machine Learning > Neural Networks
- Information Technology > Artificial Intelligence