Can LVLMs and Automatic Metrics Capture Underlying Preferences of Blind and Low-Vision Individuals for Navigational Aid?

An, Na Min, Kim, Eunki, Kang, Wan Ju, Kim, Sangryul, Shim, Hyunjung, Thorne, James

Feb-15-2025–arXiv.org Artificial Intelligence

Vision is a primary means of how humans perceive the environment, but Blind and Low-Vision (BLV) people need assistance understanding their surroundings, especially in unfamiliar environments. The emergence of semantic-based systems as assistance tools for BLV users has motivated many researchers to explore responses from Large Vision-Language Models (LVLMs). However, it has yet been studied preferences of BLV users on diverse types/styles of responses from LVLMs, specifically for navigational aid. To fill this gap, we first construct Eye4B dataset, consisting of human-validated 1.1k curated outdoor/indoor scenes with 5-10 relevant requests per scene. Then, we conduct an in-depth user study with eight BLV users to evaluate their preferences on six LVLMs from five perspectives: Afraidness, Nonactionability, Sufficiency, and Conciseness. Finally, we introduce Eye4B benchmark for evaluating alignment between widely used model-based image-text metrics and our collected BLV preferences. Our work can be set as a guideline for developing BLV-aware LVLMs towards a Barrier-Free AI system.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

Feb-15-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.14)

Genre:
- Questionnaire & Opinion Survey (0.54)
- Research Report (0.81)

Industry:
- Health & Medicine (0.68)
- Transportation > Ground (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.94)
  - Natural Language
    - Chatbot (0.69)
    - Large Language Model (1.00)
  - Representation & Reasoning (0.93)
  - Vision (1.00)