Towards Blind and Low-Vision Accessibility of Lightweight VLMs and Custom LLM-Evals

Baghel, Shruti Singh, Rathore, Yash Pratap Singh, Jena, Sushovan, Pradhan, Anurag, Shukla, Amit, Bhavsar, Arnav, Goyal, Pawan

Nov-14-2025–arXiv.org Artificial Intelligence

Large Vision-Language Models (VLMs) excel at understanding and generating video descriptions but their high memory, computation, and deployment demands hinder practical use particularly for blind and low-vision (BLV) users who depend on detailed, context-aware descriptions. To study the effect of model size on accessibility-focused description quality, we evaluate SmolVLM2 variants with 500M and 2.2B parameters across two diverse datasets: AVCaps (outdoor), and Charades (indoor). In this work, we introduce two novel evaluation frameworks specifically designed for BLV accessibility assessment: the Multi-Context BLV Framework evaluating spatial orientation, social interaction, action events, and ambience contexts; and the Navigational Assistance Framework focusing on mobility-critical information. Additionally, we conduct a systematic evaluation of four different prompt design strategies and deploy both models on a smartphone, evaluating FP32 and INT8 precision variants to assess real-world performance constraints on resource-limited mobile devices.

large language model, natural language, video description, (14 more...)

arXiv.org Artificial Intelligence

Nov-14-2025

arXiv.org PDF

Add feedback

Country:
- Africa > Mali (0.04)
- Asia > India
  - West Bengal > Kharagpur (0.04)
- North America > Canada (0.04)

Genre:
- Research Report (1.00)

Technology:
- Information Technology
  - Artificial Intelligence > Natural Language
    - Large Language Model (1.00)
  - Communications > Mobile (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found