"I Can See Forever!": Evaluating Real-time VideoLLMs for Assisting Individuals with Visual Impairments
Zhang, Ziyi, Sun, Zhen, Zhang, Zongmin, Peng, Zifan, Zhao, Yuemeng, Wang, Zichun, Luo, Zeren, Zuo, Ruiting, He, Xinlei
–arXiv.org Artificial Intelligence
The visually impaired population faces significant challenges in daily activities. While prior works employ vision language models for assistance, most focus on static content and cannot address real-time perception needs in complex environments. Recent VideoLLMs enable real-time vision and speech interaction, offering promising potential for assistive tasks. In this work, we conduct the first study evaluating their effectiveness in supporting daily life for visually impaired individuals. We first conducted a user survey with visually impaired participants to design the benchmark VisAssistDaily for daily life evaluation. Using VisAssistDaily, we evaluate popular VideoLLMs and find GPT-4o achieves the highest task success rate. We further conduct a user study to reveal concerns about hazard perception. To address this, we propose SafeVid, an environment-awareness dataset, and fine-tune VITA-1.5, improving risk recognition accuracy from 25.00% to 76.00%.We hope this work provides valuable insights and inspiration for future research in this field.
arXiv.org Artificial Intelligence
Dec-5-2025
- Country:
- Asia
- China
- Guangdong Province > Guangzhou (0.04)
- Hong Kong (0.04)
- Japan > Honshū
- Kansai > Osaka Prefecture > Osaka (0.04)
- Singapore (0.04)
- China
- Europe
- North America
- Canada > Newfoundland and Labrador
- Newfoundland > St. John's (0.04)
- Puerto Rico > Peñuelas
- Peñuelas (0.04)
- United States
- Hawaii > Honolulu County
- Honolulu (0.04)
- Washington > King County
- Seattle (0.04)
- Hawaii > Honolulu County
- Canada > Newfoundland and Labrador
- Asia
- Genre:
- Questionnaire & Opinion Survey (1.00)
- Research Report (1.00)
- Industry:
- Health & Medicine > Therapeutic Area
- Ophthalmology/Optometry (0.69)
- Information Technology > Security & Privacy (1.00)
- Transportation
- Ground > Road (0.46)
- Infrastructure & Services (0.93)
- Health & Medicine > Therapeutic Area
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning > Neural Networks
- Deep Learning (0.89)
- Natural Language > Large Language Model (1.00)
- Representation & Reasoning (1.00)
- Robots (1.00)
- Vision (1.00)
- Machine Learning > Neural Networks
- Information Technology > Artificial Intelligence