Goto

Collaborating Authors

 blind user


EgoBlind: Towards Egocentric Visual Assistance for the Blind

Neural Information Processing Systems

We present EgoBlind, the first egocentric VideoQA dataset collected from blind individuals to evaluate the assistive capabilities of contemporary multimodal large language models (MLLMs). EgoBlind comprises 1,392 first-person videos from the daily lives of blind and visually impaired individuals. It also features 5,311 questions directly posed or verified by the blind to reflect their in-situation needs for visual assistance. Each question has an average of 3 manually annotated reference answers to reduce subjectiveness. Using EgoBlind, we comprehensively evaluate 16 advanced MLLMs and find that all models struggle. The best performers achieve an accuracy near 60%, which is far behind human performance of 87.4%. To guide future advancements, we identify and summarize major limitations of existing MLLMs in egocentric visual assistance for the blind and explore heuristic solutions for improvement. With these efforts, we hope that EgoBlind will serve as a foundation for developing effective AI assistants to enhance the independence of the blind and visually impaired. Data and code are available at https://github.


EgoBlind: Towards Egocentric Visual Assistance for the Blind People

arXiv.org Artificial Intelligence

We present EgoBlind, the first egocentric VideoQA dataset collected from blind individuals to evaluate the assistive capabilities of contemporary multimodal large language models (MLLMs). EgoBlind comprises 1,210 videos that record the daily lives of real blind users from a first-person perspective. It also features 4,927 questions directly posed or generated and verified by blind individuals to reflect their needs for visual assistance under various scenarios. We provide each question with an average of 3 reference answers to alleviate subjective evaluation. Using EgoBlind, we comprehensively evaluate 15 leading MLLMs and find that all models struggle, with the best performers achieving accuracy around 56\%, far behind human performance of 87.4\%. To guide future advancements, we identify and summarize major limitations of existing MLLMs in egocentric visual assistance for the blind and provide heuristic suggestions for improvement. With these efforts, we hope EgoBlind can serve as a valuable foundation for developing more effective AI assistants to enhance the independence of the blind individuals' lives.


Understanding How Blind Users Handle Object Recognition Errors: Strategies and Challenges

arXiv.org Artificial Intelligence

Object recognition technologies hold the potential to support blind and low-vision people in navigating the world around them. However, the gap between benchmark performances and practical usability remains a significant challenge. This paper presents a study aimed at understanding blind users' interaction with object recognition systems for identifying and avoiding errors. Leveraging a pre-existing object recognition system, URCam, fine-tuned for our experiment, we conducted a user study involving 12 blind and low-vision participants. Through in-depth interviews and hands-on error identification tasks, we gained insights into users' experiences, challenges, and strategies for identifying errors in camera-based assistive technologies and object recognition systems. During interviews, many participants preferred independent error review, while expressing apprehension toward misrecognitions. In the error identification task, participants varied viewpoints, backgrounds, and object sizes in their images to avoid and overcome errors. Even after repeating the task, participants identified only half of the errors, and the proportion of errors identified did not significantly differ from their first attempts. Based on these insights, we offer implications for designing accessible interfaces tailored to the needs of blind and low-vision users in identifying object recognition errors.


Toucha11y: Making Inaccessible Public Touchscreens Accessible

arXiv.org Artificial Intelligence

Despite their growing popularity, many public kiosks with touchscreens are inaccessible to blind people. Toucha11y is a working prototype that allows blind users to use existing inaccessible touchscreen kiosks independently and with little effort. Toucha11y consists of a mechanical bot that can be instrumented to an arbitrary touchscreen kiosk by a blind user and a companion app on their smartphone. The bot, once attached to a touchscreen, will recognize its content, retrieve the corresponding information from a database, and render it on the user's smartphone. As a result, a blind person can use the smartphone's built-in accessibility features to access content and make selections. The mechanical bot will detect and activate the corresponding touchscreen interface. We present the system design of Toucha11y along with a series of technical evaluations. Through a user study, we found out that Toucha11y could help blind users operate inaccessible touchscreen devices.


AI and Accessibility

Communications of the ACM

According to the World Health Organization, more than one billion people worldwide have disabilities. The field of disability studies defines disability through a social lens; people are disabled to the extent that society creates accessibility barriers. AI technologies offer the possibility of removing many accessibility barriers; for example, computer vision might help people who are blind better sense the visual world, speech recognition and translation technologies might offer real-time captioning for people who are hard of hearing, and new robotic systems might augment the capabilities of people with limited mobility. Considering the needs of users with disabilities can help technologists identify high-impact challenges whose solutions can advance the state of AI for all users; however, ethical challenges such as inclusivity, bias, privacy, error, expectation setting, simulated data, and social acceptability must be considered. The inclusivity of AI systems refers to whether they are effective for diverse user populations.


Google's Lookout app says what it sees for blind users in the US

Engadget

Google's Lookout is now finally available for download, though it's only compatible with Pixel devices in the US set to English at the moment. The application was first announced at Google's annual I/O Conference in 2018 and was designed to help the blind and visually impaired navigate their surroundings. It comes with three modes: Explore, Shopping and Quick Read. Explore, its default mode, gives users audio cues about their environment, telling them if there's a chair or a cute dog blocking the way, for instance. Shopping can read barcodes and currency, giving users a way to, say, make sure they're truly holding a $5 bill.


Blind users can now explore photos by touch with Microsoft's Seeing AI

#artificialintelligence

Microsoft's Seeing AI is an app that lets blind and limited-vision folks convert visual data into audio feedback, and it just got a useful new feature. Users can now use touch to explore the objects and people in photos. It's powered by machine learning, of course, specifically object and scene recognition. All you need to do is take a photo or open one up in the viewer and tap anywhere on it. "This new feature enables users to tap their finger to an image on a touch-screen to hear a description of objects within an image and the spatial relationship between them," wrote Seeing AI lead Saqib Shaikh in a blog post. "The app can even describe the physical appearance of people and predict their mood."


Personalized Dynamics Models for Adaptive Assistive Navigation Interfaces

arXiv.org Machine Learning

We explore the role of personalization for assistive navigational systems (e.g., service robot, wearable system or smartphone app) that guide visually impaired users through speech, sound and haptic-based instructional guidance. Based on our analysis of real-world users, we show that the dynamics of blind users cannot be accounted for by a single universal model but instead must be learned on an individual basis. To learn personalized instructional interfaces, we propose PING (Personalized INstruction Generation agent), a model-based reinforcement learning framework which aims to quickly adapt its state transition dynamics model to match the reactions of the user using a novel end-to-end learned weighted majority-based regression algorithm. In our experiments, we show that PING learns dynamics models significantly faster compared to baseline transfer learning approaches on real-world data. We find that through better reasoning over personal mobility nuances, interaction with surrounding obstacles, and the current navigation task, PING is able to improve the performance of instructional assistive navigation at the most crucial junctions such as turns or veering paths. To enable sufficient planning time over user responses, we emphasize prediction of human motion for long horizons. Specifically, the learned dynamics models are shown to consistently improve long-term position prediction by over 1 meter on average (nearly the width of a hallway) compared to baseline approaches even when considering a prediction horizon of 20 seconds into the future.


Microsoft's AI will describe images in Word and PowerPoint for blind users

#artificialintelligence

Artificial intelligence may be making small and steady advances in general-purpose situations like digital assistants. But it's the more subtle AI accessibility features that have a more substantial impact today, especially for users with disabilities. For instance, an upcoming feature for Office apps like Microsoft Word and PowerPoint will automatically suggest image and slide deck captions, called alt-text, using AI algorithms. That way, when those files are presented to blind users, computer tools designed to translate the information onscreen into audio have text descriptions to work with. Microsoft is accomplishing this feat with its Computer Vision Cognitive Service, which uses neural networks trained with deep learning techniques to better understand and describe the contents of images.


Is It Harmful When Advisors Only Pretend to Be Honest?

AAAI Conferences

In trust systems, unfair rating attacks — where advisors provide ratings dishonestly — influence the accuracy of trust evaluation. A secure trust system should function properly under all possible unfair rating attacks; including dynamic attacks. In the literature, camouflage attacks are the most studied dynamic attacks. But an open question is whether more harmful dynamic attacks exist. We propose random processes to model and measure dynamic attacks. The harm of an attack is influenced by a user's ability to learn from the past. We consider three types of users: blind users, aware users, and general users. We found for all the three types, camouflage attacks are far from the most harmful. We identified the most harmful attacks, under which we found the ratings may still be useful to users.