Modular Deep Learning Framework for Assistive Perception: Gaze, Affect, and Speaker Identification
Anchan, Akshit Pramod, Thomas, Jewelith, Roy, Sritama
–arXiv.org Artificial Intelligence
Developing comprehensive assistive technologies requires the seamless integration of visual and auditory perception. This research evaluates the feasibility of a modular architecture inspired by core functionalities of perceptive systems like 'Smart Eye.' We propose and benchmark three independent sensing modules: a Convolutional Neural Network (CNN) for eye state detection (drowsiness/attention), a deep CNN for facial expression recognition, and a Long Short-Term Memory (LSTM) network for voice-based speaker identification. Utilizing the Eyes Image, FER2013, and customized audio datasets, our models achieved accuracies of 93.0%, 97.8%, and 96.89%, respectively. This study demonstrates that lightweight, domain-specific models can achieve high fidelity on discrete tasks, establishing a validated foundation for future real-time, multimodal integration in resource-constrained assistive devices.
arXiv.org Artificial Intelligence
Nov-26-2025
- Country:
- Asia > India
- Tamil Nadu > Chennai (0.05)
- Europe > Czechia
- Moravian-Silesian Region > Ostrava (0.04)
- North America > United States (0.05)
- Asia > India
- Genre:
- Research Report (0.64)
- Industry:
- Health & Medicine (1.00)
- Technology: