Goto

Collaborating Authors

 pupil diameter


SensHRPS: Sensing Comfortable Human-Robot Proxemics and Personal Space With Eye-Tracking

Kushina, Nadezhda, Watanabe, Ko, Kannan, Aarthi, Ashok, Ashita, Dengel, Andreas, Berns, Karsten

arXiv.org Artificial Intelligence

Social robots must adjust to human proxemic norms to ensure user comfort and engagement. While prior research demonstrates that eye-tracking features reliably estimate comfort in human-human interactions, their applicability to interactions with humanoid robots remains unexplored. In this study, we investigate user comfort with the robot "Ameca" across four experimentally controlled distances (0.5 m to 2.0 m) using mobile eye-tracking and subjective reporting (N=19). We evaluate multiple machine learning and deep learning models to estimate comfort based on gaze features. Contrary to previous human-human studies where Transformer models excelled, a Decision Tree classifier achieved the highest performance (F1-score = 0.73), with minimum pupil diameter identified as the most critical predictor. These findings suggest that physiological comfort thresholds in human-robot interaction differ from human-human dynamics and can be effectively modeled using interpretable logic.


Eyes on Target: Gaze-Aware Object Detection in Egocentric Video

Lall, Vishakha, Liu, Yisi

arXiv.org Artificial Intelligence

Human gaze offers rich supervisory signals for understanding visual attention in complex visual environments. In this paper, we propose Eyes on Target, a novel depth-aware and gaze-guided object detection framework designed for egocentric videos. Our approach injects gaze-derived features into the attention mechanism of a Vision Transformer (ViT), effectively biasing spatial feature selection toward human-attended regions. Unlike traditional object detectors that treat all regions equally, our method emphasises viewer-prioritised areas to enhance object detection. We validate our method on an egocentric simulator dataset where human visual attention is critical for task assessment, illustrating its potential in evaluating human performance in simulation scenarios. We evaluate the effectiveness of our gaze-integrated model through extensive experiments and ablation studies, demonstrating consistent gains in detection accuracy over gaze-agnostic baselines on both the custom simulator dataset and public benchmarks, including Ego4D Ego-Motion and Ego-CH-Gaze datasets. To interpret model behaviour, we also introduce a gaze-aware attention head importance metric, revealing how gaze cues modulate transformer attention dynamics.


Window-Based Feature Engineering for Cognitive Workload Detection

Hallam, Andrew, Gayathri, R G, Lee, Glory, Sajjanhar, Atul

arXiv.org Artificial Intelligence

Cognitive workload is a topic of increasing interest across various fields such as health, psychology, and defense applications. In this research, we focus on classifying cognitive workload using the COLET dataset, employing a window-based approach for feature generation and machine/deep learning techniques for classification. We apply window-based temporal partitioning to enhance features used in existing research, followed by machine learning and deep learning models to classify different levels of cognitive workload. The results demonstrate that deep learning models, particularly tabular architectures, outperformed traditional machine learning methods in precision, F1-score, accuracy, and classification precision. This study highlights the effectiveness of window-based temporal feature extraction and the potential of deep learning techniques for real-time cognitive workload assessment in complex and dynamic tasks.


Learning Human Perception Dynamics for Informative Robot Communication

Chen, Shenghui, Zhao, Ruihan, Chinchali, Sandeep, Topcu, Ufuk

arXiv.org Artificial Intelligence

Human-robot cooperative navigation is challenging in environments with incomplete information. We introduce CoNav-Maze, a simulated robotics environment where a robot navigates using local perception while a human operator provides guidance based on an inaccurate map. The robot can share its camera views to improve the operator's understanding of the environment. To enable efficient human-robot cooperation, we propose Information Gain Monte Carlo Tree Search (IG-MCTS), an online planning algorithm that balances autonomous movement and informative communication. Central to IG-MCTS is a neural human perception dynamics model that estimates how humans distill information from robot communications. We collect a dataset through a crowdsourced mapping task in CoNav-Maze and train this model using a fully convolutional architecture with data augmentation. User studies show that IG-MCTS outperforms teleoperation and instruction-following baselines, achieving comparable task performance with significantly less communication and lower human cognitive load, as evidenced by eye-tracking metrics.


Real-Time Adaptive Industrial Robots: Improving Safety And Comfort In Human-Robot Collaboration

Hostettler, Damian, Mayer, Simon, Albert, Jan Liam, Jenss, Kay Erik, Hildebrand, Christian

arXiv.org Artificial Intelligence

Industrial robots become increasingly prevalent, resulting in a growing need for intuitive, comforting human-robot collaboration. We present a user-aware robotic system that adapts to operator behavior in real time while non-intrusively monitoring physiological signals to create a more responsive and empathetic environment. Our prototype dynamically adjusts robot speed and movement patterns while measuring operator pupil dilation and proximity. Our user study compares this adaptive system to a non-adaptive counterpart, and demonstrates that the adaptive system significantly reduces both perceived and physiologically measured cognitive load while enhancing usability. Participants reported increased feelings of comfort, safety, trust, and a stronger sense of collaboration when working with the adaptive robot. This highlights the potential of integrating real-time physiological data into human-robot interaction paradigms. This novel approach creates more intuitive and collaborative industrial environments where robots effectively 'read' and respond to human cognitive states, and we feature all data and code for future use.


EyeDentify: A Dataset for Pupil Diameter Estimation based on Webcam Images

Shah, Vijul, Watanabe, Ko, Moser, Brian B., Dengel, Andreas

arXiv.org Artificial Intelligence

In this work, we introduce EyeDentify, a dataset specifically designed for pupil diameter estimation based on webcam images. EyeDentify addresses the lack of available datasets for pupil diameter estimation, a crucial domain for understanding physiological and psychological states traditionally dominated by highly specialized sensor systems such as Tobii. Unlike these advanced sensor systems and associated costs, webcam images are more commonly found in practice. Yet, deep learning models that can estimate pupil diameters using standard webcam data are scarce. By providing a dataset of cropped eye images alongside corresponding pupil diameter information, EyeDentify enables the development and refinement of models designed specifically for less-equipped environments, democratizing pupil diameter estimation by making it more accessible and broadly applicable, which in turn contributes to multiple domains of understanding human activity and supporting healthcare. Our dataset is available at https://vijulshah.github.io/eyedentify/.


Thelxino\"e: Recognizing Human Emotions Using Pupillometry and Machine Learning

Barker, Darlene, Levkowitz, Haim

arXiv.org Artificial Intelligence

In this study, we present a method for emotion recognition in Virtual Reality (VR) using pupillometry. We analyze pupil diameter responses to both visual and auditory stimuli via a VR headset and focus on extracting key features in the time-domain, frequency-domain, and time-frequency domain from VRgenerated data. Our approach utilizes feature selection to identify the most impactful features using Maximum Relevance Minimum Redundancy (mRMR). By applying a Gradient Boosting model, an ensemble learning technique using stacked decision trees, we achieve an accuracy of 98.8% with feature engineering, compared to 84.9% without it. This research contributes significantly to the Thelxinoë framework, aiming to enhance VR experiences by integrating multiple sensor data for realistic and emotionally resonant touch interactions. NTRODUCTION In a poetic sense, the eyes have long been regarded as the "window into the soul" offering a glimpse into the depths of human emotions and experiences [1]. In the realm of modern technology, this poetic vision transforms into a scientific reality, particularly in VR. The "pupils" serve as gateways not just "to the brain" but to the autonomic nervous system which subtly dilates and contracts in response to our emotions [1].


A Simulation Study of Passing Drivers' Responses to the Autonomous Truck-Mounted Attenuator System in Road Maintenance

Li, Yu, Wang, Bill, Li, William, Qin, Ruwen

arXiv.org Artificial Intelligence

The Autonomous Truck-Mounted Attenuator (ATMA) system is a lead-follower vehicle system based on autonomous driving and connected vehicle technologies. The lead truck performs maintenance tasks on the road, and the unmanned follower truck alerts passing vehicles about the moving work zone and protects workers and the equipment. While the ATMA has been under testing by transportation maintenance and operations agencies recently, a simulator-based testing capability is a supplement, especially if human subjects are involved. This paper aims to discover how passing drivers perceive, understand, and react to the ATMA system in road maintenance. With the driving simulator developed for this ATMA study, the paper performed a simulation study wherein a screen-based eye tracker collected sixteen subjects' gaze points and pupil diameters. Data analysis evidenced the change in subjects' visual attention patterns while passing the ATMA. On average, the ATMA starts to attract subjects' attention from 500 ft behind the follower truck. Most (87.50%) understood the follower truck's protection purpose, and many (60%) reasoned the association between the two trucks. Nevertheless, nearly half of the participants (43.75%) did not recognize that ATMA is a connected autonomous vehicle system. While all subjects safely changed lanes and attempted to pass the slow-moving ATMA, their inadequate understanding of the ATMA is a potential risk, like cutting into the ATAM. Results implied that transportation maintenance and operations agencies should consider this in establishing the deployment guidance.


The eyes and hearts of UAV pilots: observations of physiological responses in real-life scenarios

Duval, Alexandre, Paas, Anita, Abdalwhab, Abdalwhab, St-Onge, David

arXiv.org Artificial Intelligence

The drone industry is diversifying and the number of pilots increases rapidly. In this context, flight schools need adapted tools to train pilots, most importantly with regard to their own awareness of their physiological and cognitive limits. In civil and military aviation, pilots can train themselves on realistic simulators to tune their reaction and reflexes, but also to gather data on their piloting behavior and physiological states. It helps them to improve their performances. Opposed to cockpit scenarios, drone teleoperation is conducted outdoor in the field, thus with only limited potential from desktop simulation training. This work aims to provide a solution to gather pilots behavior out in the field and help them increase their performance. We combined advance object detection from a frontal camera to gaze and heart-rate variability measurements. We observed pilots and analyze their behavior over three flight challenges. We believe this tool can support pilots both in their training and in their regular flight tasks. A demonstration video is available on https://www.youtube.com/watch?v=eePhjd2qNiI


Identification of Cognitive Workload during Surgical Tasks with Multimodal Deep Learning

Jin, Kaizhe, Rubio-Solis, Adrian, Naik, Ravi, Onyeogulu, Tochukwu, Islam, Amirul, Khan, Salman, Teeti, Izzeddin, Kinross, James, Leff, Daniel R, Cuzzolin, Fabio, Mylonas, George

arXiv.org Artificial Intelligence

The operating room (OR) is a dynamic and complex environment consisting of a multidisciplinary team working together in a high take environment to provide safe and efficient patient care. Additionally, surgeons are frequently exposed to multiple psycho-organisational stressors that may cause negative repercussions on their immediate technical performance and long-term health. Many factors can therefore contribute to increasing the Cognitive Workload (CWL) such as temporal pressures, unfamiliar anatomy or distractions in the OR. In this paper, a cascade of two machine learning approaches is suggested for the multimodal recognition of CWL in four different surgical task conditions. Firstly, a model based on the concept of transfer learning is used to identify if a surgeon is experiencing any CWL. Secondly, a Convolutional Neural Network (CNN) uses this information to identify different degrees of CWL associated to each surgical task. The suggested multimodal approach considers adjacent signals from electroencephalogram (EEG), functional near-infrared spectroscopy (fNIRS) and eye pupil diameter. The concatenation of signals allows complex correlations in terms of time (temporal) and channel location (spatial). Data collection was performed by a Multi-sensing AI Environment for Surgical Task & Role Optimisation platform (MAESTRO) developed at the Hamlyn Centre, Imperial College London. To compare the performance of the proposed methodology, a number of state-of-art machine learning techniques have been implemented. The tests show that the proposed model has a precision of 93%.