Tian, Leimin
Collaborative Object Handover in a Robot Crafting Assistant
Tian, Leimin, Xu, Shiyu, He, Kerry, Love, Rachel, Cosgun, Akansel, Kulic, Dana
Robots are increasingly working alongside people, delivering food to patrons in restaurants or helping workers on assembly lines. These scenarios often involve object handovers between the person and the robot. To achieve safe and efficient human-robot collaboration (HRC), it is important to incorporate human context in a robot's handover strategies. Therefore, in this work, we develop a collaborative handover model trained on human teleoperation data collected in a naturalistic crafting task. To evaluate the performance of this model, we conduct cross-validation experiments on the training dataset as well as a user study in the same HRC crafting task. The handover episodes and user perceptions of the autonomous handover policy were compared with those of the human teleoperated handovers. While the cross-validation experiment and user study indicate that the autonomous policy successfully achieved collaborative handovers, the comparison with human teleoperation revealed avenues for further improvements.
A Framework for Dynamic Situational Awareness in Human Robot Teams: An Interview Study
Senaratne, Hashini, Tian, Leimin, Sikka, Pavan, Williams, Jason, Howard, David, Kulić, Dana, Paris, Cécile
In human-robot teams, human situational awareness is the operator's conscious knowledge of the team's states, actions, plans and their environment. Appropriate human situational awareness is critical to successful human-robot collaboration. In human-robot teaming, it is often assumed that the best and required level of situational awareness is knowing everything at all times. This view is problematic, because what a human needs to know for optimal team performance varies given the dynamic environmental conditions, task context and roles and capabilities of team members. We explore this topic by interviewing 16 participants with active and repeated experience in diverse human-robot teaming applications. Based on analysis of these interviews, we derive a framework explaining the dynamic nature of required situational awareness in human-robot teaming. In addition, we identify a range of factors affecting the dynamic nature of required and actual levels of situational awareness (i.e., dynamic situational awareness), types of situational awareness inefficiencies resulting from gaps between actual and required situational awareness, and their main consequences. We also reveal various strategies, initiated by humans and robots, that assist in maintaining the required situational awareness. Our findings inform the implementation of accurate estimates of dynamic situational awareness and the design of user-adaptive human-robot interfaces. Therefore, this work contributes to the future design of more collaborative and effective human-robot teams.
'What did the Robot do in my Absence?' Video Foundation Models to Enhance Intermittent Supervision
Katuwandeniya, Kavindie, Tian, Leimin, Kulić, Dana
This paper investigates the application of Video Foundation Models (ViFMs) for generating robot data summaries to enhance intermittent human supervision of robot teams. We propose a novel framework that produces both generic and query-driven summaries of long-duration robot vision data in three modalities: storyboards, short videos, and text. Through a user study involving 30 participants, we evaluate the efficacy of these summary methods in allowing operators to accurately retrieve the observations and actions that occurred while the robot was operating without supervision over an extended duration (40 min). Our findings reveal that query-driven summaries significantly improve retrieval accuracy compared to generic summaries or raw data, albeit with increased task duration. Storyboards are found to be the most effective presentation modality, especially for object-related queries. This work represents, to our knowledge, the first zero-shot application of ViFMs for generating multi-modal robot-to-human communication in intermittent supervision contexts, demonstrating both the promise and limitations of these models in human-robot interaction (HRI) scenarios.
CauSkelNet: Causal Representation Learning for Human Behaviour Analysis
Gu, Xingrui, Jiang, Chuyi, Wang, Erte, Wu, Zekun, Cui, Qiang, Tian, Leimin, Wu, Lianlong, Song, Siyang, Yu, Chuang
Constrained by the lack of model interpretability and a deep understanding of human movement in traditional movement recognition machine learning methods, this study introduces a novel representation learning method based on causal inference to better understand human joint dynamics and complex behaviors. We propose a two-stage framework that combines the Peter-Clark (PC) algorithm and Kullback-Leibler (KL) divergence to identify and quantify causal relationships between joints. Our method effectively captures interactions and produces interpretable, robust representations. Experiments on the EmoPain dataset show that our causal GCN outperforms traditional GCNs in accuracy, F1 score, and recall, especially in detecting protective behaviors. The model is also highly invariant to data scale changes, enhancing its reliability in practical applications. Our approach advances human motion analysis and paves the way for more adaptive intelligent healthcare solutions.
Improving Visual Perception of a Social Robot for Controlled and In-the-wild Human-robot Interaction
Zhong, Wangjie, Tian, Leimin, Le, Duy Tho, Rezatofighi, Hamid
Social robots often rely on visual perception to understand their users and the environment. Recent advancements in data-driven approaches for computer vision have demonstrated great potentials for applying deep-learning models to enhance a social robot's visual perception. However, the high computational demands of deep-learning methods, as opposed to the more resource-efficient shallow-learning models, bring up important questions regarding their effects on real-world interaction and user experience. It is unclear how will the objective interaction performance and subjective user experience be influenced when a social robot adopts a deep-learning based visual perception model. We employed state-of-the-art human perception and tracking models to improve the visual perception function of the Pepper robot and conducted a controlled lab study and an in-the-wild human-robot interaction study to evaluate this novel perception function for following a specific user with other people present in the scene.
Crafting with a Robot Assistant: Use Social Cues to Inform Adaptive Handovers in Human-Robot Collaboration
Tian, Leimin, He, Kerry, Xu, Shiyu, Cosgun, Akansel, Kulić, Dana
We study human-robot handovers in a naturalistic collaboration scenario, where a mobile manipulator robot assists a person during a crafting session by providing and retrieving objects used for wooden piece assembly (functional activities) and painting (creative activities). We collect quantitative and qualitative data from 20 participants in a Wizard-of-Oz study, generating the Functional And Creative Tasks Human-Robot Collaboration dataset (the FACT HRC dataset), available to the research community. This work illustrates how social cues and task context Figure 1: Experimental space layout: the participant and inform the temporal-spatial coordination in human-robot the experimenter sat at the work space and storage space, handovers, and how human-robot collaboration is shaped by and respectively. The Fetch robot moves in between to pass in turn influences people's functional and creative activities.
I Know Your Feelings Before You Do: Predicting Future Affective Reactions in Human-Computer Dialogue
Li, Yuanchao, Inoue, Koji, Tian, Leimin, Fu, Changzeng, Ishi, Carlos, Ishiguro, Hiroshi, Kawahara, Tatsuya, Lai, Catherine
Current Spoken Dialogue Systems (SDSs) often serve as passive listeners that respond only after receiving user speech. To achieve human-like dialogue, we propose a novel future prediction architecture that allows an SDS to anticipate future affective reactions based on its current behaviors before the user speaks. In this work, we investigate two scenarios: speech and laughter. In speech, we propose to predict the user's future emotion based on its temporal relationship with the system's current emotion and its causal relationship with the system's current Dialogue Act (DA). In laughter, we propose to predict the occurrence and type of the user's laughter using the system's laughter behaviors in the current turn. Preliminary analysis of human-robot dialogue demonstrated synchronicity in the emotions and laughter displayed by the human and robot, as well as DA-emotion causality in their dialogue. This verifies that our architecture can contribute to the development of an anticipatory SDS.