Not enough data to create a plot.
Try a different view from the menu above.
Mahadevan, Karthik
ImageInThat: Manipulating Images to Convey User Instructions to Robots
Mahadevan, Karthik, Lewis, Blaine, Li, Jiannan, Mutlu, Bilge, Tang, Anthony, Grossman, Tovi
--Foundation models are rapidly improving the capability of robots in performing everyday tasks autonomously such as meal preparation, yet robots will still need to be instructed by humans due to model performance, the difficulty of capturing user preferences, and the need for user agency. Robots can be instructed using various methods--natural language conveys immediate instructions but can be abstract or ambiguous, whereas end-user programming supports longer-horizon tasks but interfaces face difficulties in capturing user intent. In this work, we propose using direct manipulation of images as an alternative paradigm to instruct robots, and introduce a specific instantiation called ImageInThat which allows users to perform direct manipulation on images in a timeline-style interface to generate robot instructions. Through a user study, we demonstrate the efficacy of ImageInThat to instruct robots in kitchen manipulation tasks, comparing it to a text-based natural language instruction method. The results show that participants were faster with ImageInThat and preferred to use it over the text-based method. Supplementary material including code can be found at: https://image-in-that.github.io/. Advances in foundation models are rapidly improving the capabilities of autonomous robots, bringing us closer to robots entering our homes where they can complete everyday tasks. However, the need for human instructions will persist-- whether due to limitations in robot policies, models trained on internet-scale data that may not capture the specifics of users' environments or preferences, or simply the desire for users to maintain control over their robots' actions. For instance, a robot asked to wash dishes might follow a standard cleaning routine--e.g., by placing everything in the dishwasher and then putting them away in the cupboard--but may not respect a user's preferences-- e.g., needing to wash delicate glasses "by hand" or organizing cleaned dishes in a specific way--thus necessitating human intervention. We introduce a new paradigm for instructing robots through the direct manipulation of images. ImageInThat is a specific instantiation of this paradigm where users can manipulate images in a timeline-style interface to create instructions for the robot to execute. Existing methods for instructing robots range from those that focus on commanding the robot for the purpose of immediate execution ( e.g., uttering a language instruction to wash glasses by hand [1]) to methods that program the robot such as learning from demonstration [2] or end-user robot programming [3]. However, prior methods, whether they are used for commanding or programming, have notable drawbacks.
AeroHaptix: A Wearable Vibrotactile Feedback System for Enhancing Collision Avoidance in UAV Teleoperation
Huang, Bingjian, Wang, Zhecheng, Cheng, Qilong, Ren, Siyi, Cai, Hanfeng, Valdivia, Antonio Alvarez, Mahadevan, Karthik, Wigdor, Daniel
Haptic feedback enhances collision avoidance by providing directional obstacle information to operators in unmanned aerial vehicle (UAV) teleoperation. However, such feedback is often rendered via haptic joysticks, which are unfamiliar to UAV operators and limited to single-directional force feedback. Additionally, the direct coupling of the input device and the feedback method diminishes the operators' control authority and causes oscillatory movements. To overcome these limitations, we propose AeroHaptix, a wearable haptic feedback system that uses high-resolution vibrations to communicate multiple obstacle directions simultaneously. The vibrotactile actuators' layout was optimized based on a perceptual study to eliminate perceptual biases and achieve uniform spatial coverage. A novel rendering algorithm, MultiCBF, was adapted from control barrier functions to support multi-directional feedback. System evaluation showed that AeroHaptix effectively reduced collisions in complex environment, and operators reported significantly lower physical workload, improved situational awareness, and increased control authority.
Generative Expressive Robot Behaviors using Large Language Models
Mahadevan, Karthik, Chien, Jonathan, Brown, Noah, Xu, Zhuo, Parada, Carolina, Xia, Fei, Zeng, Andy, Takayama, Leila, Sadigh, Dorsa
People employ expressive behaviors to effectively communicate and coordinate their actions with others, such as nodding to acknowledge a person glancing at them or saying "excuse me" to pass people in a busy corridor. We would like robots to also demonstrate expressive behaviors in human-robot interaction. Prior work proposes rule-based methods that struggle to scale to new communication modalities or social situations, while data-driven methods require specialized datasets for each social situation the robot is used in. We propose to leverage the rich social context available from large language models (LLMs) and their ability to generate motion based on instructions or user preferences, to generate expressive robot motion that is adaptable and composable, building upon each other. Our approach utilizes few-shot chain-of-thought prompting to translate human language instructions into parametrized control code using the robot's available and learned skills. Through user studies and simulation experiments, we demonstrate that our approach produces behaviors that users found to be competent and easy to understand. Supplementary material can be found at https://generative-expressive-motion.github.io/.