Goto

Collaborating Authors

 Cakmak, Maya


Can Large Language Models Help Developers with Robotic Finite State Machine Modification?

arXiv.org Artificial Intelligence

Finite state machines (FSMs) are widely used to manage robot behavior logic, particularly in real-world applications that require a high degree of reliability and structure. However, traditional manual FSM design and modification processes can be time-consuming and error-prone. We propose that large language models (LLMs) can assist developers in editing FSM code for real-world robotic use cases. LLMs, with their ability to use context and process natural language, offer a solution for FSM modification with high correctness, allowing developers to update complex control logic through natural language instructions. Our approach leverages few-shot prompting and language-guided code generation to reduce the amount of time it takes to edit an FSM. To validate this approach, we evaluate it on a real-world robotics dataset, demonstrating its effectiveness in practical scenarios.


I Can Tell What I am Doing: Toward Real-World Natural Language Grounding of Robot Experiences

arXiv.org Artificial Intelligence

Understanding robot behaviors and experiences through natural language is crucial for developing intelligent and transparent robotic systems. Recent advancement in large language models (LLMs) makes it possible to translate complex, multi-modal robotic experiences into coherent, human-readable narratives. However, grounding real-world robot experiences into natural language is challenging due to many reasons, such as multi-modal nature of data, differing sample rates, and data volume. We introduce RONAR, an LLM-based system that generates natural language narrations from robot experiences, aiding in behavior announcement, failure analysis, and human interaction to recover failure. Evaluated across various scenarios, RONAR outperforms state-of-the-art methods and improves failure recovery efficiency. Our contributions include a multi-modal framework for robot experience narration, a comprehensive real-robot dataset, and empirical evidence of RONAR's effectiveness in enhancing user experience in system transparency and failure analysis.


Multiple Ways of Working with Users to Develop Physically Assistive Robots

arXiv.org Artificial Intelligence

Despite the growth of physically assistive robotics (PAR) research over the last decade, nearly half of PAR user studies do not involve participants with the target disabilities. There are several reasons for this -- recruitment challenges, small sample sizes, and transportation logistics -- all influenced by systemic barriers that people with disabilities face. However, it is well-established that working with end-users results in technology that better addresses their needs and integrates with their lived circumstances. In this paper, we reflect on multiple approaches we have taken to working with people with motor impairments across the design, development, and evaluation of three PAR projects: (a) assistive feeding with a robot arm; (b) assistive teleoperation with a mobile manipulator; and (c) shared control with a robot arm. We discuss these approaches to working with users along three dimensions -- individual vs. community-level insight, logistic burden on end-users vs. researchers, and benefit to researchers vs. community -- and share recommendations for how other PAR researchers can incorporate users into their work.


Fast Explicit-Input Assistance for Teleoperation in Clutter

arXiv.org Artificial Intelligence

The performance of prediction-based assistance for robot teleoperation degrades in unseen or goal-rich environments due to incorrect or quickly-changing intent inferences. Poor predictions can confuse operators or cause them to change their control input to implicitly signal their goal, resulting in unnatural movement. We present a new assistance algorithm and interface for robotic manipulation where an operator can explicitly communicate a manipulation goal by pointing the end-effector. Rapid optimization and parallel collision checking in a local region around the pointing target enable direct, interactive control over grasp and place pose candidates. We compare the explicit pointing interface to an implicit inference-based assistance scheme in a within-subjects user study (N=20) where participants teleoperate a simulated robot to complete a multi-step singulation and stacking task in cluttered environments. We find that operators prefer the explicit interface, which improved completion time, pick and place success rates, and NASA TLX scores. Our code is available at https://github.com/NVlabs/fast-explicit-teleop


Evaluating Customization of Remote Tele-operation Interfaces for Assistive Robots

arXiv.org Artificial Intelligence

Mobile manipulator platforms, like the Stretch RE1 robot, make the promise of in-home robotic assistance feasible. For people with severe physical limitations, like those with quadriplegia, the ability to tele-operate these robots themselves means that they can perform physical tasks they cannot otherwise do themselves, thereby increasing their level of independence. In order for users with physical limitations to operate these robots, their interfaces must be accessible and cater to the specific needs of all users. As physical limitations vary amongst users, it is difficult to make a single interface that will accommodate all users. Instead, such interfaces should be customizable to each individual user. In this paper we explore the value of customization of a browser-based interface for tele-operating the Stretch RE1 robot. More specifically, we evaluate the usability and effectiveness of a customized interface in comparison to the default interface configurations from prior work. We present a user study involving participants with motor impairments (N=10) and without motor impairments, who could serve as a caregiver, (N=13) that use the robot to perform mobile manipulation tasks in a real kitchen environment. Our study demonstrates that no single interface configuration satisfies all users' needs and preferences. Users perform better when using the customized interface for navigation, but not for manipulation due to higher complexity of learning to manipulate through the robot. All participants are able to use the robot to complete all tasks and participants with motor impairments believe that having the robot in their home would make them more independent.


Sketching Robot Programs On the Fly

arXiv.org Artificial Intelligence

Service robots for personal use in the home and the workplace require end-user development solutions for swiftly scripting robot tasks as the need arises. Many existing solutions preserve ease, efficiency, and convenience through simple programming interfaces or by restricting task complexity. Others facilitate meticulous task design but often do so at the expense of simplicity and efficiency. There is a need for robot programming solutions that reconcile the complexity of robotics with the on-the-fly goals of end-user development. In response to this need, we present a novel, multimodal, and on-the-fly development system, Tabula. Inspired by a formative design study with a prototype, Tabula leverages a combination of spoken language for specifying the core of a robot task and sketching for contextualizing the core. The result is that developers can script partial, sloppy versions of robot programs to be completed and refined by a program synthesizer. Lastly, we demonstrate our anticipated use cases of Tabula via a set of application scenarios.


Learning Perceptual Concepts by Bootstrapping from Human Queries

arXiv.org Artificial Intelligence

Robots need to be able to learn concepts from their users in order to adapt their capabilities to each user's unique task. But when the robot operates on high-dimensional inputs, like images or point clouds, this is impractical: the robot needs an unrealistic amount of human effort to learn the new concept. To address this challenge, we propose a new approach whereby the robot learns a low-dimensional variant of the concept and uses it to generate a larger data set for learning the concept in the high-dimensional space. This lets it take advantage of semantically meaningful privileged information only accessible at training time, like object poses and bounding boxes, that allows for richer human interaction to speed up learning. We evaluate our approach by learning prepositional concepts that describe object state or multi-object relationships, like above, near, or aligned, which are key to user specification of task goals and execution constraints for robots. Using a simulated human, we show that our approach improves sample complexity when compared to learning concepts directly in the high-dimensional space. We also demonstrate the utility of the learned concepts in motion planning tasks on a 7-DoF Franka Panda robot.


Vision-and-Dialog Navigation

arXiv.org Artificial Intelligence

Robots navigating in human environments should use language to ask for assistance and be able to understand human responses. To study this challenge, we introduce Cooperative Vision-and-Dialog Navigation, a dataset of over 2k embodied, human-human dialogs situated in simulated, photorealistic home environments. The Navigator asks questions to their partner, the Oracle, who has privileged access to the best next steps the Navigator should take according to a shortest path planner. To train agents that search an environment for a goal location, we define the Navigation from Dialog History task. An agent, given a target object and a dialog history between humans cooperating to find that object, must infer navigation actions towards the goal in unexplored environments. We establish an initial, multi-modal sequence-to-sequence model and demonstrate that looking farther back in the dialog history improves performance. Sourcecode and a live interface demo can be found at https://github.com/mmurray/cvdn


Power to the People: The Role of Humans in Interactive Machine Learning

AI Magazine

Intelligent systems that learn interactively from their end-users are quickly becoming widespread. Until recently, this progress has been fueled mostly by advances in machine learning; however, more and more researchers are realizing the importance of studying users of these systems. We present a number of case studies that characterize the impact of interactivity, demonstrate ways in which some existing systems fail to account for the user, and explore new ways for learning systems to interact with their users. We argue that the design process for interactive machine learning systems should involve users at all stages: explorations that reveal human interaction patterns and inspire novel interaction methods, as well as refinement stages to tune details of the interface and choose among alternatives.


Power to the People: The Role of Humans in Interactive Machine Learning

AI Magazine

Intelligent systems that learn interactively from their end-users are quickly becoming widespread. Until recently, this progress has been fueled mostly by advances in machine learning; however, more and more researchers are realizing the importance of studying users of these systems. In this article we promote this approach and demonstrate how it can result in better user experiences and more effective learning systems. We present a number of case studies that characterize the impact of interactivity, demonstrate ways in which some existing systems fail to account for the user, and explore new ways for learning systems to interact with their users. We argue that the design process for interactive machine learning systems should involve users at all stages: explorations that reveal human interaction patterns and inspire novel interaction methods, as well as refinement stages to tune details of the interface and choose among alternatives. After giving a glimpse of the progress that has been made so far, we discuss the challenges that we face in moving the field forward.