feature trace
Extended Reality System for Robotic Learning from Human Demonstration
Ngui, Isaac, McBeth, Courtney, He, Grace, Santos, André Corrêa, Soares, Luciano, Morales, Marco, Amato, Nancy M.
Figure 1: A human user interacting with a virtual UR5e robot to provide a trajectory demonstration as the robot carries a coffee mug over a table with a laptop on top. Extended reality provides a natural setting for demonstrating robotic trajectories while bypassing safety Many real-world tasks are intuitive for a human to perform, but difficult concerns and providing a broader range of interaction modalities. In these scenarios, robotic systems can benefit from expert (RADER) system, a generic extended reality interface for learning demonstrations, wherein human operators physically move the from demonstration. We additionally present its application to an robot along trajectories, to learn how to perform each task. In many existing state-of-the-art learning from demonstration approach and settings, it may be difficult or unsafe to use a physical robot to provide show comparable results between demonstrations given on a physical these demonstrations, for example, considering cooking tasks robot and those given using our extended reality system.
Inducing Structure in Reward Learning by Learning Features
Bobu, Andreea, Wiggert, Marius, Tomlin, Claire, Dragan, Anca D.
In doing so, however, these approaches sacrifice the sample efficiency and generalizability that a well-specified feature Whether it's semi-autonomous driving (Sadigh et al. 2016), set offers. While using an expressive function approximator recommender systems (Ziebart et al. 2008), or household to extract features and learn their reward combination at once robots working in close proximity with people (Jain et al. seems advantageous, many such functions can induce policies 2015), reward learning can greatly benefit autonomous agents that explain the demonstrations. Hence, to disambiguate to generate behaviors that adapt to new situations or human between all these candidate functions, the robot requires a preferences. Under this framework, the robot uses the person's very large amount of (laborious to collect) data, and this data input to learn a reward function that describes how they prefer needs to be diverse enough to identify the true reward. For the task to be performed. For instance, in the scenario in Fig. example, the human in the household robot setting in Figure 1 1, the human wants the robot to keep the cup away from the might want to demonstrate keeping the cup away from the laptop to prevent spilling liquid over it; she may communicate laptop, but from a single demonstration the robot could find this preference to the robot by providing a demonstration of many other explanations for the person's behavior: perhaps the task or even by directly intervening during the robot's task they always happened to keep the cup upright or they really execution to correct it.
Feature Expansive Reward Learning: Rethinking Human Input
Bobu, Andreea, Wiggert, Marius, Tomlin, Claire, Dragan, Anca D.
In collaborative human-robot scenarios, when a person is not satisfied with how a robot performs a task, they can intervene to correct it. Reward learning methods enable the robot to adapt its reward function online based on such human input. However, this online adaptation requires low sample complexity algorithms which rely on simple functions of handcrafted features. In practice, pre-specifying an exhaustive set of features the person might care about is impossible; what should the robot do when the human correction cannot be explained by the features it already has access to? Recent progress in deep Inverse Reinforcement Learning (IRL) suggests that the robot could fall back on demonstrations: ask the human for demonstrations of the task, and recover a reward defined over not just the known features, but also the raw state space. Our insight is that rather than implicitly learning about the missing feature(s) from task demonstrations, the robot should instead ask for data that explicitly teaches it about what it is missing. We introduce a new type of human input, in which the person guides the robot from areas of the state space where the feature she is teaching is highly expressed to states where it is not. We propose an algorithm for learning the feature from the raw state space and integrating it into the reward function. By focusing the human input on the missing feature, our method decreases sample complexity and improves generalization of the learned reward over the above deep IRL baseline. We show this in experiments with a 7DOF robot manipulator. Finally, we discuss our method's potential implications for deep reward learning more broadly: taking a divide-and-conquer approach that focuses on important features separately before learning from demonstrations can improve generalization in tasks where such features are easy for the human to teach.