Goto

Collaborating Authors

 Schubert, Ingmar


A Generalist Dynamics Model for Control

arXiv.org Artificial Intelligence

Figure 1 | Schematic overview of the data regimes for which we show experimental results. These regimes are characterized by how much data from the target environment is available to the agent, and how much (potentially generalizable) experience has been collected in other environments. The experiments both demonstrate that TDMs are capable single-environment models (marked purple) and generalize across environments (marked yellow). If sufficient data from the target environment is available, we can learn a single-environment specialist model (section 5.1). If there are only small amounts of data from the target environment, but more data from other environments, a generalist model can be pre-trained and then fine-tuned on the target environment (section 5.2.1). Finally, if we are able to train a generalist model on large amounts of data from different environments, we can zero-shot apply this model to our target environment without fine-tuning (section 5.2.2). We also show an example for unsuccessful generalization (no color) in section E.


Spatial Reasoning via Deep Vision Models for Robotic Sequential Manipulation

arXiv.org Artificial Intelligence

In this paper, we propose using deep neural architectures (i.e., vision transformers and ResNet) as heuristics for sequential decision-making in robotic manipulation problems. This formulation enables predicting the subset of objects that are relevant for completing a task. Such problems are often addressed by task and motion planning (TAMP) formulations combining symbolic reasoning and continuous motion planning. In essence, the action-object relationships are resolved for discrete, symbolic decisions that are used to solve manipulation motions (e.g., via nonlinear trajectory optimization). However, solving long-horizon tasks requires consideration of all possible action-object combinations which limits the scalability of TAMP approaches. To overcome this combinatorial complexity, we introduce a visual perception module integrated with a TAMP-solver. Given a task and an initial image of the scene, the learned model outputs the relevancy of objects to accomplish the task. By incorporating the predictions of the model into a TAMP formulation as a heuristic, the size of the search space is significantly reduced. Results show that our framework finds feasible solutions more efficiently when compared to a state-of-the-art TAMP solver.


Learning to Execute: Efficient Learning of Universal Plan-Conditioned Policies in Robotics

arXiv.org Artificial Intelligence

Applications of Reinforcement Learning (RL) in robotics are often limited by high data demand. On the other hand, approximate models are readily available in many robotics scenarios, making model-based approaches like planning a data-efficient alternative. Still, the performance of these methods suffers if the model is imprecise or wrong. In this sense, the respective strengths and weaknesses of RL and model-based planners are. In the present work, we investigate how both approaches can be integrated into one framework that combines their strengths. We introduce Learning to Execute (L2E), which leverages information contained in approximate plans to learn universal policies that are conditioned on plans. In our robotic manipulation experiments, L2E exhibits increased performance when compared to pure RL, pure planning, or baseline methods combining learning and planning.


Plan-Based Relaxed Reward Shaping for Goal-Directed Tasks

arXiv.org Artificial Intelligence

In high-dimensional state spaces, the usefulness of Reinforcement Learning (RL) is limited by the problem of exploration. This issue has been addressed using potential-based reward shaping (PB-RS) previously. In the present work, we introduce Final-Volume-Preserving Reward Shaping (FV-RS). FV-RS relaxes the strict optimality guarantees of PB-RS to a guarantee of preserved long-term behavior. Being less restrictive, FV-RS allows for reward shaping functions that are even better suited for improving the sample efficiency of RL algorithms. In particular, we consider settings in which the agent has access to an approximate plan. Here, we use examples of simulated robotic manipulation tasks to demonstrate that plan-based FV-RS can indeed significantly improve the sample efficiency of RL over plan-based PB-RS. Reinforcement Learning (RL) provides a general framework for autonomous agents to learn complex behavior, adapt to changing environments, and generalize to unseen tasks and environments with little human interference or engineering effort. However, RL in high-dimensional state spaces generally suffers from a difficult exploration problem, making learning prohibitively slow and sample-inefficient for many real-world tasks with sparse rewards. A possible strategy to increase the sample efficiency of RL algorithms is reward shaping (Mataric, 1994; Randløv & Alstrøm, 1998), in particular potential-based reward shaping (PB-RS) (Ng et al., 1999). Reward shaping provides a dense reward signal to the RL agent, enabling it to converge faster to the optimal policy.