Stulp, Freek
State- and context-dependent robotic manipulation and grasping via uncertainty-aware imitation learning
Winter, Tim R., Sundaram, Ashok M., Friedl, Werner, Roa, Maximo A., Stulp, Freek, Silvério, João
Generating context-adaptive manipulation and grasping actions is a challenging problem in robotics. Classical planning and control algorithms tend to be inflexible with regard to parameterization by external variables such as object shapes. In contrast, Learning from Demonstration (LfD) approaches, due to their nature as function approximators, allow for introducing external variables to modulate policies in response to the environment. In this paper, we utilize this property by introducing an LfD approach to acquire context-dependent grasping and manipulation strategies. We treat the problem as a kernel-based function approximation, where the kernel inputs include generic context variables describing task-dependent parameters such as the object shape. We build on existing work on policy fusion with uncertainty quantification to propose a state-dependent approach that automatically returns to demonstrations, avoiding unpredictable behavior while smoothly adapting to context changes. The approach is evaluated against the LASA handwriting dataset and on a real 7-DoF robot in two scenarios: adaptation to slippage while grasping and manipulating a deformable food item.
Unknown Object Grasping for Assistive Robotics
Miller, Elle, Durner, Maximilian, Humt, Matthias, Quere, Gabriel, Boerdijk, Wout, Sundaram, Ashok M., Stulp, Freek, Vogel, Jorn
We propose a novel pipeline for unknown object grasping in shared robotic autonomy scenarios. State-of-the-art methods for fully autonomous scenarios are typically learning-based approaches optimised for a specific end-effector, that generate grasp poses directly from sensor input. In the domain of assistive robotics, we seek instead to utilise the user's cognitive abilities for enhanced satisfaction, grasping performance, and alignment with their high level task-specific goals. Given a pair of stereo images, we perform unknown object instance segmentation and generate a 3D reconstruction of the object of interest. In shared control, the user then guides the robot end-effector across a virtual hemisphere centered around the object to their desired approach direction. A physics-based grasp planner finds the most stable local grasp on the reconstruction, and finally the user is guided by shared control to this grasp. In experiments on the DLR EDAN platform, we report a grasp success rate of 87% for 10 unknown objects, and demonstrate the method's capability to grasp objects in structured clutter and from shelves.
AI-enabled Cyber-Physical In-Orbit Factory -- AI approaches based on digital twin technology for robotic small satellite production
Leutert, Florian, Bohlig, David, Kempf, Florian, Schilling, Klaus, Mühlbauer, Maximilian, Ayan, Bengisu, Hulin, Thomas, Stulp, Freek, Albu-Schäffer, Alin, Kutscher, Vladimir, Plesker, Christian, Dasbach, Thomas, Damm, Stephan, Anderl, Reiner, Schleich, Benjamin
With the ever increasing number of active satellites in space, the rising demand for larger formations of small satellites and the commercialization of the space industry (so-called New Space), the realization of manufacturing processes in orbit comes closer to reality. Reducing launch costs and risks, allowing for faster on-demand deployment of individually configured satellites as well as the prospect for possible on-orbit servicing for satellites makes the idea of realizing an in-orbit factory promising. In this paper, we present a novel approach to an in-orbit factory of small satellites covering a digital process twin, AI-based fault detection, and teleoperated robot-control, which are being researched as part of the "AI-enabled Cyber-Physical In-Orbit Factory" project. In addition to the integration of modern automation and Industry 4.0 production approaches, the question of how artificial intelligence (AI) and learning approaches can be used to make the production process more robust, fault-tolerant and autonomous is addressed. This lays the foundation for a later realisation of satellite production in space in the form of an in-orbit factory. Central aspect is the development of a robotic AIT (Assembly, Integration and Testing) system where a small satellite could be assembled by a manipulator robot from modular subsystems. Approaches developed to improving this production process with AI include employing neural networks for optical and electrical fault detection of components. Force sensitive measuring and motion training helps to deal with uncertainties and tolerances during assembly. An AI-guided teleoperated control of the robot arm allows for human intervention while a Digital Process Twin represents process data and provides supervision during the whole production process. Approaches and results towards automated satellite production are presented in detail.
Open X-Embodiment: Robotic Learning Datasets and RT-X Models
Collaboration, Open X-Embodiment, Padalkar, Abhishek, Pooley, Acorn, Mandlekar, Ajay, Jain, Ajinkya, Tung, Albert, Bewley, Alex, Herzog, Alex, Irpan, Alex, Khazatsky, Alexander, Rai, Anant, Singh, Anikait, Garg, Animesh, Brohan, Anthony, Raffin, Antonin, Wahid, Ayzaan, Burgess-Limerick, Ben, Kim, Beomjoon, Schölkopf, Bernhard, Ichter, Brian, Lu, Cewu, Xu, Charles, Finn, Chelsea, Xu, Chenfeng, Chi, Cheng, Huang, Chenguang, Chan, Christine, Pan, Chuer, Fu, Chuyuan, Devin, Coline, Driess, Danny, Pathak, Deepak, Shah, Dhruv, Büchler, Dieter, Kalashnikov, Dmitry, Sadigh, Dorsa, Johns, Edward, Ceola, Federico, Xia, Fei, Stulp, Freek, Zhou, Gaoyue, Sukhatme, Gaurav S., Salhotra, Gautam, Yan, Ge, Schiavi, Giulio, Kahn, Gregory, Su, Hao, Fang, Hao-Shu, Shi, Haochen, Amor, Heni Ben, Christensen, Henrik I, Furuta, Hiroki, Walke, Homer, Fang, Hongjie, Mordatch, Igor, Radosavovic, Ilija, Leal, Isabel, Liang, Jacky, Abou-Chakra, Jad, Kim, Jaehyung, Peters, Jan, Schneider, Jan, Hsu, Jasmine, Bohg, Jeannette, Bingham, Jeffrey, Wu, Jiajun, Wu, Jialin, Luo, Jianlan, Gu, Jiayuan, Tan, Jie, Oh, Jihoon, Malik, Jitendra, Booher, Jonathan, Tompson, Jonathan, Yang, Jonathan, Lim, Joseph J., Silvério, João, Han, Junhyek, Rao, Kanishka, Pertsch, Karl, Hausman, Karol, Go, Keegan, Gopalakrishnan, Keerthana, Goldberg, Ken, Byrne, Kendra, Oslund, Kenneth, Kawaharazuka, Kento, Zhang, Kevin, Rana, Krishan, Srinivasan, Krishnan, Chen, Lawrence Yunliang, Pinto, Lerrel, Fei-Fei, Li, Tan, Liam, Ott, Lionel, Lee, Lisa, Tomizuka, Masayoshi, Spero, Max, Du, Maximilian, Ahn, Michael, Zhang, Mingtong, Ding, Mingyu, Srirama, Mohan Kumar, Sharma, Mohit, Kim, Moo Jin, Kanazawa, Naoaki, Hansen, Nicklas, Heess, Nicolas, Joshi, Nikhil J, Suenderhauf, Niko, Di Palo, Norman, Shafiullah, Nur Muhammad Mahi, Mees, Oier, Kroemer, Oliver, Sanketi, Pannag R, Wohlhart, Paul, Xu, Peng, Sermanet, Pierre, Sundaresan, Priya, Vuong, Quan, Rafailov, Rafael, Tian, Ran, Doshi, Ria, Martín-Martín, Roberto, Mendonca, Russell, Shah, Rutav, Hoque, Ryan, Julian, Ryan, Bustamante, Samuel, Kirmani, Sean, Levine, Sergey, Moore, Sherry, Bahl, Shikhar, Dass, Shivin, Sonawani, Shubham, Song, Shuran, Xu, Sichun, Haldar, Siddhant, Adebola, Simeon, Guist, Simon, Nasiriany, Soroush, Schaal, Stefan, Welker, Stefan, Tian, Stephen, Dasari, Sudeep, Belkhale, Suneel, Osa, Takayuki, Harada, Tatsuya, Matsushima, Tatsuya, Xiao, Ted, Yu, Tianhe, Ding, Tianli, Davchev, Todor, Zhao, Tony Z., Armstrong, Travis, Darrell, Trevor, Jain, Vidhi, Vanhoucke, Vincent, Zhan, Wei, Zhou, Wenxuan, Burgard, Wolfram, Chen, Xi, Wang, Xiaolong, Zhu, Xinghao, Li, Xuanlin, Lu, Yao, Chebotar, Yevgen, Zhou, Yifan, Zhu, Yifeng, Xu, Ying, Wang, Yixuan, Bisk, Yonatan, Cho, Yoonyoung, Lee, Youngwoon, Cui, Yuchen, Wu, Yueh-Hua, Tang, Yujin, Zhu, Yuke, Li, Yunzhu, Iwasawa, Yusuke, Matsuo, Yutaka, Xu, Zhuo, Cui, Zichen Jeff
Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning methods train a separate model for every application, every robot, and even every environment. Can we instead train generalist X-robot policy that can be adapted efficiently to new robots, tasks, and environments? In this paper, we provide datasets in standardized data formats and models to make it possible to explore this possibility in the context of robotic manipulation, alongside experimental results that provide an example of effective X-robot policies. We assemble a dataset from 22 different robots collected through a collaboration between 21 institutions, demonstrating 527 skills (160266 tasks). We show that a high-capacity model trained on this data, which we call RT-X, exhibits positive transfer and improves the capabilities of multiple robots by leveraging experience from other platforms. More details can be found on the project website $\href{https://robotics-transformer-x.github.io}{\text{robotics-transformer-x.github.io}}$.
A Simple Open-Loop Baseline for Reinforcement Learning Locomotion Tasks
Raffin, Antonin, Sigaud, Olivier, Kober, Jens, Albu-Schäffer, Alin, Silvério, João, Stulp, Freek
In search of the simplest baseline capable of competing with Deep Reinforcement Learning on locomotion tasks, we propose a biologically inspired model-free open-loop strategy. Drawing upon prior knowledge and harnessing the elegance of simple oscillators to generate periodic joint motions, it achieves respectable performance in five different locomotion environments, with a number of tunable parameters that is a tiny fraction of the thousands typically required by RL algorithms. Unlike RL methods, which are prone to performance degradation when exposed to sensor noise or failure, our open-loop oscillators exhibit remarkable robustness due to their lack of reliance on sensors. Furthermore, we showcase a successful transfer from simulation to reality using an elastic quadruped, all without the need for randomization or reward engineering. Overall, the proposed baseline and associated experiments highlight the existing limitations of DRL for robotic applications, provide insights on how to address them, and encourage reflection on the costs of complexity and generality.
Learning to Exploit Elastic Actuators for Quadruped Locomotion
Raffin, Antonin, Seidel, Daniel, Kober, Jens, Albu-Schäffer, Alin, Silvério, João, Stulp, Freek
Spring-based actuators in legged locomotion provide energy-efficiency and improved performance, but increase the difficulty of controller design. While previous work has focused on extensive modeling and simulation to find optimal controllers for such systems, we propose to learn model-free controllers directly on the real robot. In our approach, gaits are first synthesized by central pattern generators (CPGs), whose parameters are optimized to quickly obtain an open-loop controller that achieves efficient locomotion. Then, to make this controller more robust and further improve the performance, we use reinforcement learning to close the loop, to learn corrective actions on top of the CPGs. We evaluate the proposed approach on the DLR elastic quadruped bert. Our results in learning trotting and pronking gaits show that exploitation of the spring actuator dynamics emerges naturally from optimizing for dynamic motions, yielding high-performing locomotion, particularly the fastest walking gait recorded on bert, despite being model-free. The whole process takes no more than 1.5 hours on the real robot and results in natural-looking gaits.
Generalized State-Dependent Exploration for Deep Reinforcement Learning in Robotics
Raffin, Antonin, Stulp, Freek
Reinforcement learning (RL) enables robots to learn skills from interactions with the real world. In practice, the unstructured step-based exploration used in Deep RL -- often very successful in simulation -- leads to jerky motion patterns on real robots. Consequences of the resulting shaky behavior are poor exploration, or even damage to the robot. We address these issues by adapting state-dependent exploration (SDE) to current Deep RL algorithms. To enable this adaptation, we propose three extensions to the original SDE, which leads to a new exploration method generalized state-dependent exploration (gSDE). We evaluate gSDE both in simulation, on PyBullet continuous control tasks, and directly on a tendon-driven elastic robot. gSDE yields competitive results in simulation but outperforms the unstructured exploration on the real robot. The code is available at https://github.com/DLR-RM/stable-baselines3/tree/sde.
Comparing Semi-Parametric Model Learning Algorithms for Dynamic Model Estimation in Robotics
Riedel, Sebastian, Stulp, Freek
Physical modeling of robotic system behavior is the foundation for controlling many robotic mechanisms to a satisfactory degree. Mechanisms are also typically designed in a way that good model accuracy can be achieved with relatively simple models and model identification strategies. If the modeling accuracy using physically based models is not enough or too complex, model-free methods based on machine learning techniques can help. Of particular interest to us was therefore the question to what degree semi-parametric modeling techniques, meaning combinations of physical models with machine learning, increase the modeling accuracy of inverse dynamics models which are typically used in robot control. To this end, we evaluated semi-parametric Gaussian process regression and a novel model-based neural network architecture, and compared their modeling accuracy to a series of naive semi-parametric, parametric-only and non-parametric-only regression methods. The comparison has been carried out on three test scenarios, one involving a real test-bed and two involving simulated scenarios, with the most complex scenario targeting the modeling a simulated robot's inverse dynamics model. We found that in all but one case, semi-parametric Gaussian process regression yields the most accurate models, also with little tuning required for the training procedure.
Investigating Generalisation in Continuous Deep Reinforcement Learning
Zhao, Chenyang, Sigaud, Olivier, Stulp, Freek, Hospedales, Timothy M.
Deep Reinforcement Learning has shown great success in a variety of control tasks. However, it is unclear how close we are to the vision of putting Deep RL into practice to solve real world problems. In particular, common practice in the field is to train policies on largely deterministic simulators and to evaluate algorithms through training performance alone, without a train/test distinction to ensure models generalise and are not overfitted. Moreover, it is not standard practice to check for generalisation under domain shift, although robustness to such system change between training and testing would be necessary for real-world Deep RL control, for example, in robotics. In this paper we study these issues by first characterising the sources of uncertainty that provide generalisation challenges in Deep RL. We then provide a new benchmark and thorough empirical evaluation of generalisation challenges for state of the art Deep RL methods. In particular, we show that, if generalisation is the goal, then common practice of evaluating algorithms based on their training performance leads to the wrong conclusions about algorithm choice. Finally, we evaluate several techniques for improving generalisation and draw conclusions about the most robust techniques to date.
A survey on policy search algorithms for learning robot controllers in a handful of trials
Chatzilygeroudis, Konstantinos, Vassiliades, Vassilis, Stulp, Freek, Calinon, Sylvain, Mouret, Jean-Baptiste
Most policy search algorithms require thousands of training episodes to find an effective policy, which is often infeasible with a physical robot. This survey article focuses on the extreme other end of the spectrum: how can a robot adapt with only a handful of trials (a dozen) and a few minutes? By analogy with the word "big-data", we refer to this challenge as "micro-data reinforcement learning". We show that a first strategy is to leverage prior knowledge on the policy structure (e.g., dynamic movement primitives), on the policy parameters (e.g., demonstrations), or on the dynamics (e.g., simulators). A second strategy is to create data-driven surrogate models of the expected reward (e.g., Bayesian optimization) or the dynamical model (e.g., model-based policy search), so that the policy optimizer queries the model instead of the real system. Overall, all successful micro-data algorithms combine these two strategies by varying the kind of model and prior knowledge. The current scientific challenges essentially revolve around scaling up to complex robots (e.g., humanoids), designing generic priors, and optimizing the computing time.