Goto

Collaborating Authors

 Higuera, Juan Camilo Gamboa


Learning Domain Randomization Distributions for Transfer of Locomotion Policies

arXiv.org Machine Learning

Domain randomization (DR) is a successful technique for learning robust policies for robot systems, when the dynamics of the target robot system are unknown. The success of policies trained with domain randomization however, is highly dependent on the correct selection of the randomization distribution. The majority of success stories typically use real world data in order to carefully select the DR distribution, or incorporate real world trajectories to better estimate appropriate randomization distributions. In this paper, we consider the problem of finding good domain randomization parameters for simulation, without prior access to data from the target system. We explore the use of gradient-based search methods to learn a domain randomization with the following properties: 1) The trained policy should be successful in environments sampled from the domain randomization distribution 2) The domain randomization distribution should be wide enough so that the experience similar to the target robot system is observed during training, while addressing the practicality of training finite capacity models. These two properties aim to ensure the trajectories encountered in the target system are close to those observed during training, as existing methods in machine learning are better suited for interpolation than extrapolation. We show how adapting the domain randomization distribution while training context-conditioned policies results in improvements on jump-start and asymptotic performance when transferring a learned policy to the target environment.


Synthesizing Neural Network Controllers with Probabilistic Model based Reinforcement Learning

arXiv.org Artificial Intelligence

We present an algorithm for rapidly learning controllers for robotics systems. The algorithm follows the model-based reinforcement learning paradigm, and improves upon existing algorithms; namely Probabilistic learning in Control (PILCO) and a sample-based version of PILCO with neural network dynamics (Deep-PILCO). We propose training a neural network dynamics model using variational dropout with truncated Log-Normal noise. This allows us to obtain a dynamics model with calibrated uncertainty, which can be used to simulate controller executions via rollouts. We also describe set of techniques, inspired by viewing PILCO as a recurrent neural network model, that are crucial to improve the convergence of the method. We test our method on a variety of benchmark tasks, demonstrating data-efficiency that is competitive with PILCO, while being able to optimize complex neural network controllers. Finally, we assess the performance of the algorithm for learning motor controllers for a six legged autonomous underwater vehicle. This demonstrates the potential of the algorithm for scaling up the dimensionality and dataset sizes, in more complex control tasks.