Goto

Collaborating Authors

 behaviour selection policy


Learning to Walk Autonomously via Reset-Free Quality-Diversity

Lim, Bryan, Reichenbach, Alexander, Cully, Antoine

arXiv.org Artificial Intelligence

Quality-Diversity (QD) algorithms can discover large and complex behavioural repertoires consisting of both diverse and high-performing skills. However, the generation of behavioural repertoires has mainly been limited to simulation environments instead of real-world learning. This is because existing QD algorithms need large numbers of evaluations as well as episodic resets, which require manual human supervision and interventions. This paper proposes Reset-Free Quality-Diversity optimization (RF-QD) as a step towards autonomous learning for robotics in open-ended environments. We build on Dynamics-Aware Quality-Diversity (DA-QD) and introduce a behaviour selection policy that leverages the diversity of the imagined repertoire and environmental information to intelligently select of behaviours that can act as automatic resets. We demonstrate this through a task of learning to walk within defined training zones with obstacles. Our experiments show that we can learn full repertoires of legged locomotion controllers autonomously without manual resets with high sample efficiency in spite of harsh safety constraints. Finally, using an ablation of different target objectives, we show that it is important for RF-QD to have diverse types solutions available for the behaviour selection policy over solutions optimised with a specific objective. Videos and code available at https://sites.google.com/view/rf-qd.


Passing Through Narrow Gaps with Deep Reinforcement Learning

Tidd, Brendan, Cosgun, Akansel, Leitner, Jurgen, Hudson, Nicolas

arXiv.org Artificial Intelligence

The DARPA subterranean challenge requires teams of robots to traverse difficult and diverse underground environments. Traversing small gaps is one of the challenging scenarios that robots encounter. Imperfect sensor information makes it difficult for classical navigation methods, where behaviours require significant manual fine tuning. In this paper we present a deep reinforcement learning method for autonomously navigating through small gaps, where contact between the robot and the gap may be required. We first learn a gap behaviour policy to get through small gaps (only centimeters wider than the robot). We then learn a goal-conditioned behaviour selection policy that determines when to activate the gap behaviour policy. We train our policies in simulation and demonstrate their effectiveness with a large tracked robot in simulation and on the real platform. In simulation experiments, our approach achieves 93% success rate when the gap behaviour is activated manually by an operator, and 67% with autonomous activation using the behaviour selection policy. In real robot experiments, our approach achieves a success rate of 73% with manual activation, and 40% with autonomous behaviour selection. While we show the feasibility of our approach in simulation, the difference in performance between simulated and real world scenarios highlight the difficulty of direct sim-to-real transfer for deep reinforcement learning policies. In both the simulated and real world environments alternative methods were unable to traverse the gap.