Synergizing Quality-Diversity with Descriptor-Conditioned Reinforcement Learning
Faldor, Maxence, Chalumeau, Félix, Flageat, Manon, Cully, Antoine
–arXiv.org Artificial Intelligence
A fundamental trait of intelligence involves finding novel and creative solutions to address a given challenge or to adapt to unforeseen situations. Reflecting this, Quality-Diversity optimization is a family of Evolutionary Algorithms, that generates collections of both diverse and high-performing solutions. Among these, MAP-Elites is a prominent example, that has been successfully applied to a variety of domains, including evolutionary robotics. However, MAP-Elites performs a divergent search with random mutations originating from Genetic Algorithms, and thus, is limited to evolving populations of low-dimensional solutions. PGA-MAP-Elites overcomes this limitation using a gradient-based variation operator inspired by deep reinforcement learning which enables the evolution of large neural networks. Although high-performing in many environments, PGA-MAP-Elites fails on several tasks where the convergent search of the gradient-based variation operator hinders diversity. In this work, we present three contributions: (1) we enhance the Policy Gradient variation operator with a descriptor-conditioned critic that reconciles diversity search with gradient-based methods, (2) we leverage the actor-critic training to learn a descriptor-conditioned policy at no additional cost, distilling the knowledge of the population into one single versatile policy that can execute a diversity of behaviors, (3) we exploit the descriptor-conditioned actor by injecting it in the population, despite network architecture differences. Our method, DCG-MAP-Elites, achieves equal or higher QD score and coverage compared to all baselines on seven challenging continuous control locomotion tasks.
arXiv.org Artificial Intelligence
Dec-10-2023
- Country:
- Africa > South Africa (0.04)
- North America
- United States > New York
- New York County > New York City (0.05)
- Puerto Rico > San Juan
- San Juan (0.04)
- United States > New York
- Europe
- United Kingdom > England
- Greater London > London (0.04)
- Portugal > Lisbon
- Lisbon (0.04)
- United Kingdom > England
- Genre:
- Research Report > Promising Solution (0.34)
- Industry:
- Leisure & Entertainment > Games (0.67)
- Technology: