Synergizing Quality-Diversity with Descriptor-Conditioned Reinforcement Learning

Faldor, Maxence, Chalumeau, Félix, Flageat, Manon, Cully, Antoine

Dec-10-2023–arXiv.org Artificial Intelligence

A fundamental trait of intelligence involves finding novel and creative solutions to address a given challenge or to adapt to unforeseen situations. Reflecting this, Quality-Diversity optimization is a family of Evolutionary Algorithms, that generates collections of both diverse and high-performing solutions. Among these, MAP-Elites is a prominent example, that has been successfully applied to a variety of domains, including evolutionary robotics. However, MAP-Elites performs a divergent search with random mutations originating from Genetic Algorithms, and thus, is limited to evolving populations of low-dimensional solutions. PGA-MAP-Elites overcomes this limitation using a gradient-based variation operator inspired by deep reinforcement learning which enables the evolution of large neural networks. Although high-performing in many environments, PGA-MAP-Elites fails on several tasks where the convergent search of the gradient-based variation operator hinders diversity. In this work, we present three contributions: (1) we enhance the Policy Gradient variation operator with a descriptor-conditioned critic that reconciles diversity search with gradient-based methods, (2) we leverage the actor-critic training to learn a descriptor-conditioned policy at no additional cost, distilling the knowledge of the population into one single versatile policy that can execute a diversity of behaviors, (3) we exploit the descriptor-conditioned actor by injecting it in the population, despite network architecture differences. Our method, DCG-MAP-Elites, achieves equal or higher QD score and coverage compared to all baselines on seven challenging continuous control locomotion tasks.

archive, descriptor, variation operator, (16 more...)

arXiv.org Artificial Intelligence

Dec-10-2023

arXiv.org PDF

Add feedback

Country:
- Africa > South Africa (0.04)
- North America
  - United States > New York
    - New York County > New York City (0.05)
  - Puerto Rico > San Juan
    - San Juan (0.04)
- Europe
  - United Kingdom > England
    - Greater London > London (0.04)
  - Portugal > Lisbon
    - Lisbon (0.04)

Genre:
- Research Report > Promising Solution (0.34)

Industry:
- Leisure & Entertainment > Games (0.67)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (1.00)
  - Neural Networks (1.00)
  - Evolutionary Systems (1.00)
  - Statistical Learning > Gradient Descent (0.34)