diverse behavior
- Asia > Middle East > Jordan (0.04)
- Asia > China (0.04)
Discovering Creative Behaviors through DUPLEX: Diverse Universal Features for Policy Exploration
The ability to approach the same problem from different angles is a cornerstone of human intelligence that leads to robust solutions and effective adaptation to problem variations. In contrast, current RL methodologies tend to lead to policies that settle on a single solution to a given problem, making them brittle to problem variations. Replicating human flexibility in reinforcement learning agents is the challenge that we explore in this work.
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- Africa > Rwanda > Kigali > Kigali (0.04)
- (4 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
Robust Imitation of Diverse Behaviors
Deep generative models have recently shown great promise in imitation learning for motor control. Given enough data, even supervised approaches can do one-shot imitation learning; however, they are vulnerable to cascading failures when the agent trajectory diverges from the demonstrations. Compared to purely supervised methods, Generative Adversarial Imitation Learning (GAIL) can learn more robust controllers from fewer demonstrations, but is inherently mode-seeking and more difficult to train. In this paper, we show how to combine the favourable aspects of these two approaches. The base of our model is a new type of variational autoencoder on demonstration trajectories that learns semantic policy embeddings. We show that these embeddings can be learned on a 9 DoF Jaco robot arm in reaching tasks, and then smoothly interpolated with a resulting smooth interpolation of reaching behavior. Leveraging these policy representations, we develop a new version of GAIL that (1) is much more robust than the purely-supervised controller, especially with few demonstrations, and (2) avoids mode collapse, capturing many diverse behaviors when GAIL on its own does not. We demonstrate our approach on learning diverse gaits from demonstration on a 2D biped and a 62 DoF 3D humanoid in the MuJoCo physics environment.
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Europe > Italy > Sardinia (0.04)
- Africa > Ethiopia > Addis Ababa > Addis Ababa (0.04)
- Education (0.93)
- Leisure & Entertainment > Games (0.46)
Discovering Creative Behaviors through DUPLEX: Diverse Universal Features for Policy Exploration
The ability to approach the same problem from different angles is a cornerstone of human intelligence that leads to robust solutions and effective adaptation to problem variations. In contrast, current RL methodologies tend to lead to policies that settle on a single solution to a given problem, making them brittle to problem variations. Replicating human flexibility in reinforcement learning agents is the challenge that we explore in this work.
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- Africa > Rwanda > Kigali > Kigali (0.04)
- (4 more...)
Unsupervised Behavior Extraction via Random Intent Priors
Reward-free data is abundant and contains rich prior knowledge of human behaviors, but it is not well exploited by offline reinforcement learning (RL) algorithms. In this paper, we propose UBER, an unsupervised approach to extract useful behaviors from offline reward-free datasets via diversified rewards.
- Asia > Middle East > Jordan (0.04)
- Asia > China (0.04)
Offline Learning of Controllable Diverse Behaviors
Petitbois, Mathieu, Portelas, Rémy, Lamprier, Sylvain, Denoyer, Ludovic
Accepted to the Generative Models for Robot Learning Workshop at ICLR 2025O FFLINEL EARNING OF C ONTROLLABLED IVERSE B E-HAVIORS Mathieu Petitbois *,1, R emy Portelas 1, Sylvain Lamprier 2, Ludovic Denoyer 3 1 Ubisoft La Forge 2 University of Angers 3 H Company A BSTRACT Imitation Learning (IL) techniques aim to replicate human behaviors in specific tasks. While IL has gained prominence due to its effectiveness and efficiency, traditional methods often focus on datasets collected from experts to produce a single efficient policy. Recently, extensions have been proposed to handle datasets of diverse behaviors by mainly focusing on learning transition-level diverse policies or on performing entropy maximization at the trajectory level. While these methods may lead to diverse behaviors, they may not be sufficient to reproduce the actual diversity of demonstrations or to allow controlled trajectory generation. To overcome these drawbacks, we propose a different method based on two key features: a) Temporal Consistency that ensures consistent behaviors across entire episodes and not just at the transition level as well as b) Controllability obtained by constructing a latent space of behaviors that allows users to selectively activate specific behaviors based on their requirements. We compare our approach to state-of-the-art methods over a diverse set of tasks and environments. For robotics, learning from human experts allows to reach human-level performance without any controller hard coding or expensive interaction with simulated or real environments.
Imitation from Diverse Behaviors: Wasserstein Quality Diversity Imitation Learning with Single-Step Archive Exploration
Yu, Xingrui, Wan, Zhenglin, Bossens, David Mark, Lyu, Yueming, Guo, Qing, Tsang, Ivor W.
Learning diverse and high-performance behaviors from a limited set of demonstrations is a grand challenge. Traditional imitation learning methods usually fail in this task because most of them are designed to learn one specific behavior even with multiple demonstrations. Therefore, novel techniques for quality diversity imitation learning are needed to solve the above challenge. This work introduces Wasserstein Quality Diversity Imitation Learning (WQDIL), which 1) improves the stability of imitation learning in the quality diversity setting with latent adversarial training based on a Wasserstein Auto-Encoder (WAE), and 2) mitigates a behavior-overfitting issue using a measure-conditioned reward function with a single-step archive exploration bonus. Empirically, our method significantly outperforms state-of-the-art IL methods, achieving near-expert or beyond-expert QD performance on the challenging continuous control tasks derived from MuJoCo environments.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Asia > Singapore (0.04)
- Asia > China > Hong Kong (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.48)
- Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.46)