diverse skill
VendiRL: A Framework for Self-Supervised Reinforcement Learning of Diversely Diverse Skills
In self-supervised reinforcement learning (RL), one of the key challenges is learning a diverse set of skills to prepare agents for unknown future tasks. Despite impressive advances, scalability and evaluation remain prevalent issues. Regarding scalability, the search for meaningful skills can be obscured by high-dimensional feature spaces, where relevant features may vary across downstream task domains. For evaluating skill diversity, defining what constitutes "diversity" typically requires a hard commitment to a specific notion of what it means for skills to be diverse, potentially leading to inconsistencies in how skill diversity is understood, making results across different approaches hard to compare, and leaving many forms of diversity unexplored. To address these issues, we adopt a measure of sample diversity that translates ideas from ecology to machine learning -- the Vendi Score -- allowing the user to specify and evaluate any desired form of diversity. We demonstrate how this metric facilitates skill evaluation and introduce VendiRL, a unified framework for learning diversely diverse sets of skills. Given distinct similarity functions, VendiRL motivates distinct forms of diversity, which could support skill-diversity pretraining in new and richly interactive environments where optimising for various forms of diversity may be desirable.
- Europe > Finland > Uusimaa > Helsinki (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Human-Aligned Skill Discovery: Balancing Behaviour Exploration and Alignment
Hussonnois, Maxence, Karimpanal, Thommen George, Rana, Santu
Unsupervised skill discovery in Reinforcement Learning aims to mimic humans' ability to autonomously discover diverse behaviors. However, existing methods are often unconstrained, making it difficult to find useful skills, especially in complex environments, where discovered skills are frequently unsafe or impractical. We address this issue by proposing Human-aligned Skill Discovery (HaSD), a framework that incorporates human feedback to discover safer, more aligned skills. HaSD simultaneously optimises skill diversity and alignment with human values. This approach ensures that alignment is maintained throughout the skill discovery process, eliminating the inefficiencies associated with exploring unaligned skills. We demonstrate its effectiveness in both 2D navigation and SafetyGymnasium environments, showing that HaSD discovers diverse, human-aligned skills that are safe and useful for downstream tasks. Finally, we extend HaSD by learning a range of configurable skills with varying degrees of diversity alignment trade-offs that could be useful in practical scenarios.
- Oceania > Australia (0.04)
- North America > United States > Michigan > Wayne County > Detroit (0.04)
- Europe > Portugal > Braga > Braga (0.04)
Reviews: Unsupervised Curricula for Visual Meta-Reinforcement Learning
This paper presents a method for learning a distribution of tasks to feed to an agent that's learning via meta RL, while simultaneously optimizing the agent to perform better more quickly on tasks sampled from this distribution. The task distribution is trained using an objective that maximizes mutual information between a latent task variable and the trajectories produced by the meta RL agent. The meta RL agent is trained to maximize this mutual information, more or less. The overall optimization relies on some variational lower bounds on mutual information, and on the RL 2 algorithm for meta RL. Experiments are provided which show that the task distributions and meta RL agents trained in this co-adaptive manner exhibit some potentially useful behaviors, e.g. an improved ability to quickly solve new tasks sampled from an "actual" task distribution -- i.e., a task distribution which is not equal to the one that's co-adapted with the agent.
Acquiring Diverse Skills using Curriculum Reinforcement Learning with Mixture of Experts
Celik, Onur, Taranovic, Aleksandar, Neumann, Gerhard
Reinforcement learning (RL) is a powerful approach for acquiring a good-performing policy. However, learning diverse skills is challenging in RL due to the commonly used Gaussian policy parameterization. We propose \textbf{Di}verse \textbf{Skil}l \textbf{L}earning (Di-SkilL\footnote{Videos and code are available on the project webpage: \url{https://alrhub.github.io/di-skill-website/}}), an RL method for learning diverse skills using Mixture of Experts, where each expert formalizes a skill as a contextual motion primitive. Di-SkilL optimizes each expert and its associate context distribution to a maximum entropy objective that incentivizes learning diverse skills in similar contexts. The per-expert context distribution enables automatic curricula learning, allowing each expert to focus on its best-performing sub-region of the context space. To overcome hard discontinuities and multi-modalities without any prior knowledge of the environment's unknown context probability space, we leverage energy-based models to represent the per-expert context distributions and demonstrate how we can efficiently train them using the standard policy gradient objective. We show on challenging robot simulation tasks that Di-SkilL can learn diverse and performant skills.
- Europe > Austria > Vienna (0.14)
- Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
- Asia > Middle East > Jordan (0.04)
Language Guided Skill Discovery
Rho, Seungeun, Smith, Laura, Li, Tianyu, Levine, Sergey, Peng, Xue Bin, Ha, Sehoon
Skill discovery methods enable agents to learn diverse emergent behaviors without explicit rewards. To make learned skills useful for unknown downstream tasks, obtaining a semantically diverse repertoire of skills is essential. While some approaches introduce a discriminator to distinguish skills and others aim to increase state coverage, no existing work directly addresses the "semantic diversity" of skills. We hypothesize that leveraging the semantic knowledge of large language models (LLMs) can lead us to improve semantic diversity of resulting behaviors. In this sense, we introduce Language Guided Skill Discovery (LGSD), a skill discovery framework that aims to directly maximize the semantic diversity between skills. LGSD takes user prompts as input and outputs a set of semantically distinctive skills. The prompts serve as a means to constrain the search space into a semantically desired subspace, and the generated LLM outputs guide the agent to visit semantically diverse states within the subspace. We demonstrate that LGSD enables legged robots to visit different user-intended areas on a plane by simply changing the prompt. Furthermore, we show that language guidance aids in discovering more diverse skills compared to five existing skill discovery methods in robot-arm manipulation environments. Lastly, LGSD provides a simple way of utilizing learned skills via natural language.
- North America > Montserrat (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Education (1.00)
- Health & Medicine > Therapeutic Area (0.54)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.88)
Controlled Diversity with Preference : Towards Learning a Diverse Set of Desired Skills
Hussonnois, Maxence, Karimpanal, Thommen George, Rana, Santu
Autonomously learning diverse behaviors without an extrinsic reward signal has been a problem of interest in reinforcement learning. However, the nature of learning in such mechanisms is unconstrained, often resulting in the accumulation of several unusable, unsafe or misaligned skills. In order to avoid such issues and ensure the discovery of safe and human-aligned skills, it is necessary to incorporate humans into the unsupervised training process, which remains a largely unexplored research area. In this work, we propose Controlled Diversity with Preference (CDP), a novel, collaborative human-guided mechanism for an agent to learn a set of skills that is diverse as well as desirable. The key principle is to restrict the discovery of skills to those regions that are deemed to be desirable as per a preference model trained using human preference labels on trajectory pairs. We evaluate our approach on 2D navigation and Mujoco environments and demonstrate the ability to discover diverse, yet desirable skills.
- Oceania > Australia (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)