AITopics | unsupervised reinforcement learning

Unsupervised Reinforcement Learning with Contrastive Intrinsic Control

Neural Information Processing SystemsDec-25-2025, 12:07:52 GMT

We introduce Contrastive Intrinsic Control (CIC), an unsupervised reinforcement learning (RL) algorithm that maximizes the mutual information between state-transitions and latent skill vectors. CIC utilizes contrastive learning between state-transitions and skills vectors to learn behaviour embeddings and maximizes the entropy of these embeddings as an intrinsic reward to encourage behavioural diversity. We evaluate our algorithm on the Unsupervised RL Benchmark (URLB) in the asymptotic state-based setting, which consists of a long reward-free pre-training phase followed by a short adaptation phase to downstream tasks with extrinsic rewards. We find that CIC improves over prior exploration algorithms in terms of adaptation efficiency to downstream tasks on state-based URLB.

contrastive intrinsic control, name change, unsupervised reinforcement learning, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)

Add feedback

A Mixture Of Surprises for Unsupervised Reinforcement Learning

Neural Information Processing SystemsDec-24-2025, 22:31:59 GMT

Unsupervised reinforcement learning aims at learning a generalist policy in a reward-free manner for fast adaptation to downstream tasks. Most of the existing methods propose to provide an intrinsic reward based on surprise. Maximizing or minimizing surprise drives the agent to either explore or gain control over its environment. However, both strategies rely on a strong assumption: the entropy of the environment's dynamics is either high or low. This assumption may not always hold in real-world scenarios, where the entropy of the environment's dynamics may be unknown. Hence, choosing between the two objectives is a dilemma. We propose a novel yet simple mixture of policies to address this concern, allowing us to optimize an objective that simultaneously maximizes and minimizes the surprise. Concretely, we train one mixture component whose objective is to maximize the surprise and another whose objective is to minimize the surprise. Hence, our method does not make assumptions about the entropy of the environment's dynamics.

name change, objective, unsupervised reinforcement learning, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.33)

Add feedback

Unsupervised Reinforcement Learning with Contrastive Intrinsic Control

Neural Information Processing SystemsJan-19-2025, 02:49:07 GMT

We introduce Contrastive Intrinsic Control (CIC), an unsupervised reinforcement learning (RL) algorithm that maximizes the mutual information between state-transitions and latent skill vectors. CIC utilizes contrastive learning between state-transitions and skills vectors to learn behaviour embeddings and maximizes the entropy of these embeddings as an intrinsic reward to encourage behavioural diversity. We evaluate our algorithm on the Unsupervised RL Benchmark (URLB) in the asymptotic state-based setting, which consists of a long reward-free pre-training phase followed by a short adaptation phase to downstream tasks with extrinsic rewards. We find that CIC improves over prior exploration algorithms in terms of adaptation efficiency to downstream tasks on state-based URLB.

algorithm, contrastive intrinsic control, unsupervised reinforcement learning, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A Mixture Of Surprises for Unsupervised Reinforcement Learning

Neural Information Processing SystemsJan-18-2025, 10:15:23 GMT

Unsupervised reinforcement learning aims at learning a generalist policy in a reward-free manner for fast adaptation to downstream tasks. Most of the existing methods propose to provide an intrinsic reward based on surprise. Maximizing or minimizing surprise drives the agent to either explore or gain control over its environment. However, both strategies rely on a strong assumption: the entropy of the environment's dynamics is either high or low. This assumption may not always hold in real-world scenarios, where the entropy of the environment's dynamics may be unknown.

objective, textbf, unsupervised reinforcement learning, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Add feedback

POLTER: Policy Trajectory Ensemble Regularization for Unsupervised Reinforcement Learning

Schubert, Frederik, Benjamins, Carolin, Döhler, Sebastian, Rosenhahn, Bodo, Lindauer, Marius

arXiv.org Artificial IntelligenceDec-15-2023

The goal of Unsupervised Reinforcement Learning (URL) is to find a reward-agnostic prior policy on a task domain, such that the sample-efficiency on supervised downstream tasks is improved. Although agents initialized with such a prior policy can achieve a significantly higher reward with fewer samples when finetuned on the downstream task, it is still an open question how an optimal pretrained prior policy can be achieved in practice. In this work, we present POLTER (Policy Trajectory Ensemble Regularization) - a general method to regularize the pretraining that can be applied to any URL algorithm and is especially useful on data- and knowledge-based URL algorithms. It utilizes an ensemble of policies that are discovered during pretraining and moves the policy of the URL algorithm closer to its optimal prior. Our method is based on a theoretical framework, and we analyze its practical effects on a white-box benchmark, allowing us to study POLTER with full control. In our main experiments, we evaluate POLTER on the Unsupervised Reinforcement Learning Benchmark (URLB), which consists of 12 tasks in 3 domains. We demonstrate the generality of our approach by improving the performance of a diverse set of data- and knowledge-based URL algorithms by 19% on average and up to 40% in the best case. Under a fair comparison with tuned baselines and tuned POLTER, we establish a new state-of-the-art for model-free methods on the URLB.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2205.11357

Country:

North America > United States > Maryland > Baltimore (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(9 more...)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Augmenting Unsupervised Reinforcement Learning with Self-Reference

Zhao, Andrew, Zhu, Erle, Lu, Rui, Lin, Matthieu, Liu, Yong-Jin, Huang, Gao

arXiv.org Artificial IntelligenceNov-16-2023

Humans possess the ability to draw on past experiences explicitly when learning new tasks and applying them accordingly. We believe this capacity for self-referencing is especially advantageous for reinforcement learning agents in the unsupervised pretrain-then-finetune setting. During pretraining, an agent's past experiences can be explicitly utilized to mitigate the nonstationarity of intrinsic rewards. In the finetuning phase, referencing historical trajectories prevents the unlearning of valuable exploratory behaviors. Motivated by these benefits, we propose the Self-Reference (SR) approach, an add-on module explicitly designed to leverage historical information and enhance agent performance within the pretrain-finetune paradigm. Our approach achieves state-of-the-art results in terms of Interquartile Mean (IQM) performance and Optimality Gap reduction on the Unsupervised Reinforcement Learning Benchmark for model-free methods, recording an 86% IQM and a 16% Optimality Gap. Additionally, it improves current algorithms by up to 17% IQM and reduces the Optimality Gap by 31%. Beyond performance enhancement, the Self-Reference add-on also increases sample efficiency, a crucial attribute for real-world applications.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2311.09692

Country:

North America > United States > California (0.14)
Asia > China > Beijing > Beijing (0.05)
North America > Canada > British Columbia (0.04)
(4 more...)

Genre: Research Report (0.82)

Industry: Education (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

All You Need to Know About Unsupervised Reinforcement Learning

#artificialintelligenceDec-26-2021, 07:30:15 GMT

Unsupervised learning can be considered as the approach to learning from the huge amount of unannotated data and reinforcement learning can be considered as the approach to learning from the very low amount of data. A combination of these learning methods can be considered as unsupervised reinforcement learning which is basically a betterment in reinforcement learning. In this article, we are going to discuss unsupervised Reinforcement learning in detail along with special features and application areas. The major points that we will discuss here are listed below. Unsupervised reinforcement learning is a combination of unsupervised learning and reinforcement learning.

learning, reinforcement, unsupervised learning, (10 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A summary of the keynotes at AAMAS

AIHubJun-2-2020, 18:08:51 GMT

A virtual edition of the International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS) conference was held on 9-13 May. Videos of the talks are now available for public viewing, and you can also see the sessions from the various workshops. Alison is interested in how cities work and builds spatial agent-based models (ABMs) to study how people move around and how behaviour plays out in space and time. There are a number of challenges with these kinds of models and they need to be really robust if they are to be adopted by policy makers. So, why should we be interested in modelling cities?

agent, artificial intelligence, learning, (17 more...)

AIHub

Country: Europe > United Kingdom (0.05)

Industry: