Goto

Collaborating Authors

 Reinforcement Learning


Offline Behavior Distillation

Neural Information Processing Systems

Inspired by dataset distillation (DD) [Wang et al., 2018, Zhao et al., (Corollary 1). Extensive experiments on nine datasets of D4RL benchmark [Fu et al., 2020] with multiple environments and data qualities illustrate that our Av-PBC remarkably promotes the OBD performance, Moreover, Av-PBC has a significant convergence speed and requires only a quarter of distillation steps compared to DBC and PBC.



A Study of Plasticity Loss in On-Policy Deep Reinforcement Learning

Neural Information Processing Systems

We demonstrate that plasticity loss is pervasive under domain shift in this regime, and that a number of methods developed to resolve it in other settings fail, sometimes even performing worse than applying no intervention at all. In contrast, we find that a class of "regenerative" methods are able to consistently mitigate plasticity loss in a variety of contexts, including in gridworld tasks and


Exclusively Penalized Q-learning for Offline Reinforcement Learning

Neural Information Processing Systems

Reinforcement learning (RL) is gaining significant attention for solving complex Markov decision process (MDP) tasks.







Skill-aware Mutual Information Optimisation for Generalisation in Reinforcement Learning

Neural Information Processing Systems

Reinforcement Learning (RL) agents often learn policies that do not generalise across tasks in which the environmental features and optimal skills are different [des Combes et al., 2018, Garcin et al., 2024].