Reinforcement Learning
Offline Behavior Distillation
Inspired by dataset distillation (DD) [Wang et al., 2018, Zhao et al., (Corollary 1). Extensive experiments on nine datasets of D4RL benchmark [Fu et al., 2020] with multiple environments and data qualities illustrate that our Av-PBC remarkably promotes the OBD performance, Moreover, Av-PBC has a significant convergence speed and requires only a quarter of distillation steps compared to DBC and PBC.
A Study of Plasticity Loss in On-Policy Deep Reinforcement Learning
We demonstrate that plasticity loss is pervasive under domain shift in this regime, and that a number of methods developed to resolve it in other settings fail, sometimes even performing worse than applying no intervention at all. In contrast, we find that a class of "regenerative" methods are able to consistently mitigate plasticity loss in a variety of contexts, including in gridworld tasks and