Bag of Policies for Distributional Deep Exploration
Nachkov, Asen, Li, Luchen, Luise, Giulia, Valdettaro, Filippo, Faisal, Aldo
–arXiv.org Artificial Intelligence
Efficient exploration in complex environments remains Distributional RL (DiRL) has rapidly established its place a major challenge for reinforcement learning among reinforcement learning (RL) algorithms Bellemare (RL). Compared to previous Thompson samplinginspired et al. [2017] as a powerful improvement over nondistributional mechanisms that enable temporally extended value-based counterparts Lyle et al. [2019]. In exploration, i.e., deep exploration, we focus DiRL, the agent does not learn a single summary statistic of on deep exploration in distributional RL. We develop the return for each state-action pair, but instead learns the here a general purpose approach, Bag of Policies whole return distribution. The agent's behaviour is being (BoP), that can be built on top of any return evaluated for multiple possible consequences which in turn distribution estimator by maintaining a population affect the policy update. While this does lead to more stable of its copies. BoP consists of an ensemble of multiple learning and better performance Lyle et al. [2019], it does heads that are updated independently. During not itself change the way actions are selected; as distributional training, each episode is controlled by only one of extensions to value-based RL, in C51 Bellemare et al. the heads and the collected state-action pairs are [2017], QR-DQN Dabney et al. [2018b] the agent still takes used to update all heads off-policy, leading to distinct actions according to the mean of the estimated return distributions learning signals for each head which diversify in each state-action pair.
arXiv.org Artificial Intelligence
Aug-3-2023
- Country:
- North America > United States (0.04)
- Europe
- United Kingdom > England
- Greater London > London (0.04)
- Sweden > Stockholm
- Stockholm (0.04)
- Germany > Bavaria
- Upper Franconia > Bayreuth (0.04)
- United Kingdom > England
- Asia > Middle East
- Jordan (0.04)
- Genre:
- Research Report (0.82)
- Technology: