Bag of Policies for Distributional Deep Exploration

Nachkov, Asen, Li, Luchen, Luise, Giulia, Valdettaro, Filippo, Faisal, Aldo

Aug-3-2023–arXiv.org Artificial Intelligence

Efficient exploration in complex environments remains Distributional RL (DiRL) has rapidly established its place a major challenge for reinforcement learning among reinforcement learning (RL) algorithms Bellemare (RL). Compared to previous Thompson samplinginspired et al. [2017] as a powerful improvement over nondistributional mechanisms that enable temporally extended value-based counterparts Lyle et al. [2019]. In exploration, i.e., deep exploration, we focus DiRL, the agent does not learn a single summary statistic of on deep exploration in distributional RL. We develop the return for each state-action pair, but instead learns the here a general purpose approach, Bag of Policies whole return distribution. The agent's behaviour is being (BoP), that can be built on top of any return evaluated for multiple possible consequences which in turn distribution estimator by maintaining a population affect the policy update. While this does lead to more stable of its copies. BoP consists of an ensemble of multiple learning and better performance Lyle et al. [2019], it does heads that are updated independently. During not itself change the way actions are selected; as distributional training, each episode is controlled by only one of extensions to value-based RL, in C51 Bellemare et al. the heads and the collected state-action pairs are [2017], QR-DQN Dabney et al. [2018b] the agent still takes used to update all heads off-policy, leading to distinct actions according to the mean of the estimated return distributions learning signals for each head which diversify in each state-action pair.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

Aug-3-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.04)
- Europe
  - United Kingdom > England
    - Greater London > London (0.04)
  - Sweden > Stockholm
    - Stockholm (0.04)
  - Germany > Bavaria
    - Upper Franconia > Bayreuth (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found