The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation

Rowland, Mark, Tang, Yunhao, Lyle, Clare, Munos, Rémi, Bellemare, Marc G., Dabney, Will

arXiv.org Artificial Intelligence 

We study the problem of temporal-differencebased In this paper, however, we reach a surprising conclusion: policy evaluation in reinforcement learning. Even in the tabular setting, there are many scenarios where In particular, we analyse the use of a distributional quantile temporal-difference learning (QTD; Dabney et al., reinforcement learning algorithm, quantile 2018b), a distributional RL algorithm which aims to learn temporal-difference learning (QTD), for this task.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found