The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation
Rowland, Mark, Tang, Yunhao, Lyle, Clare, Munos, Rémi, Bellemare, Marc G., Dabney, Will
–arXiv.org Artificial Intelligence
We study the problem of temporal-differencebased In this paper, however, we reach a surprising conclusion: policy evaluation in reinforcement learning. Even in the tabular setting, there are many scenarios where In particular, we analyse the use of a distributional quantile temporal-difference learning (QTD; Dabney et al., reinforcement learning algorithm, quantile 2018b), a distributional RL algorithm which aims to learn temporal-difference learning (QTD), for this task.
arXiv.org Artificial Intelligence
May-28-2023