Non-decreasing Quantile Function Network with Efficient Exploration for Distributional Reinforcement Learning

Zhou, Fan, Zhu, Zhoufan, Kuang, Qi, Zhang, Liwen

arXiv.org Artificial Intelligence 

Although distributional reinforcement learning The theoretical validity of QR-DQN [Dabney et al., (DRL) has been widely examined in the past few 2018b], IQN [Dabney et al., 2018a] and FQF [Yang et al., years, there are two open questions people are still 2019] heavily depends on a prerequisite that the approximated trying to address. One is how to ensure the validity quantile curve is non-decreasing. Unfortunately, since of the learned quantile function, the other is how to no global constraint is imposed when simultaneously estimating efficiently utilize the distribution information. This the quantile values at multiple locations, the monotonicity paper attempts to provide some new perspectives to can not be ensured using their network designs. At encourage the future in-depth studies in these two early training stage, the crossing issue is even more severe fields. We first propose a non-decreasing quantile given limited training samples. Some existing studies try to function network (NDQFN) to guarantee the monotonicity solve this problem [Zhou et al., 2020; Tang Nguyen et al., of the obtained quantile estimates and then 2020]. However, their main architecture is built on a set of design a general exploration framework called distributional fixed quantile locations and not applicable to quantile value prediction error (DPE) for DRL which based algorithms such as IQN and FQF.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found