Non-decreasing Quantile Function Network with Efficient Exploration for Distributional Reinforcement Learning
Zhou, Fan, Zhu, Zhoufan, Kuang, Qi, Zhang, Liwen
–arXiv.org Artificial Intelligence
Although distributional reinforcement learning The theoretical validity of QR-DQN [Dabney et al., (DRL) has been widely examined in the past few 2018b], IQN [Dabney et al., 2018a] and FQF [Yang et al., years, there are two open questions people are still 2019] heavily depends on a prerequisite that the approximated trying to address. One is how to ensure the validity quantile curve is non-decreasing. Unfortunately, since of the learned quantile function, the other is how to no global constraint is imposed when simultaneously estimating efficiently utilize the distribution information. This the quantile values at multiple locations, the monotonicity paper attempts to provide some new perspectives to can not be ensured using their network designs. At encourage the future in-depth studies in these two early training stage, the crossing issue is even more severe fields. We first propose a non-decreasing quantile given limited training samples. Some existing studies try to function network (NDQFN) to guarantee the monotonicity solve this problem [Zhou et al., 2020; Tang Nguyen et al., of the obtained quantile estimates and then 2020]. However, their main architecture is built on a set of design a general exploration framework called distributional fixed quantile locations and not applicable to quantile value prediction error (DPE) for DRL which based algorithms such as IQN and FQF.
arXiv.org Artificial Intelligence
May-14-2021
- Genre:
- Research Report (0.64)
- Industry:
- Leisure & Entertainment > Games > Computer Games (0.31)
- Technology: