Qualitative Measurements of Policy Discrepancy for Return-based Deep Q-Network

Meng, Wenjia, Zheng, Qian, Yang, Long, Li, Pengfei, Pan, Gang

arXiv.org Artificial Intelligence 

In this paper, we focus on policy discrepancy in return-based deep Q-network (R-DQN) learning. We propose a general framework for R-DQN, with which most of the return-based reinforcement learning algorithms can be combined with DQN. We show the performance of traditional DQN can be significantly improved by introducing returnbased reinforcement learning. In order to further improve the performance of R-DQN, we present a strategy with two measurements which can qualitatively measure the policy discrepancy. Moreover, we give two bounds for these two measurements under the R-DQN framework. Algorithms with our strategy can accurately express the trace coefficient and achieve a better approximation to return. The experiments are carried out on several representative tasks from the OpenAI Gym library. Results show the algorithms with our strategy outperform the state-of-the-art R-DQN methods.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found