Qualitative Measurements of Policy Discrepancy for Return-based Deep Q-Network

Meng, Wenjia, Zheng, Qian, Yang, Long, Li, Pengfei, Pan, Gang

Jun-14-2018–arXiv.org Artificial Intelligence

In this paper, we focus on policy discrepancy in return-based deep Q-network (R-DQN) learning. We propose a general framework for R-DQN, with which most of the return-based reinforcement learning algorithms can be combined with DQN. We show the performance of traditional DQN can be significantly improved by introducing returnbased reinforcement learning. In order to further improve the performance of R-DQN, we present a strategy with two measurements which can qualitatively measure the policy discrepancy. Moreover, we give two bounds for these two measurements under the R-DQN framework. Algorithms with our strategy can accurately express the trace coefficient and achieve a better approximation to return. The experiments are carried out on several representative tasks from the OpenAI Gym library. Results show the algorithms with our strategy outperform the state-of-the-art R-DQN methods.

artificial intelligence, machine learning, reinforcement learning, (3 more...)

arXiv.org Artificial Intelligence

Jun-14-2018

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.69)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.73)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found