Finite-Time Analysis of Simultaneous Double Q-learning

Na, Hyunjun, Lee, Donghwan

arXiv.org Artificial Intelligence 

Q-learning is one of the most fundamental reinforcement learning (RL) algorithms. Despite its widespread success in various applications, it is prone to overestimation bias in the Q-learning update. To address this issue, double Q-learning employs two independent Q-estimators which are randomly selected and updated during the learning process. This paper proposes a modified double Q-learning, called simultaneous double Q-learning (SDQ), with its finite-time analysis. SDQ eliminates the need for random selection between the two Q-estimators, and this modification allows us to analyze double Q-learning through the lens of a novel switching system framework facilitating efficient finite-time analysis. Empirical studies demonstrate that SDQ converges faster than double Q-learning while retaining the ability to mitigate the maximization bias. Finally, we derive a finite-time expected error bound for SDQ.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found