OntheEstimationBiasinDoubleQ-Learning
–Neural Information Processing Systems
One of the phenomena of interest is that Q-learning (Watkins, 1989) is known to suffer from overestimation issues, since it takes a maximum operator overaset ofestimated action-values.
Neural Information Processing Systems
Feb-8-2026, 17:44:47 GMT