The Mean-Squared Error of Double Q-Learning
–Neural Information Processing Systems
In this paper, we establish a theoretical comparison between the asymptotic mean-squared error of Double Q-learning and Q-learning. Our result builds upon an analysis for linear stochastic approximation based on Lyapunov equations and applies to both tabular setting and with linear function approximation, provided that the optimal policy is unique and the algorithms converge.
Neural Information Processing Systems
Oct-2-2025, 21:02:21 GMT
- Country:
- Europe > United Kingdom
- England
- Cambridgeshire > Cambridge (0.04)
- Oxfordshire > Oxford (0.04)
- England
- North America
- Canada (0.04)
- United States
- Illinois (0.05)
- Michigan > Washtenaw County
- Ann Arbor (0.04)
- Europe > United Kingdom
- Technology: