TheMean-SquaredErrorofDoubleQ-Learning
–Neural Information Processing Systems
Our result builds upon an analysis for linear stochastic approximation based on Lyapunov equations and applies to both tabular setting and with linear function approximation, provided thattheoptimal policyisunique andthealgorithms converge.
Neural Information Processing Systems
Feb-8-2026, 08:45:52 GMT
- Country:
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- North America
- Canada > British Columbia
- United States > Illinois (0.05)
- Europe > United Kingdom
- Technology: