TheMean-SquaredErrorofDoubleQ-Learning

Neural Information Processing Systems 

Our result builds upon an analysis for linear stochastic approximation based on Lyapunov equations and applies to both tabular setting and with linear function approximation, provided thattheoptimal policyisunique andthealgorithms converge.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found