A Unified Switching System Perspective and Convergence Analysis of Q-Learning Algorithms

Neural Information Processing Systems 

However, its application to Q-learning has been limited due to the presence of the max-operator, which makes the associated ODE model a complex nonlinear system. In contrast, the associated ODE of TD learning for policy evaluation is a linear system, whose asymptotic stability is much easier to analyze in general.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found