AUnifiedSwitchingSystemPerspectiveand ConvergenceAnalysisofQ-LearningAlgorithms
–Neural Information Processing Systems
However, its application to Q-learning has been limited due to the presence of the max-operator, which makes the associated ODE model a complex nonlinear system. In contrast, the associated ODE of TD learning for policy evaluation is a linear system, whose asymptotic stability is much easier to analyze in general.
Neural Information Processing Systems
Feb-9-2026, 21:46:11 GMT
- Country:
- Technology: