Goto

Collaborating Authors

 Reinforcement Learning







AUnifiedSwitchingSystemPerspectiveand ConvergenceAnalysisofQ-LearningAlgorithms

Neural Information Processing Systems

However, its application to Q-learning has been limited due to the presence of the max-operator, which makes the associated ODE model a complex nonlinear system. In contrast, the associated ODE of TD learning for policy evaluation is a linear system, whose asymptotic stability is much easier to analyze in general.



p |S|3|A|K) 0 OptPess-PrimalDual O(H

Neural Information Processing Systems

We address the issue of safety in reinforcement learning. We pose the problem in an episodic framework of a constrained Markov decision process.