Reinforcement Learning Methods for Continuous-Time Markov Decision Problems

Neural Information Processing Systems 

Semi-Markov Decision Problems are continuous time generaliza(cid:173) tions of discrete time Markov Decision Problems. A number of reinforcement learning algorithms have been developed recently for the solution of Markov Decision Problems, based on the ideas of asynchronous dynamic programming and stochastic approxima(cid:173) tion. Among these are TD(,x), Q-Iearning, and Real-time Dynamic Programming. After reviewing semi-Markov Decision Problems and Bellman's optimality equation in that context, we propose al(cid:173) gorithms similar to those named above, adapted to the solution of semi-Markov Decision Problems. We demonstrate these algorithms by applying them to the problem of determining the optimal con(cid:173) trol for a simple queueing system.