Multi-Grid Methods for Reinforcement Learning in Controlled Diffusion Processes

Pareigis, Stephan

Neural Information Processing Systems 

The optimal control problem reduces to a boundary value problem for a fully nonlinear second-order elliptic differential equation of Hamilton Jacobi-Bellman (HJB-) type. Numerical analysis provides multigrid methodsfor this kind of equation. In the case of Learning Control, however,the systems of equations on the various grid-levels are obtained using observed information (transitions and local cost). To ensure consistency, special attention needs to be directed toward thetype of time and space discretization during the observation. Analgorithm for multi-grid observation is proposed. The multi-grid algorithm is demonstrated on a simple queuing problem. 1 Introduction Controlled Diffusion Processes (CDP) are the analogy to Markov Decision Problems in continuous state space and continuous time.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found