Goto

Collaborating Authors

 multi-grid method


Multi-Grid Methods for Reinforcement Learning in Controlled Diffusion Processes

Neural Information Processing Systems

Reinforcement learning methods for discrete and semi-Markov de(cid:173) cision problems such as Real-Time Dynamic Programming can be generalized for Controlled Diffusion Processes. The optimal control problem reduces to a boundary value problem for a fully nonlinear second-order elliptic differential equation of Hamilton(cid:173) Jacobi-Bellman (HJB-) type. Numerical analysis provides multi(cid:173) grid methods for this kind of equation. In the case of Learning Con(cid:173) trol, however, the systems of equations on the various grid-levels are obtained using observed information (transitions and local cost). To ensure consistency, special attention needs to be directed to(cid:173) ward the type of time and space discretization during the obser(cid:173) vation.


Multi-Grid Methods for Reinforcement Learning in Controlled Diffusion Processes

Neural Information Processing Systems

A CDP can always be discretized in state space and time and thus reduced to a Markov Decision Problem. Algorithms like Q-Iearning and RTDP as described in [1] can then be applied to produce controls or optimal value functions for a fixed discretization. Problems arise when the discretization needs to be refined, or when multi-grid information needs to be extracted to accelerate the algorithm. The relation of time to state space discretization parameters is crucial in both cases. Therefore 1034 S. Pareigis a mathematical model of the discretized process is introduced, which reflects the properties of the converged empirical process.


Multi-Grid Methods for Reinforcement Learning in Controlled Diffusion Processes

Neural Information Processing Systems

A CDP can always be discretized in state space and time and thus reduced to a Markov Decision Problem. Algorithms like Q-Iearning and RTDP as described in [1] can then be applied to produce controls or optimal value functions for a fixed discretization. Problems arise when the discretization needs to be refined, or when multi-grid information needs to be extracted to accelerate the algorithm. The relation of time to state space discretization parameters is crucial in both cases. Therefore 1034 S. Pareigis a mathematical model of the discretized process is introduced, which reflects the properties of the converged empirical process.


Multi-Grid Methods for Reinforcement Learning in Controlled Diffusion Processes

Neural Information Processing Systems

The optimal control problem reduces to a boundary value problem for a fully nonlinear second-order elliptic differential equation of Hamilton Jacobi-Bellman (HJB-) type. Numerical analysis provides multigrid methodsfor this kind of equation. In the case of Learning Control, however,the systems of equations on the various grid-levels are obtained using observed information (transitions and local cost). To ensure consistency, special attention needs to be directed toward thetype of time and space discretization during the observation. Analgorithm for multi-grid observation is proposed.