Multi-Grid Methods for Reinforcement Learning in Controlled Diffusion Processes
–Neural Information Processing Systems
A CDP can always be discretized in state space and time and thus reduced to a Markov Decision Problem. Algorithms like Q-Iearning and RTDP as described in [1] can then be applied to produce controls or optimal value functions for a fixed discretization. Problems arise when the discretization needs to be refined, or when multi-grid information needs to be extracted to accelerate the algorithm. The relation of time to state space discretization parameters is crucial in both cases. Therefore 1034 S. Pareigis a mathematical model of the discretized process is introduced, which reflects the properties of the converged empirical process.
Neural Information Processing Systems
Dec-31-1997