Statistical guarantees for continuous-time policy evaluation: blessing of ellipticity and new tradeoffs
–arXiv.org Artificial Intelligence
Similar to the Markov decision process (MDP) framework in discrete-time, the continuous-time controlled diffusion processes provide a natural framework for modeling such continuous-time decision-making problems. With discrete-time observations from the continuous-time dynamics, the continuous-time RL problem can be viewed as a discrete-time MDP, allowing us to apply standard techniques. In particular, the model-free RL algorithms offers flexibility of function approximation. By fitting the value function and/or control policy with powerful statistical learning models including neural networks, one can efficiently learn the optimal decisions in high-dimensional and complex environments. Despite the empirical success, however, the theoretical understanding of continuous-time RL algorithms is still in its infancy. In particular, when applied to continuous-time diffusion processes, the statistical guarantees for value learning algorithms are largely unknown. The theoretical gap also leads to practical limitations, as the fundamental tradeoffs in the choice of function approximations, discretization step length, and the trajectory length remain elusive. In this work, we aim to bridge this gap by providing sharp statistical guarantees for value function estimation in continuous-time diffusion processes.
arXiv.org Artificial Intelligence
Feb-6-2025