Planning in entropy-regularized Markov decision processes and games
–Neural Information Processing Systems
We propose SmoothCruiser, a new planning algorithm for estimating the value function in entropy-regularized Markov decision processes and two-player games, given a generative model of the SmoothCruiser. SmoothCruiser makes use of the smoothness of the Bellman operator promoted by the regularization to achieve problem-independent sample complexity of order \tilde{\mathcal{O}}(1/\epsilon 4) for a desired accuracy \epsilon, whereas for non-regularized settings there are no known algorithms with guaranteed polynomial sample complexity in the worst case.
entropy-regularized markov decision process, markov decision process and game, sample complexity, (2 more...)
Neural Information Processing Systems
Oct-10-2024, 00:24:04 GMT
- Technology: