Planning in entropy-regularized Markov decision processes and games

Grill, Jean-Bastien, Domingues, Omar Darwiche, Menard, Pierre, Munos, Remi, Valko, Michal

Mar-19-2020, 01:46:13 GMT–Neural Information Processing Systems

We propose SmoothCruiser, a new planning algorithm for estimating the value function in entropy-regularized Markov decision processes and two-player games, given a generative model of the SmoothCruiser. SmoothCruiser makes use of the smoothness of the Bellman operator promoted by the regularization to achieve problem-independent sample complexity of order $\tilde{\mathcal{O}}(1/\epsilon 4)$ for a desired accuracy $\epsilon$, whereas for non-regularized settings there are no known algorithms with guaranteed polynomial sample complexity in the worst case. Papers published at the Neural Information Processing Systems Conference.

entropy-regularized markov decision process, markov decision process and game, sample complexity, (2 more...)

Neural Information Processing Systems

Mar-19-2020, 01:46:13 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.70)