Reviews: Planning in entropy-regularized Markov decision processes and games

Jan-23-2025, 15:44:33 GMT–Neural Information Processing Systems

This theoretical paper considers the problem of computing optimal value function in entropy-regularized MDPs and two-player games. It shows that the smoothness property of the Bellman operator in the presence of entropy regularized policies (and possibly other forms of regularization), can be used to derive a sample complexity which is polynomial of order O((1/ε) {4 c}), with c being a problem independent constant and ε the precision of the value function estimate. The proof is built upon the proposed algorithm, SmoothCruiser, an algorithm motivated in the sparse sampling algorithm of Kearns et al that recursively estimates V through samples and subsequently aggregates the results. This sampling dynamic programming is done up to a depth when the required number of samples is no longer polynomial. The paper is very well written and provides a solid result.

algorithm, entropy-regularized markov decision process, markov decision process and game, (5 more...)

Neural Information Processing Systems

Jan-23-2025, 15:44:33 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (0.38)
  - Machine Learning > Learning Graphical Models
    - Undirected Networks > Markov Models (0.40)