Nearly Minimax Optimal Reinforcement Learning for Discounted MDPs

Dec-24-2025, 20:13:05 GMT–Neural Information Processing Systems

We study the reinforcement learning problem for discounted Markov Decision Processes (MDPs) under the tabular setting. We propose a model-based algorithm named UCBVI-$\gamma$, which is based on the \emph{optimism in the face of uncertainty principle} and the Bernstein-type bonus.

gamma, minimax optimal reinforcement learning, name change, (6 more...)

Neural Information Processing Systems

Dec-24-2025, 20:13:05 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.32)