Nearly Minimax Optimal Reinforcement Learning for Discounted MDPs
–Neural Information Processing Systems
We study the reinforcement learning problem for discounted Markov Decision Processes (MDPs) under the tabular setting. We propose a model-based algorithm named UCBVI-$\gamma$, which is based on the \emph{optimism in the face of uncertainty principle} and the Bernstein-type bonus.
Neural Information Processing Systems
Dec-24-2025, 20:13:05 GMT
- Technology: