Nearly Minimax Optimal Reinforcement Learning for Discounted MDPs
–Neural Information Processing Systems
We study the reinforcement learning problem for discounted Markov Decision Processes (MDPs) under the tabular setting. We propose a model-based algorithm named UCBVI-γ, which is based on the optimism in the face of uncertainty principle and the Bernstein-type bonus.
Neural Information Processing Systems
Mar-21-2025, 10:44:58 GMT
- Country:
- Europe > United Kingdom
- England (0.28)
- North America > United States
- California (0.28)
- Europe > United Kingdom
- Genre:
- Research Report (0.45)
- Industry:
- Education (0.34)