Achieving \tilde{O}(1/\epsilon) Sample Complexity for Constrained Markov Decision Process

May-27-2025, 08:39:32 GMT–Neural Information Processing Systems

We consider the reinforcement learning problem for the constrained Markov decision process (CMDP), which plays a central role in satisfying safety or resource constraints in sequential learning and decision-making. In this problem, we are given finite resources and a MDP with unknown transition probabilities. At each stage, we take an action, collecting a reward and consuming some resources, all assumed to be unknown and need to be learned over time. In this work, we take the first step towards deriving optimal problem-dependent guarantees for the CMDP problems. We derive a logarithmic regret bound, which translates into a O(\frac{1}{\Delta\cdot\epsilon}\cdot\log 2(1/\epsilon)) sample complexity bound, with \Delta being a problem-dependent parameter, yet independent of \epsilon .

cmdp problem, constrained markov decision process, sample complexity, (4 more...)

Neural Information Processing Systems

May-27-2025, 08:39:32 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.64)