Polynomial-Time Approximability of Constrained Reinforcement Learning
–arXiv.org Artificial Intelligence
Constrained Reinforcement Learning (CRL) is growing increasingly crucial for managing complex, real-world applications such as medicine [13, 29, 22], disaster relief [14, 38, 34], and resource management [25, 24, 31, 5]. Various constraints, including expectation [2], chance [39], almost-sure [9], and anytime constraints [28], were each proposed to address new challenges. Despite the richness of the literature, most works focus on stochastic, expectation-constrained policies, leaving many popular settings with longstanding open problems. Even chance constraints, arguably a close second in popularity, still lack any polynomial-time, even approximate, algorithms despite being introduced over a decade ago [39]. Other settings for which polynomial-time algorithms are open include deterministic policies under multiple expectation constraints, policies under nonhomogeneous constraints (i.e., constraints of different types), and policies under constraints for continuous-state processes. Consequently, we study the computational complexity of general constrained problems to resolve many of these fundamental open questions. Formally, we study the solution of Constrained Markov Decision Processes (CMDPs). Here, we define a CMDP through three fundamental parts: (1) a MDP M that accumulates both rewards and costs, (2) a general cost criterion C, and (3) a budget vector B. Additionally, we allow the agent to specify whether they require their policy to be
arXiv.org Artificial Intelligence
Feb-11-2025
- Country:
- North America > United States
- Wisconsin > Dane County
- Madison (0.04)
- New York > New York County
- New York City (0.04)
- Wisconsin > Dane County
- North America > United States
- Genre:
- Research Report (0.50)
- Industry:
- Health & Medicine (0.46)