Polynomial-Time Approximability of Constrained Reinforcement Learning

Feb-11-2025–arXiv.org Artificial Intelligence

Constrained Reinforcement Learning (CRL) is growing increasingly crucial for managing complex, real-world applications such as medicine [13, 29, 22], disaster relief [14, 38, 34], and resource management [25, 24, 31, 5]. Various constraints, including expectation [2], chance [39], almost-sure [9], and anytime constraints [28], were each proposed to address new challenges. Despite the richness of the literature, most works focus on stochastic, expectation-constrained policies, leaving many popular settings with longstanding open problems. Even chance constraints, arguably a close second in popularity, still lack any polynomial-time, even approximate, algorithms despite being introduced over a decade ago [39]. Other settings for which polynomial-time algorithms are open include deterministic policies under multiple expectation constraints, policies under nonhomogeneous constraints (i.e., constraints of different types), and policies under constraints for continuous-state processes. Consequently, we study the computational complexity of general constrained problems to resolve many of these fundamental open questions. Formally, we study the solution of Constrained Markov Decision Processes (CMDPs). Here, we define a CMDP through three fundamental parts: (1) a MDP M that accumulates both rewards and costs, (2) a general cost criterion C, and (3) a budget vector B. Additionally, we allow the agent to specify whether they require their policy to be

constraint, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

Feb-11-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Wisconsin > Dane County
    - Madison (0.04)
  - New York > New York County
    - New York City (0.04)

Genre:
- Research Report (0.50)

Industry:
- Health & Medicine (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Optimization (0.93)
  - Machine Learning
    - Reinforcement Learning (1.00)
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.34)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found