Deterministic Policies for Constrained Reinforcement Learning in Polynomial-Time
–arXiv.org Artificial Intelligence
Constrained Reinforcement Learning (CRL) traditionally produces stochastic, expectationconstrained policies that can behave undesirably - imagine a self-driving car that randomly changes lanes or runs out of fuel. However, artificial decision-making systems must be predictable, trustworthy, and robust. One approach to ensuring these qualities is to focus on deterministic policies, which are inherently predictable and trustworthy. Moreover, they are easy to implement [10], reliable for autonomous vehicles [16, 12], and effective for multi-agent coordination [23]. Similarly, almost sure and anytime constraints [21] provide inherent trustworthiness and robustness, essential for applications in medicine [6, 22, 18], disaster relief [9, 29, 27], and resource management [20, 19, 24, 4]. Despite the advantages of deterministic policies and stricter constraints, their computation remains an open challenge in CRL. Our research aims to address this challenge by studying the computational complexity of computing deterministic policies for a wide range of constraint types. Consider a constrained Markov Decision Process (cMDP) denoted by M. Let C represent an arbitrary cost criterion and B be the available budget.
arXiv.org Artificial Intelligence
May-23-2024
- Country:
- North America > United States
- New York > New York County > New York City (0.04)
- Europe
- Germany (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- North America > United States
- Genre:
- Research Report (0.40)
- Industry:
- Transportation (0.86)
- Technology: