Deterministic Policies for Constrained Reinforcement Learning in Polynomial Time

May-31-2025, 22:42:00 GMT–Neural Information Processing Systems

We present a novel algorithm that efficiently computes near-optimal deterministic policies for constrained reinforcement learning (CRL) problems. Our approach combines three key ideas: (1) value-demand augmentation, (2) action-space approximate dynamic programming, and (3) time-space rounding. Our algorithm constitutes a fully polynomial-time approximation scheme (FPTAS) for any timespace recursive (TSR) cost criteria. A TSR criteria requires the cost of a policy to be computable recursively over both time and (state) space, which includes classical expectation, almost sure, and anytime constraints. Our work answers three open questions spanning two long-standing lines of research: polynomial-time approximability is possible for 1) anytime-constrained policies, 2) almost-sure-constrained policies, and 3) deterministic expectation-constrained policies.

constraint, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

May-31-2025, 22:42:00 GMT

Conferences PDF

Add feedback

Country:
- North America > United States > Wisconsin (0.14)

Genre:
- Research Report > Experimental Study (0.93)

Industry:
- Health & Medicine (0.45)
- Information Technology (0.46)
- Transportation (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Reinforcement Learning (1.00)
  - Representation & Reasoning (1.00)