Reviews: Constrained Reinforcement Learning Has Zero Duality Gap

Jan-26-2025, 21:16:33 GMT–Neural Information Processing Systems

The paper studies a form of constrained reinforcement learning in which the constraints are bounds on the value functions for auxiliary rewards. This allows a more expressive formulation than the common approach of defining the reward as a linear combination of multiple objectives. The authors show that under certain conditions, the constraint optimization problem has zero duality gap, implying that a solution can be found by solving the dual optimization problem, which is convex. The authors also extend this analysis to the case for which the policy is parameterized. Theorem 1 assumes that Slater's condition holds, which is problematic for two reasons. Slater's condition is usually defined for convex constraints, but the authors specifically state that the constraints in PI are non-convex.

constrained reinforcement learning, constraint, slater, (9 more...)

Neural Information Processing Systems

Jan-26-2025, 21:16:33 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning > Reinforcement Learning (0.62)