Reviews: Constrained Reinforcement Learning Has Zero Duality Gap
–Neural Information Processing Systems
The paper studies a form of constrained reinforcement learning in which the constraints are bounds on the value functions for auxiliary rewards. This allows a more expressive formulation than the common approach of defining the reward as a linear combination of multiple objectives. The authors show that under certain conditions, the constraint optimization problem has zero duality gap, implying that a solution can be found by solving the dual optimization problem, which is convex. The authors also extend this analysis to the case for which the policy is parameterized. Theorem 1 assumes that Slater's condition holds, which is problematic for two reasons. Slater's condition is usually defined for convex constraints, but the authors specifically state that the constraints in PI are non-convex.
Neural Information Processing Systems
Jan-26-2025, 21:16:33 GMT
- Technology: