Reviews: A Lyapunov-based Approach to Safe Reinforcement Learning

Oct-7-2024, 10:12:27 GMT–Neural Information Processing Systems

The focus is safe reinforcement learning under constrained markov decision process framework, where safety can be expressed as policy-dependent constraints. Two key assumptions are that (i) we have access to a safe baseline policy and (ii) this baseline policy is close enough to the unknown optimal policy under total variation distance (this is assumption 1 in the paper). A key insight into the technical approach is to augment the unknown, optimal safety constraints with some cost-shaping function, in order to turn the safety constraint into a Lyapunov function wrt the baseline policy. Similar to how identifying Lyapunov function is not trivial, this cost shaping function is also difficult to compute. So the authors propose several approximations, including solving a LP for each loop of a policy iteration and value iteration procedure.

baseline policy, lyapunov-based approach, safe reinforcement learning, (12 more...)

Neural Information Processing Systems

Oct-7-2024, 10:12:27 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.79)