A Primal-Dual-Critic Algorithm for Offline Constrained Reinforcement Learning

Hong, Kihyuk, Li, Yuhang, Tewari, Ambuj

arXiv.org Machine Learning 

Offline constrained reinforcement learning (RL) aims to learn a decision making policy that performs well while satisfying safety constraints given a dataset of trajectories collected from historical experiments. It enjoys the benefits of offline RL [22]: not requiring interaction with the environment enables real-world applications where collecting interaction data is expensive (e.g., robotics [18, 23]) or dangerous (e.g., healthcare [30]). It also enjoys the benefits of constrained RL [1]: being able to specify constraints to the behavior of the agent enables real-world applications with safety concerns (e.g., smart grid [31], robotics [14]). Offline constrained RL with function approximation (e.g., neural networks) is of particular interest because function approximation can encode inductive biases to allow sample-efficient learning in large state spaces. As is the case for offline unconstrained RL [29, 32], offline constrained RL with function approximation requires two classes of assumptions for sample-efficient learning.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found