A Primal-Dual-Critic Algorithm for Offline Constrained Reinforcement Learning
Hong, Kihyuk, Li, Yuhang, Tewari, Ambuj
Offline constrained reinforcement learning (RL) aims to learn a decision making policy that performs well while satisfying safety constraints given a dataset of trajectories collected from historical experiments. It enjoys the benefits of offline RL [22]: not requiring interaction with the environment enables real-world applications where collecting interaction data is expensive (e.g., robotics [18, 23]) or dangerous (e.g., healthcare [30]). It also enjoys the benefits of constrained RL [1]: being able to specify constraints to the behavior of the agent enables real-world applications with safety concerns (e.g., smart grid [31], robotics [14]). Offline constrained RL with function approximation (e.g., neural networks) is of particular interest because function approximation can encode inductive biases to allow sample-efficient learning in large state spaces. As is the case for offline unconstrained RL [29, 32], offline constrained RL with function approximation requires two classes of assumptions for sample-efficient learning.
Oct-19-2023
- Country:
- North America > United States (0.15)
- Genre:
- Research Report (0.40)
- Industry:
- Energy > Power Industry (0.34)
- Technology: