Safe Offline Reinforcement Learning with Real-Time Budget Constraints
Lin, Qian, Tang, Bo, Wu, Zifan, Yu, Chao, Mao, Shangqin, Xie, Qianlong, Wang, Xingxing, Wang, Dong
–arXiv.org Artificial Intelligence
Many safe RL approaches have been proposed in the past few years (Achiam et al., Aiming at promoting the safe real-world deployment 2017; Zhang et al., 2020; Sootla et al., 2022; Liu et al., of Reinforcement Learning (RL), research 2022a). Unfortunately, most existing approaches only target on safe RL has made significant progress in recent at the online setting, where potentially risky constraint years. However, most existing works in the violations can be incurred during interactions with the real literature still focus on the online setting where environment. As a kind of data-driven methods, offline risky violations of the safety budget are likely to RL (Levine et al., 2020) aims to derive a policy from offline be incurred during training. Besides, in many realworld data without further real-world exploration, and thus is particularly applications, the learned policy is required suitable for safety-critical applications. Despite the to respond to dynamically determined safety budgets recent progress in the offline RL literature (Fujimoto et al., (i.e., constraint threshold) in real time. In 2019; Kumar et al., 2020; Fujimoto & Gu, 2021), however, this paper, we target at the above real-time budget there are still limited works focusing on attaining a safe constraint problem under the offline setting, policy under the offline setting.
arXiv.org Artificial Intelligence
Jun-1-2023