Joint Learning of Policy with Unknown Temporal Constraints for Safe Reinforcement Learning

Yifru, Lunet, Baheri, Ali

arXiv.org Artificial Intelligence 

Another direction in safe RL is risksensitive RL has emerged as a powerful computational approach for RL, which aims to balance the trade-off between training agents to achieve complex objectives through interactions exploration, exploitation, and risk management (Mihatsch within stochastic environments (Sutton and Barto and Neuneier 2002). Risk-sensitive RL algorithms incorporate 2018). RL algorithms have demonstrated significant success risk measures, such as conditional value-at-risk (CVaR) in a wide range of applications and domains (Singh, (Tamar, Glassner, and Mannor 2014) and risk envelope (Majumdar Kumar, and Singh 2022; Razzaghi et al. 2022). However, et al. 2017), to guide the learning process. An additional when deploying RL policies in real-world scenarios, particularly approach to ensure safety in RL is through shielding, those involving safety-critical operations, ensuring the which intervenes in the agent's actions when it might violate safety of the learning process becomes a paramount concern.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found