Joint Learning of Policy with Unknown Temporal Constraints for Safe Reinforcement Learning
–arXiv.org Artificial Intelligence
Another direction in safe RL is risksensitive RL has emerged as a powerful computational approach for RL, which aims to balance the trade-off between training agents to achieve complex objectives through interactions exploration, exploitation, and risk management (Mihatsch within stochastic environments (Sutton and Barto and Neuneier 2002). Risk-sensitive RL algorithms incorporate 2018). RL algorithms have demonstrated significant success risk measures, such as conditional value-at-risk (CVaR) in a wide range of applications and domains (Singh, (Tamar, Glassner, and Mannor 2014) and risk envelope (Majumdar Kumar, and Singh 2022; Razzaghi et al. 2022). However, et al. 2017), to guide the learning process. An additional when deploying RL policies in real-world scenarios, particularly approach to ensure safety in RL is through shielding, those involving safety-critical operations, ensuring the which intervenes in the agent's actions when it might violate safety of the learning process becomes a paramount concern.
arXiv.org Artificial Intelligence
Apr-30-2023
- Genre:
- Research Report (0.50)
- Industry:
- Energy > Oil & Gas
- Upstream (0.34)
- Information Technology (0.48)
- Energy > Oil & Gas
- Technology: