Regret Bounds for Risk-Sensitive Reinforcement Learning
Bastani, O., Ma, Y. J., Shen, E., Xu, W.
–arXiv.org Artificial Intelligence
In safety-critical applications of reinforcement learning such as healthcare and robotics, it is often desirable to optimize risk-sensitive objectives that account for tail outcomes rather than expected reward. We prove the first regret bounds for reinforcement learning under a general class of risk-sensitive objectives including the popular CVaR objective. Our theory is based on a novel characterization of the CVaR objective as well as a novel optimistic MDP construction.
arXiv.org Artificial Intelligence
Oct-11-2022
- Country:
- North America > United States
- California > Santa Clara County
- Palo Alto (0.04)
- Pennsylvania (0.04)
- California > Santa Clara County
- North America > United States
- Genre:
- Research Report (0.50)
- Workflow (0.46)
- Industry:
- Health & Medicine (0.34)
- Technology: