Budgeted Reinforcement Learning in Continuous State Space Nicolas Carrara SequeL team, INRIA Lille - Nord Europe
–Neural Information Processing Systems
A Budgeted Markov Decision Process (BMDP) is an extension of a Markov Decision Process to critical applications requiring safety constraints. It relies on a notion of risk implemented in the shape of a cost signal constrained to lie below an - adjustable - threshold. So far, BMDPs could only be solved in the case of finite state spaces with known dynamics. This work extends the state-of-the-art to continuous spaces environments and unknown dynamics. We show that the solution to a BMDP is a fixed point of a novel Budgeted Bellman Optimality operator. This observation allows us to introduce natural extensions of Deep Reinforcement Learning algorithms to address large-scale BMDPs.
Neural Information Processing Systems
May-31-2025, 17:04:14 GMT
- Country:
- Europe > France
- Hauts-de-France (0.14)
- North America > Canada (0.28)
- Europe > France
- Industry:
- Automobiles & Trucks (0.68)