Budgeted Reinforcement Learning in Continuous State Space
Carrara, Nicolas, Leurent, Edouard, Laroche, Romain, Urvoy, Tanguy, Maillard, Odalric-Ambrym, Pietquin, Olivier
–Neural Information Processing Systems
A Budgeted Markov Decision Process (BMDP) is an extension of a Markov Decision Process to critical applications requiring safety constraints. It relies on a notion of risk implemented in the shape of an upper bound on a constrains violation signal that -- importantly -- can be modified in real-time. So far, BMDPs could only be solved in the case of finite state spaces with known dynamics. This work extends the state-of-the-art to continuous spaces environments and unknown dynamics. We show that the solution to a BMDP is the fixed point of a novel Budgeted Bellman Optimality operator.
Neural Information Processing Systems
Mar-19-2020, 00:17:59 GMT
- Technology: