Scaling up budgeted reinforcement learning
Carrara, Nicolas, Leurent, Edouard, Laroche, Romain, Urvoy, Tanguy, Maillard, Odalric-Ambrym, Pietquin, Olivier
–arXiv.org Artificial Intelligence
Can we learn a control policy able to adapt its behaviour in real time so as to take any desired amount of risk? The general Reinforcement Learning framework solely aims at optimising a total reward in expectation, which may not be desirable in critical applications. In stark contrast, the Budgeted Markov Decision Process (BMDP) framework is a formalism in which the notion of risk is implemented as a hard constraint on a failure signal. Existing algorithms solving BMDPs rely on strong assumptions and have so far only been applied to toy-examples. In this work, we relax some of these assumptions and demonstrate the scalability of our approach on two practical problems: a spoken dialogue system and an autonomous driving task. On both examples, we reach similar performances as Lagrangian Relaxation methods with a significant improvement in sample and memory efficiency.
arXiv.org Artificial Intelligence
Mar-6-2019
- Country:
- North America > United States (0.46)
- Genre:
- Research Report (0.64)
- Industry:
- Automobiles & Trucks (0.66)
- Government (0.46)
- Information Technology > Robotics & Automation (0.34)
- Transportation > Ground
- Road (0.48)