From STL Rulebooks to Rewards
Aguilar, Edgar A., Berducci, Luigi, Brunnbauer, Axel, Grosu, Radu, Ničković, Dejan
–arXiv.org Artificial Intelligence
The automatic synthesis of neural-network controllers for autonomous agents through reinforcement learning has to simultaneously optimize many, possibly conflicting, objectives of various importance. This multi-objective optimization task is reflected in the shape of the reward function, which is most often the result of an ad-hoc and crafty-like activity. In this paper we propose a principled approach to shaping rewards for reinforcement learning from multiple objectives that are given as a partially-ordered set of signal-temporal-logic (STL) rules. To this end, we first equip STL with a novel quantitative semantics allowing to automatically evaluate individual requirements. We then develop a method for systematically combining evaluations of multiple requirements into a single reward that takes into account the priorities defined by the partial order. We finally evaluate our approach on several case studies, demonstrating its practical applicability.
arXiv.org Artificial Intelligence
Oct-6-2021
- Country:
- North America
- United States > Wisconsin
- Dane County > Madison (0.04)
- Canada > Quebec
- Montreal (0.04)
- Capitale-Nationale Region
- Québec (0.04)
- Quebec City (0.04)
- United States > Wisconsin
- Europe
- Asia
- North America
- Genre:
- Research Report (0.50)
- Technology: