From STL Rulebooks to Rewards

Aguilar, Edgar A., Berducci, Luigi, Brunnbauer, Axel, Grosu, Radu, Ničković, Dejan

Oct-6-2021–arXiv.org Artificial Intelligence

The automatic synthesis of neural-network controllers for autonomous agents through reinforcement learning has to simultaneously optimize many, possibly conflicting, objectives of various importance. This multi-objective optimization task is reflected in the shape of the reward function, which is most often the result of an ad-hoc and crafty-like activity. In this paper we propose a principled approach to shaping rewards for reinforcement learning from multiple objectives that are given as a partially-ordered set of signal-temporal-logic (STL) rules. To this end, we first equip STL with a novel quantitative semantics allowing to automatically evaluate individual requirements. We then develop a method for systematically combining evaluations of multiple requirements into a single reward that takes into account the priorities defined by the partial order. We finally evaluate our approach on several case studies, demonstrating its practical applicability.

lander, obstacle, requirement, (17 more...)

arXiv.org Artificial Intelligence

Oct-6-2021

arXiv.org PDF

Add feedback

Country:
- North America
  - United States > Wisconsin
    - Dane County > Madison (0.04)
  - Canada > Quebec
    - Montreal (0.04)
    - Capitale-Nationale Region
      - Québec (0.04)
      - Quebec City (0.04)
- Europe
  - Austria > Vienna (0.04)
  - Sweden > Stockholm
    - Stockholm (0.04)
  - Germany > North Rhine-Westphalia
    - Cologne Region > Bonn (0.04)
  - Finland > Uusimaa
    - Helsinki (0.04)
- Asia
  - Macao (0.04)
  - China (0.04)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning > Reinforcement Learning (1.00)