Assume-Guarantee Reinforcement Learning

Kazemi, Milad, Perez, Mateo, Somenzi, Fabio, Soudjani, Sadegh, Trivedi, Ashutosh, Velasquez, Alvaro

Dec-15-2023–arXiv.org Artificial Intelligence

We present a modular approach to \emph{reinforcement learning} (RL) in environments consisting of simpler components evolving in parallel. A monolithic view of such modular environments may be prohibitively large to learn, or may require unrealizable communication between the components in the form of a centralized controller. Our proposed approach is based on the assume-guarantee paradigm where the optimal control for the individual components is synthesized in isolation by making \emph{assumptions} about the behaviors of neighboring components, and providing \emph{guarantees} about their own behavior. We express these \emph{assume-guarantee contracts} as regular languages and provide automatic translations to scalar rewards to be used in RL. By combining local probabilities of satisfaction for each component, we provide a lower bound on the probability of satisfaction of the complete system. By solving a Markov game for each component, RL can produce a controller for each component that maximizes this lower bound. The controller utilizes the information it receives through communication, observations, and any knowledge of a coarse model of other agents. We experimentally demonstrate the efficiency of the proposed approach on a variety of case studies.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

Dec-15-2023

arXiv.org PDF

Add feedback

Country:
- Europe > Middle East
  - Cyprus (0.14)
- North America > United States
  - Colorado (0.14)

Genre:
- Research Report (0.64)

Industry:
- Transportation (0.30)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.69)
    - Reinforcement Learning (1.00)
  - Representation & Reasoning > Agents (1.00)
  - Robots (1.00)