Synthesis of Hierarchical Controllers Based on Deep Reinforcement Learning Policies

Delgrange, Florent, Avni, Guy, Lukina, Anna, Schilling, Christian, Nowé, Ann, Pérez, Guillermo A.

Feb-21-2024–arXiv.org Artificial Intelligence

We propose a novel approach to the problem of controller design for environments modeled as Markov decision processes (MDPs). Specifically, we consider a hierarchical MDP a graph with each vertex populated by an MDP called a "room." We first apply deep reinforcement learning (DRL) to obtain low-level policies for each room, scaling to large rooms of unknown structure. We then apply reactive synthesis to obtain a high-level planner that chooses which low-level policy to execute in each room. The central challenge in synthesizing the planner is the need for modeling rooms. We address this challenge by developing a DRL procedure to train concise "latent" policies together with PAC guarantees on their performance. Unlike previous approaches, ours circumvents a model distillation step. Our approach combats sparse rewards in DRL and enables reusability of low-level policies. We demonstrate feasibility in a case study involving agent navigation amid moving obstacles.

low-level policy, reset, synthesis, (15 more...)

arXiv.org Artificial Intelligence

Feb-21-2024

arXiv.org PDF

Add feedback

Country:
- North America
  - United States
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
    - California > Los Angeles County
      - Long Beach (0.04)
    - Arizona > Maricopa County
      - Phoenix (0.04)
  - Puerto Rico > San Juan
    - San Juan (0.04)
  - Canada > British Columbia
    - Metro Vancouver Regional District > Vancouver (0.04)
- Europe
  - United Kingdom > England
    - Greater London > London (0.04)
  - Slovenia > Upper Carniola
    - Municipality of Bled > Bled (0.04)
  - Netherlands
    - South Holland > Delft (0.04)
    - North Brabant > Eindhoven (0.04)
  - France > Hauts-de-France
    - Nord > Lille (0.04)
  - Denmark > North Jutland
    - Aalborg (0.04)
  - Belgium > Flanders
    - Antwerp Province > Antwerp (0.04)
- Asia > Middle East
  - Israel > Haifa District
    - Haifa (0.04)
  - Iran > Tehran Province
    - Tehran (0.04)

Genre:
- Research Report > Promising Solution (0.34)
- Overview > Innovation (0.34)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (1.00)
  - Learning Graphical Models > Undirected Networks
    - Markov Models (0.66)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found