Adjacency constraint for efficient hierarchical reinforcement learning

Zhang, Tianren, Guo, Shangqi, Tan, Tian, Hu, Xiaolin, Chen, Feng

Aug-22-2022–arXiv.org Artificial Intelligence

Goal-conditioned Hierarchical Reinforcement Learning (HRL) is a promising approach for scaling up reinforcement learning (RL) techniques. However, it often suffers from training inefficiency as the action space of the high-level, i.e., the goal space, is large. Searching in a large goal space poses difficulty for both high-level subgoal generation and low-level policy learning. In this paper, we show that this problem can be effectively alleviated by restricting the high-level action space from the whole goal space to a $k$-step adjacent region of the current state using an adjacency constraint. We theoretically prove that in a deterministic Markov Decision Process (MDP), the proposed adjacency constraint preserves the optimal hierarchical policy, while in a stochastic MDP the adjacency constraint induces a bounded state-value suboptimality determined by the MDP's transition structure. We further show that this constraint can be practically implemented by training an adjacency network that can discriminate between adjacent and non-adjacent subgoals. Experimental results on discrete and continuous control tasks including challenging simulated robot locomotion and manipulation tasks show that incorporating the adjacency constraint significantly boosts the performance of state-of-the-art goal-conditioned HRL approaches.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

Aug-22-2022

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Massachusetts (0.04)
  - California > Santa Clara County
    - Stanford (0.04)
    - Palo Alto (0.04)
- Europe > Russia
  - Northwestern Federal District > Leningrad Oblast > Saint Petersburg (0.04)
- Asia
  - Russia (0.04)
  - China
    - Beijing > Beijing (0.05)
    - Hubei Province > Wuhan (0.04)
    - Hong Kong (0.04)
    - Sichuan Province > Chengdu (0.04)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Education (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Robots (1.00)
  - Machine Learning
    - Reinforcement Learning (1.00)
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.34)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found