Offline Hierarchical Reinforcement Learning via Inverse Optimization

Schmidt, Carolin, Gammelli, Daniele, Harrison, James, Pavone, Marco, Rodrigues, Filipe

Oct-10-2024–arXiv.org Artificial Intelligence

Hierarchical policies enable strong performance in many sequential decision-making problems, such as those with high-dimensional action spaces, those requiring long-horizon planning, and settings with sparse rewards. However, learning hierarchical policies from static offline datasets presents a significant challenge. Crucially, actions taken by higher-level policies may not be directly observable within hierarchical controllers, and the offline dataset might have been generated using a different policy structure, hindering the use of standard offline learning algorithms. In this work, we propose OHIO: a framework for offline reinforcement learning (RL) of hierarchical policies. Our framework leverages knowledge of the policy structure to solve the inverse problem, recovering the unobservable high-level actions that likely generated the observed data under our hierarchical policy. This approach constructs a dataset suitable for off-the-shelf offline training. We demonstrate our framework on robotic and network optimization problems and show that it substantially outperforms end-to-end RL methods and improves robustness. We investigate a variety of instantiations of our framework, both in direct deployment of policies trained offline and when online fine-tuning is performed.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

Oct-10-2024

arXiv.org PDF

Add feedback

Country:
- Asia > China
  - Guangdong Province > Shenzhen (0.04)
- Europe > Denmark (0.04)
- North America > United States
  - California > Santa Clara County
    - Palo Alto (0.04)
  - New York (0.04)
  - Ohio (0.29)
- Oceania > Australia
  - Queensland > Brisbane (0.04)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Transportation
  - Ground > Road (0.92)
  - Passenger (0.68)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Learning Graphical Models > Undirected Networks
    - Markov Models (0.45)
  - Neural Networks > Deep Learning (0.67)
  - Reinforcement Learning (1.00)