Hierarchical Programmatic Option Framework Yu-An Lin Chen-Tao Lee Chih-Han Yang Guan-Ting Liu
–Neural Information Processing Systems
Deep reinforcement learning aims to learn deep neural network policies to solve large-scale decision-making problems. However, approximating policies using deep neural networks makes it difficult to interpret the learned decision-making process. To address this issue, prior works [10, 46, 74] proposed to use humanreadable programs as policies to increase the interpretability of the decision-making pipeline. Nevertheless, programmatic policies generated by these methods struggle to effectively solve long and repetitive RL tasks and cannot generalize to even longer horizons during testing. To solve these problems, we propose the Hierarchical Programmatic Option framework (HIPO), which aims to solve long and repetitive RL problems with human-readable programs as options (low-level policies). Specifically, we propose a method that retrieves a set of effective, diverse, and compatible programs as options. Then, we learn a high-level policy to effectively reuse these programmatic options to solve reoccurring subtasks. Our proposed framework outperforms programmatic RL and deep RL baselines on various tasks. Ablation studies justify the effectiveness of our proposed search algorithm for retrieving a set of programmatic options.
Neural Information Processing Systems
Mar-27-2025, 12:42:50 GMT
- Genre:
- Research Report > Experimental Study (0.93)
- Workflow (0.67)
- Industry:
- Education (0.92)
- Information Technology (0.92)
- Leisure & Entertainment > Games (0.67)
- Technology: