Review for NeurIPS paper: On Efficiency in Hierarchical Reinforcement Learning
–Neural Information Processing Systems
The justification relies on the Bayesian regret bounds to show that hierarchical decomposition can lead to statistically efficient learning by comparing the bounds for the "flat" MDP to the decomposed MDP, and thus deriving the following conditions for beneficial decomposition: either the subMDPs must all have a small state space or the original MDP is able to be decomposed into a small number of equivalent MDPs. The paper then goes on to discuss the computational complexity of planning with hierarchical structures with the Planning with Exit Profiles (PEP) algorithm. The authors derive a bound on the computational complexity of the PEP algorithm, which leads to the following properties being required for efficient learning: all subMDPs must be small, with a small number of exit profiles and total exit states. Finally, the paper also presents a bound on the performance of the PEP algorithm, and discusses the conditions for finding high-quality exit profiles, which is a requirement for the near-optimal performance of PEP.
Neural Information Processing Systems
Jan-24-2025, 03:34:31 GMT
- Technology: