Reviews: DAC: The Double Actor-Critic Architecture for Learning Options

Neural Information Processing Systems 

Post-rebuttal update: I have read the rebuttal. Thanks for the clarification regarding they type of experiments where there is a larger gap between DAC and the baselines, as well as the clarification on PPO OC/IOPG. The paper proposes a new method for learning options in a hierarchical reinforcement learning set-up. The method works by decomposing the original problem into two MDPs, that can each be solved using conventional policy-based methods. This allows new state-of-the-art methods to easily be'dropped in' to improve HRL.