Reviews: DAC: The Double Actor-Critic Architecture for Learning Options

Neural Information Processing Systems 

The paper introduces a double actor critic architecture for learning options. The authors define 2 augmented MDPs for learning the option selection policy as well as the options themselves. Using this MDP formulation, off-the-shelf policy learning algorithms can be used for learning option selection as well as option policies, which was not possible with previous algorithms. The reviews for this paper are borderline. Most reviewers appreciated the intutive idea and the promising results reported in the paper.