DAC: The Double Actor-Critic Architecture for Learning Options

Apr-30-2019–arXiv.org Artificial Intelligence

Under this novel formulation, all policy optimization algorithms can be used off the shelf to learn intra-option policies, option termination conditions, and a master policy over options. We apply an actor-critic algorithm on each augmented MDP, yielding the Double Actor-Critic (DAC) architecture. Furthermore, we show that, when state-value functions are used as critics, one critic can be expressed in terms of the other, and hence only one critic is necessary. Our experiments on challenging robot simulation tasks demonstrate that DAC outperforms previous gradient-based option learning algorithms by a large margin and significantly outperforms its hierarchy-free counterparts in a transfer learning setting.

algorithm, artificial intelligence, machine learning, (12 more...)

arXiv.org Artificial Intelligence

Apr-30-2019

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning > Neural Networks (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found