Online Meta-Critic Learning for Off-Policy Actor-Critic Methods

Zhou, Wei, Li, Yiying, Yang, Yongxin, Wang, Huaimin, Hospedales, Timothy M.

Mar-11-2020–arXiv.org Machine Learning

Off-Policy Actor-Critic (Off-PAC) methods have proven successful in a variety of continuous control tasks. Normally, the critic's action-value function is updated using temporal-difference, and the critic in turn provides a loss for the actor that trains it to take actions with higher expected return. In this paper, we introduce a novel and flexible meta-critic that observes the learning process and meta-learns an additional loss for the actor that accelerates and improves actor-critic learning. Compared to the vanilla critic, the meta-critic network is explicitly trained to accelerate the learning process; and compared to existing meta-learning algorithms, meta-critic is rapidly learned online for a single task, rather than slowly over a family of tasks. Crucially, our meta-critic framework is designed for off-policy based learners, which currently provide state-of-the-art reinforcement learning sample efficiency. We demonstrate that online meta-critic learning leads to improvements in avariety of continuous control environments when combined with contemporary Off-PAC methods DDPG, TD3 and the state-of-the-art SAC.

learning, online meta-critic learning, time step, (14 more...)

arXiv.org Machine Learning

Mar-11-2020

arXiv.org PDF

Add feedback

Country:
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.04)
- Asia
  - Middle East > Jordan (0.04)
  - China > Hunan Province
    - Changsha (0.04)

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (1.00)
  - Neural Networks (0.94)
  - Statistical Learning (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found