Meta-Q-Learning
Fakoor, Rasool, Chaudhari, Pratik, Soatto, Stefano, Smola, Alexander J.
This paper introduces Meta-Q-Learning (MQL), a new off-policy algorithm for meta-Reinforcement Learning (meta-RL). MQL builds upon three simple ideas. First, we show that Q-learning is competitive with state of the art meta-RL algorithms if given access to a context variable that is a representation of the past trajectory. Second, using a multi-task objective to maximize the average reward across the training tasks is an effective method to meta-train RL policies. Third, past data from the meta-training replay buffer can be recycled to adapt the policy on a new task using off-policy updates. MQL draws upon ideas in propensity estimation to do so and thereby amplifies the amount of available data for adaptation. Experiments on standard continuous-control benchmarks suggest that MQL compares favorably with state of the art meta-RL algorithms.
Sep-30-2019
- Country:
- North America > United States
- Texas (0.04)
- Pennsylvania (0.04)
- Illinois > Cook County
- Chicago (0.04)
- Europe > Germany
- Bavaria > Upper Bavaria > Munich (0.04)
- Asia > Middle East
- Jordan (0.04)
- North America > United States
- Genre:
- Research Report (0.64)
- Technology: