Improving Context-Based Meta-Reinforcement Learning with Self-Supervised Trajectory Contrastive Learning

Wang, Bernie, Xu, Simon, Keutzer, Kurt, Gao, Yang, Wu, Bichen

Mar-10-2021–arXiv.org Artificial Intelligence

Meta-reinforcement learning typically requires orders of magnitude more samples than single task reinforcement learning methods. This is because meta-training needs to deal with more diverse distributions and train extra components such as context encoders. To address this, we propose a novel self-supervised learning task, which we named Trajectory Contrastive Learning (TCL), to improve meta-training. TCL adopts contrastive learning and trains a context encoder to predict whether two transition windows are sampled from the same trajectory. TCL leverages the natural hierarchical structure of context-based meta-RL and makes minimal assumptions, allowing it to be generally applicable to context-based meta-RL algorithms. It accelerates the training of context encoders and improves meta-training overall. Experiments show that TCL performs better or comparably than a strong meta-RL baseline in most of the environments on both meta-RL MuJoCo (5 of 6) and Meta-World benchmarks (44 out of 50).

artificial intelligence, reinforcement learning, tcl-pearl, (13 more...)

arXiv.org Artificial Intelligence

Mar-10-2021

arXiv.org PDF

Add feedback

Country:
- Europe > Austria
  - Vienna (0.14)
- North America > United States
  - California (0.14)

Genre:
- Research Report (0.82)

Industry:
- Education (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks > Deep Learning (0.46)
  - Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found