R3M: A Universal Visual Representation for Robot Manipulation

Nair, Suraj, Rajeswaran, Aravind, Kumar, Vikash, Finn, Chelsea, Gupta, Abhinav

Nov-18-2022–arXiv.org Artificial Intelligence

We study how visual representations pre-trained on diverse human video data can enable data-efficient learning of downstream robotic manipulation tasks. Concretely, we pre-train a visual representation using the Ego4D human video dataset using a combination of time-contrastive learning, video-language alignment, and an L1 penalty to encourage sparse and compact representations. The resulting representation, R3M, can be used as a frozen perception module for downstream policy learning. Across a suite of 12 simulated robot manipulation tasks, we find that R3M improves task success by over 20% compared to training from scratch and by over 10% compared to state-of-the-art visual representations like CLIP and MoCo. Furthermore, R3M enables a Franka Emika Panda arm to learn a range of manipulation tasks in a real, cluttered apartment given just 20 demonstrations. Code and pre-trained models are available at https://tinyurl.com/robotr3m.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

Nov-18-2022

arXiv.org PDF

Add feedback

Country:
- Oceania > New Zealand
  - North Island > Auckland Region > Auckland (0.04)
- North America > United States
  - Oregon (0.04)
  - Pennsylvania > Allegheny County
    - Pittsburgh (0.04)
  - Minnesota > Hennepin County
    - Minneapolis (0.14)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence
  - Robots > Manipulation (0.71)
  - Machine Learning
    - Neural Networks (0.68)
    - Reinforcement Learning (0.47)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found