From Play to Policy: Conditional Behavior Generation from Uncurated Robot Data

Cui, Zichen Jeff, Wang, Yibin, Shafiullah, Nur Muhammad Mahi, Pinto, Lerrel

Dec-15-2022–arXiv.org Artificial Intelligence

While large-scale sequence modeling from offline data has led to impressive performance gains in natural language and image generation, directly translating such ideas to robotics has been challenging. One critical reason for this is that uncurated robot demonstration data, i.e. play data, collected from non-expert human demonstrators are often noisy, diverse, and distributionally multi-modal. This makes extracting useful, task-centric behaviors from such data a difficult generative modeling problem. In this work, we present Conditional Behavior Transformers (C-BeT), a method that combines the multi-modal generation ability of Behavior Transformer with future-conditioned goal specification. On a suite of simulated benchmark tasks, we find that C-BeT improves upon prior state-of-the-art work in learning from play data by an average of 45.7%. Further, we demonstrate for the first time that useful task-centric behaviors can be learned on a real-world robot purely from play data without any task labels or reward information. Robot videos are best viewed on our project website: play-to-policy.github.io. Machine Learning is undergoing a Cambrian explosion in large generative models for applications across vision (Ramesh et al., 2022) and language (Brown et al., 2020). A shared property across these models is that they are trained on large and uncurated data, often scraped from the internet.

arxiv preprint arxiv, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

Dec-15-2022

arXiv.org PDF

Add feedback

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.46)
  - Robots (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found