On-Robot Reinforcement Learning with Goal-Contrastive Rewards
Biza, Ondrej, Weng, Thomas, Sun, Lingfeng, Schmeckpeper, Karl, Kelestemur, Tarik, Ma, Yecheng Jason, Platt, Robert, van de Meent, Jan-Willem, Wong, Lawson L. S.
–arXiv.org Artificial Intelligence
Reinforcement Learning (RL) has the potential to enable robots to learn from their own actions in the real world. Unfortunately, RL can be prohibitively expensive, in terms of on-robot runtime, due to inefficient exploration when learning from a sparse reward signal. Designing dense reward functions is labour-intensive and requires domain expertise. In our work, we propose GCR (Goal-Contrastive Rewards), a dense reward function learning method that can be trained on passive video demonstrations. By using videos without actions, our method is easier to scale, as we can use arbitrary videos. GCR combines two loss functions, an implicit value loss function that models how the reward increases when traversing a successful trajectory, and a goal-contrastive loss that discriminates between successful and failed trajectories. We perform experiments in simulated manipulation environments across RoboMimic and MimicGen tasks, as well as in the real world using a Franka arm and a Spot quadruped. We find that GCR leads to a more-sample efficient RL, enabling model-free RL to solve about twice as many tasks as our baseline reward learning methods. We also demonstrate positive cross-embodiment transfer from videos of people and of other robots performing a task. Appendix: \url{https://tinyurl.com/gcr-appendix-2}.
arXiv.org Artificial Intelligence
Oct-25-2024
- Country:
- Africa
- Ethiopia > Addis Ababa
- Addis Ababa (0.04)
- Rwanda > Kigali
- Kigali (0.04)
- Ethiopia > Addis Ababa
- Asia
- Japan > Honshū
- Kansai > Osaka Prefecture
- Osaka (0.04)
- Kantō > Kanagawa Prefecture
- Yokohama (0.04)
- Kansai > Osaka Prefecture
- Middle East > Israel
- Tel Aviv District > Tel Aviv (0.04)
- Japan > Honshū
- Europe
- Austria (0.04)
- Czechia > Prague (0.04)
- France > Île-de-France
- Germany > Baden-Württemberg
- Freiburg (0.04)
- Netherlands > North Holland
- Amsterdam (0.04)
- Slovenia > Upper Carniola
- Municipality of Bled > Bled (0.04)
- Spain
- Andalusia > Granada Province
- Granada (0.04)
- Catalonia > Barcelona Province
- Barcelona (0.04)
- Andalusia > Granada Province
- North America
- Canada
- Alberta > Census Division No. 15
- Improvement District No. 9 > Banff (0.04)
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- Quebec > Montreal (0.14)
- Alberta > Census Division No. 15
- United States
- Georgia > Fulton County
- Atlanta (0.04)
- Wisconsin > Dane County
- Madison (0.04)
- Illinois > Cook County
- Chicago (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.14)
- Utah > Salt Lake County
- Salt Lake City (0.04)
- California > Santa Clara County
- Stanford (0.04)
- Pennsylvania > Philadelphia County
- Philadelphia (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.05)
- Oregon (0.04)
- New York
- Bronx County > New York City (0.04)
- Kings County > New York City (0.04)
- New York County > New York City (0.04)
- Queens County > New York City (0.04)
- Richmond County > New York City (0.04)
- Hawaii > Honolulu County
- Honolulu (0.04)
- Maryland > Baltimore (0.04)
- Georgia > Fulton County
- Canada
- Oceania
- Australia > Queensland
- Brisbane (0.04)
- New Zealand > North Island
- Auckland Region > Auckland (0.04)
- Australia > Queensland
- Africa
- Genre:
- Research Report (0.82)
- Technology: