MA-ROESL: Motion-aware Rapid Reward Optimization for Efficient Robot Skill Learning from Single Videos

Wang, Xianghui, Zhang, Xinming, Chen, Yanjun, Shen, Xiaoyu, Zhang, Wei

May-14-2025–arXiv.org Artificial Intelligence

Vision-language models (VLMs) have demonstrated excellent high-level planning capabilities, enabling locomotion skill learning from video demonstrations without the need for meticulous human-level reward design. However, the improper frame sampling method and low training efficiency of current methods remain a critical bottleneck, resulting in substantial computational overhead and time costs. To address this limitation, we propose Motion-aware Rapid Reward Optimization for Efficient Robot Skill Learning from Single Videos (MA-ROESL). MA-ROESL integrates a motion-aware frame selection method to implicitly enhance the quality of VLM-generated reward functions. It further employs a hybrid three-phase training pipeline that improves training efficiency via rapid reward optimization and derives the final policy through online fine-tuning. Experimental results demonstrate that MA-ROESL significantly enhances training efficiency while faithfully reproducing locomotion skills in both simulated and real-world settings, thereby underscoring its potential as a robust and scalable framework for efficient robot locomotion skill learning from video demonstrations.

large language model, machine learning, reinforcement learning, (12 more...)

arXiv.org Artificial Intelligence

May-14-2025

arXiv.org PDF

Add feedback

Country:
- North America (0.46)
- Asia > China (0.29)

Genre:
- Research Report (0.84)

Industry:
- Education (0.71)

Technology:
- Information Technology > Artificial Intelligence
  - Robots > Locomotion (0.48)
  - Natural Language > Large Language Model (0.48)
  - Machine Learning > Reinforcement Learning (0.35)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found