ViReSkill: Vision-Grounded Replanning with Skill Memory for LLM-Based Planning in Lifelong Robot Learning
Kagaya, Tomoyuki, Lakshmi, Subramanian, Ye, Anbang, Yuan, Thong Jing, Karlekar, Jayashree, Pranata, Sugiri, Murakami, Natsuki, Kinose, Akira, You, Yang
–arXiv.org Artificial Intelligence
Robots trained via Reinforcement Learning (RL) or Imitation Learning (IL) often adapt slowly to new tasks, whereas recent Large Language Models (LLMs) and Vision-Language Models (VLMs) promise knowledge-rich planning from minimal data. Deploying LLMs/VLMs for motion planning, however, faces two key obstacles: (i) symbolic plans are rarely grounded in scene geometry and object physics, and (ii) model outputs can vary for identical prompts, undermining execution reliability. We propose ViReSkill, a framework that pairs vision-grounded replanning with a skill memory for accumulation and reuse. When a failure occurs, the replanner generates a new action sequence conditioned on the current scene, tailored to the observed state. On success, the executed plan is stored as a reusable skill and replayed in future encounters without additional calls to LLMs/VLMs. This feedback loop enables autonomous continual learning: each attempt immediately expands the skill set and stabilizes subsequent executions. We evaluate ViReSkill on simulators such as LIBERO and RLBench as well as on a physical robot. Across all settings, it consistently outperforms conventional baselines in task success rate, demonstrating robust sim-to-real generalization.
arXiv.org Artificial Intelligence
Sep-30-2025
- Country:
- Asia
- Japan (0.04)
- Singapore > Central Region
- Singapore (0.04)
- Asia
- Genre:
- Research Report (0.50)
- Workflow (0.48)
- Industry:
- Education (1.00)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning (1.00)
- Natural Language > Large Language Model (1.00)
- Robots (1.00)
- Information Technology > Artificial Intelligence