Data-Agnostic Robotic Long-Horizon Manipulation with Vision-Language-Guided Closed-Loop Feedback

Meng, Yuan, Yao, Xiangtong, Ye, Haihui, Zhou, Yirui, Zhang, Shengqiang, Bing, Zhenshan, Knoll, Alois

Mar-27-2025–arXiv.org Artificial Intelligence

Our framework demonstrates state-of-the-art performance across diverse long-horizon tasks, achieving strong generalization in both simulated and real-world scenarios. Videos and code are available at https://ghiara.github.io/DAHLIA/. I. INTRODUCTION Language-conditioned robotic manipulation is an emerging field at the intersection of robotics, natural language processing, and computer vision, which aims to enable robots to interpret human commands and perform complex tasks using multi-modal sensing [1]. Imitation learning (IL) and reinforcement learning (RL) have traditionally been the dominant approaches for training robotic manipulation policies. However, recent IL and RL methods are often constrained to narrow task distributions, leading to sampling inefficiency and high sensitivity to distributional shifts, which limits their ability to generalize to diverse and complex scenarios. Additionally, both IL and RL are data-driven, requiring large-scale expert demonstrations, yet Internet-scale data collection for embodied AI remains a substantial challenge. In contrast, the natural language processing domain has seen state-of-the-art (SOT A) LLMs like GPT [2] and Llama [3] achieve humanlike semantic understanding and common sense reasoning by training on massive datasets. Within embodied AI, LLMs offer a promising solution to bridge the gap between high-level language instructions and low-level robotic control, 1 Y uan Meng, Xiangtong Y ao, Haihui Y e, Yirui Zhou, and Alois Knoll are with the School of Computation, Information and Technology, Technical University of Munich, Germany. 2 Shengqiang Zhang is with the Center for Information and Language Processing, Ludwig Maximilian University of Munich, Germany. 3 Zhenshan Bing is with the State Key Laboratory for Novel Software Technology, Nanjing University, China.

arxiv preprint arxiv, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

Mar-27-2025

arXiv.org PDF

Add feedback

Country:
- Europe > Germany
  - Bavaria > Upper Bavaria > Munich (0.44)
- Asia > China
  - Jiangsu Province > Nanjing (0.24)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Education (0.48)
- Energy > Renewable
  - Geothermal > Geothermal Energy Systems and Facilities > Geothermal System for Power Generation > Advanced Geothermal System (AGS) (0.42)

Technology:
- Information Technology > Artificial Intelligence
  - Robots (1.00)
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.71)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found