AntiGrounding: Lifting Robotic Actions into VLM Representation Space for Decision Making

Li, Wenbo, Wang, Shiyi, Chen, Yiteng, Zhuang, Huiping, Wu, Qingyao

Jun-25-2025–arXiv.org Artificial Intelligence

Vision-Language Models (VLMs) encode knowledge and reasoning capabilities for robotic manipulation within high-dimensional representation spaces. However, current approaches often project them into compressed intermediate representations, discarding important task-specific information such as fine-grained spatial or semantic details. To address this, we propose AntiGrounding, a new framework that reverses the instruction grounding process. It lifts candidate actions directly into the VLM representation space, renders trajectories from multiple views, and uses structured visual question answering for instruction-based decision making. This enables zero-shot synthesis of optimal closed-loop robot trajectories for new tasks. We also propose an offline policy refinement module that leverages past experience to enhance long-term performance. Experiments in both simulation and real-world environments show that our method outperforms baselines across diverse robotic manipulation tasks.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

Jun-25-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report (1.00)

Industry:
- Energy (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Robots > Robot Planning & Action (0.92)
  - Natural Language > Large Language Model (0.89)
  - Representation & Reasoning
    - Spatial Reasoning (0.68)
    - Agents (0.67)
  - Machine Learning > Neural Networks
    - Deep Learning (0.67)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found