DeLTa: Demonstration and Language-Guided Novel Transparent Object Manipulation

Lee, Taeyeop, Kang, Gyuree, Wen, Bowen, Kim, Youngho, Back, Seunghyeok, Kweon, In So, Shim, David Hyunchul, Yoon, Kuk-Jin

Oct-8-2025–arXiv.org Artificial Intelligence

Abstract-- Despite the prevalence of transparent object interactions in human everyday life, transparent robotic manipulation research remains limited to short-horizon tasks and basic grasping capabilities. Although some methods have partially addressed these issues, most of them have limitations in generalizability to novel objects and are insufficient for precise long-horizon robot manipulation. T o address this limitation, we propose DeL T a (Demonstration and Language-Guided Novel Transparent Object Manipulation), a novel framework that integrates depth estimation, 6D pose estimation, and vision-language planning for precise long-horizon manipulation of transparent objects guided by natural task instructions. A key advantage of our method is its single-demonstration approach, which generalizes 6D trajectories to novel transparent objects without requiring category-level priors or additional training. Additionally, we present a task planner that refines the VLM-generated plan to account for the constraints of a single-arm, eye-in-hand robot for long-horizon object manipulation tasks. Through comprehensive evaluation, we demonstrate that our method significantly outperforms existing transparent object manipulation approaches, particularly in long-horizon scenarios requiring precise manipulation capabilities. I. INTRODUCTION Transparent objects are prevalent across real-world environments, including laboratories, kitchens, and manufacturing facilities. However, conventional depth sensors often fail to perceive these objects accurately.

artificial intelligence, large language model, natural language, (19 more...)

arXiv.org Artificial Intelligence

Oct-8-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Robots > Manipulation (0.67)
  - Natural Language > Large Language Model (0.47)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found