VLM-driven Skill Selection for Robotic Assembly Tasks

Kim, Jeong-Jung, Koh, Doo-Yeol, Kim, Chang-Hyun

Nov-11-2025–arXiv.org Artificial Intelligence

Robotic assembly tasks represent one of the most challenging problems in robotics, requiring precise manipulation capabilities combined with sophisticated reasoning about complex multi-step processes. Unlike simple pick-and-place tasks, assembly tasks demand long-term planning that spans multiple sequential actions, where each step must be carefully coordinated with previous and subsequent operations. Furthermore, these tasks require physical understanding of component interactions and spatial relationships between parts [1], [2], [3]. Vision-Language Models (VLMs) have emerged as powerful tools that bridge visual perception and high-level reasoning, offering significant advantages for robotic applications. These models excel at processing visual information while understanding natural language instructions, making them well-suited for complex manipulation tasks.

artificial intelligence, assembly task, natural language, (14 more...)

arXiv.org Artificial Intelligence

Nov-11-2025

arXiv.org PDF

Add feedback

Country:
- Asia > South Korea > Daejeon > Daejeon (0.04)

Genre:
- Research Report (0.64)
- Workflow (0.48)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Robots (1.00)