VLM-driven Skill Selection for Robotic Assembly Tasks
Kim, Jeong-Jung, Koh, Doo-Yeol, Kim, Chang-Hyun
–arXiv.org Artificial Intelligence
Robotic assembly tasks represent one of the most challenging problems in robotics, requiring precise manipulation capabilities combined with sophisticated reasoning about complex multi-step processes. Unlike simple pick-and-place tasks, assembly tasks demand long-term planning that spans multiple sequential actions, where each step must be carefully coordinated with previous and subsequent operations. Furthermore, these tasks require physical understanding of component interactions and spatial relationships between parts [1], [2], [3]. Vision-Language Models (VLMs) have emerged as powerful tools that bridge visual perception and high-level reasoning, offering significant advantages for robotic applications. These models excel at processing visual information while understanding natural language instructions, making them well-suited for complex manipulation tasks.
arXiv.org Artificial Intelligence
Nov-11-2025
- Country:
- Asia > South Korea > Daejeon > Daejeon (0.04)
- Genre:
- Research Report (0.64)
- Workflow (0.48)
- Technology:
- Information Technology > Artificial Intelligence
- Natural Language (1.00)
- Robots (1.00)
- Information Technology > Artificial Intelligence