Visual Prompting for Robotic Manipulation with Annotation-Guided Pick-and-Place Using ACT

Muttaqien, Muhammad A., Motoda, Tomohiro, Hanai, Ryo, Domae, Yukiyasu

Aug-13-2025–arXiv.org Artificial Intelligence

Embodied AI Research T eam National Institute of AIST Tokyo, Japan muha.muttaqien@aist.go.jp Embodied AI Research T eam National Institute of AIST Tokyo, Japan tomohiro.motoda@aist.go.jp Embodied AI Research T eam National Institute of AIST Tokyo, Japan ryo.hanai@aist.go.jp Abstract --Robotic pick-and-place tasks in convenience stores pose challenges due to dense object arrangements, occlusions, and variations in object properties such as color, shape, size, and texture. These factors complicate trajectory planning and grasping. This paper introduces a perception-action pipeline leveraging annotation-guided visual prompting, where bounding box annotations identify both pickable objects and placement locations, providing structured spatial guidance. Instead of traditional step-by-step planning, we employ Action Chunking with Transformers (ACT) as an imitation learning algorithm, enabling the robotic arm to predict chunked action sequences from human demonstrations. We evaluate our system based on success rate and visual analysis of grasping behavior, demonstrating improved grasp accuracy and adaptability in retail environments. Robotic pick-and-place tasks are essential in various industrial and retail applications, particularly in convenience stores where robots must handle a diverse range of products with different shapes, sizes, textures, and colors, as shown in Figure 1. However, real-world pick-and-place scenarios pose significant challenges due to dense object arrangements, frequent occlusions, and the need for precise grasping and placement.

artificial intelligence, machine learning, robot, (13 more...)

arXiv.org Artificial Intelligence

Aug-13-2025

arXiv.org PDF

Add feedback

Country:
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (1.00)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Robots > Robots in the Workplace (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found