A Multi-Modal Interaction Framework for Efficient Human-Robot Collaborative Shelf Picking

Pathak, Abhinav, Venkatesan, Kalaichelvi, Taha, Tarek, Muthusamy, Rajkumar

Apr-10-2025–arXiv.org Artificial Intelligence

In this paper, we propose a collaborative shelf-picking framework that combines multimodal interaction, physics-based reasoning, and task division for enhanced human-robot teamwork. The framework enables the robot to recognize human pointing gestures, interpret verbal cues and voice commands, and communicate through visual and auditory feedback. Moreover, it is powered by a Large Language Model (LLM) which utilizes Chain of Thought (CoT) and a physics-based simulation engine for safely retrieving cluttered stacks of boxes on shelves, relationship graph for sub-task generation, extraction sequence planning and decision making.

collaboration, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

Apr-10-2025

arXiv.org PDF

Add feedback

Country:
- Asia > Middle East > UAE (0.15)

Genre:
- Research Report (0.65)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Model-Based Reasoning (0.69)
  - Robots > Humanoid Robots (0.65)
  - Natural Language > Large Language Model (0.58)