GraspMamba: A Mamba-based Language-driven Grasp Detection Framework with Hierarchical Feature Learning
Nguyen, Huy Hoang, Vuong, An, Nguyen, Anh, Reid, Ian, Vu, Minh Nhat
–arXiv.org Artificial Intelligence
Grasp detection is a fundamental robotic task critical to the success of many industrial applications. However, current language-driven models for this task often struggle with cluttered images, lengthy textual descriptions, or slow inference speed. We introduce GraspMamba, a new language-driven grasp detection method that employs hierarchical feature fusion with Mamba vision to tackle these challenges. By leveraging rich visual features of the Mamba-based backbone alongside textual information, our approach effectively enhances the fusion of multimodal features. GraspMamba represents the first Mamba-based grasp detection model to extract vision and language features at multiple scales, delivering robust performance and rapid inference time. Intensive experiments show that GraspMamba outperforms recent methods by a clear margin. We validate our approach through real-world robotic experiments, highlighting its fast inference speed.
arXiv.org Artificial Intelligence
Sep-22-2024
- Country:
- Europe (0.14)
- Genre:
- Research Report (1.00)
- Technology:
- Information Technology
- Artificial Intelligence
- Machine Learning > Neural Networks
- Deep Learning (1.00)
- Natural Language (1.00)
- Representation & Reasoning (1.00)
- Robots (1.00)
- Vision (1.00)
- Machine Learning > Neural Networks
- Sensing and Signal Processing > Image Processing (1.00)
- Artificial Intelligence
- Information Technology