SegVec3D: A Method for Vector Embedding of 3D Objects Oriented Towards Robot manipulation

Kang, Zhihan, Wang, Boyu

arXiv.org Artificial Intelligence 

However, due to their inherent sparsity, disorder, and lack of structure, instance-level semantic understanding of point clouds remains challenging - particularly under conditions of limited supervision and cross-modal semantic ambiguity. To address these issues, we propose SegV ec3D, a novel framework integrating attention mechanisms, embedding learning, and cross-modal alignment techniques for 3D point cloud instance segmentation. The proposed approach first builds a hierarchical instance feature extractor based on spatial adjacency and attention computation, enhancing the model's ability to capture fine-grained geometric structures. It then introduces a high-dimensional embedding space, enabling unsupervised instance segmentation through a contrastive-learning-based clustering mechanism. Furthermore, a shared cross-modal semantic space is constructed to align 3D data with natural language descriptions, allowing zero-shot understanding and retrieval of 3D objects given text queries. The model is ultimately deployed and validated in realistic scenarios, demonstrating strong generalizability and engineering feasibility. While recent methods like Mask3D [40] and ULIP [10][11] have advanced 3D segmentation and vision-language pre-training respectively, our approach uniquely integrates these domains by enabling instance segmentation with minimal labeling and directly aligning point clouds with language. Experimental evaluations confirm that the proposed method achieves high semantic discriminability, robust multi-modal alignment, and practical deployabil-ity. It supports weakly-supervised or unsupervised 3D instance understanding, providing a promising foundation for future multi-modal cognitive robotic systems.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found