SegVec3D: A Method for Vector Embedding of 3D Objects Oriented Towards Robot manipulation

Jul-15-2025–arXiv.org Artificial Intelligence

However, due to their inherent sparsity, disorder, and lack of structure, instance-level semantic understanding of point clouds remains challenging - particularly under conditions of limited supervision and cross-modal semantic ambiguity. To address these issues, we propose SegV ec3D, a novel framework integrating attention mechanisms, embedding learning, and cross-modal alignment techniques for 3D point cloud instance segmentation. The proposed approach first builds a hierarchical instance feature extractor based on spatial adjacency and attention computation, enhancing the model's ability to capture fine-grained geometric structures. It then introduces a high-dimensional embedding space, enabling unsupervised instance segmentation through a contrastive-learning-based clustering mechanism. Furthermore, a shared cross-modal semantic space is constructed to align 3D data with natural language descriptions, allowing zero-shot understanding and retrieval of 3D objects given text queries. The model is ultimately deployed and validated in realistic scenarios, demonstrating strong generalizability and engineering feasibility. While recent methods like Mask3D [40] and ULIP [10][11] have advanced 3D segmentation and vision-language pre-training respectively, our approach uniquely integrates these domains by enabling instance segmentation with minimal labeling and directly aligning point clouds with language. Experimental evaluations confirm that the proposed method achieves high semantic discriminability, robust multi-modal alignment, and practical deployabil-ity. It supports weakly-supervised or unsupervised 3D instance understanding, providing a promising foundation for future multi-modal cognitive robotic systems.

machine learning, natural language, object-oriented architecture, (13 more...)

arXiv.org Artificial Intelligence

Jul-15-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence
  - Robots (1.00)
  - Representation & Reasoning > Object-Oriented Architecture (1.00)
  - Natural Language > Text Processing (0.93)
  - Machine Learning
    - Statistical Learning (1.00)
    - Neural Networks > Deep Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found