XMask3D: Cross-modal Mask Reasoning for Open Vocabulary 3D Semantic Segmentation Ziyi Wang Y anbo Wang
–Neural Information Processing Systems
Subsequently, the generated 2D masks are employed to align mask-level 3D representations with the vision-language feature space, thereby augmenting the open vocabulary capability of 3D geometry embeddings.
Neural Information Processing Systems
Feb-16-2026, 10:31:48 GMT