Bridging the Domain Gap: Self-Supervised 3D Scene Understanding with Foundation Models Zhimin Chen

Feb-18-2026, 02:01:26 GMT–Neural Information Processing Systems

Foundation models have achieved remarkable results in 2D and language tasks like image segmentation, object detection, and visual-language understanding. However, their potential to enrich 3D scene representation learning is largely untapped due to the existence of the domain gap. In this work, we propose an innovative methodology called Bridge3D to address this gap by pre-training 3D models using features, semantic masks, and captions sourced from foundation models. Specifically, our method employs semantic masks from foundation models to guide the masking and reconstruction process for the masked autoen-coder, enabling more focused attention on foreground representations.

artificial intelligence, machine learning, natural language, (15 more...)

Neural Information Processing Systems

Feb-18-2026, 02:01:26 GMT

Conferences PDF

Add feedback

Country:
- Asia > Middle East
  - Israel (0.04)
  - Jordan (0.04)

Genre:
- Research Report > Promising Solution (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Natural Language (1.00)
  - Machine Learning > Neural Networks (0.94)

Duplicate Docs Excel Report

Title
Bridging the Domain Gap: Self-Supervised 3D Scene Understanding with Foundation Models Zhimin Chen

Similar Docs Excel Report more

Title	Similarity	Source
None found