Supplementary Material for Bridging the Domain Gap: Self-Supervised 3DScene Understanding with Foundation Models Anonymous Author(s) Affiliation Address email

Apr-30-2026, 09:17:13 GMT–Neural Information Processing Systems

The masking strategy is set to random and the mask4 ratio m is 60 %.5 Embedding: To embed each masked point patch, the Point-MAE method substitutes it with a mask6 token that is learnable and weighted-shared. Meanwhile, for unmasked point patches (i.e., those that7 are visible), Point-MAE employs a lightweight PointNet [8] to extract features from the point patches.8 The visible point patches Pv are hence embedded into visible tokens Tv:9 Tv = PointNet(Pv) (1) Backbone: The backbone of Point-MAE is entirely based on standard Transformers, with an10 asymmetric encoder-decoder. The encoder takes visible tokens Tv as input to generate encoded11 tokens Te. In addition, Point-MAE incorporates positional embeddings into each Transformer block,12 thereby adding location-based information.

artificial intelligence, machine learning, proceedings, (14 more...)

Neural Information Processing Systems

Apr-30-2026, 09:17:13 GMT

Conferences PDF

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (1.00)

Duplicate Docs Excel Report

Title
Supplementary Material for Bridging the Domain Gap: Self-Supervised 3D Scene Understanding with Foundation Models Anonymous Author(s) Affiliation Address email 1 Baseline: Point-MAE 1

Similar Docs Excel Report more

Title	Similarity	Source
None found