CUS-GS: A Compact Unified Structured Gaussian Splatting Framework for Multimodal Scene Representation
Ming, Yuhang, Fang, Chenxin, Yu, Xingyuan, Zhang, Fan, Dai, Weichen, Kong, Wanzeng, Zhang, Guofeng
–arXiv.org Artificial Intelligence
Recent advances in Gaussian Splatting based 3D scene representation have shown two major trends: semantics-oriented approaches that focus on high-level understanding but lack explicit 3D geometry modeling, and structure-oriented approaches that capture spatial structures yet provide limited semantic abstraction. To bridge this gap, we present CUS-GS, a compact unified structured Gaussian Splatting representation, which connects multimodal semantic features with structured 3D geometry. Specifically, we design a voxelized anchor structure that constructs a spatial scaffold, while extracting multimodal semantic features from a set of foundation models (e.g., CLIP, DINOv2, SEEM). Moreover, we introduce a multimodal latent feature allocation mechanism to unify appearance, geometry, and semantics across heterogeneous feature spaces, ensuring a consistent representation across multiple foundation models. Finally, we propose a feature-aware significance evaluation strategy to dynamically guide anchor growing and pruning, effectively removing redundant or invalid anchors while maintaining semantic integrity. Extensive experiments show that CUS-GS achieves competitive performance compared to state-of-the-art methods using as few as 6M parameters - an order of magnitude smaller than the closest rival at 35M - highlighting the excellent trade off between performance and model efficiency of the proposed framework.
arXiv.org Artificial Intelligence
Nov-25-2025
- Country:
- Asia
- China > Zhejiang Province
- Hangzhou (0.04)
- Japan > Honshū
- Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
- China > Zhejiang Province
- Europe > Italy
- North America > United States
- California > San Diego County > San Diego (0.04)
- Asia
- Genre:
- Research Report (1.00)
- Technology: