OmniMap: A General Mapping Framework Integrating Optics, Geometry, and Semantics
Deng, Yinan, Yue, Yufeng, Dou, Jianyu, Zhao, Jingyu, Wang, Jiahui, Tang, Yujie, Yang, Yi, Fu, Mengyin
–arXiv.org Artificial Intelligence
Figure 1: We introduce OmniMap, a general online mapping framework integrating optics, geometry, and semantics. OmniMap incrementally maintains an open-vocabulary instance-level voxel representation and a 3DGS (3D Gaussian Splatting) representation, from which color and geometric meshes are derived. OmniMap supports multi-modal rendering (RGB / depth / normal / instance), and achieves state-of-the-art performance in rendering fidelity, mesh quality, and semantic understanding. This holistic framework enables versatile support for a wide range of downstream applications. Abstract--Robotic systems demand accurate and comprehensive 3D environment perception, requiring simultaneous capture of photo-realistic appearance (optical), precise layout shape (geometric), and open-vocabulary scene understanding (semantic). Existing methods typically achieve only partial fulfillment of these requirements while exhibiting optical blurring, geometric irregularities, and semantic ambiguities. T o address these challenges, we propose OmniMap. Overall, OmniMap represents the first online mapping framework that simultaneously captures optical, geometric, and semantic scene attributes while maintaining real-time performance and model compactness. This work is supported by the National Natural Science Foundation of China under Grant 92370203, 62473050, 62233002, Beijing Natural Science Foundation Undergraduate Research Program QY24180. Mengyin Fu is with the School of Automation, Beijing Institute of Technology, Beijing 100081, China, and the School of Automation, Nanjing University of Science and Technology, Nanjing 210018, China (e-mail: fumy@bit.edu.cn). The project page of OmniMap is available at https://omni-map.github.io/. At the implementation level, OmniMap identifies key challenges across different modalities and introduces several innovations: adaptive camera modeling for motion blur and exposure compensation, hybrid incremental representation with normal constraints, and probabilistic fusion for robust instance-level understanding. Extensive experiments show OmniMap's superior performance in rendering fidelity, geometric accuracy, and zero-shot semantic segmentation compared to state-of-the-art methods across diverse scenes. The framework's versatility is further evidenced through a variety of downstream applications, including multi-domain scene Q&A, interactive editing, perception-guided manipulation, and map-assisted navigation. The quality of a robot's 3D environmental representation, measured by its accuracy and dimensionality, fundamentally impacts the robot's task operational performance and execution capabilities.
arXiv.org Artificial Intelligence
Sep-10-2025
- Genre:
- Research Report > Promising Solution (0.34)
- Industry:
- Education (0.54)
- Technology:
- Information Technology > Artificial Intelligence
- Vision (1.00)
- Robots (1.00)
- Natural Language > Text Processing (0.92)
- Representation & Reasoning > Uncertainty
- Bayesian Inference (0.46)
- Machine Learning
- Learning Graphical Models (0.93)
- Neural Networks > Deep Learning (0.46)
- Information Technology > Artificial Intelligence