slam
Collaborative Dynamic 3D Scene Graphs for Open-Vocabulary Urban Scene Understanding
Steinke, Tim, Büchner, Martin, Vödisch, Niclas, Valada, Abhinav
Mapping and scene representation are fundamental to reliable planning and navigation in mobile robots. While purely geometric maps using voxel grids allow for general navigation, obtaining up-to-date spatial and semantically rich representations that scale to dynamic large-scale environments remains challenging. In this work, we present CURB-OSG, an open-vocabulary dynamic 3D scene graph engine that generates hierarchical decompositions of urban driving scenes via multi-agent collaboration. By fusing the camera and LiDAR observations from multiple perceiving agents with unknown initial poses, our approach generates more accurate maps compared to a single agent while constructing a unified open-vocabulary semantic hierarchy of the scene. Unlike previous methods that rely on ground truth agent poses or are evaluated purely in simulation, CURB-OSG alleviates these constraints. We evaluate the capabilities of CURB-OSG on real-world multi-agent sensor data obtained from multiple sessions of the Oxford Radar RobotCar dataset. We demonstrate improved mapping and object prediction accuracy through multi-agent collaboration as well as evaluate the environment partitioning capabilities of the proposed approach. To foster further research, we release our code and supplementary material at https://ov-curb.cs.uni-freiburg.de.
GigaSLAM: Large-Scale Monocular SLAM with Hierachical Gaussian Splats
Deng, Kai, Yang, Jian, Wang, Shenlong, Xie, Jin
Tracking and mapping in large-scale, unbounded outdoor environments using only monocular RGB input presents substantial challenges for existing SLAM systems. Traditional Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) SLAM methods are typically limited to small, bounded indoor settings. To overcome these challenges, we introduce GigaSLAM, the first NeRF/3DGS-based SLAM framework for kilometer-scale outdoor environments, as demonstrated on the KITTI and KITTI 360 datasets. Our approach employs a hierarchical sparse voxel map representation, where Gaussians are decoded by neural networks at multiple levels of detail. This design enables efficient, scalable mapping and high-fidelity viewpoint rendering across expansive, unbounded scenes. For front-end tracking, GigaSLAM utilizes a metric depth model combined with epipolar geometry and PnP algorithms to accurately estimate poses, while incorporating a Bag-of-Words-based loop closure mechanism to maintain robust alignment over long trajectories. Consequently, GigaSLAM delivers high-precision tracking and visually faithful rendering on urban outdoor benchmarks, establishing a robust SLAM solution for large-scale, long-term scenarios, and significantly extending the applicability of Gaussian Splatting SLAM systems to unbounded outdoor environments.
- Asia > China (0.28)
- North America > United States > Illinois (0.14)
- Europe > Switzerland (0.14)
Semantic Mapping in Indoor Embodied AI -- A Comprehensive Survey and Future Directions
Raychaudhuri, Sonia, Chang, Angel X.
Among many skills that the agents need to possess, building and maintaining a semantic map of the environment is most crucial in long-horizon tasks. A semantic map captures information about the environment in a structured way, allowing the agent to reference it for advanced reasoning throughout the task. While existing surveys in embodied AI focus on general advancements or specific tasks like navigation and manipulation, this paper provides a comprehensive review of semantic map-building approaches in embodied AI, specifically for indoor navigation. We categorize these approaches based on their structural representation (spatial grids, topological graphs, dense point-clouds or hybrid maps) and the type of information they encode (implicit features or explicit environmental data). We also explore the strengths and limitations of the map building techniques, highlight current challenges, and propose future research directions. We identify that the field is moving towards developing open-vocabulary, queryable, task-agnostic map representations, while high memory demands and computational inefficiency still remaining to be open challenges. This survey aims to guide current and future researchers in advancing semantic mapping techniques for embodied AI systems.
Set-Type Belief Propagation with Applications to Poisson Multi-Bernoulli SLAM
Kim, Hyowon, García-Fernández, Angel F., Ge, Yu, Xia, Yuxuan, Svensson, Lennart, Wymeersch, Henk
Belief propagation (BP) is a useful probabilistic inference algorithm for efficiently computing approximate marginal probability densities of random variables. However, in its standard form, BP is only applicable to the vector-type random variables with a fixed and known number of vector elements, while certain applications rely on RFSs with an unknown number of vector elements. In this paper, we develop BP rules for factor graphs defined on sequences of RFSs where each RFS has an unknown number of elements, with the intention of deriving novel inference methods for RFSs. Furthermore, we show that vector-type BP is a special case of set-type BP, where each RFS follows the Bernoulli process. To demonstrate the validity of developed set-type BP, we apply it to the PMB filter for SLAM, which naturally leads to new set-type BP-mapping, SLAM, multi-target tracking, and simultaneous localization and tracking filters. Finally, we explore the relationships between the vector-type BP and the proposed set-type BP PMB-SLAM implementations and show a performance gain of the proposed set-type BP PMB-SLAM filter in comparison with the vector-type BP-SLAM filter.
- North America > United States (0.28)
- Europe > United Kingdom (0.28)
- Europe > Germany (0.14)
- (7 more...)
New AI creates bird's-eye view map faster, brings safer autonomous vehicles a step closer
Award-winning research from the University of Surrey that uses artificial intelligence (AI) to instantly and accurately translate two-dimensional images into a bird's-eye view map faster, brings the prospect of safer autonomous vehicles closer to reality. Surrey's new AI model produces results that are 15% more accurate than other technologies on the market. Avishkar Saha, co-author of the study at the University of Surrey, said, "Safety is one of the key hurdles preventing autonomous vehicles from becoming a reality. It is crucial for such vehicles to build maps of the world instantly and accurately, so they know where it is safe to drive. "Our model exploits the one-to-one correspondence between a vertical line in an image and rays passing through the camera location in an overhead map.