AITopics | droid-slam

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.63)

Neural Information Processing SystemsOct-8-2025, 23:01:34 GMT

7ac484b0f1a1719ad5be9aa8c8455fbb-Paper-Conference.pdf

artificial intelligence, droid-slam, machine learning, (17 more...)

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Hong Kong (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (0.96)
Information Technology > Artificial Intelligence > Robots (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Yadav, Yajat, Bharadwaj, Varun, Korrapati, Jathin, Baranwal, Tanish

VROOM - Visual Reconstruction over Onboard Multiview

arXiv.org Artificial IntelligenceAug-26-2025

W e introduce VROOM, a system for reconstructing 3D models of F ormula 1 circuits using only onboard camera footage from racecars. Leveraging video data from the 2023 Monaco Grand Prix, we address video challenges such as high-speed motion and sharp cuts in camera frames. Our pipeline analyzes different methods such as DROID-SLAM, AnyCam, and Monst3r and combines preprocessing techniques such as different methods of masking, temporal chunking, and resolution scaling to account for dynamic motion and computational constraints. W e show that Vroom is able to partially recover track and vehicle trajectories in complex environments. These findings indicate the feasibility of using onboard video for scalable 4D reconstruction in real-world settings.

artificial intelligence, reconstruction, video, (16 more...)

2508.17172

Country: Europe > Monaco (0.26)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Sports > Motorsports > Formula One (0.69)

Technology: Information Technology > Artificial Intelligence (1.00)

Neural Information Processing SystemsAug-15-2025, 18:11:00 GMT

89fcd07f20b6785b92134bd6c1d0fa42-Paper.pdf

artificial intelligence, deep learning, machine learning, (16 more...)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceJun-25-2025

Multimodal Fusion SLAM with Fourier Attention

Zhou, Youjie, Mei, Guofeng, Wang, Yiming, Wan, Yi, Poiesi, Fabio

Visual SLAM is particularly challenging in environments affected by noise, varying lighting conditions, and darkness. Learning-based optical flow algorithms can leverage multiple modalities to address these challenges, but traditional optical flow-based visual SLAM approaches often require significant computational resources.To overcome this limitation, we propose FMF-SLAM, an efficient multimodal fusion SLAM method that utilizes fast Fourier transform (FFT) to enhance the algorithm efficiency. Specifically, we introduce a novel Fourier-based self-attention and cross-attention mechanism to extract features from RGB and depth signals. We further enhance the interaction of multimodal features by incorporating multi-scale knowledge distillation across modalities. We also demonstrate the practical feasibility of FMF-SLAM in real-world scenarios with real time performance by integrating it with a security robot by fusing with a global positioning module GNSS-RTK and global Bundle Adjustment. Our approach is validated using video sequences from TUM, TartanAir, and our real-world datasets, showcasing state-of-the-art performance under noisy, varying lighting, and dark conditions.Our code and datasets are available at https://github.com/youjie-zhou/FMF-SLAM.git.

artificial intelligence, fmf-slam, machine learning, (18 more...)

2506.18204

Country:

North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
North America > United States > California > Los Angeles County > Los Angeles (0.04)
North America > Canada > Quebec > Montreal (0.04)
(7 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Kaveti, Pushyami, Waldum, Ambjorn Grimsrud, Singh, Hanumant, Ludvigsen, Martin

Enhancing Situational Awareness in Underwater Robotics with Multi-modal Spatial Perception

arXiv.org Artificial IntelligenceJun-10-2025

Autonomous Underwater Vehicles (AUVs) and Remotely Operated Vehicles (ROVs) demand robust spatial perception capabilities, including Simultaneous Localization and Mapping (SLAM), to support both remote and autonomous tasks. Vision-based systems have been integral to these advancements, capturing rich color and texture at low cost while enabling semantic scene understanding. However, underwater conditions -- such as light attenuation, backscatter, and low contrast -- often degrade image quality to the point where traditional vision-based SLAM pipelines fail. Moreover, these pipelines typically rely on monocular or stereo inputs, limiting their scalability to the multi-camera configurations common on many vehicles. To address these issues, we propose to leverage multi-modal sensing that fuses data from multiple sensors-including cameras, inertial measurement units (IMUs), and acoustic devices-to enhance situational awareness and enable robust, real-time SLAM. We explore both geometric and learning-based techniques along with semantic analysis, and conduct experiments on the data collected from a work-class ROV during several field deployments in the Trondheim Fjord. Through our experimental results, we demonstrate the feasibility of real-time reliable state estimation and high-quality 3D reconstructions in visually challenging underwater conditions. We also discuss system constraints and identify open research questions, such as sensor calibration, limitations with learning-based methods, that merit further exploration to advance large-scale underwater operations.

artificial intelligence, dataset, machine learning, (15 more...)

2506.06476

Country:

Europe > Norway > Central Norway > Trøndelag > Trondheim (0.25)
North America > United States (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Electrical Industrial Apparatus (0.79)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsJan-14-2025, 15:58:30 GMT

DROID-SLAM: Deep Visual SLAM for Monocular, Stereo, and RGB-D Cameras

We introduce DROID-SLAM, a new deep learning based SLAM system. DROID-SLAM consists of recurrent iterative updates of camera pose and pixelwise depth through a Dense Bundle Adjustment layer. DROID-SLAM is accurate, achieving large improvements over prior work, and robust, suffering from substantially fewer catastrophic failures. Despite training on monocular video, it can leverage stereo or RGB-D video to achieve improved performance at test time.

deep visual slam, droid-slam, rgb-d camera, (3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.74)

Murai, Riku, Dexheimer, Eric, Davison, Andrew J.

MASt3R-SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors

arXiv.org Artificial IntelligenceDec-16-2024

We present a real-time monocular dense SLAM system designed bottom-up from MASt3R, a two-view 3D reconstruction and matching prior. Equipped with this strong prior, our system is robust on in-the-wild video sequences despite making no assumption on a fixed or parametric camera model beyond a unique camera centre. We introduce efficient methods for pointmap matching, camera tracking and local fusion, graph construction and loop closure, and second-order global optimisation. With known calibration, a simple modification to the system achieves state-of-the-art performance across various benchmarks. Altogether, we propose a plug-and-play monocular SLAM system capable of producing globally-consistent poses and dense geometry while operating at 15 FPS.

artificial intelligence, machine learning, real time system, (19 more...)