AITopics | Oswald, Martin R.

Collaborating Authors

Oswald, Martin R.

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Splat-LOAM: Gaussian Splatting LiDAR Odometry and Mapping

Giacomini, Emanuele, Di Giammarino, Luca, De Rebotti, Lorenzo, Grisetti, Giorgio, Oswald, Martin R.

arXiv.org Artificial IntelligenceMar-21-2025

LiDARs provide accurate geometric measurements, making them valuable for ego-motion estimation and reconstruction tasks. Although its success, managing an accurate and lightweight representation of the environment still poses challenges. Both classic and NeRF-based solutions have to trade off accuracy over memory and processing times. In this work, we build on recent advancements in Gaussian Splatting methods to develop a novel LiDAR odometry and mapping pipeline that exclusively relies on Gaussian primitives for its scene representation. Leveraging spherical projection, we drive the refinement of the primitives uniquely from LiDAR measurements. Experiments show that our approach matches the current registration performance, while achieving SOTA results for mapping tasks with minimal GPU requirements. This efficiency makes it a strong candidate for further exploration and potential adoption in real-time robotics estimation tasks.

artificial intelligence, gaussian splatting lidar odometry, splatting lidar odometry and mapping, (1 more...)

arXiv.org Artificial Intelligence

2503.17491

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Vision (0.89)

Add feedback

Deblur Gaussian Splatting SLAM

Girlanda, Francesco, Rozumnyi, Denys, Pollefeys, Marc, Oswald, Martin R.

arXiv.org Artificial IntelligenceMar-16-2025

We present Deblur-SLAM, a robust RGB SLAM pipeline designed to recover sharp reconstructions from motion-blurred inputs. The proposed method bridges the strengths of both frame-to-frame and frame-to-model approaches to model sub-frame camera trajectories that lead to high-fidelity reconstructions in motion-blurred settings. Moreover, our pipeline incorporates techniques such as online loop closure and global bundle adjustment to achieve a dense and precise global trajectory. We model the physical image formation process of motion-blurred images and minimize the error between the observed blurry images and rendered blurry images obtained by averaging sharp virtual sub-frame images. Additionally, by utilizing a monocular depth estimator alongside the online deformation of Gaussians, we ensure precise mapping and enhanced image deblurring. The proposed SLAM pipeline integrates all these components to improve the results. We achieve state-of-the-art results for sharp map estimation and sub-frame trajectory recovery both on synthetic and real-world blurry input data.

artificial intelligence, conference, reconstruction, (14 more...)

arXiv.org Artificial Intelligence

2503.12572

Country:

Europe > Netherlands (0.14)
Europe > Germany (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)

Add feedback

MAGiC-SLAM: Multi-Agent Gaussian Globally Consistent SLAM

Yugay, Vladimir, Gevers, Theo, Oswald, Martin R.

arXiv.org Artificial IntelligenceNov-25-2024

Simultaneous localization and mapping (SLAM) systems with novel view synthesis capabilities are widely used in computer vision, with applications in augmented reality, robotics, and autonomous driving. However, existing approaches are limited to single-agent operation. Recent work has addressed this problem using a distributed neural scene representation. Unfortunately, existing methods are slow, cannot accurately render real-world data, are restricted to two agents, and have limited tracking accuracy. In contrast, we propose a rigidly deformable 3D Gaussian-based scene representation that dramatically speeds up the system. However, improving tracking accuracy and reconstructing a globally consistent map from multiple agents remains challenging due to trajectory drift and discrepancies across agents' observations. Therefore, we propose new tracking and map-merging mechanisms and integrate loop closure in the Gaussian-based SLAM pipeline. We evaluate MAGiC-SLAM on synthetic and real-world datasets and find it more accurate and faster than the state of the art.

agent, artificial intelligence, magic-slam, (15 more...)

arXiv.org Artificial Intelligence

2411.16785

Country: Europe > Netherlands (0.28)

Genre: Research Report (0.50)

Industry:

Transportation > Ground > Road (0.48)
Information Technology > Robotics & Automation (0.48)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

OVO-SLAM: Open-Vocabulary Online Simultaneous Localization and Mapping

Martins, Tomas Berriel, Oswald, Martin R., Civera, Javier

arXiv.org Artificial IntelligenceNov-22-2024

This paper presents the first Open-Vocabulary Online 3D semantic SLAM pipeline, that we denote as OVO-SLAM. Our primary contribution is in the pipeline itself, particularly in the mapping thread. Given a set of posed RGB-D frames, we detect and track 3D segments, which we describe using CLIP vectors, calculated through a novel aggregation from the viewpoints where these 3D segments are observed. Notably, our OVO-SLAM pipeline is not only faster but also achieves better segmentation metrics compared to offline approaches in the literature. Along with superior segmentation performance, we show experimental results of our contributions integrated with Gaussian-SLAM, being the first ones demonstrating end-to-end open-vocabulary online 3D reconstructions without relying on ground-truth camera poses or scene geometry.

data mining, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2411.15043

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining (0.93)
(3 more...)

Add feedback

An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels

Nguyen, Duy-Kien, Assran, Mahmoud, Jain, Unnat, Oswald, Martin R., Snoek, Cees G. M., Chen, Xinlei

arXiv.org Artificial IntelligenceJun-13-2024

This work does not introduce a new method. Instead, we present an interesting finding that questions the necessity of the inductive bias -- locality in modern computer vision architectures. Concretely, we find that vanilla Transformers can operate by directly treating each individual pixel as a token and achieve highly performant results. This is substantially different from the popular design in Vision Transformer, which maintains the inductive bias from ConvNets towards local neighborhoods (e.g. by treating each 16x16 patch as a token). We mainly showcase the effectiveness of pixels-as-tokens across three well-studied tasks in computer vision: supervised learning for object classification, self-supervised learning via masked autoencoding, and image generation with diffusion models. Although directly operating on individual pixels is less computationally practical, we believe the community must be aware of this surprising piece of knowledge when devising the next generation of neural architectures for computer vision.

artificial intelligence, machine learning, transformer, (17 more...)

arXiv.org Artificial Intelligence

2406.09415

Genre: Research Report > New Finding (0.94)

Technology:

Information Technology > Artificial Intelligence > Vision > Image Understanding (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

How NeRFs and 3D Gaussian Splatting are Reshaping SLAM: a Survey

Tosi, Fabio, Zhang, Youmin, Gong, Ziren, Sandström, Erik, Mattoccia, Stefano, Oswald, Martin R., Poggi, Matteo

arXiv.org Artificial IntelligenceApr-11-2024

Over the past two decades, research in the field of Simultaneous Localization and Mapping (SLAM) has undergone a significant evolution, highlighting its critical role in enabling autonomous exploration of unknown environments. This evolution ranges from hand-crafted methods, through the era of deep learning, to more recent developments focused on Neural Radiance Fields (NeRFs) and 3D Gaussian Splatting (3DGS) representations. Recognizing the growing body of research and the absence of a comprehensive survey on the topic, this paper aims to provide the first comprehensive overview of SLAM progress through the lens of the latest advancements in radiance fields. It sheds light on the background, evolutionary path, inherent strengths and limitations, and serves as a fundamental reference to highlight the dynamic progress and specific challenges.

artificial intelligence, machine learning, representation, (20 more...)

arXiv.org Artificial Intelligence

2402.13255

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.67)

Industry: Energy (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback

Gaussian-SLAM: Photo-realistic Dense SLAM with Gaussian Splatting

Yugay, Vladimir, Li, Yue, Gevers, Theo, Oswald, Martin R.

arXiv.org Artificial IntelligenceDec-6-2023

Specifically, earlier works focus a scene representation. The new representation enables on tracking using various scene representations like interactive-time reconstruction and photo-realistic rendering feature point clouds [15, 26, 40], surfels [53, 71], depth of real-world and synthetic scenes. We propose novel maps [43, 58], or implicit representations [14, 42, 44]. Later strategies for seeding and optimizing Gaussian splats to works focused more on the map quality and density. With extend their use from multiview offline scenarios to sequential the advent of powerful neural scene representations like monocular RGBD input data setups. In addition, we neural radiance fields [38] that allow for high fidelity viewsynthesis, extend Gaussian splats to encode geometry and experiment a rapidly growing body of dense neural SLAM with tracking against this scene representation. Our methods [19, 34, 51, 60, 62, 64, 81, 84] has been developed.

artificial intelligence, optimization problem, survey article, (17 more...)

arXiv.org Artificial Intelligence

2312.1007

Country:

Europe > Netherlands (0.28)
Asia > Middle East > Israel (0.14)

Genre:

Overview (0.46)
Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

Revisiting Proposal-based Object Detection

Bhowmik, Aritra, Oswald, Martin R., Mettes, Pascal, Snoek, Cees G. M.

arXiv.org Artificial IntelligenceNov-30-2023

This paper revisits the pipeline for detecting objects in images with proposals. For any object detector, the obtained box proposals or queries need to be classified and regressed towards ground truth boxes. The common solution for the final predictions is to directly maximize the overlap between each proposal and the ground truth box, followed by a winner-takes-all ranking or non-maximum suppression. In this work, we propose a simple yet effective alternative. For proposal regression, we solve a simpler problem where we regress to the area of intersection between proposal and ground truth. In this way, each proposal only specifies which part contains the object, avoiding a blind inpainting problem where proposals need to be regressed beyond their visual scope. In turn, we replace the winner-takes-all strategy and obtain the final prediction by taking the union over the regressed intersections of a proposal group surrounding an object. Our revisited approach comes with minimal changes to the detection pipeline and can be plugged into any existing method. We show that our approach directly improves canonical object detection and instance segmentation architectures, highlighting the utility of intersection-based regression and grouping.

artificial intelligence, machine learning, proposal, (16 more...)

arXiv.org Artificial Intelligence

2311.18512

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Better Situational Graphs by Inferring High-level Semantic-Relational Concepts

Millan-Romera, Jose Andres, Bavle, Hriday, Shaheer, Muhammad, Oswald, Martin R., Voos, Holger, Sanchez-Lopez, Jose Luis

arXiv.org Artificial IntelligenceSep-30-2023

Recent works on SLAM extend their pose graphs with higher-level semantic concepts exploiting relationships between them, to provide, not only a richer representation of the situation/environment but also to improve the accuracy of its estimation. Concretely, our previous work, Situational Graphs (S-Graphs), a pioneer in jointly leveraging semantic relationships in the factor optimization process, relies on semantic entities such as wall surfaces and rooms, whose relationship is mathematically defined. Nevertheless, excerpting these high-level concepts relying exclusively on the lower-level factor-graph remains a challenge and it is currently done with ad-hoc algorithms, which limits its capability to include new semantic-relational concepts. To overcome this limitation, in this work, we propose a Graph Neural Network (GNN) for learning high-level semantic-relational concepts that can be inferred from the low-level factor graph. We have demonstrated that we can infer room entities and their relationship to the mapped wall surfaces, more accurately and more computationally efficient than the baseline algorithm. Additionally, to demonstrate the versatility of our method, we provide a new semantic concept, i.e. wall, and its relationship with its wall surfaces. Our proposed method has been integrated into S-Graphs+, and it has been validated in both simulated and real datasets. A docker container with our software will be made available to the scientific community.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2310.00401

Country:

Europe (0.14)
Oceania > Australia (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

The Drunkard's Odometry: Estimating Camera Motion in Deforming Scenes

Recasens, David, Oswald, Martin R., Pollefeys, Marc, Civera, Javier

arXiv.org Artificial IntelligenceJun-29-2023

Estimating camera motion in deformable scenes poses a complex and open research challenge. Most existing non-rigid structure from motion techniques assume to observe also static scene parts besides deforming scene parts in order to establish an anchoring reference. However, this assumption does not hold true in certain relevant application cases such as endoscopies. Deformable odometry and SLAM pipelines, which tackle the most challenging scenario of exploratory trajectories, suffer from a lack of robustness and proper quantitative evaluation methodologies. To tackle this issue with a common benchmark, we introduce the Drunkard's Dataset, a challenging collection of synthetic data targeting visual navigation and reconstruction in deformable environments. This dataset is the first large set of exploratory camera trajectories with ground truth inside 3D scenes where every surface exhibits non-rigid deformations over time. Simulations in realistic 3D buildings lets us obtain a vast amount of data and ground truth labels, including camera poses, RGB images and depth, optical flow and normal maps at high resolution and quality. We further present a novel deformable odometry method, dubbed the Drunkard's Odometry, which decomposes optical flow estimates into rigid-body camera motion and non-rigid scene deformations. In order to validate our data, our work contains an evaluation of several baselines as well as a novel tracking error metric which does not require ground truth data. Dataset and code: https://davidrecasens.github.io/TheDrunkard'sOdometry/

artificial intelligence, drunkard, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2306.16917

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.50)

Industry:

Media > Television (0.91)
Media > Photography (0.91)
Media > Film (0.91)
Health & Medicine > Diagnostic Medicine > Imaging (0.67)

Technology:

Information Technology > Graphics (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback