Goto

Collaborating Authors

 stachniss


Self-Supervised Moving Object Segmentation of Sparse and Noisy Radar Point Clouds

Schwarzer, Leon, Zeller, Matthias, Herraez, Daniel Casado, Dierl, Simon, Heidingsfeld, Michael, Stachniss, Cyrill

arXiv.org Artificial Intelligence

Moving object segmentation is a crucial task for safe and reliable autonomous mobile systems like self-driving cars, improving the reliability and robustness of subsequent tasks like SLAM or path planning. While the segmentation of camera or LiDAR data is widely researched and achieves great results, it often introduces an increased latency by requiring the accumulation of temporal sequences to gain the necessary temporal context. Radar sensors overcome this problem with their ability to provide a direct measurement of a point's Doppler velocity, which can be exploited for single-scan moving object segmentation. However, radar point clouds are often sparse and noisy, making data annotation for use in supervised learning very tedious, time-consuming, and cost-intensive. To overcome this problem, we address the task of self-supervised moving object segmentation of sparse and noisy radar point clouds. We follow a two-step approach of contrastive self-supervised representation learning with subsequent supervised fine-tuning using limited amounts of annotated data. We propose a novel clustering-based contrastive loss function with cluster refinement based on dynamic points removal to pretrain the network to produce motion-aware representations of the radar data. Our method improves label efficiency after fine-tuning, effectively boosting state-of-the-art performance by self-supervised pretraining.


3D Hierarchical Panoptic Segmentation in Real Orchard Environments Across Different Sensors

Sodano, Matteo, Magistri, Federico, Marks, Elias, Hosn, Fares, Zurbayev, Aibek, Marcuzzi, Rodrigo, Malladi, Meher V. R., Behley, Jens, Stachniss, Cyrill

arXiv.org Artificial Intelligence

Crop yield estimation is a relevant problem in agriculture, because an accurate crop yield estimate can support farmers' decisions on harvesting or precision intervention. Robots can help to automate this process. To do so, they need to be able to perceive the surrounding environment to identify target objects. In this paper, we introduce a novel approach to address the problem of hierarchical panoptic segmentation of apple orchards on 3D data from different sensors. Our approach is able to simultaneously provide semantic segmentation, instance segmentation of trunks and fruits, and instance segmentation of plants (a single trunk with its fruits). This allows us to identify relevant information such as individual plants, fruits, and trunks, and capture the relationship among them, such as precisely estimate the number of fruits associated to each tree in an orchard. Additionally, to efficiently evaluate our approach for hierarchical panoptic segmentation, we provide a dataset designed specifically for this task. Our dataset is recorded in Bonn in a real apple orchard with a variety of sensors, spanning from a terrestrial laser scanner to a RGB-D camera mounted on different robotic platforms. The experiments show that our approach surpasses state-of-the-art approaches in 3D panoptic segmentation in the agricultural domain, while also providing full hierarchical panoptic segmentation. Our dataset has been made publicly available at https://www.ipb.uni-bonn.de/data/hops/. We will provide the open-source implementation of our approach and public competiton for hierarchical panoptic segmentation on the hidden test sets upon paper acceptance.


KISS-SLAM: A Simple, Robust, and Accurate 3D LiDAR SLAM System With Enhanced Generalization Capabilities

Guadagnino, Tiziano, Mersch, Benedikt, Gupta, Saurabh, Vizzo, Ignacio, Grisetti, Giorgio, Stachniss, Cyrill

arXiv.org Artificial Intelligence

Robust and accurate localization and mapping of an environment using laser scanners, so-called LiDAR SLAM, is essential to many robotic applications. Early 3D LiDAR SLAM methods often exploited additional information from IMU or GNSS sensors to enhance localization accuracy and mitigate drift. Later, advanced systems further improved the estimation at the cost of a higher runtime and complexity. This paper explores the limits of what can be achieved with a LiDAR-only SLAM approach while following the "Keep It Small and Simple" (KISS) principle. By leveraging this minimalistic design principle, our system, KISS-SLAM, archives state-of-the-art performances in pose accuracy while requiring little to no parameter tuning for deployment across diverse environments, sensors, and motion profiles. We follow best practices in graph-based SLAM and build upon LiDAR odometry to compute the relative motion between scans and construct local maps of the environment. To correct drift, we match local maps and optimize the trajectory in a pose graph optimization step. The experimental results demonstrate that this design achieves competitive performance while reducing complexity and reliance on additional sensor modalities. By prioritizing simplicity, this work provides a new strong baseline for LiDAR-only SLAM and a high-performing starting point for future research. Further, our pipeline builds consistent maps that can be used directly for further downstream tasks like navigation. Our open-source system operates faster than the sensor frame rate in all presented datasets and is designed for real-world scenarios.


Transformer-Based Spatio-Temporal Association of Apple Fruitlets

Freeman, Harry, Kantor, George

arXiv.org Artificial Intelligence

-- In this paper, we present a transformer-based method to spatio-temporally associate apple fruitlets in stereo-images collected on different days and from different camera poses. State-of-the-art association methods in agriculture are dedicated towards matching larger crops using either high-resolution point clouds or temporally stable features, which are both difficult to obtain for smaller fruit in the field. T o address these challenges, we propose a transformer-based architecture that encodes the shape and position of each fruitlet, and propagates and refines these features through a series of transformer encoder layers with alternating self and cross-attention. We demonstrate that our method is able to achieve an F1-score of 92.4% on data collected in a commercial apple orchard and outperforms all baselines and ablations. The global food supply is constantly under increasing pressure as a result of climate change, population growth, and increased labor shortages. To keep up with demand, agriculturalists are turning to computer vision-based systems that can automate a variety of laborious and time-intensive tasks such as harvesting [1], pruning [2], counting [3], and crop modeling [4]. These automated solutions not only improve efficiency, but also help mitigate the challenges posed by labor shortages and increasing food demand, ensuring that critical agricultural tasks can be performed reliably at scale. One particularly important but challenging task to automate is monitoring the growth and development of individual plants and fruits. Monitoring plant and fruit growth is important because it enables agricultural specialists to make more informed real-time crop management decisions and helps with downstream tasks such as phenotyping [5], disease management [6], and yield prediction [7].


PINGS: Gaussian Splatting Meets Distance Fields within a Point-Based Implicit Neural Map

Pan, Yue, Zhong, Xingguang, Jin, Liren, Wiesmann, Louis, Popović, Marija, Behley, Jens, Stachniss, Cyrill

arXiv.org Artificial Intelligence

Robots require high-fidelity reconstructions of their environment for effective operation. Such scene representations should be both, geometrically accurate and photorealistic to support downstream tasks. While this can be achieved by building distance fields from range sensors and radiance fields from cameras, the scalable incremental mapping of both fields consistently and at the same time with high quality remains challenging. In this paper, we propose a novel map representation that unifies a continuous signed distance field and a Gaussian splatting radiance field within an elastic and compact point-based implicit neural map. By enforcing geometric consistency between these fields, we achieve mutual improvements by exploiting both modalities. We devise a LiDAR-visual SLAM system called PINGS using the proposed map representation and evaluate it on several challenging large-scale datasets. Experimental results demonstrate that PINGS can incrementally build globally consistent distance and radiance fields encoded with a compact set of neural points. Compared to the state-of-the-art methods, PINGS achieves superior photometric and geometric rendering at novel views by leveraging the constraints from the distance field. Furthermore, by utilizing dense photometric cues and multi-view consistency from the radiance field, PINGS produces more accurate distance fields, leading to improved odometry estimation and mesh reconstruction.


Horticultural Temporal Fruit Monitoring via 3D Instance Segmentation and Re-Identification using Point Clouds

Fusaro, Daniel, Magistri, Federico, Behley, Jens, Pretto, Alberto, Stachniss, Cyrill

arXiv.org Artificial Intelligence

Robotic fruit monitoring is a key step toward automated agricultural production systems. Robots can significantly enhance plant and temporal fruit monitoring by providing precise, high-throughput assessments that overcome the limitations of traditional manual methods. Fruit monitoring is a challenging task due to the significant variation in size, shape, orientation, and occlusion of fruits. Also, fruits may be harvested or newly grown between recording sessions. Most methods are 2D image-based and they lack the 3D structure, depth, and spatial information, which represent key aspects of fruit monitoring. 3D colored point clouds, instead, can offer this information but they introduce challenges such as their sparsity and irregularity. In this paper, we present a novel approach for temporal fruit monitoring that addresses point clouds collected in a greenhouse over time. Our method segments fruits using a learning-based instance segmentation approach directly on the point cloud. Each segmented fruit is processed by a 3D sparse convolutional neural network to extract descriptors, which are used in an attention-based matching network to associate fruits with their instances from previous data collections. Experimental results on a real dataset of strawberries demonstrate that our approach outperforms other methods for fruits re-identification over time, allowing for precise temporal fruit monitoring in real and complex scenarios.


Moving Object Segmentation in Point Cloud Data using Hidden Markov Models

Bhandari, Vedant, James, Jasmin, Phillips, Tyson, McAree, P. Ross

arXiv.org Artificial Intelligence

Autonomous agents require the capability to identify dynamic objects in their environment for safe planning and navigation. Incomplete and erroneous dynamic detections jeopardize the agent's ability to accomplish its task. Dynamic detection is a challenging problem due to the numerous sources of uncertainty inherent in the problem's inputs and the wide variety of applications, which often lead to use-case-tailored solutions. We propose a robust learning-free approach to segment moving objects in point cloud data. The foundation of the approach lies in modelling each voxel using a hidden Markov model (HMM), and probabilistically integrating beliefs into a map using an HMM filter. The proposed approach is tested on benchmark datasets and consistently performs better than or as well as state-of-the-art methods with strong generalized performance across sensor characteristics and environments. The approach is open-sourced at https://github.com/vb44/HMM-MOS.


HeLiMOS: A Dataset for Moving Object Segmentation in 3D Point Clouds From Heterogeneous LiDAR Sensors

Lim, Hyungtae, Jang, Seoyeon, Mersch, Benedikt, Behley, Jens, Myung, Hyun, Stachniss, Cyrill

arXiv.org Artificial Intelligence

Moving object segmentation (MOS) using a 3D light detection and ranging (LiDAR) sensor is crucial for scene understanding and identification of moving objects. Despite the availability of various types of 3D LiDAR sensors in the market, MOS research still predominantly focuses on 3D point clouds from mechanically spinning omnidirectional LiDAR sensors. Thus, we are, for example, lacking a dataset with MOS labels for point clouds from solid-state LiDAR sensors which have irregular scanning patterns. In this paper, we present a labeled dataset, called \textit{HeLiMOS}, that enables to test MOS approaches on four heterogeneous LiDAR sensors, including two solid-state LiDAR sensors. Furthermore, we introduce a novel automatic labeling method to substantially reduce the labeling effort required from human annotators. To this end, our framework exploits an instance-aware static map building approach and tracking-based false label filtering. Finally, we provide experimental results regarding the performance of commonly used state-of-the-art MOS approaches on HeLiMOS that suggest a new direction for a sensor-agnostic MOS, which generally works regardless of the type of LiDAR sensors used to capture 3D point clouds. Our dataset is available at https://sites.google.com/view/helimos.


TOSS: Real-time Tracking and Moving Object Segmentation for Static Scene Mapping

Jang, Seoyeon, Oh, Minho, Yu, Byeongho, Nahrendra, I Made Aswin, Lee, Seungjae, Lim, Hyungtae, Myung, Hyun

arXiv.org Artificial Intelligence

Safe navigation with simultaneous localization and mapping (SLAM) for autonomous robots is crucial in challenging environments. To achieve this goal, detecting moving objects in the surroundings and building a static map are essential. However, existing moving object segmentation methods have been developed separately for each field, making it challenging to perform real-time navigation and precise static map building simultaneously. In this paper, we propose an integrated real-time framework that combines online tracking-based moving object segmentation with static map building. For safe navigation, we introduce a computationally efficient hierarchical association cost matrix to enable real-time moving object segmentation. In the context of precise static mapping, we present a voting-based method, DS-Voting, designed to achieve accurate dynamic object removal and static object recovery by emphasizing their spatio-temporal differences. We evaluate our proposed method quantitatively and qualitatively in the SemanticKITTI dataset and real-world challenging environments. The results demonstrate that dynamic objects can be clearly distinguished and incorporated into static map construction, even in stairs, steep hills, and dense vegetation.


A Dataset and Benchmark for Shape Completion of Fruits for Agricultural Robotics

Magistri, Federico, Läbe, Thomas, Marks, Elias, Nagulavancha, Sumanth, Pan, Yue, Smitt, Claus, Klingbeil, Lasse, Halstead, Michael, Kuhlmann, Heiner, McCool, Chris, Behley, Jens, Stachniss, Cyrill

arXiv.org Artificial Intelligence

As the population is expected to reach 10 billion by 2050, our agricultural production system needs to double its productivity despite a decline of human workforce in the agricultural sector. Autonomous robotic systems are one promising pathway to increase productivity by taking over labor-intensive manual tasks like fruit picking. To be effective, such systems need to monitor and interact with plants and fruits precisely, which is challenging due to the cluttered nature of agricultural environments causing, for example, strong occlusions. Thus, being able to estimate the complete 3D shapes of objects in presence of occlusions is crucial for automating operations such as fruit harvesting. In this paper, we propose the first publicly available 3D shape completion dataset for agricultural vision systems. We provide an RGB-D dataset for estimating the 3D shape of fruits. Specifically, our dataset contains RGB-D frames of single sweet peppers in lab conditions but also in a commercial greenhouse. For each fruit, we additionally collected high-precision point clouds that we use as ground truth. For acquiring the ground truth shape, we developed a measuring process that allows us to record data of real sweet pepper plants, both in the lab and in the greenhouse with high precision, and determine the shape of the sensed fruits. We release our dataset, consisting of almost 7000 RGB-D frames belonging to more than 100 different fruits. We provide segmented RGB-D frames, with camera instrinsics to easily obtain colored point clouds, together with the corresponding high-precision, occlusion-free point clouds obtained with a high-precision laser scanner. We additionally enable evaluation ofshape completion approaches on a hidden test set through a public challenge on a benchmark server.