Frémont, Vincent
Active Collaborative Visual SLAM exploiting ORB Features
Ahmed, Muhammad Farhan, Frémont, Vincent, Fantoni, Isabelle
In autonomous robotics, a significant challenge involves devising robust solutions for Active Collaborative SLAM (AC-SLAM). This process requires multiple robots to cooperatively explore and map an unknown environment by intelligently coordinating their movements and sensor data acquisition. In this article, we present an efficient visual AC-SLAM method using aerial and ground robots for environment exploration and mapping. We propose an efficient frontiers filtering method that takes into account the common IoU map frontiers and reduces the frontiers for each robot. Additionally, we also present an approach to guide robots to previously visited goal positions to promote loop closure to reduce SLAM uncertainty. The proposed method is implemented in ROS and evaluated through simulations on publicly available datasets and similar methods, achieving an accumulative average of 59% of increase in area coverage.
Entropy Based Multi-robot Active SLAM
Ahmed, Muhammad Farhan, Maragliano, Matteo, Frémont, Vincent, Recchiuto, Carmine Tommaso
The objective is to find the optimal state vector that minimizes the measurement error between the estimated pose and environmental landmarks. Most SLAM algorithms are passive, i.e., the robot is controlled manually and the navigation or path planning algorithm does not actively take part in robot motion or trajectory. Active SLAM (A-SLAM), however, tries to solve the optimal exploration problem of the unknown environment by proposing a navigation strategy that generates future goal/target positions actions which decrease map and pose uncertainties, thus enabling a fully autonomous navigation and mapping SLAM system without the need of an external controller or human effort. In Active Collaborative SLAM (AC-SLAM) multiple robots interchange information to improve their localization estimation and map accuracy to achieve some high-level tasks such as exploration. The exchanged information can be localization information [1], entropy [2], visual features [3], and frontier points [4]. In this article, we present a multi-agent AC-SLAM system for efficient environment exploration using frontiers detected over an Occupancy Grid (OG) map. In particular, in this work, we aim at: 1. Extending the A-SLAM approach of [5] which uses a computationally inexpensive D-optimality criterion for utility computation to a multi-agent AC-SLAM framework.
Practical Collaborative Perception: A Framework for Asynchronous and Multi-Agent 3D Object Detection
Dao, Minh-Quan, Berrio, Julie Stephany, Frémont, Vincent, Shan, Mao, Héry, Elwan, Worrall, Stewart
Occlusion is a major challenge for LiDAR-based object detection methods. This challenge becomes safety-critical in urban traffic where the ego vehicle must have reliable object detection to avoid collision while its field of view is severely reduced due to the obstruction posed by a large number of road users. Collaborative perception via Vehicle-to-Everything (V2X) communication, which leverages the diverse perspective thanks to the presence at multiple locations of connected agents to form a complete scene representation, is an appealing solution. State-of-the-art V2X methods resolve the performance-bandwidth tradeoff using a mid-collaboration approach where the Bird-Eye View images of point clouds are exchanged so that the bandwidth consumption is lower than communicating point clouds as in early collaboration, and the detection performance is higher than late collaboration, which fuses agents' output, thanks to a deeper interaction among connected agents. While achieving strong performance, the real-world deployment of most mid-collaboration approaches is hindered by their overly complicated architectures, involving learnable collaboration graphs and autoencoder-based compressor/ decompressor, and unrealistic assumptions about inter-agent synchronization. In this work, we devise a simple yet effective collaboration method that achieves a better bandwidth-performance tradeoff than prior state-of-the-art methods while minimizing changes made to the single-vehicle detection models and relaxing unrealistic assumptions on inter-agent synchronization. Experiments on the V2X-Sim dataset show that our collaboration method achieves 98\% of the performance of an early-collaboration method, while only consuming the equivalent bandwidth of a late-collaboration method.
Aligning Bird-Eye View Representation of Point Cloud Sequences using Scene Flow
Dao, Minh-Quan, Frémont, Vincent, Héry, Elwan
Low-resolution point clouds are challenging for object detection methods due to their sparsity. Densifying the present point cloud by concatenating it with its predecessors is a popular solution to this challenge. Such concatenation is possible thanks to the removal of ego vehicle motion using its odometry. This method is called Ego Motion Compensation (EMC). Thanks to the added points, EMC significantly improves the performance of single-frame detectors. However, it suffers from the shadow effect that manifests in dynamic objects' points scattering along their trajectories. This effect results in a misalignment between feature maps and objects' locations, thus limiting performance improvement to stationary and slow-moving objects only. Scene flow allows aligning point clouds in 3D space, thus naturally resolving the misalignment in feature spaces. By observing that scene flow computation shares several components with 3D object detection pipelines, we develop a plug-in module that enables single-frame detectors to compute scene flow to rectify their Bird-Eye View representation. Experiments on the NuScenes dataset show that our module leads to a significant increase (up to 16%) in the Average Precision of large vehicles, which interestingly demonstrates the most severe shadow effect. The code is published at https://github.com/quan-dao/pc-corrector.
A two-stage data association approach for 3D Multi-object Tracking
Dao, Minh-Quan, Frémont, Vincent
Multi-object tracking (MOT) is an integral part of any autonomous driving pipelines because itproduces trajectories which has been taken by other moving objects in the scene and helps predicttheir future motion. Thanks to the recent advances in 3D object detection enabled by deep learning,track-by-detection has become the dominant paradigm in 3D MOT. In this paradigm, a MOT systemis essentially made of an object detector and a data association algorithm which establishes track-to-detection correspondence. While 3D object detection has been actively researched, associationalgorithms for 3D MOT seem to settle at a bipartie matching formulated as a linear assignmentproblem (LAP) and solved by the Hungarian algorithm. In this paper, we adapt a two-stage dataassociation method which was successful in image-based tracking to the 3D setting, thus providingan alternative for data association for 3D MOT. Our method outperforms the baseline using one-stagebipartie matching for data association by achieving 0.587 AMOTA in NuScenes validation set.
R-AGNO-RPN: A LIDAR-Camera Region Deep Network for Resolution-Agnostic Detection
Théodose, Ruddy, Denis, Dieumet, Chateau, Thierry, Frémont, Vincent, Checchin, Paul
Current neural networks-based object detection approaches processing LiDAR point clouds are generally trained from one kind of LiDAR sensors. However, their performances decrease when they are tested with data coming from a different LiDAR sensor than the one used for training, i.e., with a different point cloud resolution. In this paper, R-AGNO-RPN, a region proposal network built on fusion of 3D point clouds and RGB images is proposed for 3D object detection regardless of point cloud resolution. As our approach is designed to be also applied on low point cloud resolutions, the proposed method focuses on object localization instead of estimating refined boxes on reduced data. The resilience to low-resolution point cloud is obtained through image features accurately mapped to Bird's Eye View and a specific data augmentation procedure that improves the contribution of the RGB images. To show the proposed network's ability to deal with different point clouds resolutions, experiments are conducted on both data coming from the KITTI 3D Object Detection and the nuScenes datasets. In addition, to assess its performances, our method is compared to PointPillars, a well-known 3D detection network. Experimental results show that even on point cloud data reduced by $80\%$ of its original points, our method is still able to deliver relevant proposals localization.