Goto

Collaborating Authors

 lidar







Forget About the LiDAR: Self-Supervised Depth Estimators with MED Probability Volumes

Neural Information Processing Systems

Self-supervised depth estimators have recently shown results comparable to the supervised methods on the challenging single image depth estimation (SIDE) task, by exploiting the geometrical relations between target and reference views in the training data. However, previous methods usually learn forward or backward image synthesis, but not depth estimation, as they cannot effectively neglect occlusions between the target and the reference images. Previous works rely on rigid photometric assumptions or on the SIDE network to infer depth and occlusions, resulting in limited performance. On the other hand, we propose a method to Forget About the LiDAR (FAL), with Mirrored Exponential Disparity (MED) probability volumes for the training of monocular depth estimators from stereo images. Our MED representation allows us to obtain geometrically inspired occlusion maps with our novel Mirrored Occlusion Module (MOM), which does not impose a learning burden on our FAL-net.


RLCNet: An end-to-end deep learning framework for simultaneous online calibration of LiDAR, RADAR, and Camera

Cholakkal, Hafeez Husain, Arrigoni, Stefano, Braghin, Francesco

arXiv.org Artificial Intelligence

UTONOMOUS vehicles are poised to revolutionize transportation by improving road safety, reducing traffic congestion, and increasing mobility convenience [1]. To perceive and interact with their environment accurately, these vehicles rely on a combination of complementary sensors, including LiDAR, RADAR, and cameras. Each sensor offers unique advantages: cameras capture rich visual detail, LiDAR provides precise 3D spatial measurements, and RADAR performs robustly under adverse weather conditions [2]. Sensor fusion leverages the strengths of these modalities to ensure redundancy and resilience, allowing the vehicle to maintain accurate perception in diverse and dynamic environments [3]. A critical component of sensor fusion is extrinsic calibration, which involves the determination of the relative positions and orientations of sensors in a common coordinate frame. However, maintaining precise calibration over time is a persistent challenge. Factors such as mechanical vibrations, temperature changes, and minor collisions can lead to sensor drift, where even small misalignments in sensor orientation or position can result in substantial perception errors, potentially compromising vehicle safety.


TEMPO-VINE: A Multi-Temporal Sensor Fusion Dataset for Localization and Mapping in Vineyards

Martini, Mauro, Ambrosio, Marco, Vilella-Cantos, Judith, Navone, Alessandro, Chiaberge, Marcello

arXiv.org Artificial Intelligence

In recent years, precision agriculture has been introducing groundbreaking innovations in the field, with a strong focus on automation. However, research studies in robotics and autonomous navigation often rely on controlled simulations or isolated field trials. The absence of a realistic common benchmark represents a significant limitation for the diffusion of robust autonomous systems under real complex agricultural conditions. Vineyards pose significant challenges due to their dynamic nature, and they are increasingly drawing attention from both academic and industrial stakeholders interested in automation. In this context, we introduce the TEMPO-VINE dataset, a large-scale multi-temporal dataset specifically designed for evaluating sensor fusion, simultaneous localization and mapping (SLAM), and place recognition techniques within operational vineyard environments. TEMPO-VINE is the first multi-modal public dataset that brings together data from heterogeneous LiDARs of different price levels, AHRS, RTK-GPS, and cameras in real trellis and pergola vineyards, with multiple rows exceeding 100 m in length. In this work, we address a critical gap in the landscape of agricultural datasets by providing researchers with a comprehensive data collection and ground truth trajectories in different seasons, vegetation growth stages, terrain and weather conditions. The sequence paths with multiple runs and revisits will foster the development of sensor fusion, localization, mapping and place recognition solutions for agricultural fields. The dataset, the processing tools and the benchmarking results will be available at the dedicated webpage upon acceptance.


nuScenes Revisited: Progress and Challenges in Autonomous Driving

Fong, Whye Kit, Liong, Venice Erin, Tan, Kok Seang, Caesar, Holger

arXiv.org Artificial Intelligence

Autonomous Vehicles (AV) and Advanced Driver Assistance Systems (ADAS) have been revolutionized by Deep Learning. As a data-driven approach, Deep Learning relies on vast amounts of driving data, typically labeled in great detail. As a result, datasets, alongside hardware and algorithms, are foundational building blocks for the development of AVs. In this work we revisit one of the most widely used autonomous driving datasets: the nuScenes dataset. nuScenes exemplifies key trends in AV development, being the first dataset to include radar data, to feature diverse urban driving scenes from two continents, and to be collected using a fully autonomous vehicle operating on public roads, while also promoting multi-modal sensor fusion, standardized benchmarks, and a broad range of tasks including perception, localization \& mapping, prediction and planning. We provide an unprecedented look into the creation of nuScenes, as well as its extensions nuImages and Panoptic nuScenes, summarizing many technical details that have hitherto not been revealed in academic publications. Furthermore, we trace how the influence of nuScenes impacted a large number of other datasets that were released later and how it defined numerous standards that are used by the community to this day. Finally, we present an overview of both official and unofficial tasks using the nuScenes dataset and review major methodological developments, thereby offering a comprehensive survey of the autonomous driving literature, with a particular focus on nuScenes.


Resource-Efficient Beam Prediction in mmWave Communications with Multimodal Realistic Simulation Framework

Park, Yu Min, Tun, Yan Kyaw, Huh, Eui-Nam, Saad, Walid, Hong, Choong Seon

arXiv.org Artificial Intelligence

Beamforming is a key technology in millimeter-wave (mmWave) communications that improves signal transmission by optimizing directionality and intensity. However, conventional channel estimation methods, such as pilot signals or beam sweeping, often fail to adapt to rapidly changing communication environments. To address this limitation, multimodal sensing-aided beam prediction has gained significant attention, using various sensing data from devices such as LiDAR, radar, GPS, and RGB images to predict user locations or network conditions. Despite its promising potential, the adoption of multimodal sensing-aided beam prediction is hindered by high computational complexity, high costs, and limited datasets. Thus, in this paper, a novel resource-efficient learning framework is introduced for beam prediction, which leverages a custom-designed cross-modal relational knowledge distillation (CRKD) algorithm specifically tailored for beam prediction tasks, to transfer knowledge from a multimodal network to a radar-only student model, achieving high accuracy with reduced computational cost. To enable multimodal learning with realistic data, a novel multimodal simulation framework is developed while integrating sensor data generated from the autonomous driving simulator CARLA with MATLAB-based mmWave channel modeling, and reflecting real-world conditions. The proposed CRKD achieves its objective by distilling relational information across different feature spaces, which enhances beam prediction performance without relying on expensive sensor data. Simulation results demonstrate that CRKD efficiently distills multimodal knowledge, allowing a radar-only model to achieve $94.62%$ of the teacher performance. In particular, this is achieved with just $10%$ of the teacher network's parameters, thereby significantly reducing computational complexity and dependence on multimodal sensor data.