off-road environment
Advancing Off-Road Autonomous Driving: The Large-Scale ORAD-3D Dataset and Comprehensive Benchmarks
Min, Chen, Mei, Jilin, Zhai, Heng, Wang, Shuai, Sun, Tong, Kong, Fanjie, Li, Haoyang, Mao, Fangyuan, Liu, Fuyang, Wang, Shuo, Nie, Yiming, Zhu, Qi, Xiao, Liang, Zhao, Dawei, Hu, Yu
Abstract-- A major bottleneck in off-road autonomous driving research lies in the scarcity of large-scale, high-quality datasets and benchmarks. T o bridge this gap, we present ORAD-3D, which, to the best of our knowledge, is the largest dataset specifically curated for off-road autonomous driving. ORAD-3D covers a wide spectrum of terrains--including woodlands, farmlands, grasslands, riversides, gravel roads, cement roads, and rural areas--while capturing diverse environmental variations across weather conditions (sunny, rainy, foggy, and snowy) and illumination levels (bright daylight, daytime, twilight, and nighttime). Building upon this dataset, we establish a comprehensive suite of benchmark evaluations spanning five fundamental tasks: 2D free-space detection, 3D occupancy prediction, rough GPS-guided path planning, vision-language model-driven autonomous driving, and world model for off-road environments. T ogether, the dataset and benchmarks provide a unified and robust resource for advancing perception and planning in challenging off-road scenarios. The dataset and code will be made publicly available at https://github.
Self-Supervised Traversability Learning with Online Prototype Adaptation for Off-Road Autonomous Driving
Bu, Yafeng, Sun, Zhenping, Li, Xiaohui, Zeng, Jun, Zhang, Xin, Shen, Hui
Achieving reliable and safe autonomous driving in off-road environments requires accurate and efficient terrain traversability analysis. However, this task faces several challenges, including the scarcity of large-scale datasets tailored for off-road scenarios, the high cost and potential errors of manual annotation, the stringent real-time requirements of motion planning, and the limited computational power of onboard units. To address these challenges, this paper proposes a novel traversability learning method that leverages self-supervised learning, eliminating the need for manual annotation. For the first time, a Birds-Eye View (BEV) representation is used as input, reducing computational burden and improving adaptability to downstream motion planning. During vehicle operation, the proposed method conducts online analysis of traversed regions and dynamically updates prototypes to adaptively assess the traversability of the current environment, effectively handling dynamic scene changes. We evaluate our approach against state-of-the-art benchmarks on both public datasets and our own dataset, covering diverse seasons and geographical locations. Experimental results demonstrate that our method significantly outperforms recent approaches. Additionally, real-world vehicle experiments show that our method operates at 10 Hz, meeting real-time requirements, while a 5.5 km autonomous driving experiment further validates the generated traversability cost maps compatibility with downstream motion planning.
Learning Autonomy: Off-Road Navigation Enhanced by Human Input
Nagariya, Akhil, Filev, Dimitar, Saripalli, Srikanth, Pandey, Gaurav
Successfully navigating these environments requires leveraging both visual and geometric features effectively. Modeling tire-terrain interactions and vehicle dynamics across diverse off-road conditions is a complex task. Even with accurate models, tuning the planning algorithm to navigate safely across different terrains demands extensive time and expertise. In our research, we introduce a demonstration-based local planning algorithm that bypasses the need for directly modeling these intricate dynamic interactions. Instead, it learns navigation preferences from human driving data, demonstrating the ability to adapt these learned behaviors from simulations to real vehicles with minimal manual adjustments. Our approach uses utility functions to directly extract key features from segmented images and learns human driving behaviour using demonstration data. This approach diverges from traditional methods, which typically require either extensive labeled data for end-to-end learning or precise sensor calibration and global mapping in classical robotics approaches.
GO: The Great Outdoors Multimodal Dataset
Jiang, Peng, Viswanath, Kasi, Nagariya, Akhil, Chustz, George, Wigness, Maggie, Osteen, Philip, Overbye, Timothy, Ellis, Christian, Quang, Long, Saripalli, Srikanth
The Great Outdoors (GO) dataset is a multi-modal annotated data resource aimed at advancing ground robotics research in unstructured environments. This dataset provides the most comprehensive set of data modalities and annotations compared to existing off-road datasets. In total, the GO dataset includes six unique sensor types with high-quality semantic annotations and GPS traces to support tasks such as semantic segmentation, object detection, and SLAM. The diverse environmental conditions represented in the dataset present significant real-world challenges that provide opportunities to develop more robust solutions to support the continued advancement of field robotics, autonomous exploration, and perception systems in natural environments. The dataset can be downloaded at: https://www.unmannedlab.org/the-great-outdoors-dataset/
MCRL4OR: Multimodal Contrastive Representation Learning for Off-Road Environmental Perception
Yang, Yi, Zhang, Zhang, Wang, Liang
Most studies on environmental perception for autonomous vehicles (AVs) focus on urban traffic environments, where the objects/stuff to be perceived are mainly from man-made scenes and scalable datasets with dense annotations can be used to train supervised learning models. By contrast, it is hard to densely annotate a large-scale off-road driving dataset manually due to the inherently unstructured nature of off-road environments. In this paper, we propose a Multimodal Contrastive Representation Learning approach for Off-Road environmental perception, namely MCRL4OR. This approach aims to jointly learn three encoders for processing visual images, locomotion states, and control actions by aligning the locomotion states with the fused features of visual images and control actions within a contrastive learning framework. The causation behind this alignment strategy is that the inertial locomotion state is the result of taking a certain control action under the current landform/terrain condition perceived by visual sensors. In experiments, we pre-train the MCRL4OR with a large-scale off-road driving dataset and adopt the learned multimodal representations for various downstream perception tasks in off-road driving scenarios. The superior performance in downstream tasks demonstrates the advantages of the pre-trained multimodal representations. The codes can be found in \url{https://github.com/1uciusy/MCRL4OR}.
IRisPath: Enhancing Off-Road Navigation with Robust IR-RGB Fusion for Improved Day and Night Traversability
Sharma, Saksham, Raizada, Akshit, Sundaram, Suresh
Autonomous off-road navigation is required for applications in agriculture, construction, search and rescue and defence. Traditional on-road autonomous methods struggle with dynamic terrains, leading to poor vehicle control on off-road. Recent deep-learning models have used perception sensors along with kinesthetic feedback for navigation on such terrains. However, this approach has out-of-domain uncertainty. Factors like change in weather and time of day impacts the performance of the model. We propose a multi modal fusion network FuseIsPath capable of using LWIR and RGB images to provide robustness against dynamic weather and light conditions. To aid further works in this domain, we also open-source a day-night dataset with LWIR and RGB images along with pseudo-labels for traversability. In order to co-register the two images we developed a novel method for targetless extrinsic calibration of LWIR, LiDAR and RGB cameras with translation accuracy of 1.7cm and rotation accuracy of 0.827degree.
WildOcc: A Benchmark for Off-Road 3D Semantic Occupancy Prediction
Zhai, Heng, Mei, Jilin, Min, Chen, Chen, Liang, Zhao, Fangzhou, Hu, Yu
3D semantic occupancy prediction is an essential part of autonomous driving, focusing on capturing the geometric details of scenes. Off-road environments are rich in geometric information, therefore it is suitable for 3D semantic occupancy prediction tasks to reconstruct such scenes. However, most of researches concentrate on on-road environments, and few methods are designed for off-road 3D semantic occupancy prediction due to the lack of relevant datasets and benchmarks. In response to this gap, we introduce WildOcc, to our knowledge, the first benchmark to provide dense occupancy annotations for off-road 3D semantic occupancy prediction tasks. A ground truth generation pipeline is proposed in this paper, which employs a coarse-to-fine reconstruction to achieve a more realistic result. Moreover, we introduce a multi-modal 3D semantic occupancy prediction framework, which fuses spatio-temporal information from multi-frame images and point clouds at voxel level. In addition, a cross-modality distillation function is introduced, which transfers geometric knowledge from point clouds to image features.
BenchNav: Simulation Platform for Benchmarking Off-road Navigation Algorithms with Probabilistic Traversability
Endo, Masafumi, Honda, Kohei, Ishigami, Genya
As robotic navigation techniques in perception and planning advance, mobile robots increasingly venture into off-road environments involving complex traversability. However, selecting suitable planning methods remains a challenge due to their algorithmic diversity, as each offers unique benefits. To aid in algorithm design, we introduce BenchNav, an open-source PyTorch-based simulation platform for benchmarking off-road navigation with uncertain traversability. Built upon Gymnasium, BenchNav provides three key features: 1) a data generation pipeline for preparing synthetic natural environments, 2) built-in machine learning models for traversability prediction, and 3) consistent execution of path and motion planning across different algorithms. We show BenchNav's versatility through simulation examples in off-road environments, employing three representative planning algorithms from different domains. https://github.com/masafumiendo/benchnav
Motion planning for off-road autonomous driving based on human-like cognition and weight adaptation
Wang, Yuchun, Gong, Cheng, Gong, Jianwei, Jia, Peng
Driving in an off-road environment is challenging for autonomous vehicles due to the complex and varied terrain. To ensure stable and efficient travel, the vehicle requires consideration and balancing of environmental factors, such as undulations, roughness, and obstacles, to generate optimal trajectories that can adapt to changing scenarios. However, traditional motion planners often utilize a fixed cost function for trajectory optimization, making it difficult to adapt to different driving strategies in challenging irregular terrains and uncommon scenarios. To address these issues, we propose an adaptive motion planner based on human-like cognition and cost evaluation for off-road driving. First, we construct a multi-layer map describing different features of off-road terrains, including terrain elevation, roughness, obstacle, and artificial potential field map. Subsequently, we employ a CNN-LSTM network to learn the trajectories planned by human drivers in various off-road scenarios. Then, based on human-like generated trajectories in different environments, we design a primitive-based trajectory planner that aims to mimic human trajectories and cost weight selection, generating trajectories that are consistent with the dynamics of off-road vehicles. Finally, we compute optimal cost weights and select and extend behavioral primitives to generate highly adaptive, stable, and efficient trajectories. We validate the effectiveness of the proposed method through experiments in a desert off-road environment with complex terrain and varying road conditions. The experimental results show that the proposed human-like motion planner has excellent adaptability to different off-road conditions. It shows real-time operation, greater stability, and more human-like planning ability in diverse and challenging scenarios.
UFO: Uncertainty-aware LiDAR-image Fusion for Off-road Semantic Terrain Map Estimation
Kim, Ohn, Seo, Junwon, Ahn, Seongyong, Kim, Chong Hui
Autonomous off-road navigation requires an accurate semantic understanding of the environment, often converted into a bird's-eye view (BEV) representation for various downstream tasks. While learning-based methods have shown success in generating local semantic terrain maps directly from sensor data, their efficacy in off-road environments is hindered by challenges in accurately representing uncertain terrain features. This paper presents a learning-based fusion method for generating dense terrain classification maps in BEV. By performing LiDAR-image fusion at multiple scales, our approach enhances the accuracy of semantic maps generated from an RGB image and a single-sweep LiDAR scan. Utilizing uncertainty-aware pseudo-labels further enhances the network's ability to learn reliably in off-road environments without requiring precise 3D annotations. By conducting thorough experiments using off-road driving datasets, we demonstrate that our method can improve accuracy in off-road terrains, validating its efficacy in facilitating reliable and safe autonomous navigation in challenging off-road settings.