Goto

Collaborating Authors

 Saripalli, Srikanth


Learning Autonomy: Off-Road Navigation Enhanced by Human Input

arXiv.org Artificial Intelligence

Successfully navigating these environments requires leveraging both visual and geometric features effectively. Modeling tire-terrain interactions and vehicle dynamics across diverse off-road conditions is a complex task. Even with accurate models, tuning the planning algorithm to navigate safely across different terrains demands extensive time and expertise. In our research, we introduce a demonstration-based local planning algorithm that bypasses the need for directly modeling these intricate dynamic interactions. Instead, it learns navigation preferences from human driving data, demonstrating the ability to adapt these learned behaviors from simulations to real vehicles with minimal manual adjustments. Our approach uses utility functions to directly extract key features from segmented images and learns human driving behaviour using demonstration data. This approach diverges from traditional methods, which typically require either extensive labeled data for end-to-end learning or precise sensor calibration and global mapping in classical robotics approaches.


GO: The Great Outdoors Multimodal Dataset

arXiv.org Artificial Intelligence

The Great Outdoors (GO) dataset is a multi-modal annotated data resource aimed at advancing ground robotics research in unstructured environments. This dataset provides the most comprehensive set of data modalities and annotations compared to existing off-road datasets. In total, the GO dataset includes six unique sensor types with high-quality semantic annotations and GPS traces to support tasks such as semantic segmentation, object detection, and SLAM. The diverse environmental conditions represented in the dataset present significant real-world challenges that provide opportunities to develop more robust solutions to support the continued advancement of field robotics, autonomous exploration, and perception systems in natural environments. The dataset can be downloaded at: https://www.unmannedlab.org/the-great-outdoors-dataset/


Exploring Unstructured Environments using Minimal Sensing on Cooperative Nano-Drones

arXiv.org Artificial Intelligence

Recent advances have improved autonomous navigation and mapping under payload constraints, but current multi-robot inspection algorithms are unsuitable for nano-drones due to their need for heavy sensors and high computational resources. To address these challenges, we introduce ExploreBug, a novel hybrid frontier range bug algorithm designed to handle limited sensing capabilities for a swarm of nano-drones. This system includes three primary components: a mapping subsystem, an exploration subsystem, and a navigation subsystem. Additionally, an intra-swarm collision avoidance system is integrated to prevent collisions between drones. We validate the efficacy of our approach through extensive simulations and real-world exploration experiments involving up to seven drones in simulations and three in real-world settings, across various obstacle configurations and with a maximum navigation speed of 0.75 m/s. Our tests demonstrate that the algorithm efficiently completes exploration tasks, even with minimal sensing, across different swarm sizes and obstacle densities. Furthermore, our frontier allocation heuristic ensures an equal distribution of explored areas and paths traveled by each drone in the swarm. We publicly release the source code of the proposed system to foster further developments in mapping and exploration using autonomous nano drones.


Reflectivity Is All You Need!: Advancing LiDAR Semantic Segmentation

arXiv.org Artificial Intelligence

LiDAR semantic segmentation frameworks predominantly leverage geometry-based features to differentiate objects within a scan. While these methods excel in scenarios with clear boundaries and distinct shapes, their performance declines in environments where boundaries are blurred, particularly in off-road contexts. To address this, recent strides in 3D segmentation algorithms have focused on harnessing raw LiDAR intensity measurements to improve prediction accuracy. Despite these efforts, current learning-based models struggle to correlate the intricate connections between raw intensity and factors such as distance, incidence angle, material reflectivity, and atmospheric conditions. Building upon our prior work, this paper delves into the advantages of employing calibrated intensity (also referred to as reflectivity) within learning-based LiDAR semantic segmentation frameworks. We initially establish that incorporating reflectivity as an input enhances the existing LiDAR semantic segmentation model. Furthermore, we present findings that enable the model to learn to calibrate intensity can boost its performance. Through extensive experimentation on the off-road dataset Rellis-3D, we demonstrate notable improvements. Specifically, converting intensity to reflectivity results in a 4% increase in mean Intersection over Union (mIoU) when compared to using raw intensity in Off-road scenarios. Additionally, we also investigate the possible benefits of using calibrated intensity in semantic segmentation in urban environments (SemanticKITTI) and cross-sensor domain adaptation.


3DGS-ReLoc: 3D Gaussian Splatting for Map Representation and Visual ReLocalization

arXiv.org Artificial Intelligence

This paper presents a novel system designed for 3D mapping and visual relocalization using 3D Gaussian Splatting. Our proposed method uses LiDAR and camera data to create accurate and visually plausible representations of the environment. By leveraging LiDAR data to initiate the training of the 3D Gaussian Splatting map, our system constructs maps that are both detailed and geometrically accurate. To mitigate excessive GPU memory usage and facilitate rapid spatial queries, we employ a combination of a 2D voxel map and a KD-tree. This preparation makes our method well-suited for visual localization tasks, enabling efficient identification of correspondences between the query image and the rendered image from the Gaussian Splatting map via normalized cross-correlation (NCC). Additionally, we refine the camera pose of the query image using feature-based matching and the Perspective-n-Point (PnP) technique. The effectiveness, adaptability, and precision of our system are demonstrated through extensive evaluation on the KITTI360 dataset.


GPT-4V Takes the Wheel: Promises and Challenges for Pedestrian Behavior Prediction

arXiv.org Artificial Intelligence

Predicting pedestrian behavior is the key to ensure safety and reliability of autonomous vehicles. While deep learning methods have been promising by learning from annotated video frame sequences, they often fail to fully grasp the dynamic interactions between pedestrians and traffic, crucial for accurate predictions. These models also lack nuanced common sense reasoning. Moreover, the manual annotation of datasets for these models is expensive and challenging to adapt to new situations. The advent of Vision Language Models (VLMs) introduces promising alternatives to these issues, thanks to their advanced visual and causal reasoning skills. To our knowledge, this research is the first to conduct both quantitative and qualitative evaluations of VLMs in the context of pedestrian behavior prediction for autonomous driving. We evaluate GPT-4V(ision) on publicly available pedestrian datasets: JAAD and WiDEVIEW. Our quantitative analysis focuses on GPT-4V's ability to predict pedestrian behavior in current and future frames. The model achieves a 57% accuracy in a zero-shot manner, which, while impressive, is still behind the state-of-the-art domain-specific models (70%) in predicting pedestrian crossing actions. Qualitatively, GPT-4V shows an impressive ability to process and interpret complex traffic scenarios, differentiate between various pedestrian behaviors, and detect and analyze groups. However, it faces challenges, such as difficulty in detecting smaller pedestrians and assessing the relative motion between pedestrians and the ego vehicle.


Online Multi-IMU Calibration Using Visual-Inertial Odometry

arXiv.org Artificial Intelligence

This work presents a centralized multi-IMU filter framework with online intrinsic and extrinsic calibration for unsynchronized inertial measurement units that is robust against changes in calibration parameters. The novel EKF-based method estimates the positional and rotational offsets of the system of sensors as well as their intrinsic biases without the use of rigid body geometric constraints. Additionally, the filter is flexible in the total number of sensors used while leveraging the commonly used MSCKF framework for camera measurements. The filter framework has been validated using Monte Carlo simulation as well as experimentally. In both simulations and experiments, using multiple IMU measurement streams within the proposed filter framework outperforms the use of a single IMU in a filter prediction step while also producing consistent and accurate estimates of initial calibration errors. Compared to current state-of-the-art optimizers, the filter produces similar intrinsic and extrinsic calibration parameters for each sensor. Finally, an open source repository has been provided at https://github.com/unmannedlab/ekf-cal containing both the online estimator and the simulation used for testing and evaluation.


Off-Road LiDAR Intensity Based Semantic Segmentation

arXiv.org Artificial Intelligence

LiDAR is used in autonomous driving to provide 3D spatial information and enable accurate perception in off-road environments, aiding in obstacle detection, mapping, and path planning. Learning-based LiDAR semantic segmentation utilizes machine learning techniques to automatically classify objects and regions in LiDAR point clouds. Learning-based models struggle in off-road environments due to the presence of diverse objects with varying colors, textures, and undefined boundaries, which can lead to difficulties in accurately classifying and segmenting objects using traditional geometric-based features. In this paper, we address this problem by harnessing the LiDAR intensity parameter to enhance object segmentation in off-road environments. Our approach was evaluated in the RELLIS-3D data set and yielded promising results as a preliminary analysis with improved mIoU for classes "puddle" and "grass" compared to more complex deep learning-based benchmarks. The methodology was evaluated for compatibility across both Velodyne and Ouster LiDAR systems, assuring its cross-platform applicability. This analysis advocates for the incorporation of calibrated intensity as a supplementary input, aiming to enhance the prediction accuracy of learning based semantic segmentation frameworks. https://github.com/MOONLABIISERB/lidar-intensity-predictor/tree/main


Vehicular Teamwork: Collaborative localization of Autonomous Vehicles

arXiv.org Artificial Intelligence

This paper develops a distributed collaborative localization algorithm based on an extended kalman filter. This algorithm incorporates Ultra-Wideband (UWB) measurements for vehicle to vehicle ranging, and shows improvements in localization accuracy where GPS typically falls short. The algorithm was first tested in a newly created open-source simulation environment that emulates various numbers of vehicles and sensors while simultaneously testing multiple localization algorithms. Predicted error distributions for various algorithms are quickly producible using the Monte-Carlo method and optimization techniques within MatLab. The simulation results were validated experimentally in an outdoor, urban environment. Improvements of localization accuracy over a typical extended kalman filter ranged from 2.9% to 9.3% over 180 meter test runs. When GPS was denied, these improvements increased up to 83.3% over a standard kalman filter. In both simulation and experimentally, the DCL algorithm was shown to be a good approximation of a full state filter, while reducing required communication between vehicles. These results are promising in showing the efficacy of adding UWB ranging sensors to cars for collaborative and landmark localization, especially in GPS-denied environments. In the future, additional moving vehicles with additional tags will be tested in other challenging GPS denied environments.


AutoCone: An OmniDirectional Robot for Lane-Level Cone Placement

arXiv.org Artificial Intelligence

This paper summarizes the progress in developing a rugged, low-cost, automated ground cone robot network capable of traffic delineation at lane-level precision. A holonomic omnidirectional base with a traffic delineator was developed to allow flexibility in initialization. RTK GPS was utilized to reduce minimum position error to 2 centimeters. Due to recent developments, the cost of the platform is now less than $1,600. To minimize the effects of GPS-denied environments, wheel encoders and an Extended Kalman Filter were implemented to maintain lane-level accuracy during operation and a maximum error of 1.97 meters through 50 meters with little to no GPS signal. Future work includes increasing the operational speed of the platforms, incorporating lanelet information for path planning, and cross-platform estimation.