Goto

Collaborating Authors

 Shen, Shaojie


Semi-SD: Semi-Supervised Metric Depth Estimation via Surrounding Cameras for Autonomous Driving

arXiv.org Artificial Intelligence

In this paper, we introduce Semi-SD, a novel metric depth estimation framework tailored for surrounding cameras equipment in autonomous driving. In this work, the input data consists of adjacent surrounding frames and camera parameters. We propose a unified spatial-temporal-semantic fusion module to construct the visual fused features. Cross-attention components for surrounding cameras and adjacent frames are utilized to focus on metric scale information refinement and temporal feature matching. Building on this, we propose a pose estimation framework using surrounding cameras, their corresponding estimated depths, and extrinsic parameters, which effectively address the scale ambiguity in multi-camera setups. Moreover, semantic world model and monocular depth estimation world model are integrated to supervised the depth estimation, which improve the quality of depth estimation. We evaluate our algorithm on DDAD and nuScenes datasets, and the results demonstrate that our method achieves state-of-the-art performance in terms of surrounding camera based depth estimation quality. The source code will be available on https://github.com/xieyuser/Semi-SD.


SLABIM: A SLAM-BIM Coupled Dataset in HKUST Main Building

arXiv.org Artificial Intelligence

Existing indoor SLAM datasets primarily focus on robot sensing, often lacking building architectures. To address this gap, we design and construct the first dataset to couple the SLAM and BIM, named SLABIM. This dataset provides BIM and SLAM-oriented sensor data, both modeling a university building at HKUST. The as-designed BIM is decomposed and converted for ease of use. We employ a multi-sensor suite for multi-session data collection and mapping to obtain the as-built model. All the related data are timestamped and organized, enabling users to deploy and test effectively. Furthermore, we deploy advanced methods and report the experimental results on three tasks: registration, localization and semantic mapping, demonstrating the effectiveness and practicality of SLABIM. We make our dataset open-source at https://github.com/HKUST-Aerial-Robotics/SLABIM.


Contact-Aware Motion Planning Among Movable Objects

arXiv.org Artificial Intelligence

Most existing methods for motion planning of mobile robots involve generating collision-free trajectories. However, these methods focusing solely on contact avoidance may limit the robots' locomotion and can not be applied to tasks where contact is inevitable or intentional. To address these issues, we propose a novel contact-aware motion planning (CAMP) paradigm for robotic systems. Our approach incorporates contact between robots and movable objects as complementarity constraints in optimization-based trajectory planning. By leveraging augmented Lagrangian methods (ALMs), we efficiently solve the optimization problem with complementarity constraints, producing spatial-temporal optimal trajectories of the robots. Simulations demonstrate that, compared to the state-of-the-art method, our proposed CAMP method expands the reachable space of mobile robots, resulting in a significant improvement in the success rate of two types of fundamental tasks: navigation among movable objects (NAMO) and rearrangement of movable objects (RAMO). Real-world experiments show that the trajectories generated by our proposed method are feasible and quickly deployed in different tasks.


GS-LIVO: Real-Time LiDAR, Inertial, and Visual Multi-sensor Fused Odometry with Gaussian Mapping

arXiv.org Artificial Intelligence

In recent years, 3D Gaussian splatting (3D-GS) has emerged as a novel scene representation approach. However, existing vision-only 3D-GS methods often rely on hand-crafted heuristics for point-cloud densification and face challenges in handling occlusions and high GPU memory and computation consumption. LiDAR-Inertial-Visual (LIV) sensor configuration has demonstrated superior performance in localization and dense mapping by leveraging complementary sensing characteristics: rich texture information from cameras, precise geometric measurements from LiDAR, and high-frequency motion data from IMU. Inspired by this, we propose a novel real-time Gaussian-based simultaneous localization and mapping (SLAM) system. Our map system comprises a global Gaussian map and a sliding window of Gaussians, along with an IESKF-based odometry. The global Gaussian map consists of hash-indexed voxels organized in a recursive octree, effectively covering sparse spatial volumes while adapting to different levels of detail and scales. The Gaussian map is initialized through multi-sensor fusion and optimized with photometric gradients. Our system incrementally maintains a sliding window of Gaussians, significantly reducing GPU computation and memory consumption by only optimizing the map within the sliding window. Moreover, we implement a tightly coupled multi-sensor fusion odometry with an iterative error state Kalman filter (IESKF), leveraging real-time updating and rendering of the Gaussian map. Our system represents the first real-time Gaussian-based SLAM framework deployable on resource-constrained embedded systems, demonstrated on the NVIDIA Jetson Orin NX platform. The framework achieves real-time performance while maintaining robust multi-sensor fusion capabilities. All implementation algorithms, hardware designs, and CAD models will be publicly available.


CoDriveVLM: VLM-Enhanced Urban Cooperative Dispatching and Motion Planning for Future Autonomous Mobility on Demand Systems

arXiv.org Artificial Intelligence

The increasing demand for flexible and efficient urban transportation solutions has spotlighted the limitations of traditional Demand Responsive Transport (DRT) systems, particularly in accommodating diverse passenger needs and dynamic urban environments. Autonomous Mobility-on-Demand (AMoD) systems have emerged as a promising alternative, leveraging connected and autonomous vehicles (CAVs) to provide responsive and adaptable services. However, existing methods primarily focus on either vehicle scheduling or path planning, which often simplify complex urban layouts and neglect the necessity for simultaneous coordination and mutual avoidance among CAVs. This oversimplification poses significant challenges to the deployment of AMoD systems in real-world scenarios. To address these gaps, we propose CoDriveVLM, a novel framework that integrates high-fidelity simultaneous dispatching and cooperative motion planning for future AMoD systems. Our method harnesses Vision-Language Models (VLMs) to enhance multi-modality information processing, and this enables comprehensive dispatching and collision risk evaluation. The VLM-enhanced CAV dispatching coordinator is introduced to effectively manage complex and unforeseen AMoD conditions, thus supporting efficient scheduling decision-making. Furthermore, we propose a scalable decentralized cooperative motion planning method via consensus alternating direction method of multipliers (ADMM) focusing on collision risk evaluation and decentralized trajectory optimization. Simulation results demonstrate the feasibility and robustness of CoDriveVLM in various traffic conditions, showcasing its potential to significantly improve the fidelity and effectiveness of AMoD systems in future urban transportation networks. The code is available at https://github.com/henryhcliu/CoDriveVLM.git.


ESVO2: Direct Visual-Inertial Odometry with Stereo Event Cameras

arXiv.org Artificial Intelligence

Event-based visual odometry is a specific branch of visual Simultaneous Localization and Mapping (SLAM) techniques, which aims at solving tracking and mapping sub-problems in parallel by exploiting the special working principles of neuromorphic (ie, event-based) cameras. Due to the motion-dependent nature of event data, explicit data association ie, feature matching under large-baseline view-point changes is hardly established, making direct methods a more rational choice. However, state-of-the-art direct methods are limited by the high computational complexity of the mapping sub-problem and the degeneracy of camera pose tracking in certain degrees of freedom (DoF) in rotation. In this paper, we resolve these issues by building an event-based stereo visual-inertial odometry system on top of our previous direct pipeline Event-based Stereo Visual Odometry. Specifically, to speed up the mapping operation, we propose an efficient strategy for sampling contour points according to the local dynamics of events. The mapping performance is also improved in terms of structure completeness and local smoothness by merging the temporal stereo and static stereo results. To circumvent the degeneracy of camera pose tracking in recovering the pitch and yaw components of general six-DoF motion, we introduce IMU measurements as motion priors via pre-integration. To this end, a compact back-end is proposed for continuously updating the IMU bias and predicting the linear velocity, enabling an accurate motion prediction for camera pose tracking. The resulting system scales well with modern high-resolution event cameras and leads to better global positioning accuracy in large-scale outdoor environments. Extensive evaluations on five publicly available datasets featuring different resolutions and scenarios justify the superior performance of the proposed system against five state-of-the-art methods.


Integrated Decision Making and Trajectory Planning for Autonomous Driving Under Multimodal Uncertainties: A Bayesian Game Approach

arXiv.org Artificial Intelligence

Modeling the interaction between traffic agents is a key issue in designing safe and non-conservative maneuvers in autonomous driving. This problem can be challenging when multi-modality and behavioral uncertainties are engaged. Existing methods either fail to plan interactively or consider unimodal behaviors that could lead to catastrophic results. In this paper, we introduce an integrated decision-making and trajectory planning framework based on Bayesian game (i.e., game of incomplete information). Human decisions inherently exhibit discrete characteristics and therefore are modeled as types of players in the game. A general solver based on no-regret learning is introduced to obtain a corresponding Bayesian Coarse Correlated Equilibrium, which captures the interaction between traffic agents in the multimodal context. With the attained equilibrium, decision-making and trajectory planning are performed simultaneously, and the resulting interactive strategy is shown to be optimal over the expectation of rivals' driving intentions. Closed-loop simulations on different traffic scenarios are performed to illustrate the generalizability and the effectiveness of the proposed framework.


WING: Wheel-Inertial Neural Odometry with Ground Manifold Constraints

arXiv.org Artificial Intelligence

In this paper, we propose an interoceptive-only odometry system for ground robots with neural network processing and soft constraints based on the assumption of a globally continuous ground manifold. Exteroceptive sensors such as cameras, GPS and LiDAR may encounter difficulties in scenarios with poor illumination, indoor environments, dusty areas and straight tunnels. Therefore, improving the pose estimation accuracy only using interoceptive sensors is important to enhance the reliability of navigation system even in degrading scenarios mentioned above. However, interoceptive sensors like IMU and wheel encoders suffer from large drift due to noisy measurements. To overcome these challenges, the proposed system trains deep neural networks to correct the measurements from IMU and wheel encoders, while considering their uncertainty. Moreover, because ground robots can only travel on the ground, we model the ground surface as a globally continuous manifold using a dual cubic B-spline manifold to further improve the estimation accuracy by this soft constraint. A novel space-based sliding-window filtering framework is proposed to fully exploit the $C^2$ continuity of ground manifold soft constraints and fuse all the information from raw measurements and neural networks in a yaw-independent attitude convention. Extensive experiments demonstrate that our proposed approach can outperform state-of-the-art learning-based interoceptive-only odometry methods.


UniQuad: A Unified and Versatile Quadrotor Platform Series for UAV Research and Application

arXiv.org Artificial Intelligence

As quadrotors take on an increasingly diverse range of roles, researchers often need to develop new hardware platforms tailored for specific tasks, introducing significant engineering overhead. In this article, we introduce the UniQuad series, a unified and versatile quadrotor platform series that offers high flexibility to adapt to a wide range of common tasks, excellent customizability for advanced demands, and easy maintenance in case of crashes. This project is fully open-source at https://hkust-aerial-robotics.github.io/UniQuad.


MetaFollower: Adaptable Personalized Autonomous Car Following

arXiv.org Artificial Intelligence

Car-following (CF) modeling, a fundamental component in microscopic traffic simulation, has attracted increasing interest of researchers in the past decades. In this study, we propose an adaptable personalized car-following framework -MetaFollower, by leveraging the power of meta-learning. Specifically, we first utilize Model-Agnostic Meta-Learning (MAML) to extract common driving knowledge from various CF events. Afterward, the pre-trained model can be fine-tuned on new drivers with only a few CF trajectories to achieve personalized CF adaptation. We additionally combine Long Short-Term Memory (LSTM) and Intelligent Driver Model (IDM) to reflect temporal heterogeneity with high interpretability. Unlike conventional adaptive cruise control (ACC) systems that rely on predefined settings and constant parameters without considering heterogeneous driving characteristics, MetaFollower can accurately capture and simulate the intricate dynamics of car-following behavior while considering the unique driving styles of individual drivers. We demonstrate the versatility and adaptability of MetaFollower by showcasing its ability to adapt to new drivers with limited training data quickly. To evaluate the performance of MetaFollower, we conduct rigorous experiments comparing it with both data-driven and physics-based models. The results reveal that our proposed framework outperforms baseline models in predicting car-following behavior with higher accuracy and safety. To the best of our knowledge, this is the first car-following model aiming to achieve fast adaptation by considering both driver and temporal heterogeneity based on meta-learning.