Goto

Collaborating Authors

 Scherer, Sebastian


2D-3D Pose Tracking with Multi-View Constraints

arXiv.org Artificial Intelligence

Camera localization in 3D LiDAR maps has gained increasing attention due to its promising ability to handle complex scenarios, surpassing the limitations of visual-only localization methods. However, existing methods mostly focus on addressing the cross-modal gaps, estimating camera poses frame by frame without considering the relationship between adjacent frames, which makes the pose tracking unstable. To alleviate this, we propose to couple the 2D-3D correspondences between adjacent frames using the 2D-2D feature matching, establishing the multi-view geometrical constraints for simultaneously estimating multiple camera poses. Specifically, we propose a new 2D-3D pose tracking framework, which consists: a front-end hybrid flow estimation network for consecutive frames and a back-end pose optimization module. We further design a cross-modal consistency-based loss to incorporate the multi-view constraints during the training and inference process. We evaluate our proposed framework on the KITTI and Argoverse datasets. Experimental results demonstrate its superior performance compared to existing frame-by-frame 2D-3D pose tracking methods and state-of-the-art vision-only pose tracking algorithms. More online pose tracking videos are available at \url{https://youtu.be/yfBRdg7gw5M}


Pegasus Simulator: An Isaac Sim Framework for Multiple Aerial Vehicles Simulation

arXiv.org Artificial Intelligence

Developing and testing novel control and motion planning algorithms for aerial vehicles can be a challenging task, with the robotics community relying more than ever on 3D simulation technologies to evaluate the performance of new algorithms in a variety of conditions and environments. In this work, we introduce the Pegasus Simulator, a modular framework implemented as an NVIDIA Isaac Sim extension that enables real-time simulation of multiple multirotor vehicles in photo-realistic environments, while providing out-of-the-box integration with the widely adopted PX4-Autopilot and ROS2 through its modular implementation and intuitive graphical user interface. To demonstrate some of its capabilities, a nonlinear controller was implemented and simulation results for two drones performing aggressive flight maneuvers are presented. Code and documentation for this framework are also provided as supplementary material.


Image-based Visual Servo Control for Aerial Manipulation Using a Fully-Actuated UAV

arXiv.org Artificial Intelligence

Using Unmanned Aerial Vehicles (UAVs) to perform high-altitude manipulation tasks beyond just passive visual application can reduce the time, cost, and risk of human workers. Prior research on aerial manipulation has relied on either ground truth state estimate or GPS/total station with some Simultaneous Localization and Mapping (SLAM) algorithms, which may not be practical for many applications close to infrastructure with degraded GPS signal or featureless environments. Visual servo can avoid the need to estimate robot pose. Existing works on visual servo for aerial manipulation either address solely end-effector position control or rely on precise velocity measurement and pre-defined visual visual marker with known pattern. Furthermore, most of previous work used under-actuated UAVs, resulting in complicated mechanical and hence control design for the end-effector. This paper develops an image-based visual servo control strategy for bridge maintenance using a fully-actuated UAV. The main components are (1) a visual line detection and tracking system, (2) a hybrid impedance force and motion control system. Our approach does not rely on either robot pose/velocity estimation from an external localization system or pre-defined visual markers. The complexity of the mechanical system and controller architecture is also minimized due to the fully-actuated nature. Experiments show that the system can effectively execute motion tracking and force holding using only the visual guidance for the bridge painting. To the best of our knowledge, this is one of the first studies on aerial manipulation using visual servo that is capable of achieving both motion and force control without the need of external pose/velocity information or pre-defined visual guidance.


Learning-on-the-Drive: Self-supervised Adaptation of Visual Offroad Traversability Models

arXiv.org Artificial Intelligence

Autonomous off-road driving requires understanding traversability, which refers to the suitability of a given terrain to drive over. When offroad vehicles travel at high speed ($>10m/s$), they need to reason at long-range ($50m$-$100m$) for safe and deliberate navigation. Moreover, vehicles often operate in new environments and under different weather conditions. LiDAR provides accurate estimates robust to visual appearances, however, it is often too noisy beyond 30m for fine-grained estimates due to sparse measurements. Conversely, visual-based models give dense predictions at further distances but perform poorly at all ranges when out of training distribution. To address these challenges, we present ALTER, an offroad perception module that adapts-on-the-drive to combine the best of both sensors. Our visual model continuously learns from new near-range LiDAR measurements. This self-supervised approach enables accurate long-range traversability prediction in novel environments without hand-labeling. Results on two distinct real-world offroad environments show up to 52.5% improvement in traversability estimation over LiDAR-only estimates and 38.1% improvement over non-adaptive visual baseline.


AutoMerge: A Framework for Map Assembling and Smoothing in City-scale Environments

arXiv.org Artificial Intelligence

Adaptive Loop Closure Detection: Spurious loop closures are frequent in environments with repetitive appearances, such as long streets. On the one hand, false positive place I. SYSTEM OVERVIEW retrievals may easily break the global optimization system, As shown in Figure 1, AutoMerge provides an automatic and ideally 100% accuracy can avoid these optimization map merging system for the large-scale single-/multi-agent failures for large-scale mapping. On the other hand, low recalls mapping tasks. Each agent is equipped with a LiDAR mapping can provide partial data association, which will affect global module to enable the self-maintained sub-map generation and optimization performance. Hybrid loop closure detection takes odometry estimation. The AutoMerge system consists of three advantage of sequence matching to provide continuous true modules: 1) fusion-enhanced place descriptor extraction, 2) an positive retrievals over long overlaps, and RANSAC-based adaptive data-association mechanism to provide high accuracy single frame detection for local overlaps. By analyzing the and recall for segment-wise place retrievals, and 3) a partially feature correlation between segments, we can balance the place decentralized system to provide centralized map merging and retrievals from sequence-/single-frame matching to provide single agent self-localization in the world frame.


Off-Policy Evaluation with Online Adaptation for Robot Exploration in Challenging Environments

arXiv.org Artificial Intelligence

Autonomous exploration has many important applications. However, classic information gain-based or frontier-based exploration only relies on the robot current state to determine the immediate exploration goal, which lacks the capability of predicting the value of future states and thus leads to inefficient exploration decisions. This paper presents a method to learn how "good" states are, measured by the state value function, to provide a guidance for robot exploration in real-world challenging environments. We formulate our work as an off-policy evaluation (OPE) problem for robot exploration (OPERE). It consists of offline Monte-Carlo training on real-world data and performs Temporal Difference (TD) online adaptation to optimize the trained value estimator. We also design an intrinsic reward function based on sensor information coverage to enable the robot to gain more information with sparse extrinsic rewards. Results show that our method enables the robot to predict the value of future states so as to better guide robot exploration. The proposed algorithm achieves better prediction and exploration performance compared with the state-of-the-arts. To the best of our knowledge, this work for the first time demonstrates value function prediction on real-world dataset for robot exploration in challenging subterranean and urban environments. More details and demo videos can be found at https://jeffreyyh.github.io/opere/.


A Simulator for Fully-Actuated UAVs

arXiv.org Artificial Intelligence

This workshop paper presents the challenges we encountered when simulating fully-actuated Unmanned Aerial Vehicles (UAVs) for our research and the solutions we developed to overcome the challenges. We describe the ARCAD simulator that has helped us rapidly implement and test different controllers ranging from Hybrid Force-Position Controllers to advanced Model Predictive Path Integrals and has allowed us to analyze the design and behavior of different fully-actuated UAVs. We used the simulator to enable real-world deployments of our fully-actuated UAV fleet for different applications. The simulator is further extended to support the physical interaction of UAVs with their environment and allow more UAV designs, such as hybrid VTOLs. The code for the simulator can be accessed from https://github.com/keipour/aircraft-simulator-matlab.


DytanVO: Joint Refinement of Visual Odometry and Motion Segmentation in Dynamic Environments

arXiv.org Artificial Intelligence

Learning-based visual odometry (VO) algorithms achieve remarkable performance on common static scenes, benefiting from high-capacity models and massive annotated data, but tend to fail in dynamic, populated environments. Semantic segmentation is largely used to discard dynamic associations before estimating camera motions but at the cost of discarding static features and is hard to scale up to unseen categories. In this paper, we leverage the mutual dependence between camera ego-motion and motion segmentation and show that both can be jointly refined in a single learning-based framework. In particular, we present DytanVO, the first supervised learning-based VO method that deals with dynamic environments. It takes two consecutive monocular frames in real-time and predicts camera ego-motion in an iterative fashion. Our method achieves an average improvement of 27.7% in ATE over state-of-the-art VO solutions in real-world dynamic environments, and even performs competitively among dynamic visual SLAM systems which optimize the trajectory on the backend. Experiments on plentiful unseen environments also demonstrate our method's generalizability.


UAS Simulator for Modeling, Analysis and Control in Free Flight and Physical Interaction

arXiv.org Artificial Intelligence

This paper presents the ARCAD simulator for the rapid development of Unmanned Aerial Systems (UAS), including underactuated and fully-actuated multirotors, fixed-wing aircraft, and Vertical Take-Off and Landing (VTOL) hybrid vehicles. The simulator is designed to accelerate these aircraft's modeling and control design. It provides various analyses of the design and operation, such as wrench-set computation, controller response, and flight optimization. In addition to simulating free flight, it can simulate the physical interaction of the aircraft with its environment. The simulator is written in MATLAB to allow rapid prototyping and is capable of generating graphical visualization of the aircraft and the environment in addition to generating the desired plots. It has been used to develop several real-world multirotor and VTOL applications. The source code is available at https://github.com/keipour/aircraft-simulator-matlab.


Detection and Physical Interaction with Deformable Linear Objects

arXiv.org Artificial Intelligence

Deformable linear objects (e.g., cables, ropes, and threads) commonly appear in our everyday lives. However, perception of these objects and the study of physical interaction with them is still a growing area. There have already been successful methods to model and track deformable linear objects. However, the number of methods that can automatically extract the initial conditions in non-trivial situations for these methods has been limited, and they have been introduced to the community only recently. On the other hand, while physical interaction with these objects has been done with ground manipulators, there have not been any studies on physical interaction and manipulation of the deformable linear object with aerial robots. This workshop describes our recent work on detecting deformable linear objects, which uses the segmentation output of the existing methods to provide the initialization required by the tracking methods automatically. It works with crossings and can fill the gaps and occlusions in the segmentation and output the model desirable for physical interaction and simulation. Then we present our work on using the method for tasks such as routing and manipulation with the ground and aerial robots. We discuss our feasibility analysis on extending the physical interaction with these objects to aerial manipulation applications.