gradslam
Investigation of the Challenges of Underwater-Visual-Monocular-SLAM
Grimaldi, Michele, Nakath, David, She, Mengkun, Köser, Kevin
In this paper, we present a comprehensive investigation of the challenges of Monocular Visual Simultaneous Localization and Mapping (vSLAM) methods for underwater robots. While significant progress has been made in state estimation methods that utilize visual data in the past decade, most evaluations have been limited to controlled indoor and urban environments, where impressive performance was demonstrated. However, these techniques have not been extensively tested in extremely challenging conditions, such as underwater scenarios where factors such as water and light conditions, robot path, and depth can greatly impact algorithm performance. Hence, our evaluation is conducted in real-world AUV scenarios as well as laboratory settings which provide precise external reference. A focus is laid on understanding the impact of environmental conditions, such as optical properties of the water and illumination scenarios, on the performance of monocular vSLAM methods. To this end, we first show that all methods perform very well in in-air settings and subsequently show the degradation of their performance in challenging underwater environments. The final goal of this study is to identify techniques that can improve accuracy and robustness of SLAM methods in such conditions. To achieve this goal, we investigate the potential of image enhancement techniques to improve the quality of input images used by the SLAM methods, specifically in low visibility and extreme lighting scenarios in scattering media. We present a first evaluation on calibration maneuvers and simple image restoration techniques to determine their ability to enable or enhance the performance of monocular SLAM methods in underwater environments.
Computer Vision and Visual SLAM vs. AI Agents
To take a look at what the end goal in terms of end-to-end deep learning for visual SLAM might look like, take a look at gradSLAM from Krishna Murthy, a Ph.D. student in MILA, and collaborators at CMU. Their paper offers a new way of thinking of SLAM as made up of differentiable blocks. From the article, "This amalgamation of dense SLAM with computational graphs enables us to backprop from 3D maps to 2D pixels, opening up new possibilities in gradient-based learning for SLAM." We are seeing more and more practical successes of self-supervised learning for multi-view problems where geometry enables us to get away from strong supervision. Even the ConvNet-based point detector SuperPoint [7], which my team and I developed at Magic Leap, uses self-supervision to train more robust interest point detectors.