Not enough data to create a plot.
Try a different view from the menu above.
Scherer, Sebastian
Deep Bayesian Future Fusion for Self-Supervised, High-Resolution, Off-Road Mapping
Aich, Shubhra, Wang, Wenshan, Maheshwari, Parv, Sivaprakasam, Matthew, Triest, Samuel, Ho, Cherie, Gregory, Jason M., Rogers, John G. III, Scherer, Sebastian
The limited sensing resolution of resource-constrained off-road vehicles poses significant challenges towards reliable off-road autonomy. To overcome this limitation, we propose a general framework based on fusing the future information (i.e. future fusion) for self-supervision. Recent approaches exploit this future information alongside the hand-crafted heuristics to directly supervise the targeted downstream tasks (e.g. traversability estimation). However, in this paper, we opt for a more general line of development - time-efficient completion of the highest resolution (i.e. 2cm per pixel) BEV map in a self-supervised manner via future fusion, which can be used for any downstream tasks for better longer range prediction. To this end, first, we create a high-resolution future-fusion dataset containing pairs of (RGB / height) raw sparse and noisy inputs and map-based dense labels. Next, to accommodate the noise and sparsity of the sensory information, especially in the distal regions, we design an efficient realization of the Bayes filter onto the vanilla convolutional network via the recurrent mechanism. Equipped with the ideas from SOTA generative models, our Bayesian structure effectively predicts high-quality BEV maps in the distal regions. Extensive evaluation on both the quality of completion and downstream task on our future-fusion dataset demonstrates the potential of our approach.
Learning Generalizable Feature Fields for Mobile Manipulation
Qiu, Ri-Zhao, Hu, Yafei, Yang, Ge, Song, Yuchen, Fu, Yang, Ye, Jianglong, Mu, Jiteng, Yang, Ruihan, Atanasov, Nikolay, Scherer, Sebastian, Wang, Xiaolong
An open problem in mobile manipulation is how to represent objects and scenes in a unified manner, so that robots can use it both for navigating in the environment and manipulating objects. The latter requires capturing intricate geometry while understanding fine-grained semantics, whereas the former involves capturing the complexity inherit to an expansive physical scale. In this work, we present GeFF (Generalizable Feature Fields), a scene-level generalizable neural feature field that acts as a unified representation for both navigation and manipulation that performs in real-time. To do so, we treat generative novel view synthesis as a pre-training task, and then align the resulting rich scene priors with natural language via CLIP feature distillation. We demonstrate the effectiveness of this approach by deploying GeFF on a quadrupedal robot equipped with a manipulator. We evaluate GeFF's ability to generalize to open-set objects as well as running time, when performing open-vocabulary mobile manipulation in dynamic scenes.
TartanAviation: Image, Speech, and ADS-B Trajectory Datasets for Terminal Airspace Operations
Patrikar, Jay, Dantas, Joao, Moon, Brady, Hamidi, Milad, Ghosh, Sourish, Keetha, Nikhil, Higgins, Ian, Chandak, Atharva, Yoneyama, Takashi, Scherer, Sebastian
We introduce TartanAviation, an open-source multi-modal dataset focused on terminal-area airspace operations. TartanAviation provides a holistic view of the airport environment by concurrently collecting image, speech, and ADS-B trajectory data using setups installed inside airport boundaries. The datasets were collected at both towered and non-towered airfields across multiple months to capture diversity in aircraft operations, seasons, aircraft types, and weather conditions. In total, TartanAviation provides 3.1M images, 3374 hours of Air Traffic Control speech data, and 661 days of ADS-B trajectory data. The data was filtered, processed, and validated to create a curated dataset. In addition to the dataset, we also open-source the code-base used to collect and pre-process the dataset, further enhancing accessibility and usability. We believe this dataset has many potential use cases and would be particularly vital in allowing AI and machine learning technologies to be integrated into air traffic control systems and advance the adoption of autonomous aircraft in the airspace.
A Unified MPC Strategy for a Tilt-rotor VTOL UAV Towards Seamless Mode Transitioning
Chen, Qizhao, Hu, Ziqi, Geng, Junyi, Bai, Dongwei, Mousaei, Mohammad, Scherer, Sebastian
Capabilities of long-range flight and vertical take-off and landing (VTOL) are essential for Urban Air Mobility (UAM). Tiltrotor VTOLs have the advantage of balancing control simplicity and system complexity due to their redundant control authority. Prior work on controlling these aircraft either requires separate controllers and switching modes for different vehicle configurations or performs the control allocation on separate actuator sets, which cannot fully use the potential of the redundancy of tiltrotor. This paper introduces a unified MPC-based control strategy for a customized tiltrotor VTOL Unmanned Aerial Vehicle (UAV), which does not require mode-switching and can perform the control allocation in a consistent way. The incorporation of four independently controllable rotors in VTOL design offers an extra level of redundancy, allowing the VTOL to accommodate actuator failures. The result shows that our approach outperforms PID controllers while maintaining unified control. It allows the VTOL to perform smooth acceleration/deceleration, and precise coordinated turns. In addition, the independently controlled tilts enable the vehicle to handle actuator failures, ensuring that the aircraft remains operational even in the event of a servo or motor malfunction.
TartanDrive 2.0: More Modalities and Better Infrastructure to Further Self-Supervised Learning Research in Off-Road Driving Tasks
Sivaprakasam, Matthew, Maheshwari, Parv, Castro, Mateo Guaman, Triest, Samuel, Nye, Micah, Willits, Steve, Saba, Andrew, Wang, Wenshan, Scherer, Sebastian
We present TartanDrive 2.0, a large-scale off-road driving dataset for self-supervised learning tasks. In 2021 we released TartanDrive 1.0, which is one of the largest datasets for off-road terrain. As a follow-up to our original dataset, we collected seven hours of data at speeds of up to 15m/s with the addition of three new LiDAR sensors alongside the original camera, inertial, GPS, and proprioceptive sensors. We also release the tools we use for collecting, processing, and querying the data, including our metadata system designed to further the utility of our data. Custom infrastructure allows end users to reconfigure the data to cater to their own platforms. These tools and infrastructure alongside the dataset are useful for a variety of tasks in the field of off-road autonomy and, by releasing them, we encourage collaborative data aggregation. These resources lower the barrier to entry to utilizing large-scale datasets, thereby helping facilitate the advancement of robotics in areas such as self-supervised learning, multi-modal perception, inverse reinforcement learning, and representation learning. The dataset is available at https://github.com/castacks/tartan drive 2.0.
Aerial Field Robotics
Kulkarni, Mihir, Moon, Brady, Alexis, Kostas, Scherer, Sebastian
Aerial field robotics research represents the domain of study that aims to equip unmanned aerial vehicles--and as it pertains to this chapter, specifically Micro Aerial Vehicles (MAVs)--with the ability to operate in real-life environments that present challenges to safe navigation. We present the key elements of autonomy for MAVs that are resilient to collisions and sensing degradation, while operating under constrained computational resources. We overview aspects of the state of the art, outline bottlenecks to resilient navigation autonomy, and overview the field-readiness of MAVs. We conclude with notable contributions and discuss considerations for future research that are essential for resilience in aerial robotics. The state of the art in aerial robotics can accomplish impressive tasks. Yet wider use and adoption of MAVs for effective field deployment is limited by the resilience of the components of autonomy. In this chapter, we view each element of the autonomy system under the framework of resilience and examine the latest developments, as well as open questions. Towards a principled understanding of progress in resilient and field-hardened aerial robotic autonomy, we define resilience motivated by analogous studies in the domain of risk analysis (Howell 2013).
SubT-MRS Dataset: Pushing SLAM Towards All-weather Environments
Zhao, Shibo, Gao, Yuanjun, Wu, Tianhao, Singh, Damanpreet, Jiang, Rushan, Sun, Haoxiang, Sarawata, Mansi, Qiu, Yuheng, Whittaker, Warren, Higgins, Ian, Du, Yi, Su, Shaoshu, Xu, Can, Keller, John, Karhade, Jay, Nogueira, Lucas, Saha, Sourojit, Zhang, Ji, Wang, Wenshan, Wang, Chen, Scherer, Sebastian
Simultaneous localization and mapping (SLAM) is a fundamental task for numerous applications such as autonomous navigation and exploration. Despite many SLAM datasets have been released, current SLAM solutions still struggle to have sustained and resilient performance. One major issue is the absence of high-quality datasets including diverse all-weather conditions and a reliable metric for assessing robustness. This limitation significantly restricts the scalability and generalizability of SLAM technologies, impacting their development, validation, and deployment. To address this problem, we present SubT-MRS, an extremely challenging real-world dataset designed to push SLAM towards all-weather environments to pursue the most robust SLAM performance. It contains multi-degraded environments including over 30 diverse scenes such as structureless corridors, varying lighting conditions, and perceptual obscurants like smoke and dust; multimodal sensors such as LiDAR, fisheye camera, IMU, and thermal camera; and multiple locomotions like aerial, legged, and wheeled robots. We develop accuracy and robustness evaluation tracks for SLAM and introduced novel robustness metrics. Comprehensive studies are performed, revealing new observations, challenges, and opportunities for future research.
Toward General-Purpose Robots via Foundation Models: A Survey and Meta-Analysis
Hu, Yafei, Xie, Quanting, Jain, Vidhi, Francis, Jonathan, Patrikar, Jay, Keetha, Nikhil, Kim, Seungchan, Xie, Yaqi, Zhang, Tianyi, Zhao, Shibo, Chong, Yu Quan, Wang, Chen, Sycara, Katia, Johnson-Roberson, Matthew, Batra, Dhruv, Wang, Xiaolong, Scherer, Sebastian, Kira, Zsolt, Xia, Fei, Bisk, Yonatan
Building general-purpose robots that can operate seamlessly, in any environment, with any object, and utilizing various skills to complete diverse tasks has been a long-standing goal in Artificial Intelligence. Unfortunately, however, most existing robotic systems have been constrained - having been designed for specific tasks, trained on specific datasets, and deployed within specific environments. These systems usually require extensively-labeled data, rely on task-specific models, have numerous generalization issues when deployed in real-world scenarios, and struggle to remain robust to distribution shifts. Motivated by the impressive open-set performance and content generation capabilities of web-scale, large-capacity pre-trained models (i.e., foundation models) in research fields such as Natural Language Processing (NLP) and Computer Vision (CV), we devote this survey to exploring (i) how these existing foundation models from NLP and CV can be applied to the field of robotics, and also exploring (ii) what a robotics-specific foundation model would look like. We begin by providing an overview of what constitutes a conventional robotic system and the fundamental barriers to making it universally applicable. Next, we establish a taxonomy to discuss current work exploring ways to leverage existing foundation models for robotics and develop ones catered to robotics. Finally, we discuss key challenges and promising future directions in using foundation models for enabling general-purpose robotic systems. We encourage readers to view our living GitHub repository of resources, including papers reviewed in this survey as well as related projects and repositories for developing foundation models for robotics.
WIT-UAS: A Wildland-fire Infrared Thermal Dataset to Detect Crew Assets From Aerial Views
Jong, Andrew, Yu, Mukai, Dhrafani, Devansh, Kailas, Siva, Moon, Brady, Sycara, Katia, Scherer, Sebastian
We present the Wildland-fire Infrared Thermal (WIT-UAS) dataset for long-wave infrared sensing of crew and vehicle assets amidst prescribed wildland fire environments. While such a dataset is crucial for safety monitoring in wildland fire applications, to the authors' awareness, no such dataset focusing on assets near fire is publicly available. Presumably, this is due to the barrier to entry of collaborating with fire management personnel. We present two related data subsets: WIT-UAS-ROS consists of full ROS bag files containing sensor and robot data of UAS flight over the fire, and WIT-UAS-Image contains hand-labeled long-wave infrared (LWIR) images extracted from WIT-UAS-ROS. Our dataset is the first to focus on asset detection in a wildland fire environment. We show that thermal detection models trained without fire data frequently detect false positives by classifying fire as people. By adding our dataset to training, we show that the false positive rate is reduced significantly. Yet asset detection in wildland fire environments is still significantly more challenging than detection in urban environments, due to dense obscuring trees, greater heat variation, and overbearing thermal signal of the fire. We publicize this dataset to encourage the community to study more advanced models to tackle this challenging environment. The dataset, code and pretrained models are available at \url{https://github.com/castacks/WIT-UAS-Dataset}.
SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM
Keetha, Nikhil, Karhade, Jay, Jatavallabhula, Krishna Murthy, Yang, Gengshan, Scherer, Sebastian, Ramanan, Deva, Luiten, Jonathon
Dense simultaneous localization and mapping (SLAM) is pivotal for embodied scene understanding. Recent work has shown that 3D Gaussians enable high-quality reconstruction and real-time rendering of scenes using multiple posed cameras. In this light, we show for the first time that representing a scene by 3D Gaussians can enable dense SLAM using a single unposed monocular RGB-D camera. Our method, SplaTAM, addresses the limitations of prior radiance field-based representations, including fast rendering and optimization, the ability to determine if areas have been previously mapped, and structured map expansion by adding more Gaussians. We employ an online tracking and mapping pipeline while tailoring it to specifically use an underlying Gaussian representation and silhouette-guided optimization via differentiable rendering. Extensive experiments show that SplaTAM achieves up to 2X state-of-the-art performance in camera pose estimation, map construction, and novel-view synthesis, demonstrating its superiority over existing approaches, while allowing real-time rendering of a high-resolution dense 3D map.