Neumann, Ulrich
InSpaceType: Reconsider Space Type in Indoor Monocular Depth Estimation
Wu, Cho-Ying, Gao, Quankai, Hsu, Chin-Cheng, Wu, Te-Lin, Chen, Jing-Wen, Neumann, Ulrich
Indoor monocular depth estimation has attracted increasing research interest. Most previous works have been focusing on methodology, primarily experimenting with NYU-Depth-V2 (NYUv2) Dataset, and only concentrated on the overall performance over the test set. However, little is known regarding robustness and generalization when it comes to applying monocular depth estimation methods to real-world scenarios where highly varying and diverse functional \textit{space types} are present such as library or kitchen. A study for performance breakdown into space types is essential to realize a pretrained model's performance variance. To facilitate our investigation for robustness and address limitations of previous works, we collect InSpaceType, a high-quality and high-resolution RGBD dataset for general indoor environments. We benchmark 12 recent methods on InSpaceType and find they severely suffer from performance imbalance concerning space types, which reveals their underlying bias. We extend our analysis to 4 other datasets, 3 mitigation approaches, and the ability to generalize to unseen space types. Our work marks the first in-depth investigation of performance imbalance across space types for indoor monocular depth estimation, drawing attention to potential safety concerns for model deployment without considering space types, and further shedding light on potential ways to improve robustness. See \url{https://depthcomputation.github.io/DepthPublic} for data and the supplementary document. The benchmark list on the GitHub project page keeps updates for the lastest monocular depth estimation methods.
Scene Completeness-Aware Lidar Depth Completion for Driving Scenario
Wu, Cho-Ying, Neumann, Ulrich
This paper introduces Scene Completeness-Aware Depth Completion (SCADC) to complete raw lidar scans into dense depth maps with fine and complete scene structures. Recent sparse depth completion for lidars only focuses on the lower scenes and produces irregular estimations on the upper because existing datasets, such as KITTI, do not provide groundtruth for upper areas. These areas are considered less important since they are usually sky or trees of less scene understanding interest. However, we argue that in several driving scenarios such as large trucks or cars with loads, objects could extend to the upper parts of scenes. Thus depth maps with structured upper scene estimation are important for RGBD algorithms. SCADC adopts stereo images that produce disparities with better scene completeness but are generally less precise than lidars, to help sparse lidar depth completion. To our knowledge, we are the first to focus on scene completeness of sparse depth completion. We validate our SCADC on both depth estimate precision and scene-completeness on KITTI. Moreover, we experiment on less-explored outdoor RGBD semantic segmentation with scene completeness-aware D-input to validate our method.
Collaborative Uncertainty Benefits Multi-Agent Multi-Modal Trajectory Forecasting
Tang, Bohan, Zhong, Yiqi, Xu, Chenxin, Wu, Wei-Tao, Neumann, Ulrich, Wang, Yanfeng, Zhang, Ya, Chen, Siheng
In multi-modal multi-agent trajectory forecasting, two major challenges have not been fully tackled: 1) how to measure the uncertainty brought by the interaction module that causes correlations among the predicted trajectories of multiple agents; 2) how to rank the multiple predictions and select the optimal predicted trajectory. In order to handle these challenges, this work first proposes a novel concept, collaborative uncertainty (CU), which models the uncertainty resulting from interaction modules. Then we build a general CU-aware regression framework with an original permutation-equivariant uncertainty estimator to do both tasks of regression and uncertainty estimation. Further, we apply the proposed framework to current SOTA multi-agent multi-modal forecasting systems as a plugin module, which enables the SOTA systems to 1) estimate the uncertainty in the multi-agent multi-modal trajectory forecasting task; 2) rank the multiple predictions and select the optimal one based on the estimated uncertainty. We conduct extensive experiments on a synthetic dataset and two public large-scale multi-agent trajectory forecasting benchmarks. Experiments show that: 1) on the synthetic dataset, the CU-aware regression framework allows the model to appropriately approximate the ground-truth Laplace distribution; 2) on the multi-agent trajectory forecasting benchmarks, the CU-aware regression framework steadily helps SOTA systems improve their performances. Specially, the proposed framework helps VectorNet improve by 262 cm regarding the Final Displacement Error of the chosen optimal prediction on the nuScenes dataset; 3) for multi-agent multi-modal trajectory forecasting systems, prediction uncertainty is positively correlated with future stochasticity; and 4) the estimated CU values are highly related to the interactive information among agents.
Behind the Curtain: Learning Occluded Shapes for 3D Object Detection
Xu, Qiangeng, Zhong, Yiqi, Neumann, Ulrich
Advances in LiDAR sensors provide rich 3D data that supports 3D scene understanding. However, due to occlusion and signal miss, LiDAR point clouds are in practice 2.5D as they cover only partial underlying shapes, which poses a fundamental challenge to 3D perception. To tackle the challenge, we present a novel LiDAR-based 3D object detection model, dubbed Behind the Curtain Detector (BtcDet), which learns the object shape priors and estimates the complete object shapes that are partially occluded (curtained) in point clouds. BtcDet first identifies the regions that are affected by occlusion and signal miss. In these regions, our model predicts the probability of occupancy that indicates if a region contains object shapes. Integrated with this probability map, BtcDet can generate high-quality 3D proposals. Finally, the probability of occupancy is also integrated into a proposal refinement module to generate the final bounding boxes. Extensive experiments on the KITTI Dataset and the Waymo Open Dataset demonstrate the effectiveness of BtcDet. Particularly, for the 3D detection of both cars and cyclists on the KITTI benchmark, BtcDet surpasses all of the published state-of-the-art methods by remarkable margins. Code is released (https://github.com/Xharlie/BtcDet}{https://github.com/Xharlie/BtcDet).
Stochastic Video Long-term Interpolation
Xu, Qiangeng, Zhang, Hanwang, Wang, Weiyue, Belhumeur, Peter N., Neumann, Ulrich
In this paper, we introduce a stochastic learning framework for long-term video interpolation. While most existing interpolation models require two reference frames with a short interval, our framework predicts a plausible intermediate sequence between a long interval. Our model consists of two parts: (1) a deterministic estimation to guarantee the spatial and temporal coherency among frames, (2) a stochastic sampling process to generate dynamics from inferred distributions. Experimental results show that our model is able to generate sharp and clear sequences with variations. Moreover, motions in the generated sequence are realistic and able to transfer smoothly from the referenced start frame to the end frame.
A Unified Framework for Augmented Reality and Knowledge-Based Systems in Maintaining Aircraft
Jo, Geun-Sik (Inha University) | Oh, Kyeong-Jin (INHA University) | Ha, Inay (INHA University) | Lee, Kee-Sung (INHA University) | Hong, Myung-Duk (INHA University) | Neumann, Ulrich (University of Southern California) | You, Suya (University of Southern California)
Aircraft maintenance and training play one of the most important roles in ensuring flight safety. The maintenance process usually involves massive numbers of components and substantial procedural knowledge of maintenance procedures. Maintenance tasks require technicians to follow rigorous procedures to prevent operational errors in the maintenance process. In addition, the maintenance time is a cost-sensitive issue for airlines. This paper proposes intelligent augmented reality (IAR) system to minimize operation errors and time-related costs and help aircraft technicians cope with complex tasks by using an intuitive UI/UX interface for their maintenance tasks. The IAR system is composed mainly of three major modules: 1) the AR module 2) the knowledge-based system (KBS) module 3) a unified platform with an integrated UI/UX module between the AR and KBS modules. The AR module addresses vision-based tracking, annotation, and recognition. The KBS module deals with ontology-based resources and context management. Overall testing of the IAR system is conducted at Korea Air Lines (KAL) hangars. Tasks involving the removal and installation of pitch trimmers in landing gear are selected for benchmarking purposes, and according to the results, the proposed IAR system can help technicians to be more effective and accurate in performing their maintenance tasks.