kitti 2012
- Oceania > Australia (0.05)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.05)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
- North America > United States > Virginia (0.04)
- North America > United States > California > Merced County > Merced (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
Hierarchical Neural Architecture Search for Deep Stereo Matching - Supplementary Materials
KITTI 2012 contains 194 training image pairs and 195 test image pairs. We use a maximum disparity level of 192 in this dataset. Most of the stereo pairs are indoor scenes with handcrafted layouts. This dataset contains many thin objects and large disparity ranges. We provide more qualitative results on the SceneFlow, KITTI 2012, KITTI 2015 and Middlebury datasets in Figure 1 2 3 4, respectively.
- Oceania > Australia (0.05)
- North America > Canada (0.05)
Matching neural paths: transfer from recognition to correspondence search
Nikolay Savinov, Lubor Ladicky, Marc Pollefeys
Many machine learning tasks require finding per-part correspondences between objects. In this work we focus on low-level correspondences -- a highly ambiguous matching problem. We propose to use a hierarchical semantic representation of the objects, coming from a convolutional neural network, to solve this ambiguity. Training it for low-level correspondence prediction directly might not be an option in some domains where the ground-truth correspondences are hard to obtain. We show how transfer from recognition can be used to avoid such training. Our idea is to mark parts as "matching" if their features are close to each other at all the levels of convolutional feature hierarchy (neural paths). Although the overall number of such paths is exponential in the number of layers, we propose a polynomial algorithm for aggregating all of them in a single backward pass. The empirical validation is done on the task of stereo correspondence and demonstrates that we achieve competitive results among the methods which do not use labeled target domain data.
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
- North America > United States > Virginia (0.04)
- North America > United States > California > Merced County > Merced (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
Stereo Risk: A Continuous Modeling Approach to Stereo Matching
Liu, Ce, Kumar, Suryansh, Gu, Shuhang, Timofte, Radu, Yao, Yao, Van Gool, Luc
We introduce Stereo Risk, a new deep-learning approach to solve the classical stereo-matching problem in computer vision. As it is well-known that stereo matching boils down to a per-pixel disparity estimation problem, the popular state-of-the-art stereo-matching approaches widely rely on regressing the scene disparity values, yet via discretization of scene disparity values. Such discretization often fails to capture the nuanced, continuous nature of scene depth. Stereo Risk departs from the conventional discretization approach by formulating the scene disparity as an optimal solution to a continuous risk minimization problem, hence the name "stereo risk". We demonstrate that $L^1$ minimization of the proposed continuous risk function enhances stereo-matching performance for deep networks, particularly for disparities with multi-modal probability distributions. Furthermore, to enable the end-to-end network training of the non-differentiable $L^1$ risk optimization, we exploited the implicit function theorem, ensuring a fully differentiable network. A comprehensive analysis demonstrates our method's theoretical soundness and superior performance over the state-of-the-art methods across various benchmark datasets, including KITTI 2012, KITTI 2015, ETH3D, SceneFlow, and Middlebury 2014.
- Europe > Switzerland > Zürich > Zürich (0.14)
- Europe > Austria > Vienna (0.14)
- North America > United States > Texas (0.04)
- (6 more...)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Stereoscopic Universal Perturbations across Different Architectures and Datasets
Berger, Zachary, Agrawal, Parth, Liu, Tian Yu, Soatto, Stefano, Wong, Alex
We study the effect of adversarial perturbations of images on deep stereo matching networks for the disparity estimation task. We present a method to craft a single set of perturbations that, when added to any stereo image pair in a dataset, can fool a stereo network to significantly alter the perceived scene geometry. Our perturbation images are "universal" in that they not only corrupt estimates of the network on the dataset they are optimized for, but also generalize to stereo networks with different architectures across different datasets. We evaluate our approach on multiple public benchmark datasets and show that our perturbations can increase D1-error (akin to fooling rate) of state-of-the-art stereo networks from 1% to as much as 87%. We investigate the effect of perturbations on the estimated scene geometry and identify object classes that are most vulnerable. Our analysis on the activations of registered points between left and right images led us to find that certain architectural components, i.e. deformable convolution and explicit matching, can increase robustness against adversaries. We demonstrate that by simply designing networks with such components, one can reduce the effect of adversaries by up to 60.5%, which rivals the robustness of networks fine-tuned with costly adversarial data augmentation.
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Sensing and Signal Processing > Image Processing (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.66)
Learning by Distillation: A Self-Supervised Learning Framework for Optical Flow Estimation
Liu, Pengpeng, Lyu, Michael R., King, Irwin, Xu, Jia
We present DistillFlow, a knowledge distillation approach to learning optical flow. DistillFlow trains multiple teacher models and a student model, where challenging transformations are applied to the input of the student model to generate hallucinated occlusions as well as less confident predictions. Then, a self-supervised learning framework is constructed: confident predictions from teacher models are served as annotations to guide the student model to learn optical flow for those less confident predictions. The self-supervised learning framework enables us to effectively learn optical flow from unlabeled data, not only for non-occluded pixels, but also for occluded pixels. DistillFlow achieves state-of-the-art unsupervised learning performance on both KITTI and Sintel datasets. Our self-supervised pre-trained model also provides an excellent initialization for supervised fine-tuning, suggesting an alternate training paradigm in contrast to current supervised learning methods that highly rely on pre-training on synthetic data. At the time of writing, our fine-tuned models ranked 1st among all monocular methods on the KITTI 2015 benchmark, and outperform all published methods on the Sintel Final benchmark. More importantly, we demonstrate the generalization capability of DistillFlow in three aspects: framework generalization, correspondence generalization and cross-dataset generalization.