Goto

Collaborating Authors

 marc pollefey


7f2cba89a7116c7c6b0a769572d5fad9-Paper.pdf

Neural Information Processing Systems

In the context of localization, however, there is no natural definition of classes. Therefore, images areartificially separated intopositive/negativeclasses with respect to the chosen anchor images, based on some geometric proximity measure.



HouseLayout3D: A Benchmark and Training-Free Baseline for 3D Layout Estimation in the Wild

Bieri, Valentin, Rakotosaona, Marie-Julie, Tateno, Keisuke, Engelmann, Francis, Guibas, Leonidas

arXiv.org Artificial Intelligence

Current 3D layout estimation models are primarily trained on synthetic datasets containing simple single room or single floor environments. As a consequence, they cannot natively handle large multi floor buildings and require scenes to be split into individual floors before processing, which removes global spatial context that is essential for reasoning about structures such as staircases that connect multiple levels. In this work, we introduce HouseLayout3D, a real world benchmark designed to support progress toward full building scale layout estimation, including multiple floors and architecturally intricate spaces. We also present MultiFloor3D, a simple training free baseline that leverages recent scene understanding methods and already outperforms existing 3D layout estimation models on both our benchmark and prior datasets, highlighting the need for further research in this direction. Data and code are available at: https://houselayout3d.github.io.



Spot-On: A Mixed Reality Interface for Multi-Robot Cooperation

Engelbracht, Tim, Lukovic, Petar, Behrens, Tjark, Lascheit, Kai, Zurbrügg, René, Pollefeys, Marc, Blum, Hermann, Bauer, Zuria

arXiv.org Artificial Intelligence

Recent progress in mixed reality (MR) and robotics is enabling increasingly sophisticated forms of human-robot collaboration. Building on these developments, we introduce a novel MR framework that allows multiple quadruped robots to operate in semantically diverse environments via a MR interface. Our system supports collaborative tasks involving drawers, swing doors, and higher-level infrastructure such as light switches. A comprehensive user study verifies both the design and usability of our app, with participants giving a "good" or "very good" rating in almost all cases. Overall, our approach provides an effective and intuitive framework for MR-based multi-robot collaboration in complex, real-world scenarios.


MP-SfM: Monocular Surface Priors for Robust Structure-from-Motion

Pataki, Zador, Sarlin, Paul-Edouard, Schönberger, Johannes L., Pollefeys, Marc

arXiv.org Artificial Intelligence

While Structure-from-Motion (SfM) has seen much progress over the years, state-of-the-art systems are prone to failure when facing extreme viewpoint changes in low-overlap, low-parallax or high-symmetry scenarios. Because capturing images that avoid these pitfalls is challenging, this severely limits the wider use of SfM, especially by non-expert users. We overcome these limitations by augmenting the classical SfM paradigm with monocular depth and normal priors inferred by deep neural networks. Thanks to a tight integration of monocular and multi-view constraints, our approach significantly outperforms existing ones under extreme viewpoint changes, while maintaining strong performance in standard conditions. We also show that monocular priors can help reject faulty associations due to symmetries, which is a long-standing problem for SfM. This makes our approach the first capable of reliably reconstructing challenging indoor environments from few images. Through principled uncertainty propagation, it is robust to errors in the priors, can handle priors inferred by different models with little tuning, and will thus easily benefit from future progress in monocular depth and normal estimation. Our code is publicly available at https://github.com/cvg/mpsfm.


SCENES: Subpixel Correspondence Estimation With Epipolar Supervision

Kloepfer, Dominik A., Henriques, João F., Campbell, Dylan

arXiv.org Artificial Intelligence

Extracting point correspondences from two or more views of a scene is a fundamental computer vision problem with particular importance for relative camera pose estimation and structure-from-motion. Existing local feature matching approaches, trained with correspondence supervision on large-scale datasets, obtain highly-accurate matches on the test sets. However, they do not generalise well to new datasets with different characteristics to those they were trained on, unlike classic feature extractors. Instead, they require finetuning, which assumes that ground-truth correspondences or ground-truth camera poses and 3D structure are available. We relax this assumption by removing the requirement of 3D structure, e.g., depth maps or point clouds, and only require camera pose information, which can be obtained from odometry. We do so by replacing correspondence losses with epipolar losses, which encourage putative matches to lie on the associated epipolar line. While weaker than correspondence supervision, we observe that this cue is sufficient for finetuning existing models on new data. We then further relax the assumption of known camera poses by using pose estimates in a novel bootstrapping approach. We evaluate on highly challenging datasets, including an indoor drone dataset and an outdoor smartphone camera dataset, and obtain state-of-the-art results without strong supervision.


Multi-sensor large-scale dataset for multi-view 3D reconstruction

Voynov, Oleg, Bobrovskikh, Gleb, Karpyshev, Pavel, Galochkin, Saveliy, Ardelean, Andrei-Timotei, Bozhenko, Arseniy, Karmanova, Ekaterina, Kopanev, Pavel, Labutin-Rymsho, Yaroslav, Rakhimov, Ruslan, Safin, Aleksandr, Serpiva, Valerii, Artemov, Alexey, Burnaev, Evgeny, Tsetserukou, Dzmitry, Zorin, Denis

arXiv.org Artificial Intelligence

We present a new multi-sensor dataset for multi-view 3D surface reconstruction. It includes registered RGB and depth data from sensors of different resolutions and modalities: smartphones, Intel RealSense, Microsoft Kinect, industrial cameras, and structured-light scanner. The scenes are selected to emphasize a diverse set of material properties challenging for existing algorithms. We provide around 1.4 million images of 107 different scenes acquired from 100 viewing directions under 14 lighting conditions. We expect our dataset will be useful for evaluation and training of 3D reconstruction algorithms and for related tasks. The dataset is available at skoltech3d.appliedai.tech.


PhD Position in Online 3D Scene Representation Learning - UvA, Netherlands 2022

#artificialintelligence

Do you recognize yourself in the job profile? Then we look forward to receiving your application by 15 February 2022. You can apply online by using the link below. Please mention the months (not just years) in your CV when referring to your education and work experience. Are you excited about creating a digital twin of the 3D world around you?