AITopics | Tombari, Federico

Collaborating Authors

Tombari, Federico

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

3D-VField: Adversarial Augmentation of Point Clouds for Domain Generalization in 3D Object Detection

Lehner, Alexander, Gasperini, Stefano, Marcos-Ramiro, Alvaro, Schmidt, Michael, Mahani, Mohammad-Ali Nikouei, Navab, Nassir, Busam, Benjamin, Tombari, Federico

arXiv.org Artificial IntelligenceMay-3-2022

As 3D object detection on point clouds relies on the geometrical relationships between the points, non-standard object shapes can hinder a method's detection capability. However, in safety-critical settings, robustness to out-of-domain and long-tail samples is fundamental to circumvent dangerous issues, such as the misdetection of damaged or rare cars. In this work, we substantially improve the generalization of 3D object detectors to out-of-domain data by deforming point clouds during training. We achieve this with 3D-VField: a novel data augmentation method that plausibly deforms objects via vector fields learned in an adversarial fashion. Our approach constrains 3D points to slide along their sensor view rays while neither adding nor removing any of them. The obtained vectors are transferable, sample-independent and preserve shape and occlusions. Despite training only on a standard dataset, such as KITTI, augmenting with our vector fields significantly improves the generalization to differently shaped objects and scenes. Towards this end, we propose and share CrashD: a synthetic dataset of realistic damaged and rare cars, with a variety of crash scenarios. Extensive experiments on KITTI, Waymo, our CrashD and SUN RGB-D show the generalizability of our techniques to out-of-domain data, different models and sensors, namely LiDAR and ToF cameras, for both indoor and outdoor scenes. Our CrashD dataset is available at https://crashd-cars.github.io.

artificial intelligence, deformation, machine learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/CVPR52688.2022.01678

2112.04764

Country:

North America > United States (0.28)
Europe > Germany (0.28)

Genre: Research Report (1.00)

Industry:

Information Technology (1.00)
Transportation > Ground > Road (0.93)
Automobiles & Trucks (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Time-to-Label: Temporal Consistency for Self-Supervised Monocular 3D Object Detection

Mouawad, Issa, Brasch, Nikolas, Manhardt, Fabian, Tombari, Federico, Odone, Francesca

arXiv.org Artificial IntelligenceMar-4-2022

Monocular 3D object detection continues to attract attention due to the cost benefits and wider availability of RGB cameras. Despite the recent advances and the ability to acquire data at scale, annotation cost and complexity still limit the size of 3D object detection datasets in the supervised settings. Self-supervised methods, on the other hand, aim at training deep networks relying on pretext tasks or various consistency constraints. Moreover, other 3D perception tasks (such as depth estimation) have shown the benefits of temporal priors as a self-supervision signal. In this work, we argue that the temporal consistency on the level of object poses, provides an important supervision signal given the strong prior on physical motion. Specifically, we propose a self-supervised loss which uses this consistency, in addition to render-and-compare losses, to refine noisy pose predictions and derive high-quality pseudo labels. To assess the effectiveness of the proposed method, we finetune a synthetically trained monocular 3D object detection model using the pseudo-labels that we generated on real data. Evaluation on the standard KITTI3D benchmark demonstrates that our method reaches competitive performance compared to other monocular self-supervised and supervised methods.

artificial intelligence, detection, estimation, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/LRA.2022.3188882

2203.02193

Country: Europe (0.28)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Vision > Image Understanding (0.35)

Add feedback

R4Dyn: Exploring Radar for Self-Supervised Monocular Depth Estimation of Dynamic Scenes

Gasperini, Stefano, Koch, Patrick, Dallabetta, Vinzenz, Navab, Nassir, Busam, Benjamin, Tombari, Federico

arXiv.org Artificial IntelligenceNov-29-2021

While self-supervised monocular depth estimation in driving scenarios has achieved comparable performance to supervised approaches, violations of the static world assumption can still lead to erroneous depth predictions of traffic participants, posing a potential safety issue. In this paper, we present R4Dyn, a novel set of techniques to use cost-efficient radar data on top of a self-supervised depth estimation framework. In particular, we show how radar can be used during training as weak supervision signal, as well as an extra input to enhance the estimation robustness at inference time. Since automotive radars are readily available, this allows to collect training data from a variety of existing vehicles. Moreover, by filtering and expanding the signal to make it compatible with learning-based approaches, we address radar inherent issues, such as noise and sparsity. With R4Dyn we are able to overcome a major limitation of self-supervised depth estimation, i.e. the prediction of traffic participants. We substantially improve the estimation on dynamic objects, such as cars by 37% on the challenging nuScenes dataset, hence demonstrating that radar is a valuable additional sensor for monocular depth estimation in autonomous vehicles.

artificial intelligence, machine learning, radar, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/3DV53792.2021.00084

2108.04814

Genre: Research Report (1.00)

Industry:

Transportation > Ground > Road (0.46)
Automobiles & Trucks (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision > Image Understanding (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Panoster: End-to-end Panoptic Segmentation of LiDAR Point Clouds

Gasperini, Stefano, Mahani, Mohammad-Ali Nikouei, Marcos-Ramiro, Alvaro, Navab, Nassir, Tombari, Federico

arXiv.org Artificial IntelligenceFeb-12-2021

Panoptic segmentation has recently unified semantic and instance segmentation, previously addressed separately, thus taking a step further towards creating more comprehensive and efficient perception systems. In this paper, we present Panoster, a novel proposal-free panoptic segmentation method for LiDAR point clouds. Unlike previous approaches relying on several steps to group pixels or points into objects, Panoster proposes a simplified framework incorporating a learning-based clustering solution to identify instances. At inference time, this acts as a class-agnostic segmentation, allowing Panoster to be fast, while outperforming prior methods in terms of accuracy. Without any post-processing, Panoster reached state-of-the-art results among published approaches on the challenging SemanticKITTI benchmark, and further increased its lead by exploiting heuristic techniques. Additionally, we showcase how our method can be flexibly and effectively applied on diverse existing semantic architectures to deliver panoptic predictions.

artificial intelligence, machine learning, segmentation, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/LRA.2021.3060405

2010.15157

Country:

Europe > Germany (0.28)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Add feedback

Quantifying Aleatoric and Epistemic Uncertainty Using Density Estimation in Latent Space

Postels, Janis, Blum, Hermann, Cadena, Cesar, Siegwart, Roland, Van Gool, Luc, Tombari, Federico

arXiv.org Machine LearningDec-5-2020

The distribution of a neural network's latent representations has been successfully used to detect Out-of-Distribution (OOD) data. Since OOD detection denotes a popular benchmark for epistemic uncertainty estimates, this raises the question of a deeper correlation. This work investigates whether the distribution of latent representations indeed contains information about the uncertainty associated with the predictions of a neural network. Prior work identifies epistemic uncertainty with the surprise, thus the negative log-likelihood, of observing a particular latent representation, which we verify empirically. Moreover, we demonstrate that the output-conditional distribution of hidden representations allows quantifying aleatoric uncertainty via the entropy of the predictive distribution. We analyze epistemic and aleatoric uncertainty inferred from the representations of different layers and conclude with the exciting finding that the hidden repesentations of a deterministic neural network indeed contain information about its uncertainty. We verify our findings on both classification and regression models.

deep learning, epistemic uncertainty, neural network, (14 more...)

arXiv.org Machine Learning

2012.03082

Country:

Europe (0.28)
North America > United States > New York (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Graphite: GRAPH-Induced feaTure Extraction for Point Cloud Registration

Saleh, Mahdi, Dehghani, Shervin, Busam, Benjamin, Navab, Nassir, Tombari, Federico

arXiv.org Artificial IntelligenceOct-18-2020

3D Point clouds are a rich source of information that enjoy growing popularity in the vision community. However, due to the sparsity of their representation, learning models based on large point clouds is still a challenge. In this work, we introduce Graphite, a GRAPH-Induced feaTure Extraction pipeline, a simple yet powerful feature transform and keypoint detector. Graphite enables intensive down-sampling of point clouds with keypoint detection accompanied by a descriptor. We construct a generic graph-based learning scheme to describe point cloud regions and extract salient points. To this end, we take advantage of 6D pose information and metric learning to learn robust descriptions and keypoints across different scans. We Reformulate the 3D keypoint pipeline with graph neural networks which allow efficient processing of the point set while boosting its descriptive power which ultimately results in more accurate 3D registrations. We demonstrate our lightweight descriptor on common 3D descriptor matching and point cloud registration benchmarks and achieve comparable results with the state of the art. Describing 100 patches of a point cloud and detecting their keypoints takes only ~0.018 seconds with our proposed network.

artificial intelligence, descriptor, neural network, (16 more...)

arXiv.org Artificial Intelligence

2010.09079

Country: North America > Canada > British Columbia (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)
Information Technology > Data Science > Data Mining > Feature Extraction (0.70)

Add feedback

Sampling-free Epistemic Uncertainty Estimation Using Approximated Variance Propagation

Postels, Janis, Ferroni, Francesco, Coskun, Huseyin, Navab, Nassir, Tombari, Federico

arXiv.org Machine LearningAug-21-2019

We present a sampling-free approach for computing the epistemic uncertainty of a neural network. Epistemic uncertainty is an important quantity for the deployment of deep neural networks in safety-critical applications, since it represents how much one can trust predictions on new data. Recently promising works were proposed using noise injection combined with Monte-Carlo sampling at inference time to estimate this quantity (e.g. Monte-Carlo dropout). Our main contribution is an approximation of the epistemic uncertainty estimated by these methods that does not require sampling, thus notably reducing the computational overhead. We apply our approach to large-scale visual tasks (i.e., semantic segmentation and depth regression) to demonstrate the advantages of our method compared to sampling-based approaches in terms of quality of the uncertainty estimates as well as of computational overhead.

approximated variance propagation, uncertainty estimation

arXiv.org Machine Learning

1908.00598

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.73)

Add feedback

BOP: Benchmark for 6D Object Pose Estimation

Hodan, Tomas, Michel, Frank, Brachmann, Eric, Kehl, Wadim, Buch, Anders Glent, Kraft, Dirk, Drost, Bertram, Vidal, Joel, Ihrke, Stephan, Zabulis, Xenophon, Sahin, Caner, Manhardt, Fabian, Tombari, Federico, Kim, Tae-Kyun, Matas, Jiri, Rother, Carsten

arXiv.org Artificial IntelligenceAug-24-2018

We propose a benchmark for 6D pose estimation of a rigid object from a single RGB-D input image. The training data consists of a texture-mapped 3D object model or images of the object in known 6D poses. The benchmark comprises of: i) eight datasets in a unified format that cover different practical scenarios, including two new datasets focusing on varying lighting conditions, ii) an evaluation methodology with a pose-error function that deals with pose ambiguities, iii) a comprehensive evaluation of 15 diverse recent methods that captures the status quo of the field, and iv) an online evaluation system that is open for continuous submission of new results. The evaluation shows that methods based on point-pair features currently perform best, outperforming template matching methods, learning-based methods and methods based on 3D local features. The project website is available at bop.felk.cvut.cz.

dataset, neural network, video understanding, (21 more...)

arXiv.org Artificial Intelligence

1808.08319

Country:

Europe > Denmark (0.14)
Europe > Czechia (0.14)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Vision > Video Understanding (0.63)

Add feedback