Goto

Collaborating Authors

 Johnander, Joakim


CATPlan: Loss-based Collision Prediction in End-to-End Autonomous Driving

arXiv.org Artificial Intelligence

In recent years, there has been increased interest in the design, training, and evaluation of end-to-end autonomous driving (AD) systems. One often overlooked aspect is the uncertainty of planned trajectories predicted by these systems, despite awareness of their own uncertainty being key to achieve safety and robustness. We propose to estimate this uncertainty by adapting loss prediction from the uncertainty quantification literature. To this end, we introduce a novel light-weight module, dubbed CATPlan, that is trained to decode motion and planning embeddings into estimates of the collision loss used to partially supervise end-to-end AD systems. During inference, these estimates are interpreted as collision risk. We evaluate CATPlan on the safety-critical, nerf-based, closed-loop benchmark NeuroNCAP and find that it manages to detect collisions with a $54.8\%$ relative improvement to average precision over a GMM-based baseline in which the predicted trajectory is compared to the forecasted trajectories of other road users. Our findings indicate that the addition of CATPlan can lead to safer end-to-end AD systems and hope that our work will spark increased interest in uncertainty quantification for such systems.


Hinge-Wasserstein: Mitigating Overconfidence in Regression by Classification

arXiv.org Machine Learning

Computer vision systems that are deployed in safety-critical applications need to quantify their output uncertainty. We study regression from images to parameter values and here it is common to detect uncertainty by predicting probability distributions. In this context, we investigate the regression-by-classification paradigm which can represent multimodal distributions, without a prior assumption on the number of modes. Through experiments on a specifically designed synthetic dataset, we demonstrate that traditional loss functions lead to poor probability distribution estimates and severe overconfidence, in the absence of full ground truth distributions. In order to alleviate these issues, we propose hinge-Wasserstein -- a simple improvement of the Wasserstein loss that reduces the penalty for weak secondary modes during training. This enables prediction of complex distributions with multiple modes, and allows training on datasets where full ground truth distributions are not available. In extensive experiments, we show that the proposed loss leads to substantially better uncertainty estimation on two challenging computer vision tasks: horizon line detection and stereo disparity estimation.


Towards trustworthy multi-modal motion prediction: Holistic evaluation and interpretability of outputs

arXiv.org Artificial Intelligence

Predicting the motion of other road agents enables autonomous vehicles to perform safe and efficient path planning. This task is very complex, as the behaviour of road agents depends on many factors and the number of possible future trajectories can be considerable (multi-modal). Most prior approaches proposed to address multi-modal motion prediction are based on complex machine learning systems that have limited interpretability. Moreover, the metrics used in current benchmarks do not evaluate all aspects of the problem, such as the diversity and admissibility of the output. In this work, we aim to advance towards the design of trustworthy motion prediction systems, based on some of the requirements for the design of Trustworthy Artificial Intelligence. We focus on evaluation criteria, robustness, and interpretability of outputs. First, we comprehensively analyse the evaluation metrics, identify the main gaps of current benchmarks, and propose a new holistic evaluation framework. We then introduce a method for the assessment of spatial and temporal robustness by simulating noise in the perception system. To enhance the interpretability of the outputs and generate more balanced results in the proposed evaluation framework, we propose an intent prediction layer that can be attached to multi-modal motion prediction models. The effectiveness of this approach is assessed through a survey that explores different elements in the visualization of the multi-modal trajectories and intentions. The proposed approach and findings make a significant contribution to the development of trustworthy motion prediction systems for autonomous vehicles, advancing the field towards greater safety and reliability.


Raw or Cooked? Object Detection on RAW Images

arXiv.org Artificial Intelligence

Images fed to a deep neural network have in general undergone several handcrafted image signal processing (ISP) operations, all of which have been optimized to produce visually pleasing images. In this work, we investigate the hypothesis that the intermediate representation of visually pleasing images is sub-optimal for downstream computer vision tasks compared to the RAW image representation. We suggest that the operations of the ISP instead should be optimized towards the end task, by learning the parameters of the operations jointly during training. We extend previous works on this topic and propose a new learnable operation that enables an object detector to achieve superior performance when compared to both previous works and traditional RGB images. In experiments on the open PASCALRAW dataset, we empirically confirm our hypothesis.


Towards Explainable Motion Prediction using Heterogeneous Graph Representations

arXiv.org Artificial Intelligence

Motion prediction systems aim to capture the future behavior of traffic scenarios enabling autonomous vehicles to perform safe and efficient planning. The evolution of these scenarios is highly uncertain and depends on the interactions of agents with static and dynamic objects in the scene. GNN-based approaches have recently gained attention as they are well suited to naturally model these interactions. However, one of the main challenges that remains unexplored is how to address the complexity and opacity of these models in order to deal with the transparency requirements for autonomous driving systems, which includes aspects such as interpretability and explainability. In this work, we aim to improve the explainability of motion prediction systems by using different approaches. First, we propose a new Explainable Heterogeneous Graph-based Policy (XHGP) model based on an heterograph representation of the traffic scene and lane-graph traversals, which learns interaction behaviors using object-level and type-level attention. This learned attention provides information about the most important agents and interactions in the scene. Second, we explore this same idea with the explanations provided by GNNExplainer. Third, we apply counterfactual reasoning to provide explanations of selected individual scenarios by exploring the sensitivity of the trained model to changes made to the input data, i.e., masking some elements of the scene, modifying trajectories, and adding or removing dynamic agents. The explainability analysis provided in this paper is a first step towards more transparent and reliable motion prediction systems, important from the perspective of the user, developers and regulatory agencies. UTONOMOUS vehicles (AVs) have to perform trajectory planning based on the global route and the local context. Trajectory planning can be applied in a safer and more efficient way if the system is able to anticipate future motions of surrounding agents [1], as humans inherently do. Motion prediction has recently gained significant attention within the research community since it is one of the key unsolved challenges in reaching full self-driving autonomy [2]. The main goal of motion prediction is to determine a set of coordinates at a future point in time for an agent in the scene. Among the different approaches, graphs are gaining attention since traffic scenarios can be naturally represented as a graph.