AITopics | Petersson, Christoffer

Collaborating Authors

Petersson, Christoffer

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

GASP: Unifying Geometric and Semantic Self-Supervised Pre-training for Autonomous Driving

Ljungbergh, William, Lilja, Adam, Ling, Adam Tonderski. Arvid Laveno, Lindström, Carl, Verbeke, Willem, Fu, Junsheng, Petersson, Christoffer, Hammarstrand, Lars, Felsberg, Michael

arXiv.org Artificial IntelligenceMar-19-2025

Self-supervised pre-training based on next-token prediction has enabled large language models to capture the underlying structure of text, and has led to unprecedented performance on a large array of tasks when applied at scale. Similarly, autonomous driving generates vast amounts of spatiotemporal data, alluding to the possibility of harnessing scale to learn the underlying geometric and semantic structure of the environment and its evolution over time. In this direction, we propose a geometric and semantic self-supervised pre-training method, GASP, that learns a unified representation by predicting, at any queried future point in spacetime, (1) general occupancy, capturing the evolving structure of the 3D scene; (2) ego occupancy, modeling the ego vehicle path through the environment; and (3) distilled high-level features from a vision foundation model. By modeling geometric and semantic 4D occupancy fields instead of raw sensor measurements, the model learns a structured, generalizable representation of the environment and its evolution through time. We validate GASP on multiple autonomous driving benchmarks, demonstrating significant improvements in semantic occupancy forecasting, online mapping, and ego trajectory prediction. Our results demonstrate that continuous 4D geometric and semantic occupancy prediction provides a scalable and effective pre-training paradigm for autonomous driving. For code and additional visualizations, see \href{https://research.zenseact.com/publications/gasp/.

autonomous driving, geometric and semantic self-supervised pre-training, unifying geometric, (1 more...)

arXiv.org Artificial Intelligence

2503.15672

Industry:

Transportation > Ground > Road (1.00)
Information Technology > Robotics & Automation (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Are NeRFs ready for autonomous driving? Towards closing the real-to-simulation gap

Lindström, Carl, Hess, Georg, Lilja, Adam, Fatemi, Maryam, Hammarstrand, Lars, Petersson, Christoffer, Svensson, Lennart

arXiv.org Artificial IntelligenceApr-15-2024

Neural Radiance Fields (NeRFs) have emerged as promising tools for advancing autonomous driving (AD) research, offering scalable closed-loop simulation and data augmentation capabilities. However, to trust the results achieved in simulation, one needs to ensure that AD systems perceive real and rendered data in the same way. Although the performance of rendering methods is increasing, many scenarios will remain inherently challenging to reconstruct faithfully. To this end, we propose a novel perspective for addressing the real-to-simulated data gap. Rather than solely focusing on improving rendering fidelity, we explore simple yet effective methods to enhance perception model robustness to NeRF artifacts without compromising performance on real data. Moreover, we conduct the first large-scale investigation into the real-to-simulated data gap in an AD setting using a state-of-the-art neural rendering technique. Specifically, we evaluate object detectors and an online mapping model on real and simulated data, and study the effects of different fine-tuning strategies.Our results show notable improvements in model robustness to simulated data, even improving real-world performance in some cases. Last, we delve into the correlation between the real-to-simulated gap and image reconstruction metrics, identifying FID and LPIPS as strong indicators. See https://research.zenseact.com/publications/closing-real2sim-gap for our project page.

artificial intelligence, autonomous driving, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2403.16092

Genre: Research Report > New Finding (0.68)

Industry:

Transportation > Ground > Road (0.62)
Information Technology > Robotics & Automation (0.62)
Automobiles & Trucks (0.62)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Zenseact Open Dataset: A large-scale and diverse multimodal dataset for autonomous driving

Alibeigi, Mina, Ljungbergh, William, Tonderski, Adam, Hess, Georg, Lilja, Adam, Lindstrom, Carl, Motorniuk, Daria, Fu, Junsheng, Widahl, Jenny, Petersson, Christoffer

arXiv.org Artificial IntelligenceOct-21-2023

To address this gap, we introduce Zenseact Open Dataset (ZOD), a largescale and diverse multimodal dataset collected over two years in various European countries, covering an area 9 that of existing datasets. ZOD boasts the highest range and resolution sensors among comparable datasets, coupled with detailed keyframe annotations for 2D and 3D objects (up to 245m), road instance/semantic segmentation, traffic sign recognition, and road classification. We believe that this unique combination will facilitate breakthroughs in long-range perception and multi-task learning. The dataset is composed of Frames, Sequences, and Drives, designed to encompass both data diversity and support for spatiotemporal learning, sensor fusion, localization, and mapping. Frames consist of 100k curated camera images with two seconds of other supporting sensor data, while the 1473 Sequences and 29 Drives include the entire sensor suite for 20 seconds and a few minutes, respectively. ZOD is the only large-scale AD dataset released under a permissive license, allowing for both research and commercial use. More information, and an extensive devkit, can be found Figure 1: Geographical coverage comparison with other at zod.zenseact.com. AD datasets using the diversity area metric defined in [27] (top left), and geographical distribution of ZOD Frames overlaid on the map.

artificial intelligence, dataset, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2305.02008

Country: Europe > Germany (0.28)

Genre: Research Report (0.82)

Industry:

Transportation > Ground > Road (1.00)
Information Technology > Security & Privacy (1.00)
Transportation > Infrastructure & Services (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)
(2 more...)

Add feedback

Towards trustworthy multi-modal motion prediction: Holistic evaluation and interpretability of outputs

Limeros, Sandra Carrasco, Majchrowska, Sylwia, Johnander, Joakim, Petersson, Christoffer, Sotelo, Miguel Ángel, Llorca, David Fernández

arXiv.org Artificial IntelligenceAug-5-2023

Predicting the motion of other road agents enables autonomous vehicles to perform safe and efficient path planning. This task is very complex, as the behaviour of road agents depends on many factors and the number of possible future trajectories can be considerable (multi-modal). Most prior approaches proposed to address multi-modal motion prediction are based on complex machine learning systems that have limited interpretability. Moreover, the metrics used in current benchmarks do not evaluate all aspects of the problem, such as the diversity and admissibility of the output. In this work, we aim to advance towards the design of trustworthy motion prediction systems, based on some of the requirements for the design of Trustworthy Artificial Intelligence. We focus on evaluation criteria, robustness, and interpretability of outputs. First, we comprehensively analyse the evaluation metrics, identify the main gaps of current benchmarks, and propose a new holistic evaluation framework. We then introduce a method for the assessment of spatial and temporal robustness by simulating noise in the perception system. To enhance the interpretability of the outputs and generate more balanced results in the proposed evaluation framework, we propose an intent prediction layer that can be attached to multi-modal motion prediction models. The effectiveness of this approach is assessed through a survey that explores different elements in the visualization of the multi-modal trajectories and intentions. The proposed approach and findings make a significant contribution to the development of trustworthy motion prediction systems for autonomous vehicles, advancing the field towards greater safety and reliability.

artificial intelligence, machine learning, prediction, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1049/cit2.12244

2210.16144

Country:

North America > United States (0.67)
Europe > Sweden (0.46)
Europe > Spain (0.46)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.93)
Overview (0.92)

Industry: Government > Regional Government > Europe Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

HEAL-SWIN: A Vision Transformer On The Sphere

Carlsson, Oscar, Gerken, Jan E., Linander, Hampus, Spieß, Heiner, Ohlsson, Fredrik, Petersson, Christoffer, Persson, Daniel

arXiv.org Artificial IntelligenceJul-14-2023

High-resolution wide-angle fisheye images are becoming more and more important for robotics applications such as autonomous driving. However, using ordinary convolutional neural networks or vision transformers on this data is problematic due to projection and distortion losses introduced when projecting to a rectangular grid on the plane. We introduce the HEAL-SWIN transformer, which combines the highly uniform Hierarchical Equal Area iso-Latitude Pixelation (HEALPix) grid used in astrophysics and cosmology with the Hierarchical Shifted-Window (SWIN) transformer to yield an efficient and flexible model capable of training on high-resolution, distortion-free spherical data. In HEAL-SWIN, the nested structure of the HEALPix grid is used to perform the patching and windowing operations of the SWIN transformer, resulting in a one-dimensional representation of the spherical data with minimal computational overhead. We demonstrate the superior performance of our model for semantic segmentation and depth regression tasks on both synthetic and real automotive datasets.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2307.07313

Country:

Europe > Sweden (0.30)
Europe > Germany (0.28)

Genre: Research Report (0.50)

Industry: Transportation > Ground > Road (0.49)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

LidarCLIP or: How I Learned to Talk to Point Clouds

Hess, Georg, Tonderski, Adam, Petersson, Christoffer, Åström, Kalle, Svensson, Lennart

arXiv.org Artificial IntelligenceMay-2-2023

Research connecting text and images has recently seen several breakthroughs, with models like CLIP, DALL-E 2, and Stable Diffusion. However, the connection between text and other visual modalities, such as lidar data, has received less attention, prohibited by the lack of text-lidar datasets. In this work, we propose LidarCLIP, a mapping from automotive point clouds to a pre-existing CLIP embedding space. Using image-lidar pairs, we supervise a point cloud encoder with the image CLIP embeddings, effectively relating text and lidar data with the image domain as an intermediary. We show the effectiveness of LidarCLIP by demonstrating that lidar-based retrieval is generally on par with image-based retrieval, but with complementary strengths and weaknesses. By combining image and lidar features, we improve upon both single-modality methods and enable a targeted search for challenging detection scenarios under adverse sensor conditions. We also explore zero-shot classification and show that LidarCLIP outperforms existing attempts to use CLIP for point clouds by a large margin. Finally, we leverage our compatibility with CLIP to explore a range of applications, such as point cloud captioning and lidar-to-image generation, without any additional training. Code and pre-trained models are available at https://github.com/atonderski/lidarclip.

artificial intelligence, machine learning, point cloud, (16 more...)

arXiv.org Artificial Intelligence

2212.06858

Country:

Asia (0.46)
North America (0.28)

Genre: Research Report (0.82)

Industry: Transportation > Ground > Road (0.68)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.68)

Add feedback

Masked Autoencoder for Self-Supervised Pre-training on Lidar Point Clouds

Hess, Georg, Jaxing, Johan, Svensson, Elias, Hagerman, David, Petersson, Christoffer, Svensson, Lennart

arXiv.org Artificial IntelligenceMar-9-2023

Masked autoencoding has become a successful pretraining paradigm for Transformer models for text, images, and, recently, point clouds. Raw automotive datasets are suitable candidates for self-supervised pre-training as they generally are cheap to collect compared to annotations for tasks like 3D object detection (OD). However, the development of masked autoencoders for point clouds has focused solely on synthetic and indoor data. Consequently, existing methods have tailored their representations and models toward small and dense point clouds with homogeneous point densities. In this work, we study masked autoencoding for point clouds in an automotive setting, which are sparse and for which the point density can vary drastically among objects in the same scene. To this end, we propose Voxel-MAE, a simple masked autoencoding pre-training scheme designed for voxel representations. We pre-train the backbone of a Transformer-based 3D object detector to reconstruct masked voxels and to distinguish between empty and non-empty voxels. Our method improves the 3D OD performance by 1.75 mAP points and 1.05 NDS on the challenging nuScenes dataset. Further, we show that by pre-training with Voxel-MAE, we require only 40% of the annotated data to outperform a randomly initialized equivalent. Code available at https://github.com/georghess/voxel-mae

artificial intelligence, machine learning, point cloud, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/WACVW58289.2023.00039

2207.00531

Country: North America > United States > Minnesota (0.28)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Raw or Cooked? Object Detection on RAW Images

Ljungbergh, William, Johnander, Joakim, Petersson, Christoffer, Felsberg, Michael

arXiv.org Artificial IntelligenceMar-2-2023

Images fed to a deep neural network have in general undergone several handcrafted image signal processing (ISP) operations, all of which have been optimized to produce visually pleasing images. In this work, we investigate the hypothesis that the intermediate representation of visually pleasing images is sub-optimal for downstream computer vision tasks compared to the RAW image representation. We suggest that the operations of the ISP instead should be optimized towards the end task, by learning the parameters of the operations jointly during training. We extend previous works on this topic and propose a new learnable operation that enables an object detector to achieve superior performance when compared to both previous works and traditional RGB images. In experiments on the open PASCALRAW dataset, we empirically confirm our hypothesis.

artificial intelligence, machine learning, opération, (17 more...)

arXiv.org Artificial Intelligence

2301.08965

Country:

Europe > Sweden (0.28)
North America > United States (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Towards Explainable Motion Prediction using Heterogeneous Graph Representations

Limeros, Sandra Carrasco, Majchrowska, Sylwia, Johnander, Joakim, Petersson, Christoffer, Llorca, David Fernández

arXiv.org Artificial IntelligenceDec-7-2022

Motion prediction systems aim to capture the future behavior of traffic scenarios enabling autonomous vehicles to perform safe and efficient planning. The evolution of these scenarios is highly uncertain and depends on the interactions of agents with static and dynamic objects in the scene. GNN-based approaches have recently gained attention as they are well suited to naturally model these interactions. However, one of the main challenges that remains unexplored is how to address the complexity and opacity of these models in order to deal with the transparency requirements for autonomous driving systems, which includes aspects such as interpretability and explainability. In this work, we aim to improve the explainability of motion prediction systems by using different approaches. First, we propose a new Explainable Heterogeneous Graph-based Policy (XHGP) model based on an heterograph representation of the traffic scene and lane-graph traversals, which learns interaction behaviors using object-level and type-level attention. This learned attention provides information about the most important agents and interactions in the scene. Second, we explore this same idea with the explanations provided by GNNExplainer. Third, we apply counterfactual reasoning to provide explanations of selected individual scenarios by exploring the sensitivity of the trained model to changes made to the input data, i.e., masking some elements of the scene, modifying trajectories, and adding or removing dynamic agents. The explainability analysis provided in this paper is a first step towards more transparent and reliable motion prediction systems, important from the perspective of the user, developers and regulatory agencies. UTONOMOUS vehicles (AVs) have to perform trajectory planning based on the global route and the local context. Trajectory planning can be applied in a safer and more efficient way if the system is able to anticipate future motions of surrounding agents [1], as humans inherently do. Motion prediction has recently gained significant attention within the research community since it is one of the key unsolved challenges in reaching full self-driving autonomy [2]. The main goal of motion prediction is to determine a set of coordinates at a future point in time for an agent in the scene. Among the different approaches, graphs are gaining attention since traffic scenarios can be naturally represented as a graph.

data mining, machine learning, prediction, (21 more...)

arXiv.org Artificial Intelligence

2212.03806

Country:

Europe > Sweden (0.46)
Europe > Spain (0.46)
North America > United States (0.46)

Genre: Research Report (1.00)

Industry:

Transportation > Ground > Road (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
(2 more...)

Add feedback

Object Detection as Probabilistic Set Prediction

Hess, Georg, Petersson, Christoffer, Svensson, Lennart

arXiv.org Artificial IntelligenceJul-13-2022

Accurate uncertainty estimates are essential for deploying deep object detectors in safety-critical systems. The development and evaluation of probabilistic object detectors have been hindered by shortcomings in existing performance measures, which tend to involve arbitrary thresholds or limit the detector's choice of distributions. In this work, we propose to view object detection as a set prediction task where detectors predict the distribution over the set of objects. Using the negative log-likelihood for random finite sets, we present a proper scoring rule for evaluating and training probabilistic object detectors. The proposed method can be applied to existing probabilistic detectors, is free from thresholds, and enables fair comparison between architectures. Three different types of detectors are evaluated on the COCO dataset. Our results indicate that the training of existing detectors is optimized toward non-probabilistic metrics. We hope to encourage the development of new object detectors that can accurately estimate their own uncertainty.

artificial intelligence, machine learning, prediction, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-031-20080-9_32

2203.0798

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Vision (0.92)
Information Technology > Artificial Intelligence > Robots (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback