AITopics | Klingner, Marvin

Collaborating Authors

Klingner, Marvin

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

CleverDistiller: Simple and Spatially Consistent Cross-modal Distillation

Govindarajan, Hariprasath, Wozniak, Maciej K., Klingner, Marvin, Maurice, Camille, Kiran, B Ravi, Yogamani, Senthil

arXiv.org Artificial IntelligenceMar-12-2025

Vision foundation models (VFMs) such as DINO have led to a paradigm shift in 2D camera-based perception towards extracting generalized features to support many downstream tasks. Recent works introduce self-supervised cross-modal knowledge distillation (KD) as a way to transfer these powerful generalization capabilities into 3D LiDAR-based models. However, they either rely on highly complex distillation losses, pseudo-semantic maps, or limit KD to features useful for semantic segmentation only. In this work, we propose CleverDistiller, a self-supervised, cross-modal 2D-to-3D KD framework introducing a set of simple yet effective design choices: Unlike contrastive approaches relying on complex loss design choices, our method employs a direct feature similarity loss in combination with a multi layer perceptron (MLP) projection head to allow the 3D network to learn complex semantic dependencies throughout the projection. Crucially, our approach does not depend on pseudo-semantic maps, allowing for direct knowledge transfer from a VFM without explicit semantic supervision. Additionally, we introduce the auxiliary self-supervised spatial task of occupancy prediction to enhance the semantic knowledge, obtained from a VFM through KD, with 3D spatial reasoning capabilities. Experiments on standard autonomous driving benchmarks for 2D-to-3D KD demonstrate that CleverDistiller achieves state-of-the-art performance in both semantic segmentation and 3D object detection (3DOD) by up to 10% mIoU, especially when fine tuning on really low data amounts, showing the effectiveness of our simple yet powerful KD strategy

artificial intelligence, machine learning, proceedings, (17 more...)

arXiv.org Artificial Intelligence

2503.09878

Country:

Europe > Sweden (0.14)
North America > United States (0.14)
North America > Canada (0.14)

Genre: Research Report (0.64)

Industry: Transportation > Ground > Road (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.54)

Add feedback

S3PT: Scene Semantics and Structure Guided Clustering to Boost Self-Supervised Pre-Training for Autonomous Driving

Wozniak, Maciej K., Govindarajan, Hariprasath, Klingner, Marvin, Maurice, Camille, Kiran, B Ravi, Yogamani, Senthil

arXiv.org Artificial IntelligenceNov-4-2024

Recent self-supervised clustering-based pre-training techniques like DINO and Cribo have shown impressive results for downstream detection and segmentation tasks. However, real-world applications such as autonomous driving face challenges with imbalanced object class and size distributions and complex scene geometries. In this paper, we propose S3PT a novel scene semantics and structure guided clustering to provide more scene-consistent objectives for self-supervised training. Specifically, our contributions are threefold: First, we incorporate semantic distribution consistent clustering to encourage better representation of rare classes such as motorcycles or animals. Second, we introduce object diversity consistent spatial clustering, to handle imbalanced and diverse object sizes, ranging from large background areas to small objects such as pedestrians and traffic signs. Third, we propose a depth-guided spatial clustering to regularize learning based on geometric information of the scene, thus further refining region separation on the feature level. Our learned representations significantly improve performance in downstream semantic segmentation and 3D object detection tasks on the nuScenes, nuImages, and Cityscapes datasets and show promising domain translation properties.

artificial intelligence, machine learning, proceedings, (15 more...)

arXiv.org Artificial Intelligence

2410.23085

Country: Europe > Sweden (0.14)

Genre: Research Report (0.64)

Industry: Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

X-Align++: cross-modal cross-view alignment for Bird's-eye-view segmentation

Borse, Shubhankar, Yogamani, Senthil, Klingner, Marvin, Ravi, Varun, Cai, Hong, Almuzairee, Abdulaziz, Porikli, Fatih

arXiv.org Artificial IntelligenceJun-6-2023

Bird's-eye-view (BEV) grid is a typical representation of the perception of road components, e.g., drivable area, in autonomous driving. Most existing approaches rely on cameras only to perform segmentation in BEV space, which is fundamentally constrained by the absence of reliable depth information. The latest works leverage both camera and LiDAR modalities but suboptimally fuse their features using simple, concatenation-based mechanisms. In this paper, we address these problems by enhancing the alignment of the unimodal features in order to aid feature fusion, as well as enhancing the alignment between the cameras' perspective view (PV) and BEV representations. We propose X-Align, a novel end-to-end cross-modal and cross-view learning framework for BEV segmentation consisting of the following components: (i) a novel Cross-Modal Feature Alignment (X-FA) loss, (ii) an attention-based Cross-Modal Feature Fusion (X-FF) module to align multi-modal BEV features implicitly, and (iii) an auxiliary PV segmentation branch with Cross-View Segmentation Alignment (X-SA) losses to improve the PV-to-BEV transformation. We evaluate our proposed method across two commonly used benchmark datasets, i.e., nuScenes and KITTI-360. Notably, X-Align significantly outperforms the state-of-the-art by 3 absolute mIoU points on nuScenes. We also provide extensive ablation studies to demonstrate the effectiveness of the individual components.

machine learning, natural language, segmentation, (15 more...)

arXiv.org Artificial Intelligence

2306.0381

Country:

Europe (0.14)
North America > United States > California (0.14)

Genre: Research Report > New Finding (0.68)

Industry:

Transportation > Ground > Road (0.49)
Automobiles & Trucks (0.49)
Information Technology > Robotics & Automation (0.35)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
(3 more...)

Add feedback

X$^3$KD: Knowledge Distillation Across Modalities, Tasks and Stages for Multi-Camera 3D Object Detection

Klingner, Marvin, Borse, Shubhankar, Kumar, Varun Ravi, Rezaei, Behnaz, Narayanan, Venkatraman, Yogamani, Senthil, Porikli, Fatih

arXiv.org Artificial IntelligenceMar-3-2023

Recent advances in 3D object detection (3DOD) have obtained remarkably strong results for LiDAR-based models. In contrast, surround-view 3DOD models based on multiple camera images underperform due to the necessary view transformation of features from perspective view (PV) to a 3D world representation which is ambiguous due to missing depth information. This paper introduces X$^3$KD, a comprehensive knowledge distillation framework across different modalities, tasks, and stages for multi-camera 3DOD. Specifically, we propose cross-task distillation from an instance segmentation teacher (X-IS) in the PV feature extraction stage providing supervision without ambiguous error backpropagation through the view transformation. After the transformation, we apply cross-modal feature distillation (X-FD) and adversarial training (X-AT) to improve the 3D world representation of multi-camera features through the information contained in a LiDAR-based 3DOD teacher. Finally, we also employ this teacher for cross-modal output distillation (X-OD), providing dense supervision at the prediction stage. We perform extensive ablations of knowledge distillation at different stages of multi-camera 3DOD. Our final X$^3$KD model outperforms previous state-of-the-art approaches on the nuScenes and Waymo datasets and generalizes to RADAR-based 3DOD. Qualitative results video at https://youtu.be/1do9DPFmr38.

artificial intelligence, machine learning, survey article, (15 more...)

arXiv.org Artificial Intelligence

2303.02203

Genre: Research Report > Promising Solution (0.34)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)

Add feedback

Inspect, Understand, Overcome: A Survey of Practical Methods for AI Safety

Houben, Sebastian, Abrecht, Stephanie, Akila, Maram, Bär, Andreas, Brockherde, Felix, Feifel, Patrick, Fingscheidt, Tim, Gannamaneni, Sujan Sai, Ghobadi, Seyed Eghbal, Hammam, Ahmed, Haselhoff, Anselm, Hauser, Felix, Heinzemann, Christian, Hoffmann, Marco, Kapoor, Nikhil, Kappel, Falk, Klingner, Marvin, Kronenberger, Jan, Küppers, Fabian, Löhdefink, Jonas, Mlynarski, Michael, Mock, Michael, Mualla, Firas, Pavlitskaya, Svetlana, Poretschkin, Maximilian, Pohl, Alexander, Ravi-Kumar, Varun, Rosenzweig, Julia, Rottmann, Matthias, Rüping, Stefan, Sämann, Timo, Schneider, Jan David, Schulz, Elena, Schwalbe, Gesina, Sicking, Joachim, Srivastava, Toshika, Varghese, Serin, Weber, Michael, Wirkert, Sebastian, Wirtz, Tim, Woehrle, Matthias

arXiv.org Artificial IntelligenceApr-29-2021

The use of deep neural networks (DNNs) in safety-critical applications like mobile health and autonomous driving is challenging due to numerous model-inherent shortcomings. These shortcomings are diverse and range from a lack of generalization over insufficient interpretability to problems with malicious inputs. Cyber-physical systems employing DNNs are therefore likely to suffer from safety concerns. In recent years, a zoo of state-of-the-art techniques aiming to address these safety concerns has emerged. This work provides a structured and broad overview of them. We first identify categories of insufficiencies to then describe research activities aiming at their detection, quantification, or mitigation. Our paper addresses both machine learning experts and safety engineers: The former ones might profit from the broad range of machine learning topics covered and discussions on limitations of recent methods. The latter ones might gain insights into the specifics of modern ML methods. We moreover hope that our contribution fuels discussions on desiderata for ML systems and strategies on how to propel existing approaches accordingly.

logic & formal reasoning, machine learning, neural information processing system, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-031-01233-4_1

2104.14235

Country:

North America > United States (0.45)
Europe > Germany (0.27)

Genre:

Overview (1.00)
Instructional Material > Course Syllabus & Notes (0.67)
Research Report > New Finding (0.45)

Industry:

Transportation > Ground > Road (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
(5 more...)

Add feedback