Schwalbe, Gesina
Unveiling Ontological Commitment in Multi-Modal Foundation Models
Keser, Mert, Schwalbe, Gesina, Amini-Naieni, Niki, Rottmann, Matthias, Knoll, Alois
Ontological commitment, i.e., used concepts, relations, and assumptions, are a corner stone of qualitative reasoning (QR) models. The state-of-the-art for processing raw inputs, though, are deep neural networks (DNNs), nowadays often based off from multimodal foundation models. These automatically learn rich representations of concepts and respective reasoning. Unfortunately, the learned qualitative knowledge is opaque, preventing easy inspection, validation, or adaptation against available QR models. So far, it is possible to associate pre-defined concepts with latent representations of DNNs, but extractable relations are mostly limited to semantic similarity. As a next step towards QR for validation and verification of DNNs: Concretely, we propose a method that extracts the learned superclass hierarchy from a multimodal DNN for a given set of leaf concepts. Under the hood we (1) obtain leaf concept embeddings using the DNN's textual input modality; (2) apply hierarchical clustering to them, using that DNNs encode semantic similarities via vector distances; and (3) label the such-obtained parent concepts using search in available ontologies from QR. An initial evaluation study shows that meaningful ontological class hierarchies can be extracted from state-of-the-art foundation models. Furthermore, we demonstrate how to validate and verify a DNN's learned representations against given ontologies. Lastly, we discuss potential future applications in the context of QR.
Investigating Calibration and Corruption Robustness of Post-hoc Pruned Perception CNNs: An Image Classification Benchmark Study
Mitra, Pallavi, Schwalbe, Gesina, Klein, Nadja
Convolutional Neural Networks (CNNs) have achieved state-of-the-art performance in many computer vision tasks. However, high computational and storage demands hinder their deployment into resource-constrained environments, such as embedded devices. Model pruning helps to meet these restrictions by reducing the model size, while maintaining superior performance. Meanwhile, safety-critical applications pose more than just resource and performance constraints. In particular, predictions must not be overly confident, i.e., provide properly calibrated uncertainty estimations (proper uncertainty calibration), and CNNs must be robust against corruptions like naturally occurring input perturbations (natural corruption robustness). This work investigates the important trade-off between uncertainty calibration, natural corruption robustness, and performance for current state-of-research post-hoc CNN pruning techniques in the context of image classification tasks. Our study reveals that post-hoc pruning substantially improves the model's uncertainty calibration, performance, and natural corruption robustness, sparking hope for safe and robust embedded CNNs.Furthermore, uncertainty calibration and natural corruption robustness are not mutually exclusive targets under pruning, as evidenced by the improved safety aspects obtained by post-hoc unstructured pruning with increasing compression.
The Anatomy of Adversarial Attacks: Concept-based XAI Dissection
Mikriukov, Georgii, Schwalbe, Gesina, Motzkus, Franz, Bade, Korinna
Adversarial attacks (AAs) pose a significant threat to the reliability and robustness of deep neural networks. While the impact of these attacks on model predictions has been extensively studied, their effect on the learned representations and concepts within these models remains largely unexplored. In this work, we perform an in-depth analysis of the influence of AAs on the concepts learned by convolutional neural networks (CNNs) using eXplainable artificial intelligence (XAI) techniques. Through an extensive set of experiments across various network architectures and targeted AA techniques, we unveil several key findings. First, AAs induce substantial alterations in the concept composition within the feature space, introducing new concepts or modifying existing ones. Second, the adversarial perturbation itself can be linearly decomposed into a set of latent vector components, with a subset of these being responsible for the attack's success. Notably, we discover that these components are target-specific, i.e., are similar for a given target class throughout different AA techniques and starting classes. Our findings provide valuable insights into the nature of AAs and their impact on learned representations, paving the way for the development of more robust and interpretable deep learning models, as well as effective defenses against adversarial threats.
GCPV: Guided Concept Projection Vectors for the Explainable Inspection of CNN Feature Spaces
Mikriukov, Georgii, Schwalbe, Gesina, Hellert, Christian, Bade, Korinna
For debugging and verification of computer vision convolutional deep neural networks (CNNs) human inspection of the learned latent representations is imperative. Therefore, state-of-the-art eXplainable Artificial Intelligence (XAI) methods globally associate given natural language semantic concepts with representing vectors or regions in the CNN latent space supporting manual inspection. Yet, this approach comes with two major disadvantages: They are locally inaccurate when reconstructing a concept label and discard information about the distribution of concept instance representations. The latter, though, is of particular interest for debugging, like finding and understanding outliers, learned notions of sub-concepts, and concept confusion. Furthermore, current single-layer approaches neglect that information about a concept may be spread over the CNN depth. To overcome these shortcomings, we introduce the local-to-global Guided Concept Projection Vectors (GCPV) approach: It (1) generates local concept vectors that each precisely reconstruct a concept segmentation label, and then (2) generalizes these to global concept and even sub-concept vectors by means of hiearchical clustering. Our experiments on object detectors demonstrate improved performance compared to the state-of-the-art, the benefit of multi-layer concept vectors, and robustness against low-quality concept segmentation labels. Finally, we demonstrate that GCPVs can be applied to find root causes for confusion of concepts like bus and truck, and reveal interesting concept-level outliers. Thus, GCPVs pose a promising step towards interpretable model debugging and informed data improvement.
Knowledge Augmented Machine Learning with Applications in Autonomous Driving: A Survey
Wörmann, Julian, Bogdoll, Daniel, Brunner, Christian, Bührle, Etienne, Chen, Han, Chuo, Evaristus Fuh, Cvejoski, Kostadin, van Elst, Ludger, Gottschall, Philip, Griesche, Stefan, Hellert, Christian, Hesels, Christian, Houben, Sebastian, Joseph, Tim, Keil, Niklas, Kelsch, Johann, Keser, Mert, Königshof, Hendrik, Kraft, Erwin, Kreuser, Leonie, Krone, Kevin, Latka, Tobias, Mattern, Denny, Matthes, Stefan, Motzkus, Franz, Munir, Mohsin, Nekolla, Moritz, Paschke, Adrian, von Pilchau, Stefan Pilar, Pintz, Maximilian Alexander, Qiu, Tianming, Qureishi, Faraz, Rizvi, Syed Tahseen Raza, Reichardt, Jörg, von Rueden, Laura, Sagel, Alexander, Sasdelli, Diogo, Scholl, Tobias, Schunk, Gerhard, Schwalbe, Gesina, Shen, Hao, Shoeb, Youssef, Stapelbroek, Hendrik, Stehr, Vera, Srinivas, Gurucharan, Tran, Anh Tuan, Vivekanandan, Abhishek, Wang, Ya, Wasserrab, Florian, Werner, Tino, Wirth, Christian, Zwicklbauer, Stefan
The availability of representative datasets is an essential prerequisite for many successful artificial intelligence and machine learning models. However, in real life applications these models often encounter scenarios that are inadequately represented in the data used for training. There are various reasons for the absence of sufficient data, ranging from time and cost constraints to ethical considerations. As a consequence, the reliable usage of these models, especially in safety-critical applications, is still a tremendous challenge. Leveraging additional, already existing sources of knowledge is key to overcome the limitations of purely data-driven approaches. Knowledge augmented machine learning approaches offer the possibility of compensating for deficiencies, errors, or ambiguities in the data, thus increasing the generalization capability of the applied models. Even more, predictions that conform with knowledge are crucial for making trustworthy and safe decisions even in underrepresented scenarios. This work provides an overview of existing techniques and methods in the literature that combine data-driven models with existing knowledge. The identified approaches are structured according to the categories knowledge integration, extraction and conformity. In particular, we address the application of the presented methods in the field of autonomous driving.
Revealing Similar Semantics Inside CNNs: An Interpretable Concept-based Comparison of Feature Spaces
Mikriukov, Georgii, Schwalbe, Gesina, Hellert, Christian, Bade, Korinna
Safety-critical applications require transparency in artificial intelligence (AI) components, but widely used convolutional neural networks (CNNs) widely used for perception tasks lack inherent interpretability. Hence, insights into what CNNs have learned are primarily based on performance metrics, because these allow, e.g., for cross-architecture CNN comparison. However, these neglect how knowledge is stored inside. To tackle this yet unsolved problem, our work proposes two methods for estimating the layer-wise similarity between semantic information inside CNN latent spaces. These allow insights into both the flow and likeness of semantic information within CNN layers, and into the degree of their similarity between different network architectures. As a basis, we use two renowned explainable artificial intelligence (XAI) techniques, which are used to obtain concept activation vectors, i.e., global vector representations in the latent space. These are compared with respect to their activation on test inputs. When applied to three diverse object detectors and two datasets, our methods reveal that (1) similar semantic concepts are learned regardless of the CNN architecture, and (2) similar concepts emerge in similar relative layer depth, independent of the total number of layers. Finally, our approach poses a promising step towards semantic model comparability and comprehension of how different CNNs process semantic information.
Evaluating the Stability of Semantic Concept Representations in CNNs for Robust Explainability
Mikriukov, Georgii, Schwalbe, Gesina, Hellert, Christian, Bade, Korinna
Analysis of how semantic concepts are represented within Convolutional Neural Networks (CNNs) is a widely used approach in Explainable Artificial Intelligence (XAI) for interpreting CNNs. A motivation is the need for transparency in safety-critical AI-based systems, as mandated in various domains like automated driving. However, to use the concept representations for safety-relevant purposes, like inspection or error retrieval, these must be of high quality and, in particular, stable. This paper focuses on two stability goals when working with concept representations in computer vision CNNs: stability of concept retrieval and of concept attribution. The guiding use-case is a post-hoc explainability framework for object detection (OD) CNNs, towards which existing concept analysis (CA) methods are successfully adapted. To address concept retrieval stability, we propose a novel metric that considers both concept separation and consistency, and is agnostic to layer and concept representation dimensionality. We then investigate impacts of concept abstraction level, number of concept training samples, CNN size, and concept representation dimensionality on stability. For concept attribution stability we explore the effect of gradient instability on gradient-based explainability methods. The results on various CNNs for classification and object detection yield the main findings that (1) the stability of concept retrieval can be enhanced through dimensionality reduction via data aggregation, and (2) in shallow layers where gradient instability is more pronounced, gradient smoothing techniques are advised. Finally, our approach provides valuable insights into selecting the appropriate layer and concept representation dimensionality, paving the way towards CA in safety-critical XAI applications.
XAI Method Properties: A (Meta-)study
Schwalbe, Gesina, Finzel, Bettina
In the meantime, a wide variety of terminologies, motivations, approaches and evaluation criteria have been developed within the scope of research on explainable artificial intelligence (XAI). Many taxonomies can be found in the literature, each with a different focus, but also showing many points of overlap. In this paper, we summarize the most cited and current taxonomies in a meta-analysis in order to highlight the essential aspects of the state-of-the-art in XAI. We also present and add terminologies as well as concepts from a large number of survey articles on the topic. Last but not least, we illustrate concepts from the higher-level taxonomy with more than 50 example methods, which we categorize accordingly, thus providing a wide-ranging overview of aspects of XAI and paving the way for use case-appropriate as well as context-specific subsequent research.
Inspect, Understand, Overcome: A Survey of Practical Methods for AI Safety
Houben, Sebastian, Abrecht, Stephanie, Akila, Maram, Bär, Andreas, Brockherde, Felix, Feifel, Patrick, Fingscheidt, Tim, Gannamaneni, Sujan Sai, Ghobadi, Seyed Eghbal, Hammam, Ahmed, Haselhoff, Anselm, Hauser, Felix, Heinzemann, Christian, Hoffmann, Marco, Kapoor, Nikhil, Kappel, Falk, Klingner, Marvin, Kronenberger, Jan, Küppers, Fabian, Löhdefink, Jonas, Mlynarski, Michael, Mock, Michael, Mualla, Firas, Pavlitskaya, Svetlana, Poretschkin, Maximilian, Pohl, Alexander, Ravi-Kumar, Varun, Rosenzweig, Julia, Rottmann, Matthias, Rüping, Stefan, Sämann, Timo, Schneider, Jan David, Schulz, Elena, Schwalbe, Gesina, Sicking, Joachim, Srivastava, Toshika, Varghese, Serin, Weber, Michael, Wirkert, Sebastian, Wirtz, Tim, Woehrle, Matthias
The use of deep neural networks (DNNs) in safety-critical applications like mobile health and autonomous driving is challenging due to numerous model-inherent shortcomings. These shortcomings are diverse and range from a lack of generalization over insufficient interpretability to problems with malicious inputs. Cyber-physical systems employing DNNs are therefore likely to suffer from safety concerns. In recent years, a zoo of state-of-the-art techniques aiming to address these safety concerns has emerged. This work provides a structured and broad overview of them. We first identify categories of insufficiencies to then describe research activities aiming at their detection, quantification, or mitigation. Our paper addresses both machine learning experts and safety engineers: The former ones might profit from the broad range of machine learning topics covered and discussions on limitations of recent methods. The latter ones might gain insights into the specifics of modern ML methods. We moreover hope that our contribution fuels discussions on desiderata for ML systems and strategies on how to propel existing approaches accordingly.