Goto

Collaborating Authors

 Pernkopf, Franz


Adaptive Variational Inference in Probabilistic Graphical Models: Beyond Bethe, Tree-Reweighted, and Convex Free Energies

arXiv.org Machine Learning

Variational inference in probabilistic graphical models aims to approximate fundamental quantities such as marginal distributions and the partition function. Popular approaches are the Bethe approximation, tree-reweighted, and other types of convex free energies. These approximations are efficient but can fail if the model is complex and highly interactive. In this work, we analyze two classes of approximations that include the above methods as special cases: first, if the model parameters are changed; and second, if the entropy approximation is changed. We discuss benefits and drawbacks of either approach, and deduce from this analysis how a free energy approximation should ideally be constructed. Based on our observations, we propose approximations that automatically adapt to a given model and demonstrate their effectiveness for a range of difficult problems.


Function Space Diversity for Uncertainty Prediction via Repulsive Last-Layer Ensembles

arXiv.org Artificial Intelligence

Bayesian inference in function space has gained attention due to its robustness against overparameterization in neural networks. However, approximating the infinite-dimensional function space introduces several challenges. In this work, we discuss function space inference via particle optimization and present practical modifications that improve uncertainty estimation and, most importantly, make it applicable for large and pretrained networks. First, we demonstrate that the input samples, where particle predictions are enforced to be diverse, are detrimental to the model performance. While diversity on training data itself can lead to underfitting, the use of label-destroying data augmentation, or unlabeled out-of-distribution data can improve prediction diversity and uncertainty estimates. Furthermore, we take advantage of the function space formulation, which imposes no restrictions on network parameterization other than sufficient flexibility. Instead of using full deep ensembles to represent particles, we propose a single multi-headed network that introduces a minimal increase in parameters and computation. This allows seamless integration to pretrained networks, where this repulsive last-layer ensemble can be used for uncertainty aware fine-tuning at minimal additional cost. We achieve competitive results in disentangling aleatoric and epistemic uncertainty for active learning, detecting out-of-domain data, and providing calibrated uncertainty estimates under distribution shifts with minimal compute and memory.


Robustness of Explainable Artificial Intelligence in Industrial Process Modelling

arXiv.org Artificial Intelligence

In the last years, there has been an effort to provide eXplainable Artificial Intelligence (XAI) aims at explanations to the ML model predictions using XAI providing understandable explanations of black (Lundberg & Lee, 2017; Ribeiro et al., 2018; Alvarez-Melis box models. In this paper, we evaluate current & Jaakkola, 2018; Shrikumar et al., 2017). XAI methods by scoring them based on ground truth simulations and sensitivity analysis. To Most of these works, even if they focus on the robustness this end, we used an Electric Arc Furnace (EAF) and trustworthiness of the XAI method, have the shortcoming model to better understand the limits and robustness that they can only be evaluated through surrogate characteristics of XAI methods such as SHapley measures (Crabbé & van der Schaar, 2023), and the ground Additive exPlanations (SHAP), Local Interpretable truth sensitivity of the evaluated datasets cannot be properly Model-agnostic Explanations (LIME), as calculated (Alvarez-Melis & Jaakkola, 2018). Some well as Averaged Local Effects (ALE) or Smooth existing approaches rather use data augmentation (Sun et al., Gradients (SG) in a highly topical setting. These 2020) or create measures estimating the importance of the XAI methods were applied to various types of features (Yeh et al., 2019); further related work is provided black-box models and then scored based on their in Section A.3. None of these systems, to the best of our correctness compared to the ground-truth sensitivity knowledge, consider the ground truth sensitivity, or gradient, of the data-generating processes using a novel of the data-generating process that created the dataset.


On the Convexity and Reliability of the Bethe Free Energy Approximation

arXiv.org Machine Learning

The Bethe free energy approximation provides an effective way for relaxing NP-hard problems of probabilistic inference. However, its accuracy depends on the model parameters and particularly degrades if a phase transition in the model occurs. In this work, we analyze when the Bethe approximation is reliable and how this can be verified. We argue and show by experiment that it is mostly accurate if it is convex on a submanifold of its domain, the 'Bethe box'. For verifying its convexity, we derive two sufficient conditions that are based on the definiteness properties of the Bethe Hessian matrix: the first uses the concept of diagonal dominance, and the second decomposes the Bethe Hessian matrix into a sum of sparse matrices and characterizes the definiteness properties of the individual matrices in that sum. These theoretical results provide a simple way to estimate the critical phase transition temperature of a model. As a practical contribution we propose $\texttt{BETHE-MIN}$, a projected quasi-Newton method to efficiently find a minimum of the Bethe free energy.


Rao-Blackwellising Bayesian Causal Inference

arXiv.org Machine Learning

Bayesian causal inference, i.e., inferring a posterior over causal models for the use in downstream causal reasoning tasks, poses a hard computational inference problem that is little explored in literature. In this work, we combine techniques from order-based MCMC structure learning with recent advances in gradient-based graph learning into an effective Bayesian causal inference framework. Specifically, we decompose the problem of inferring the causal structure into (i) inferring a topological order over variables and (ii) inferring the parent sets for each variable. When limiting the number of parents per variable, we can exactly marginalise over the parent sets in polynomial time. We further use Gaussian processes to model the unknown causal mechanisms, which also allows their exact marginalisation. This introduces a Rao-Blackwellization scheme, where all components are eliminated from the model, except for the causal order, for which we learn a distribution via gradient-based optimisation. The combination of Rao-Blackwellization with our sequential inference procedure for causal orders yields state-of-the-art on linear and non-linear additive noise benchmarks with scale-free and Erdos-Renyi graph structures.


Angle-Equivariant Convolutional Neural Networks for Interference Mitigation in Automotive Radar

arXiv.org Artificial Intelligence

In automotive applications, frequency modulated continuous wave (FMCW) radar is an established technology to determine the distance, velocity and angle of objects in the vicinity of the vehicle. The quality of predictions might be seriously impaired if mutual interference between radar sensors occurs. Previous work processes data from the entire receiver array in parallel to increase interference mitigation quality using neural networks (NNs). However, these architectures do not generalize well across different angles of arrival (AoAs) of interferences and objects. In this paper we introduce fully convolutional neural network (CNN) with rank-three convolutions which is able to transfer learned patterns between different AoAs. Our proposed architecture outperforms previous work while having higher robustness and a lower number of trainable parameters. We evaluate our network on a diverse data set and demonstrate its angle equivariance.


End-to-End Training of Neural Networks for Automotive Radar Interference Mitigation

arXiv.org Artificial Intelligence

In this paper we propose a new method for training neural networks (NNs) for frequency modulated continuous wave (FMCW) radar mutual interference mitigation. Instead of training NNs to regress from interfered to clean radar signals as in previous work, we train NNs directly on object detection maps. We do so by performing a continuous relaxation of the cell-averaging constant false alarm rate (CA-CFAR) peak detector, which is a well-established algorithm for object detection using radar. With this new training objective we are able to increase object detection performance by a large margin. Furthermore, we introduce separable convolution kernels to strongly reduce the number of parameters and computational complexity of convolutional NN architectures for radar applications. We validate our contributions with experiments on real-world measurement data and compare them against signal processing interference mitigation methods.


Explainable Machine Learning for Breakdown Prediction in High Gradient RF Cavities

arXiv.org Artificial Intelligence

The occurrence of vacuum arcs or radio frequency (rf) breakdowns is one of the most prevalent factors limiting the high-gradient performance of normal conducting rf cavities in particle accelerators. In this paper, we search for the existence of previously unrecognized features related to the incidence of rf breakdowns by applying a machine learning strategy to high-gradient cavity data from CERN's test stand for the Compact Linear Collider (CLIC). By interpreting the parameters of the learned models with explainable artificial intelligence (AI), we reverse-engineer physical properties for deriving fast, reliable, and simple rule-based models. Based on 6 months of historical data and dedicated experiments, our models show fractions of data with a high influence on the occurrence of breakdowns. Specifically, it is shown that the field emitted current following an initial breakdown is closely related to the probability of another breakdown occurring shortly thereafter. Results also indicate that the cavity pressure should be monitored with increased temporal resolution in future experiments, to further explore the vacuum activity associated with breakdowns.


End-to-end Keyword Spotting using Neural Architecture Search and Quantization

arXiv.org Machine Learning

This paper introduces neural architecture search (NAS) for the automatic discovery of end-to-end keyword spotting (KWS) models in limited resource environments. We employ a differentiable NAS approach to optimize the structure of convolutional neural networks (CNNs) operating on raw audio waveforms. After a suitable KWS model is found with NAS, we conduct quantization of weights and activations to reduce the memory footprint. We conduct extensive experiments on the Google speech commands dataset. In particular, we compare our end-to-end approach to mel-frequency cepstral coefficient (MFCC) based systems. For quantization, we compare fixed bit-width quantization and trained bit-width quantization. Using NAS only, we were able to obtain a highly efficient model with an accuracy of 95.55% using 75.7k parameters and 13.6M operations. Using trained bit-width quantization, the same model achieves a test accuracy of 93.76% while using on average only 2.91 bits per activation and 2.51 bits per weight.


On Resource-Efficient Bayesian Network Classifiers and Deep Neural Networks

arXiv.org Artificial Intelligence

We present two methods to reduce the complexity of Bayesian network (BN) classifiers. First, we introduce quantization-aware training using the straight-through gradient estimator to quantize the parameters of BNs to few bits. Second, we extend a recently proposed differentiable tree-augmented naive Bayes (TAN) structure learning approach by also considering the model size. Both methods are motivated by recent developments in the deep learning community, and they provide effective means to trade off between model size and prediction accuracy, which is demonstrated in extensive experiments. Furthermore, we contrast quantized BN classifiers with quantized deep neural networks (DNNs) for small-scale scenarios which have hardly been investigated in the literature. We show Pareto optimal models with respect to model size, number of operations, and test error and find that both model classes are viable options.