Goto

Collaborating Authors

 efficient computation


FastCAV: Efficient Computation of Concept Activation Vectors for Explaining Deep Neural Networks

Schmalwasser, Laines, Penzel, Niklas, Denzler, Joachim, Niebling, Julia

arXiv.org Artificial Intelligence

Concepts such as objects, patterns, and shapes are how humans understand the world. Building on this intuition, concept-based explainability methods aim to study representations learned by deep neural networks in relation to human-understandable concepts. Here, Concept Activation Vectors (CAVs) are an important tool and can identify whether a model learned a concept or not. However, the computational cost and time requirements of existing CAV computation pose a significant challenge, particularly in large-scale, high-dimensional architectures. To address this limitation, we introduce FastCAV, a novel approach that accelerates the extraction of CAVs by up to 63.6x (on average 46.4x). We provide a theoretical foundation for our approach and give concrete assumptions under which it is equivalent to established SVM-based methods. Our empirical results demonstrate that CAVs calculated with FastCAV maintain similar performance while being more efficient and stable. In downstream applications, i.e., concept-based explanation methods, we show that FastCAV can act as a replacement leading to equivalent insights. Hence, our approach enables previously infeasible investigations of deep models, which we demonstrate by tracking the evolution of concepts during model training.


59c33016884a62116be975a9bb8257e3-Reviews.html

Neural Information Processing Systems

It is assumed that all the training outputs are observed at all inputs, which leads to a Kronecker product covariance. Noisy observations are modeled via a structured process and this is the main contribution of the paper. While previous work on multi-task GP approaches with Kronecker covariances has considered iid noise in order to carry out efficient computations, this paper shows that it is possible to consider a noise process with Kronecker structure, while maintaining efficient computations. In other words, as in the iid noise case, one never has to compute a Kronecker product and hence computations are O(N 3 T 3) instead of O(N 3T 3). This is achieved by whitening the noise process and projecting the (noiseless) covariance of the system into the eigen-basis of the noise covariance (scaled by the eigenvalues). Their experiments show that the proposed structured-noise multi-task GP approach outperforms the baseline iid-noise multi-task GP method and independent GPs on synthetic data and real applications.


Efficient computation of predictive probabilities in probit models via expectation propagation

Fasano, Augusto, Anceschi, Niccolò, Franzolini, Beatrice, Rebaudo, Giovanni

arXiv.org Machine Learning

Binary regression models represent a popular model-based approach for binary classification. In the Bayesian framework, computational challenges in the form of the posterior distribution motivate still-ongoing fruitful research. Here, we focus on the computation of predictive probabilities in Bayesian probit models via expectation propagation (EP). Leveraging more general results in recent literature, we show that such predictive probabilities admit a closed-form expression. Improvements over state-of-the-art approaches are shown in a simulation study.


On efficient computation in active inference

Paul, Aswin, Sajid, Noor, Da Costa, Lancelot, Razi, Adeel

arXiv.org Artificial Intelligence

Despite being recognized as neurobiologically plausible, active inference faces difficulties when employed to simulate intelligent behaviour in complex environments due to its computational cost and the difficulty of specifying an appropriate target distribution for the agent. This paper introduces two solutions that work in concert to address these limitations. First, we present a novel planning algorithm for finite temporal horizons with drastically lower computational complexity. Second, inspired by Z-learning from control theory literature, we simplify the process of setting an appropriate target distribution for new and existing active inference planning schemes. Our first approach leverages the dynamic programming algorithm, known for its computational efficiency, to minimize the cost function used in planning through the Bellman-optimality principle. Accordingly, our algorithm recursively assesses the expected free energy of actions in the reverse temporal order. This improves computational efficiency by orders of magnitude and allows precise model learning and planning, even under uncertain conditions. Our method simplifies the planning process and shows meaningful behaviour even when specifying only the agent's final goal state. The proposed solutions make defining a target distribution from a goal state straightforward compared to the more complicated task of defining a temporally informed target distribution. The effectiveness of these methods is tested and demonstrated through simulations in standard grid-world tasks. These advances create new opportunities for various applications.


Memory-Based Reinforcement Learning: Efficient Computation with Prioritized Sweeping

Neural Information Processing Systems

We present a new algorithm, Prioritized Sweeping, for efficient prediction and control of stochastic Markov systems. Incremental learning methods such as Temporal Differencing and Q-Iearning have fast real time perfor(cid:173) mance. Classical methods are slower, but more accurate, because they make full use of the observations. Prioritized Sweeping aims for the best of both worlds. It uses all previous experiences both to prioritize impor(cid:173) tant dynamic programming sweeps and to guide the exploration of state(cid:173) space.


Efficient Computation of Complex Distance Metrics Using Hierarchical Filtering

Neural Information Processing Systems

By their very nature, memory based algorithms such as KNN or Parzen windows require a computationally expensive search of a large database of prototypes. In this paper we optimize the search(cid:173) ing process for tangent distance (Simard, LeCun and Denker, 1993) to improve speed performance. The closest prototypes are found by recursively searching included subset.s of the database using dis(cid:173) tances of increasing complexit.y. This is done by using a hierarchy of tangent distances (increasing the Humber of tangent. At each stage, a confidence level of the classification is computed.


Efficient computation of contrastive explanations

Artelt, André, Hammer, Barbara

arXiv.org Artificial Intelligence

With the increasing deployment of machine learning systems in practice, transparency and explainability have become serious issues. Contrastive explanations are considered to be useful and intuitive, in particular when it comes to explaining decisions to lay people, since they mimic the way in which humans explain. Yet, so far, comparably little research has addressed computationally feasible technologies, which allow guarantees on uniqueness and optimality of the explanation and which enable an easy incorporation of additional constraints. Here, we will focus on specific types of models rather than black-box technologies. We study the relation of contrastive and counterfactual explanations and propose mathematical formalizations as well as a 2-phase algorithm for efficiently computing pertinent positives of many standard machine learning models.


Efficient computation of counterfactual explanations of LVQ models

Artelt, André, Hammer, Barbara

arXiv.org Artificial Intelligence

With the increasing use of machine learning in practice and because of legal regulations like EU's GDPR, it becomes indispensable to be able to explain the prediction and behavior of machine learning models. An example of easy to understand explanations of AI models are counterfactual explanations. However, for many models it is still an open research problem how to efficiently compute counterfactual explanations. We investigate how to efficiently compute counterfactual explanations of learning vector quantization models. In particular, we propose different types of convex and non-convex programs depending on the used learning vector quantization model.


Variational Bayes Inference in Digital Receivers

Tran, Viet Hung

arXiv.org Machine Learning

The digital telecommunications receiver is an important context for inference methodology, the key objective being to minimize the expected loss function in recovering the transmitted information. For that criterion, the optimal decision is the Bayesian minimum-risk estimator. However, the computational load of the Bayesian estimator is often prohibitive and, hence, efficient computational schemes are required. The design of novel schemes, striking new balances between accuracy and computational load, is the primary concern of this thesis. Two popular techniques, one exact and one approximate, will be studied. The exact scheme is a recursive one, namely the generalized distributive law (GDL), whose purpose is to distribute all operators across the conditionally independent (CI) factors of the joint model, so as to reduce the total number of operators required. In a novel theorem derived in this thesis, GDL, if applicable, will be shown to guarantee such a reduction in all cases. An associated lemma also quantifies this reduction. For practical use, two novel algorithms, namely the no-longer-needed (NLN) algorithm and the generalized form of the Markovian Forward-Backward (FB) algorithm, recursively factorizes and computes the CI factors of an arbitrary model, respectively. The approximate scheme is an iterative one, namely the Variational Bayes (VB) approximation, whose purpose is to find the independent (i.e. zero-order Markov) model closest to the true joint model in the minimum Kullback-Leibler divergence (KLD) sense. Despite being computationally efficient, this naive mean field approximation confers only modest performance for highly correlated models. A novel approximation, namely Transformed Variational Bayes (TVB), will be designed in the thesis in order to relax the zero-order constraint in the VB approximation, further reducing the KLD of the optimal approximation.


Approximately Optimal Risk-Averse Routing Policies via Adaptive Discretization

Hoy, Darrell (Northwestern University) | Nikolova, Evdokia (University of Texas at Austin)

AAAI Conferences

Mitigating risk in decision-making has been a long-standing problem. Due to the mathematical challenge of its nonlinear nature, especially in adaptive decision-making problems, finding optimal policies is typically intractable. With a focus on efficient algorithms, we ask how well we can approximate the optimal policies for the difficult case of general utility models of risk. Little is known about efficient algorithms beyond the very special cases of linear (risk-neutral) and exponential utilities since general utilities are not separable and preclude the use of traditional dynamic programming techniques. In this paper, we consider general utility functions and investigate efficient computation of approximately optimal routing policies, where the goal is to maximize the expected utility of arriving at a destination around a given deadline. We present an adaptive discretization variant of successive approximation which gives an $\error$-optimal policy in polynomial time. The main insight is to perform discretization at the utility level space, which results in a nonuniform discretization of the domain, and applies for any monotone utility function.