Goto

Collaborating Authors

 Bayesian Inference


Practical and Matching Gradient Variance Bounds for Black-Box Variational Bayesian Inference

arXiv.org Artificial Intelligence

Understanding the gradient variance of blackbox Despite the advances of BBVI, little is known about its theoretical variational inference (BBVI) is a crucial step properties. Even when restricted to the locationscale for establishing its convergence and developing family (Definition 2), it is unknown whether BBVI algorithmic improvements. However, existing is guaranteed to converge without having to modify the studies have yet to show that the gradient variance algorithms used in practice, for example, by enforcing of BBVI satisfies the conditions used to bounded domains, bounded support, bounded gradients, study the convergence of stochastic gradient descent and such. This theoretical insight is necessary since BBVI (SGD), the workhorse of BBVI. In this methods are known to be less robust (Yao et al., 2018; work, we show that BBVI satisfies a matching Dhaka et al., 2020; Welandawe et al., 2022; Dhaka et al., bound corresponding to the condition used 2021; Domke, 2020) compared to other inference methods in the SGD literature when applied to smooth and such as Markov chain Monte Carlo. Although progress has quadratically-growing log-likelihoods. Our results been made to formalize the theory of BBVI with some generality, generalize to nonlinear covariance parameterizations the gap between our understanding of BBVI and the widely used in the practice of BBVI.


An information field theory approach to Bayesian state and parameter estimation in dynamical systems

arXiv.org Artificial Intelligence

Dynamical system state estimation and parameter calibration problems are ubiquitous across science and engineering. Bayesian approaches to the problem are the gold standard as they allow for the quantification of uncertainties and enable the seamless fusion of different experimental modalities. When the dynamics are discrete and stochastic, one may employ powerful techniques such as Kalman, particle, or variational filters. Practitioners commonly apply these methods to continuous-time, deterministic dynamical systems after discretizing the dynamics and introducing fictitious transition probabilities. However, approaches based on time-discretization suffer from the curse of dimensionality since the number of random variables grows linearly with the number of time-steps. Furthermore, the introduction of fictitious transition probabilities is an unsatisfactory solution because it increases the number of model parameters and may lead to inference bias. To address these drawbacks, the objective of this paper is to develop a scalable Bayesian approach to state and parameter estimation suitable for continuous-time, deterministic dynamical systems. Our methodology builds upon information field theory. Specifically, we construct a physics-informed prior probability measure on the function space of system responses so that functions that satisfy the physics are more likely. This prior allows us to quantify model form errors. We connect the system's response to observations through a probabilistic model of the measurement process. The joint posterior over the system responses and all parameters is given by Bayes' rule. To approximate the intractable posterior, we develop a stochastic variational inference algorithm. In summary, the developed methodology offers a powerful framework for Bayesian estimation in dynamical systems.


Local Message Passing on Frustrated Systems

arXiv.org Artificial Intelligence

Message passing on factor graphs is a powerful framework for probabilistic inference, which finds important applications in various scientific domains. The most wide-spread message passing scheme is the sum-product algorithm (SPA) which gives exact results on trees but often fails on graphs with many small cycles. We search for an alternative message passing algorithm that works particularly well on such cyclic graphs. Therefore, we challenge the extrinsic principle of the SPA, which loses its objective on graphs with cycles. We further replace the local SPA message update rule at the factor nodes of the underlying graph with a generic mapping, which is optimized in a data-driven fashion. These modifications lead to a considerable improvement in performance while preserving the simplicity of the SPA. We evaluate our method for two classes of cyclic graphs: the 2x2 fully connected Ising grid and factor graphs for symbol detection on linear communication channels with inter-symbol interference. To enable the method for large graphs as they occur in practical applications, we develop a novel loss function that is inspired by the Bethe approximation from statistical physics and allows for training in an unsupervised fashion.


Robust Bayesian Inference for Measurement Error Models

arXiv.org Machine Learning

Measurement error occurs when a set of covariates influencing a response variable are corrupted by noise. This can lead to misleading inference outcomes, particularly in problems where accurately estimating the relationship between covariates and response variables is crucial, such as causal effect estimation. Existing methods for dealing with measurement error often rely on strong assumptions such as knowledge of the error distribution or its variance and availability of replicated measurements of the covariates. We propose a Bayesian Nonparametric Learning framework which is robust to mismeasured covariates, does not require the preceding assumptions, and is able to incorporate prior beliefs about the true error distribution. Our approach gives rise to two methods that are robust to measurement error via different loss functions: one based on the Total Least Squares objective and the other based on Maximum Mean Discrepancy (MMD). The latter allows for generalisation to non-Gaussian distributed errors and non-linear covariate-response relationships. We provide bounds on the generalisation error using the MMD-loss and showcase the effectiveness of the proposed framework versus prior art in real-world mental health and dietary datasets that contain significant measurement errors.


Provable benefits of score matching

arXiv.org Artificial Intelligence

Score matching is an alternative to maximum likelihood (ML) for estimating a probability distribution parametrized up to a constant of proportionality. By fitting the ''score'' of the distribution, it sidesteps the need to compute this constant of proportionality (which is often intractable). While score matching and variants thereof are popular in practice, precise theoretical understanding of the benefits and tradeoffs with maximum likelihood -- both computational and statistical -- are not well understood. In this work, we give the first example of a natural exponential family of distributions such that the score matching loss is computationally efficient to optimize, and has a comparable statistical efficiency to ML, while the ML loss is intractable to optimize using a gradient-based method. The family consists of exponentials of polynomials of fixed degree, and our result can be viewed as a continuous analogue of recent developments in the discrete setting. Precisely, we show: (1) Designing a zeroth-order or first-order oracle for optimizing the maximum likelihood loss is NP-hard. (2) Maximum likelihood has a statistical efficiency polynomial in the ambient dimension and the radius of the parameters of the family. (3) Minimizing the score matching loss is both computationally and statistically efficient, with complexity polynomial in the ambient dimension.


Context-Aware Bayesian Network Actor-Critic Methods for Cooperative Multi-Agent Reinforcement Learning

arXiv.org Artificial Intelligence

Cooperative multi-agent reinforcement learning (MARL) methods equip a group of autonomous agents with the capability Executing actions in a correlated manner is a common of planning and learning to maximize their joint strategy for human coordination that often utility, or reward signals in the reinforcement learning (RL) leads to better cooperation, which is also potentially literature, which provides a promising paradigm for a range beneficial for cooperative multi-agent reinforcement of real-world applications, such as traffic control (Chu et al., learning (MARL). However, the recent 2019), coordination of multi-robot systems (Corke et al., success of MARL relies heavily on the convenient 2005), and power grid management (Callaway & Hiskens, paradigm of purely decentralized execution, 2010). As a key distinction from the single-agent setting, where there is no action correlation among agents multi-agent joint action spaces grow exponentially with for scalability considerations. In this work, we the number of agents, which imposes significant scalability introduce a Bayesian network to inaugurate correlations issues. As a convenient and commonly adopted solution, between agents' action selections in their most existing cooperative MARL methods only consider joint policy. Theoretically, we establish a theoretical product policies, i.e., each agent selects its local action independently justification for why action dependencies given the state or its observations. Restricting are beneficial by deriving the multi-agent policy to product policies, however, does come at a cost for cooperative gradient formula under such a Bayesian network tasks: consider an example where cars wait at a joint policy and proving its global convergence crossroads, it would be hard for the cars to coordinate their to Nash equilibria under tabular softmax policy movements without knowing others' intentions, potentially parameterization in cooperative Markov games.


Poisoning Network Flow Classifiers

arXiv.org Artificial Intelligence

As machine learning (ML) classifiers increasingly oversee the automated monitoring of network traffic, studying their resilience against adversarial attacks becomes critical. This paper focuses on poisoning attacks, specifically backdoor attacks, against network traffic flow classifiers. We investigate the challenging scenario of clean-label poisoning where the adversary's capabilities are constrained to tampering only with the training data - without the ability to arbitrarily modify the training labels or any other component of the training process. We describe a trigger crafting strategy that leverages model interpretability techniques to generate trigger patterns that are effective even at very low poisoning rates. Finally, we design novel strategies to generate stealthy triggers, including an approach based on generative Bayesian network models, with the goal of minimizing the conspicuousness of the trigger, and thus making detection of an ongoing poisoning campaign more challenging. Our findings provide significant insights into the feasibility of poisoning attacks on network traffic classifiers used in multiple scenarios, including detecting malicious communication and application classification.


Accelerating science with human-aware artificial intelligence

arXiv.org Artificial Intelligence

Artificial intelligence (AI) models trained on published scientific findings have been used to invent valuable materials and targeted therapies, but they typically ignore the human scientists who continually alter the landscape of discovery. Here we show that incorporating the distribution of human expertise by training unsupervised models on simulated inferences cognitively accessible to experts dramatically improves (up to 400%) AI prediction of future discoveries beyond those focused on research content alone, especially when relevant literature is sparse. These models succeed by predicting human predictions and the scientists who will make them. By tuning human-aware AI to avoid the crowd, we can generate scientifically promising "alien" hypotheses unlikely to be imagined or pursued without intervention until the distant future, which hold promise to punctuate scientific advance beyond questions currently pursued. Accelerating human discovery or probing its blind spots, human-aware AI enables us to move toward and beyond the contemporary scientific frontier.


Theoretical Behavior of XAI Methods in the Presence of Suppressor Variables

arXiv.org Artificial Intelligence

In recent years, the community of 'explainable artificial intelligence' (XAI) has created a vast body of methods to bridge a perceived gap between model 'complexity' and 'interpretability'. However, a concrete problem to be solved by XAI methods has not yet been formally stated. As a result, XAI methods are lacking theoretical and empirical evidence for the 'correctness' of their explanations, limiting their potential use for quality-control and transparency purposes. At the same time, Haufe et al. (2014) showed, using simple toy examples, that even standard interpretations of linear models can be highly misleading. Specifically, high importance may be attributed to so-called suppressor variables lacking any statistical relation to the prediction target. This behavior has been confirmed empirically for a large array of XAI methods in Wilming et al. (2022). Here, we go one step further by deriving analytical expressions for the behavior of a variety of popular XAI methods on a simple two-dimensional binary classification problem involving Gaussian class-conditional distributions. We show that the majority of the studied approaches will attribute non-zero importance to a non-class-related suppressor feature in the presence of correlated noise. This poses important limitations on the interpretations and conclusions that the outputs of these XAI methods can afford.


Priors for symbolic regression

arXiv.org Artificial Intelligence

When choosing between competing symbolic models for a data set, a human will naturally prefer the "simpler" expression or the one which more closely resembles equations previously seen in a similar context. This suggests a non-uniform prior on functions, which is, however, rarely considered within a symbolic regression (SR) framework. In this paper we develop methods to incorporate detailed prior information on both functions and their parameters into SR. Our prior on the structure of a function is based on a $n$-gram language model, which is sensitive to the arrangement of operators relative to one another in addition to the frequency of occurrence of each operator. We also develop a formalism based on the Fractional Bayes Factor to treat numerical parameter priors in such a way that models may be fairly compared though the Bayesian evidence, and explicitly compare Bayesian, Minimum Description Length and heuristic methods for model selection. We demonstrate the performance of our priors relative to literature standards on benchmarks and a real-world dataset from the field of cosmology.