Goto

Collaborating Authors

 Bayesian Inference


Nesting Particle Filters for Experimental Design in Dynamical Systems

arXiv.org Artificial Intelligence

In this paper, we propose a novel approach to Bayesian Experimental Design (BED) for non-exchangeable data that formulates it as risk-sensitive policy optimization. We develop the Inside-Out SMC^2 algorithm that uses a nested sequential Monte Carlo (SMC) estimator of the expected information gain and embeds it into a particle Markov chain Monte Carlo (pMCMC) framework to perform gradient-based policy optimization. This is in contrast to recent approaches that rely on biased estimators of the expected information gain (EIG) to amortize the cost of experiments by learning a design policy in advance. Numerical validation on a set of dynamical systems showcases the efficacy of our method in comparison to other state-of-the-art strategies.


Predictive Churn with the Set of Good Models

arXiv.org Artificial Intelligence

Machine learning models in modern mass-market applications are often updated over time. One of the foremost challenges faced is that, despite increasing overall performance, these updates may flip specific model predictions in unpredictable ways. In practice, researchers quantify the number of unstable predictions between models pre and post update -- i.e., predictive churn. In this paper, we study this effect through the lens of predictive multiplicity -- i.e., the prevalence of conflicting predictions over the set of near-optimal models (the Rashomon set). We show how traditional measures of predictive multiplicity can be used to examine expected churn over this set of prospective models -- i.e., the set of models that may be used to replace a baseline model in deployment. We present theoretical results on the expected churn between models within the Rashomon set from different perspectives. And we characterize expected churn over model updates via the Rashomon set, pairing our analysis with empirical results on real-world datasets -- showing how our approach can be used to better anticipate, reduce, and avoid churn in consumer-facing applications. Further, we show that our approach is useful even for models enhanced with uncertainty awareness.


Bootstrapping Developmental AIs: From Simple Competences to Intelligent Human-Compatible AIs

arXiv.org Artificial Intelligence

Developmental AI is a bootstrapping approach where embodied AIs start with innate competences and learn by interacting with the world. They develop abilities in small steps along a bio-inspired trajectory. However, developmental AIs have not yet reached the abilities of young children. In contrast, mainstream approaches for creating AIs have led to valuable AI systems and impressive feats. These approaches include deep learning and generative approaches (e.g., large language models) and manually constructed symbolic approaches. Manually constructed AIs are brittle even in circumscribed domains. Generative AIs are helpful on average, but they can make strange mistakes and not notice them. They sometimes lack common sense and social alignment. This position paper lays out prospects, gaps, and challenges for augmenting AI mainstream approaches with developmental AI. The ambition is to create data-rich experientially based foundation models and human-compatible, resilient, and trustworthy AIs. This research aims to produce AIs that learn to communicate, establish common ground, read critically, consider the provenance of information, test hypotheses, and collaborate. A virtuous multidisciplinary research cycle has led to developmental AIs with capabilities for multimodal perception, object recognition, and manipulation. Computational models for hierarchical planning, abstraction discovery, curiosity, and language acquisition exist but need to be adapted to an embodied learning approach. They need to bridge competence gaps involving nonverbal communication, speech, reading, and writing. Aspirationally, developmental AIs would learn, share what they learn, and collaborate to achieve high standards. The approach would make the creation of AIs more democratic, enabling more people to train, test, build on, and replicate AIs.


Bayesian Federated Learning Via Expectation Maximization and Turbo Deep Approximate Message Passing

arXiv.org Artificial Intelligence

Federated learning (FL) is a machine learning paradigm where the clients possess decentralized training data and the central server handles aggregation and scheduling. Typically, FL algorithms involve clients training their local models using stochastic gradient descent (SGD), which carries drawbacks such as slow convergence and being prone to getting stuck in suboptimal solutions. In this work, we propose a message passing based Bayesian federated learning (BFL) framework to avoid these drawbacks.Specifically, we formulate the problem of deep neural network (DNN) learning and compression and as a sparse Bayesian inference problem, in which group sparse prior is employed to achieve structured model compression. Then, we propose an efficient BFL algorithm called EMTDAMP, where expectation maximization (EM) and turbo deep approximate message passing (TDAMP) are combined to achieve distributed learning and compression. The central server aggregates local posterior distributions to update global posterior distributions and update hyperparameters based on EM to accelerate convergence. The clients perform TDAMP to achieve efficient approximate message passing over DNN with joint prior distribution. We detail the application of EMTDAMP to Boston housing price prediction and handwriting recognition, and present extensive numerical results to demonstrate the advantages of EMTDAMP.


Logistic-beta processes for modeling dependent random probabilities with beta marginals

arXiv.org Machine Learning

The beta distribution serves as a canonical tool for modeling probabilities and is extensively used in statistics and machine learning, especially in the field of Bayesian nonparametrics. Despite its widespread use, there is limited work on flexible and computationally convenient stochastic process extensions for modeling dependent random probabilities. We propose a novel stochastic process called the logistic-beta process, whose logistic transformation yields a stochastic process with common beta marginals. Similar to the Gaussian process, the logistic-beta process can model dependence on both discrete and continuous domains, such as space or time, and has a highly flexible dependence structure through correlation kernels. Moreover, its normal variance-mean mixture representation leads to highly effective posterior inference algorithms. The flexibility and computational benefits of logistic-beta processes are demonstrated through nonparametric binary regression simulation studies. Furthermore, we apply the logistic-beta process in modeling dependent Dirichlet processes, and illustrate its application and benefits through Bayesian density regression problems in a toxicology study.


The Relevance Feature and Vector Machine for health applications

arXiv.org Artificial Intelligence

This paper presents the Relevance Feature and Vector Machine (RFVM), a novel model that addresses the challenges of the fat-data problem when dealing with clinical prospective studies. The fat-data problem refers to the limitations of Machine Learning (ML) algorithms when working with databases in which the number of features is much larger than the number of samples (a common scenario in certain medical fields). To overcome such limitations, the RFVM incorporates different characteristics: (1) A Bayesian formulation which enables the model to infer its parameters without overfitting thanks to the Bayesian model averaging. (2) A joint optimisation that overcomes the limitations arising from the fat-data characteristic by simultaneously including the variables that define the primal space (features) and those that define the dual space (observations). (3) An integrated prunning that removes the irrelevant features and samples during the training iterative optimization. Also, this last point turns out crucial when performing medical prospective studies, enabling researchers to exclude unnecessary medical tests, reducing costs and inconvenience for patients, and identifying the critical patients/subjects that characterize the disorder and, subsequently, optimize the patient recruitment process that leads to a balanced cohort. The model capabilities are tested against state-of-the-art models in several medical datasets with fat-data problems. These experimental works show that RFVM is capable of achieving competitive classification accuracies while providing the most compact subset of data (in both terms of features and samples). Moreover, the selected features (medical tests) seem to be aligned with the existing medical literature.


A Factor Graph Model of Trust for a Collaborative Multi-Agent System

arXiv.org Artificial Intelligence

In the field of Multi-Agent Systems (MAS), known for their openness, dynamism, and cooperative nature, the ability to trust the resources and services of other agents is crucial. Trust, in this setting, is the reliance and confidence an agent has in the information, behaviors, intentions, truthfulness, and capabilities of others within the system. Our paper introduces a new graphical approach that utilizes factor graphs to represent the interdependent behaviors and trustworthiness among agents. This includes modeling the behavior of robots as a trajectory of actions using a Gaussian process factor graph, which accounts for smoothness, obstacle avoidance, and trust-related factors. Our method for evaluating trust is decentralized and considers key interdependent sub-factors such as proximity safety, consistency, and cooperation. The overall system comprises a network of factor graphs that interact through trust-related factors and employs a Bayesian inference method to dynamically assess trust-based decisions with informed consent. The effectiveness of this method is validated via simulations and empirical tests with autonomous robots navigating unsignalized intersections.


Distilling Symbolic Priors for Concept Learning into Neural Networks

arXiv.org Artificial Intelligence

Humans can learn new concepts from a small number of examples by drawing on their inductive biases. These inductive biases have previously been captured by using Bayesian models defined over symbolic hypothesis spaces. Is it possible to create a neural network that displays the same inductive biases? We show that inductive biases that enable rapid concept learning can be instantiated in artificial neural networks by distilling a prior distribution from a symbolic Bayesian model via meta-learning, an approach for extracting the common structure from a set of tasks. By generating the set of tasks used in meta-learning from the prior distribution of a Bayesian model, we are able to transfer that prior into a neural network. We use this approach to create a neural network with an inductive bias towards concepts expressed as short logical formulas. Analyzing results from previous behavioral experiments in which people learned logical concepts from a few examples, we find that our meta-trained models are highly aligned with human performance.


A Rational Analysis of the Speech-to-Song Illusion

arXiv.org Artificial Intelligence

The speech-to-song illusion is a robust psychological phenomenon whereby a spoken sentence sounds increasingly more musical as it is repeated. Despite decades of research, a complete formal account of this transformation is still lacking, and some of its nuanced characteristics, namely, that certain phrases appear to transform while others do not, is not well understood. Here we provide a formal account of this phenomenon, by recasting it as a statistical inference whereby a rational agent attempts to decide whether a sequence of utterances is more likely to have been produced in a song or speech. Using this approach and analyzing song and speech corpora, we further introduce a novel prose-to-lyrics illusion that is purely text-based. In this illusion, simply duplicating written sentences makes them appear more like song lyrics. We provide robust evidence for this new illusion in both human participants and large language models.


Efficient Incremental Belief Updates Using Weighted Virtual Observations

arXiv.org Artificial Intelligence

We present an algorithmic solution to the problem of incremental belief updating in the context of Monte Carlo inference in Bayesian statistical models represented by probabilistic programs. Given a model and a sample-approximated posterior, our solution constructs a set of weighted observations to condition the model such that inference would result in the same posterior. This problem arises e.g. in multi-level modelling, incremental inference, inference in presence of privacy constraints. First, a set of virtual observations is selected, then, observation weights are found through a computationally efficient optimization procedure such that the reconstructed posterior coincides with or closely approximates the original posterior. We implement and apply the solution to a number of didactic examples and case studies, showing efficiency and robustness of our approach. The provided reference implementation is agnostic to the probabilistic programming language or the inference algorithm, and can be applied to most mainstream probabilistic programming environments.