Goto

Collaborating Authors

 Bayesian Inference


Introduction to Machine Learning

arXiv.org Machine Learning

This book introduces the mathematical foundations and techniques that lead to the development and analysis of many of the algorithms that are used in machine learning. It starts with an introductory chapter that describes notation used throughout the book and serve at a reminder of basic concepts in calculus, linear algebra and probability and also introduces some measure theoretic terminology, which can be used as a reading guide for the sections that use these tools. The introductory chapters also provide background material on matrix analysis and optimization. The latter chapter provides theoretical support to many algorithms that are used in the book, including stochastic gradient descent, proximal methods, etc. After discussing basic concepts for statistical prediction, the book includes an introduction to reproducing kernel theory and Hilbert space techniques, which are used in many places, before addressing the description of various algorithms for supervised statistical learning, including linear methods, support vector machines, decision trees, boosting, or neural networks. The subject then switches to generative methods, starting with a chapter that presents sampling methods and an introduction to the theory of Markov chains. The following chapter describe the theory of graphical models, an introduction to variational methods for models with latent variables, and to deep-learning based generative models. The next chapters focus on unsupervised learning methods, for clustering, factor analysis and manifold learning. The final chapter of the book is theory-oriented and discusses concentration inequalities and generalization bounds.


Non-stationary and Sparsely-correlated Multi-output Gaussian Process with Spike-and-Slab Prior

arXiv.org Artificial Intelligence

Multi-output Gaussian process (MGP) is commonly used as a transfer learning method to leverage information among multiple outputs. A key advantage of MGP is providing uncertainty quantification for prediction, which is highly important for subsequent decision-making tasks. However, traditional MGP may not be sufficiently flexible to handle multivariate data with dynamic characteristics, particularly when dealing with complex temporal correlations. Additionally, since some outputs may lack correlation, transferring information among them may lead to negative transfer. To address these issues, this study proposes a non-stationary MGP model that can capture both the dynamic and sparse correlation among outputs. Specifically, the covariance functions of MGP are constructed using convolutions of time-varying kernel functions. Then a dynamic spike-and-slab prior is placed on correlation parameters to automatically decide which sources are informative to the target output in the training process. An expectation-maximization (EM) algorithm is proposed for efficient model fitting. Both numerical studies and a real case demonstrate its efficacy in capturing dynamic and sparse correlation structure and mitigating negative transfer for high-dimensional time-series data. Finally, a mountain-car reinforcement learning case highlights its potential application in decision making problems.


Inverse decision-making using neural amortized Bayesian actors

arXiv.org Artificial Intelligence

Bayesian observer and actor models have provided normative explanations for many behavioral phenomena in perception, sensorimotor control, and other areas of cognitive science and neuroscience. They attribute behavioral variability and biases to different interpretable entities such as perceptual and motor uncertainty, prior beliefs, and behavioral costs. However, when extending these models to more complex tasks with continuous actions, solving the Bayesian decision-making problem is often analytically intractable. Moreover, inverting such models to perform inference over their parameters given behavioral data is computationally even more difficult. Therefore, researchers typically constrain their models to easily tractable components, such as Gaussian distributions or quadratic cost functions, or resort to numerical methods. To overcome these limitations, we amortize the Bayesian actor using a neural network trained on a wide range of different parameter settings in an unsupervised fashion. Using the pre-trained neural network enables performing gradient-based Bayesian inference of the Bayesian actor model's parameters. We show on synthetic data that the inferred posterior distributions are in close alignment with those obtained using analytical solutions where they exist. Where no analytical solution is available, we recover posterior distributions close to the ground truth. We then show that identifiability problems between priors and costs can arise in more complex cost functions. Finally, we apply our method to empirical data and show that it explains systematic individual differences of behavioral patterns.


A sparse PAC-Bayesian approach for high-dimensional quantile prediction

arXiv.org Machine Learning

Quantile regression, a robust method for estimating conditional quantiles, has advanced significantly in fields such as econometrics, statistics, and machine learning. In high-dimensional settings, where the number of covariates exceeds sample size, penalized methods like lasso have been developed to address sparsity challenges. Bayesian methods, initially connected to quantile regression via the asymmetric Laplace likelihood, have also evolved, though issues with posterior variance have led to new approaches, including pseudo/score likelihoods. This paper presents a novel probabilistic machine learning approach for high-dimensional quantile prediction. It uses a pseudo-Bayesian framework with a scaled Student-t prior and Langevin Monte Carlo for efficient computation. The method demonstrates strong theoretical guarantees, through PAC-Bayes bounds, that establish non-asymptotic oracle inequalities, showing minimax-optimal prediction error and adaptability to unknown sparsity. Its effectiveness is validated through simulations and real-world data, where it performs competitively against established frequentist and Bayesian techniques.


Bayesian Learning in a Nonlinear Multiscale State-Space Model

arXiv.org Machine Learning

In many biological systems, the developmental processes of individuals play a crucial role in shaping the traits, characteristics, and growth patterns of subsequent generations. Throughout various stages of growth and maturation, organisms undergo significant changes that impact their overall fitness and reproductive success. These developmental stages, ranging from early cellular differentiation to reproductive maturity, each contribute uniquely to the organism's ability to survive and transmit biological information to offspring. Conversely, hereditary processes also influence the developmental stages of subsequent generations, creating a feedback loop where the heritable traits and adaptations of individuals as well as their health statuses such as disease resistance, metabolic efficiency, or physiological robustness can impact the developmental trajectories of future generations. This feedback loop between developmental processes and heredity continually shapes evolutionary trajectories, driving adaptation and resilience in populations over time.


Learning in Hybrid Active Inference Models

arXiv.org Artificial Intelligence

An open problem in artificial intelligence is how systems can flexibly learn discrete abstractions that are useful for solving inherently continuous problems. Previous work in computational neuroscience has considered this functional integration of discrete and continuous variables during decision-making under the formalism of active inference (Parr, Friston & de Vries, 2017; Parr & Friston, 2018). However, their focus is on the expressive physical implementation of categorical decisions and the hierarchical mixed generative model is assumed to be known. As a consequence, it is unclear how this framework might be extended to learning. We therefore present a novel hierarchical hybrid active inference agent in which a high-level discrete active inference planner sits above a low-level continuous active inference controller. We make use of recent work in recurrent switching linear dynamical systems (rSLDS) which implement end-to-end learning of meaningful discrete representations via the piecewise linear decomposition of complex continuous dynamics (Linderman et al., 2016). The representations learned by the rSLDS inform the structure of the hybrid decision-making agent and allow us to (1) specify temporally-abstracted sub-goals in a method reminiscent of the options framework, (2) lift the exploration into discrete space allowing us to exploit information-theoretic exploration bonuses and (3) `cache' the approximate solutions to low-level problems in the discrete planner. We apply our model to the sparse Continuous Mountain Car task, demonstrating fast system identification via enhanced exploration and successful planning through the delineation of abstract sub-goals.


$\mathtt{emuflow}$: Normalising Flows for Joint Cosmological Analysis

arXiv.org Artificial Intelligence

Given the growth in the variety and precision of astronomical datasets of interest for cosmology, the best cosmological constraints are invariably obtained by combining data from different experiments. At the likelihood level, one complication in doing so is the need to marginalise over large-dimensional parameter models describing the data of each experiment. These include both the relatively small number of cosmological parameters of interest and a large number of "nuisance" parameters. Sampling over the joint parameter space for multiple experiments can thus become a very computationally expensive operation. This can be significantly simplified if one could sample directly from the marginal cosmological posterior distribution of preceding experiments, depending only on the common set of cosmological parameters. In this paper, we show that this can be achieved by emulating marginal posterior distributions via normalising flows. The resulting trained normalising flow models can be used to efficiently combine cosmological constraints from independent datasets without increasing the dimensionality of the parameter space under study. We show that the method is able to accurately describe the posterior distribution of real cosmological datasets, as well as the joint distribution of different datasets, even when significant tension exists between experiments. The resulting joint constraints can be obtained in a fraction of the time it would take to combine the same datasets at the level of their likelihoods. We construct normalising flow models for a set of public cosmological datasets of general interests and make them available, together with the software used to train them, and to exploit them in cosmological parameter inference.


Stein transport for Bayesian inference

arXiv.org Machine Learning

We introduce $\textit{Stein transport}$, a novel methodology for Bayesian inference designed to efficiently push an ensemble of particles along a predefined curve of tempered probability distributions. The driving vector field is chosen from a reproducing kernel Hilbert space and can be derived either through a suitable kernel ridge regression formulation or as an infinitesimal optimal transport map in the Stein geometry. The update equations of Stein transport resemble those of Stein variational gradient descent (SVGD), but introduce a time-varying score function as well as specific weights attached to the particles. While SVGD relies on convergence in the long-time limit, Stein transport reaches its posterior approximation at finite time $t=1$. Studying the mean-field limit, we discuss the errors incurred by regularisation and finite-particle effects, and we connect Stein transport to birth-death dynamics and Fisher-Rao gradient flows. In a series of experiments, we show that in comparison to SVGD, Stein transport not only often reaches more accurate posterior approximations with a significantly reduced computational budget, but that it also effectively mitigates the variance collapse phenomenon commonly observed in SVGD.


Dataset Distillation from First Principles: Integrating Core Information Extraction and Purposeful Learning

arXiv.org Artificial Intelligence

Dataset distillation (DD) is an increasingly important technique that focuses on constructing a synthetic dataset capable of capturing the core information in training data to achieve comparable performance in models trained on the latter. While DD has a wide range of applications, the theory supporting it is less well evolved. New methods of DD are compared on a common set of benchmarks, rather than oriented towards any particular learning task. In this work, we present a formal model of DD, arguing that a precise characterization of the underlying optimization problem must specify the inference task associated with the application of interest. Without this task-specific focus, the DD problem is under-specified, and the selection of a DD algorithm for a particular task is merely heuristic. Our formalization reveals novel applications of DD across different modeling environments. We analyze existing DD methods through this broader lens, highlighting their strengths and limitations in terms of accuracy and faithfulness to optimal DD operation. Finally, we present numerical results for two case studies important in contemporary settings. Firstly, we address a critical challenge in medical data analysis: merging the knowledge from different datasets composed of intersecting, but not identical, sets of features, in order to construct a larger dataset in what is usually a small sample setting. Secondly, we consider out-of-distribution error across boundary conditions for physics-informed neural networks (PINNs), showing the potential for DD to provide more physically faithful data. By establishing this general formulation of DD, we aim to establish a new research paradigm by which DD can be understood and from which new DD techniques can arise.


DAMe: Personalized Federated Social Event Detection with Dual Aggregation Mechanism

arXiv.org Artificial Intelligence

Training social event detection models through federated learning (FedSED) aims to improve participants' performance on the task. However, existing federated learning paradigms are inadequate for achieving FedSED's objective and exhibit limitations in handling the inherent heterogeneity in social data. This paper proposes a personalized federated learning framework with a dual aggregation mechanism for social event detection, namely DAMe. We present a novel local aggregation strategy utilizing Bayesian optimization to incorporate global knowledge while retaining local characteristics. Moreover, we introduce a global aggregation strategy to provide clients with maximum external knowledge of their preferences. In addition, we incorporate a global-local event-centric constraint to prevent local overfitting and ``client-drift''. Experiments within a realistic simulation of a natural federated setting, utilizing six social event datasets spanning six languages and two social media platforms, along with an ablation study, have demonstrated the effectiveness of the proposed framework. Further robustness analyses have shown that DAMe is resistant to injection attacks.