Goto

Collaborating Authors

 Bayesian Inference


Data Assimilation Networks

arXiv.org Artificial Intelligence

Data assimilation (DA) aims at forecasting the state of a dynamical system by combining a mathematical representation of the system with noisy observations taking into account their uncertainties. State of the art methods are based on the Gaussian error statistics and the linearization of the non-linear dynamics which may lead to sub-optimal methods. In this respect, there are still open questions how to improve these methods. In this paper, we propose a fully data driven deep learning architecture generalizing recurrent Elman networks and data assimilation algorithms which approximate a sequence of prior and posterior densities conditioned on noisy observations. By construction our approach can be used for general nonlinear dynamics and non-Gaussian densities. On numerical experiments based on the well-known Lorenz-95 system and with Gaussian error statistics, our architecture achieves comparable performance to EnKF on both the analysis and the propagation of probability density functions of the system state at a given time without using any explicit regularization technique.


Federated Variational Inference: Towards Improved Personalization and Generalization

arXiv.org Artificial Intelligence

Conventional federated learning algorithms train a single global model by leveraging all participating clients' data. However, due to heterogeneity in client generative distributions and predictive models, these approaches may not appropriately approximate the predictive process, converge to an optimal state, or generalize to new clients. We study personalization and generalization in stateless cross-device federated learning setups assuming heterogeneity in client data distributions and predictive models. We first propose a hierarchical generative model and formalize it using Bayesian Inference. We then approximate this process using Variational Inference to train our model efficiently. We call this algorithm Federated Variational Inference (FedVI). We use PAC-Bayes analysis to provide generalization bounds for FedVI. We evaluate our model on FEMNIST and CIFAR-100 image classification and show that FedVI beats the state-of-the-art on both tasks.


Utility-Probability Duality of Neural Networks

arXiv.org Artificial Intelligence

It is typically understood that the training of modern neural networks is a process of fitting the probability distribution of desired output. However, recent paradoxical observations in a number of language generation tasks let one wonder if this canonical probability-based explanation can really account for the empirical success of deep learning. To resolve this issue, we propose an alternative utility-based explanation to the standard supervised learning procedure in deep learning. The basic idea is to interpret the learned neural network not as a probability model but as an ordinal utility function that encodes the preference revealed in training data. In this perspective, training of the neural network corresponds to a utility learning process. Specifically, we show that for all neural networks with softmax outputs, the SGD learning dynamic of maximum likelihood estimation (MLE) can be seen as an iteration process that optimizes the neural network toward an optimal utility function. This utility-based interpretation can explain several otherwise-paradoxical observations about the neural networks thus trained. Moreover, our utility-based theory also entails an equation that can transform the learned utility values back to a new kind of probability estimation with which probability-compatible decision rules enjoy dramatic (double-digits) performance improvements. These evidences collectively reveal a phenomenon of utility-probability duality in terms of what modern neural networks are (truly) modeling: We thought they are one thing (probabilities), until the unexplainable showed up; changing mindset and treating them as another thing (utility values) largely reconcile the theory, despite remaining subtleties regarding its original (probabilistic) identity.


Non-Log-Concave and Nonsmooth Sampling via Langevin Monte Carlo Algorithms

arXiv.org Artificial Intelligence

The task of drawing samples efficiently from high-dimensional complex probability distributions enables us to perform inference using complex statistical models from large amounts of data, where uncertainty quantification is of paramount importance to understand the intrinsic risk associated with every decision made with models learned from data. The ability to quantify uncertainty when comparing a theoretical or computational model to observations is critical to conducting a sound scientific investigation, particularly in machine-learned models and in the physical sciences like physics [92]. More specifically, Bayesian inference [96, 184] is a prominent method for linking models and observations and estimating uncertainties, in which sampling techniques are widely adopted, which also finds applications to various areas such as imaging processing and inverse problems (see e.g., [87]), and Bayesian neural networks and deep learning [134], etc. While Markov chain Monte Carlo (MCMC) methods [164] have been the major workhorse of such sampling tasks, most traditional MCMC algorithms were regarded as unscalable to high dimensions. In particular, in modern large-scale applications such as Bayesian deep learning in the overparameterized regime in which we want to make posterior inference on the neural network weights, traditional MCMC algorithms become computationally prohibitive in such high dimensions and alternative approaches such as variational inference (VI; see e.g., [21]) have been widely adopted.


Bayesian Analysis for Over-parameterized Linear Model without Sparsity

arXiv.org Machine Learning

In high-dimensional Bayesian statistics, several methods have been developed, including many prior distributions that lead to the sparsity of estimated parameters. However, such priors have limitations in handling the spectral eigenvector structure of data, and as a result, they are ill-suited for analyzing over-parameterized models (high-dimensional linear models that do not assume sparsity) that have been developed in recent years. This paper introduces a Bayesian approach that uses a prior dependent on the eigenvectors of data covariance matrices, but does not induce the sparsity of parameters. We also provide contraction rates of derived posterior distributions and develop a truncated Gaussian approximation of the posterior distribution. The former demonstrates the efficiency of posterior estimation, while the latter enables quantification of parameter uncertainty using a Bernstein-von Mises-type approach. These results indicate that any Bayesian method that can handle the spectrum of data and estimate non-sparse high dimensions would be possible.


Bayesian inference with finitely wide neural networks

arXiv.org Artificial Intelligence

Neal in his seminal work [1] pointed out that a shallow but infinitely wide random neural network is a Gaussian process (GP) [2] in statistical sense. Subsequent work [3, 4] in interpreting neural network with specific nonlinear activation units as kernel machines was also inspired by such idea. More recent reports [5, 6] further claimed the equivalence between GP and deep neural networks when each hidden layer in latter is of infinite width. Consequently, machine learning practitioners can perform Bayesian inference by treating deep and wide neural network as a GP, and exploit the analytic marginal and conditional properties of multivariate Gaussian distribution. Otherwise, one needs to employ gradient-based learning and bootstrap sampling for obtaining predictive distribution [7]. In reality, all neural networks have finite width. Therefore, the deviation from Gaussianity requires further quantitative account as practitioners may wonder the corrections to the predictive mean and variance in, for example, a regression task. Yaida [8] and colleagues [9] proposed a perturbative approach for computing the multivariate cumulants by direct application of Wick's contraction theorem.


Data-driven Science and Machine Learning Methods in Laser-Plasma Physics

arXiv.org Artificial Intelligence

Laser-plasma physics has developed rapidly over the past few decades as high-power lasers have become both increasingly powerful and more widely available. Early experimental and numerical research in this field was restricted to single-shot experiments with limited parameter exploration. However, recent technological improvements make it possible to gather an increasing amount of data, both in experiments and simulations. This has sparked interest in using advanced techniques from mathematics, statistics and computer science to deal with, and benefit from, big data. At the same time, sophisticated modeling techniques also provide new ways for researchers to effectively deal with situations in which still only sparse amounts of data are available. This paper aims to present an overview of relevant machine learning methods with focus on applicability to laser-plasma physics, including its important sub-fields of laser-plasma acceleration and inertial confinement fusion.


Causal Discovery with Unobserved Variables: A Proxy Variable Approach

arXiv.org Artificial Intelligence

Discovering causal relations from observational data is important. The existence of unobserved variables, such as latent confounders or mediators, can mislead the causal identification. To address this issue, proximal causal discovery methods proposed to adjust for the bias with the proxy of the unobserved variable. However, these methods presumed the data is discrete, which limits their real-world application. In this paper, we propose a proximal causal discovery method that can well handle the continuous variables. Our observation is that discretizing continuous variables can can lead to serious errors and comprise the power of the proxy. Therefore, to use proxy variables in the continuous case, the critical point is to control the discretization error. To this end, we identify mild regularity conditions on the conditional distributions, enabling us to control the discretization error to an infinitesimal level, as long as the proxy is discretized with sufficiently fine, finite bins. Based on this, we design a proxy-based hypothesis test for identifying causal relationships when unobserved variables are present. Our test is consistent, meaning it has ideal power when large samples are available. We demonstrate the effectiveness of our method using synthetic and real-world data.


Building Transportation Foundation Model via Generative Graph Transformer

arXiv.org Artificial Intelligence

Efficient traffic management is crucial for maintaining urban mobility, especially in densely populated areas where congestion, accidents, and delays can lead to frustrating and expensive commutes. However, existing prediction methods face challenges in terms of optimizing a single objective and understanding the complex composition of the transportation system. Moreover, they lack the ability to understand the macroscopic system and cannot efficiently utilize big data. In this paper, we propose a novel approach, Transportation Foundation Model (TFM), which integrates the principles of traffic simulation into traffic prediction. TFM uses graph structures and dynamic graph generation algorithms to capture the participatory behavior and interaction of transportation system actors. This data-driven and model-free simulation method addresses the challenges faced by traditional systems in terms of structural complexity and model accuracy and provides a foundation for solving complex transportation problems with real data. The proposed approach shows promising results in accurately predicting traffic outcomes in an urban transportation setting.


Dimensionality Reduction as Probabilistic Inference

arXiv.org Artificial Intelligence

Dimensionality reduction (DR) algorithms compress high-dimensional data into a lower dimensional representation while preserving important features of the data. DR is a critical step in many analysis pipelines as it enables visualisation, noise reduction and efficient downstream processing of the data. In this work, we introduce the ProbDR variational framework, which interprets a wide range of classical DR algorithms as probabilistic inference algorithms in this framework. ProbDR encompasses PCA, CMDS, LLE, LE, MVU, diffusion maps, kPCA, Isomap, (t-)SNE, and UMAP. In our framework, a low-dimensional latent variable is used to construct a covariance, precision, or a graph Laplacian matrix, which can be used as part of a generative model for the data. Inference is done by optimizing an evidence lower bound. We demonstrate the internal consistency of our framework and show that it enables the use of probabilistic programming languages (PPLs) for DR. Additionally, we illustrate that the framework facilitates reasoning about unseen data and argue that our generative models approximate Gaussian processes (GPs) on manifolds. By providing a unified view of DR, our framework facilitates communication, reasoning about uncertainties, model composition, and extensions, particularly when domain knowledge is present.