Goto

Collaborating Authors

 Bayesian Inference


Applied Causal Inference Powered by ML and AI

arXiv.org Machine Learning

This book aims to provide a working introduction to the emerging fusion of modern statistical inference - aka machine learning (ML) or artificial intelligence (AI) - and causal inference methods. The book is aimed at upper level undergraduates and master's-level students as well as doctoral students focusing on applied empirical research. A sufficient background for the core material is one semester of introductory econometrics and one semester of machine learning. We hope the book is also useful to empirical researchers looking to apply modern methods in their work. The book provides an overview of key ideas in both predictive inference and causal inference and shows how predictive tools are key ingredients to answering many causal questions.


A prediction rigidity formalism for low-cost uncertainties in trained neural networks

arXiv.org Machine Learning

Regression methods are fundamental for scientific and technological applications. However, fitted models can be highly unreliable outside of their training domain, and hence the quantification of their uncertainty is crucial in many of their applications. Based on the solution of a constrained optimization problem, we propose "prediction rigidities" as a method to obtain uncertainties of arbitrary pre-trained regressors. We establish a strong connection between our framework and Bayesian inference, and we develop a last-layer approximation that allows the new method to be applied to neural networks. This extension affords cheap uncertainties without any modification to the neural network itself or its training procedure. We show the effectiveness of our method on a wide range of regression tasks, ranging from simple toy models to applications in chemistry and meteorology.


Error bounds for particle gradient descent, and extensions of the log-Sobolev and Talagrand inequalities

arXiv.org Machine Learning

We prove non-asymptotic error bounds for particle gradient descent (PGD)~(Kuntz et al., 2023), a recently introduced algorithm for maximum likelihood estimation of large latent variable models obtained by discretizing a gradient flow of the free energy. We begin by showing that, for models satisfying a condition generalizing both the log-Sobolev and the Polyak--{\L}ojasiewicz inequalities (LSI and P{\L}I, respectively), the flow converges exponentially fast to the set of minimizers of the free energy. We achieve this by extending a result well-known in the optimal transport literature (that the LSI implies the Talagrand inequality) and its counterpart in the optimization literature (that the P{\L}I implies the so-called quadratic growth condition), and applying it to our new setting. We also generalize the Bakry--\'Emery Theorem and show that the LSI/P{\L}I generalization holds for models with strongly concave log-likelihoods. For such models, we further control PGD's discretization error, obtaining non-asymptotic error bounds. While we are motivated by the study of PGD, we believe that the inequalities and results we extend may be of independent interest.


Deep Horseshoe Gaussian Processes

arXiv.org Machine Learning

Deep Gaussian processes have recently been proposed as natural objects to fit, similarly to deep neural networks, possibly complex features present in modern data samples, such as compositional structures. Adopting a Bayesian nonparametric approach, it is natural to use deep Gaussian processes as prior distributions, and use the corresponding posterior distributions for statistical inference. We introduce the deep Horseshoe Gaussian process Deep-HGP, a new simple prior based on deep Gaussian processes with a squared-exponential kernel, that in particular enables data-driven choices of the key lengthscale parameters. For nonparametric regression with random design, we show that the associated tempered posterior distribution recovers the unknown true regression curve optimally in terms of quadratic loss, up to a logarithmic factor, in an adaptive way. The convergence rates are simultaneously adaptive to both the smoothness of the regression function and to its structure in terms of compositions. The dependence of the rates in terms of dimension are explicit, allowing in particular for input spaces of dimension increasing with the number of observations.



Supplementary Materials of "BAST: Bayesian Additive Regression Spanning Trees for Complex Constrained Domain "

Neural Information Processing Systems

These appendices provide supplementary details and results of BAST. Appendix A contains additional details on Bayesian estimation and prediction. Supplementary simulation details and results including hyperparameter tuning and computation time can be found in Appendix B. Finally, Appendix C provides the proof of Proposition 1. Appendix A.1 Estimation This appendix provides details on the Markov chain Monte Carlo (MCMC) algorithm discussed in Section 3.1. This probability specification works well in our experiments, but one can modify it if desired. Appendix A.2 Prediction in Two-dimensional Constrained Domains In this subsection we provide details on specifying the neighbor set N To sample the cluster membership of u, we need to determine the cluster memberships for vertices on the domain boundary, which can be done by, for instance, assigning a boundary vertex to the same cluster as its nearest vertex in S with respect to the graph distance in the CDT mesh (when the number of vertices in the CDT graph is large, we expect this to well approximate the geodesic distance).


Statistical Mechanics of Dynamical System Identification

arXiv.org Artificial Intelligence

Recovering dynamical equations from observed noisy data is the central challenge of system identification. We develop a statistical mechanical approach to analyze sparse equation discovery algorithms, which typically balance data fit and parsimony through a trial-and-error selection of hyperparameters. In this framework, statistical mechanics offers tools to analyze the interplay between complexity and fitness, in analogy to that done between entropy and energy. To establish this analogy, we define the optimization procedure as a two-level Bayesian inference problem that separates variable selection from coefficient values and enables the computation of the posterior parameter distribution in closed form. A key advantage of employing statistical mechanical concepts, such as free energy and the partition function, is in the quantification of uncertainty, especially in in the low-data limit; frequently encountered in real-world applications. As the data volume increases, our approach mirrors the thermodynamic limit, leading to distinct sparsity- and noise-induced phase transitions that delineate correct from incorrect identification. This perspective of sparse equation discovery, is versatile and can be adapted to various other equation discovery algorithms.


Offensive Lineup Analysis in Basketball with Clustering Players Based on Shooting Style and Offensive Role

arXiv.org Artificial Intelligence

In a basketball game, scoring efficiency holds significant importance due to the numerous offensive possessions per game. Enhancing scoring efficiency necessitates effective collaboration among players with diverse playing styles. In previous studies, basketball lineups have been analyzed, but their playing style compatibility has not been quantitatively examined. The purpose of this study is to analyze more specifically the impact of playing style compatibility on scoring efficiency, focusing only on offense. This study employs two methods to capture the playing styles of players on offense: shooting style clustering using tracking data, and offensive role clustering based on annotated playtypes and advanced statistics. For the former, interpretable hand-crafted shot features and Wasserstein distances between shooting style distributions were utilized. For the latter, soft clustering was applied to playtype data for the first time. Subsequently, based on the lineup information derived from these two clusterings, machine learning models Bayesian models that predict statistics representing scoring efficiency were trained and interpreted. These approaches provide insights into which combinations of five players tend to be effective and which combinations of two players tend to produce good effects.


On the stochastics of human and artificial creativity

arXiv.org Artificial Intelligence

What constitutes human creativity, and is it possible for computers to exhibit genuine creativity? We argue that achieving human-level intelligence in computers, or so-called Artificial General Intelligence, necessitates attaining also human-level creativity. We contribute to this discussion by developing a statistical representation of human creativity, incorporating prior insights from stochastic theory, psychology, philosophy, neuroscience, and chaos theory. This highlights the stochastic nature of the human creative process, which includes both a bias guided, random proposal step, and an evaluation step depending on a flexible or transformable bias structure. The acquired representation of human creativity is subsequently used to assess the creativity levels of various contemporary AI systems. Our analysis includes modern AI algorithms such as reinforcement learning, diffusion models, and large language models, addressing to what extent they measure up to human level creativity. We conclude that these technologies currently lack the capability for autonomous creative action at a human level.


Normalising Flow-based Differentiable Particle Filters

arXiv.org Artificial Intelligence

Recently, there has been a surge of interest in incorporating neural networks into particle filters, e.g. differentiable particle filters, to perform joint sequential state estimation and model learning for non-linear non-Gaussian state-space models in complex environments. Existing differentiable particle filters are mostly constructed with vanilla neural networks that do not allow density estimation. As a result, they are either restricted to a bootstrap particle filtering framework or employ predefined distribution families (e.g. Gaussian distributions), limiting their performance in more complex real-world scenarios. In this paper we present a differentiable particle filtering framework that uses (conditional) normalising flows to build its dynamic model, proposal distribution, and measurement model. This not only enables valid probability densities but also allows the proposed method to adaptively learn these modules in a flexible way, without being restricted to predefined distribution families. We derive the theoretical properties of the proposed filters and evaluate the proposed normalising flow-based differentiable particle filters' performance through a series of numerical experiments.