Goto

Collaborating Authors

 Bayesian Learning


Variational predictive resampling

arXiv.org Machine Learning

Bayesian inference provides principled uncertainty quantification, but accurate posterior sampling with MCMC can be computationally prohibitive for modern applications. Variational inference (VI) offers a scalable alternative and often yields accurate predictive distributions, but cheap variational families such as mean-field (MF) can produce over-concentrated approximations that miss posterior dependence. We propose variational predictive resampling (VPR), a scalable posterior sampling method that exploits VI's predictive strength within a predictive-resampling framework to better approximate the Bayesian posterior. Given a prior-likelihood pair, VPR repeatedly imputes future observations from the current variational predictive, updates the variational approximation after each imputation, and records the parameter value implied by the completed sample. We establish conditions under which the law of the parameter returned by VPR is well defined and show that its finite-horizon approximation converges to this limit. In a tractable Gaussian location model, we show that VPR with MF variational predictives converges to the exact Bayesian posterior, whereas the optimal MF-VI approximation retains a non-vanishing asymptotic gap. Experiments on linear regression, logistic regression, and hierarchical linear mixed-effects models demonstrate that VPR substantially improves posterior uncertainty quantification and recovers posterior dependence missed by MF-VI, while remaining computationally competitive with, and often more efficient than, MCMC.


Enhancing a Risk Model by Adding Transient Statistical Factors

arXiv.org Machine Learning

Estimating the covariance of asset returns, i.e., the risk model, is a key component of financial portfolio construction and evaluation. Most risk modeling approaches produce a factor model that decomposes the asset variability into two components: the first attributed to a small number of factors that are common among the assets and the second attributed to the idiosyncratic behavior of each asset. Third-party providers typically provide risk models to investors, and while these models are typically of high quality, they may fail to capture important information, e.g., changing market regimes and transient factors. To overcome these limitations, we propose a systematic method based on maximum likelihood estimation to enhance an existing factor model by both refining the given model and adding new statistical factors. Our approach relies only on the observed sequence of realized returns and on the choice of two hyperparameters: the number of additional factors and the half-life parameter that determines the weights assigned to returns in the log-likelihood objective. Importantly, our methodology applies to the situation where asset returns may be missing, making it suitable for typical equity datasets. We demonstrate our approach on the Barra short-term US risk model, a high-quality risk model used in practice, for a universe of US high-capitalization equities. We show that the proposed extension captures structure in the returns that is missed by the original model.


A Stable Distance Persistence Homology for Dynamic Bayesian Network Clustering

arXiv.org Machine Learning

Dynamic Bayesian networks (DBNs) are a widely used framework for modeling systems whose probabilistic structure evolves over time. Standard inference methods focus on local conditional distributions and can miss larger-scale patterns in how dependencies between variables organize and change over time. We introduce a topological approach to this problem. To each DBN we associate a time-varying graph, called a Dynamic Bayesian Graph (DBG), by assigning to each edge a strength that measures variation in its conditional dependence across parent configurations, and retaining edges whose strength exceeds a chosen threshold. We show that this construction fits within the dynamic graph framework of Kim and Mรฉmoli, enabling the use of tools from topological data analysis. Applying persistent homology to a DBG produces a barcode, which records the merging and disappearance of connected groups of strongly dependent variables over time. We prove that this barcode is stable: small perturbations in the conditional probability tables of the DBN lead to small changes in the resulting barcode. This yields a principled and noise-resistant summary of how dependency structure evolves in a dynamic Bayesian network.


Posterior Contraction Rates for Sparse Kolmogorov-Arnold Networks in Anisotropic Besov Spaces

arXiv.org Machine Learning

We study posterior contraction rates for sparse Bayesian Kolmogorov-Arnold networks (KANs) over anisotropic Besov spaces, providing a statistical foundation of KANs from a Bayesian point of view. We show that sparse Bayesian KANs equipped with spike-and-slab-type sparsity priors attain the near-minimax posterior contraction. In particular, the contraction rate depends on the intrinsic anisotropic smoothness of the underlying function. Moreover, by placing a hyperprior on a single model-size parameter, the resulting posterior adapts to unknown anisotropic smoothness and still achieves the corresponding near-minimax rate. A distinctive feature of our results, compared with those for standard sparse MLP-based models, is that the KAN depth can be kept fixed: owing to the flexibility of learnable spline edge functions, the required approximation complexity is controlled through the network width, spline-grid range and size, and parameter sparsity. Our analysis develops theoretical tools tailored to sparse spline-edge architectures, including approximation and complexity bounds for Bayesian KANs. We then extend to compositional Besov spaces and show that the contraction rates depend on layerwise smoothness and effective dimension of the underlying compositional structure, thereby effectively avoiding the curse of dimensionality. Together, the developed tools and findings advance the theoretical understanding of Bayesian neural networks and provide rigorous statistical foundations for KANs.


Self-Supervised Laplace Approximation for Bayesian Uncertainty Quantification

arXiv.org Machine Learning

Approximate Bayesian inference typically revolves around computing the posterior parameter distribution. In practice, however, the main object of interest is often a model's predictions rather than its parameters. In this work, we propose to bypass the parameter posterior and focus directly on approximating the posterior predictive distribution. We achieve this by drawing inspiration from self-training within self-supervised and semi-supervised learning. Essentially, we quantify a Bayesian model's predictive uncertainty by refitting on self-predicted data. The idea is strikingly simple: If a model assigns high likelihood to self-predicted data, these predictions are of low uncertainty, and vice versa. This yields a deterministic, sampling-free approximation of the posterior predictive. The modular structure of our Self-Supervised Laplace Approximation (SSLA) further allows us to plug in different prior specifications, enabling classical Bayesian sensitivity (w.r.t. prior choice) analysis. In order to bypass expensive refitting, we further introduce an approximate version of SSLA, called ASSLA. We study (A)SSLA both theoretically and empirically in regression models ranging from Bayesian linear models to Bayesian neural networks. Across a wide array of regression tasks with simulated and real-world datasets, our methods outperform classical Laplace approximations in predictive calibration while remaining computationally efficient.


Posterior Concentration of Bayesian Physics-Informed Neural Networks for Elliptic PDEs

arXiv.org Machine Learning

Unlike a standard PINN--which produces an approximate Deep neural networks (DNNs) or multi-layer perceptronssolution by minimizing a PDE-residual loss and thus yields (MLPs) offer various inherent advantages over traditionalonly a point estimate, failing to quantify uncertainty inapproaches of scientific computing and data analysis, suchduced by noisy or limited data, a Bayesian PINN returns a as finite element methods, wavelets and kernel methods, full posterior distribution over solutions by combining the which are often hampered by the irregular and nonlinearuncertain information from the likelihood (data) and the data structures and the high input dimensions. In contrast, DNNs are capable of approximating a rich class of functions prior. Bayesian neural networks, originating in the seminal works of MacKay (MacKay, 1995) and Neal (Neal, 1995), with aforementioned complexities and can also easily en-have been extensively studied over the past three decades codes additional complex physical structures, such as sym- (Lampinen & Vehtari, 2001; Titterington, 2004; Graves, metry and other invariant structures.


Empirical Bayes 1-bit matrix completion

arXiv.org Machine Learning

Matrix completion is a fundamental problem in machine learning, where the objective is to recover missing entries of a partially observed matrix. A prominent example is the Netflix Prize (Bennett and Lanning, 2007), which involved predicting a matrix of movie ratings by users for recommendation purposes. Beyond recommendation, matrix completion has recently found applications in causal inference for panel data (Athey et al., 2021). A standard assumption in matrix completion is that the underlying matrix is approximately low-rank, reflecting a few latent factors that govern interactions between rows and columns. A substantial body of work has established theoretical guarantees and developed efficient algorithms for matrix completion (Cai, Cand`es and Shen, 2010; Cand`es and Recht, 2008; Keshavan, Montanari, and Oh, 2010; Mazumder, Hastie and Tibshirani, 2010; Recht, 2011), predominantly focusing on cases where the observed entries are continuous-valued. In many applications, however, observations are not continuous-valued but binary.


Extended Wasserstein-GAN Approach to Causal Distribution Learning: Density-Free Estimation and Minimax Optimality

arXiv.org Machine Learning

Distributional causal inference requires estimating not only average treatment effects but also interventional outcome distributions, including quantiles, tail risks, and policy-dependent uncertainty. As a method for distributional causal inference, generative adversarial network (GAN)-based counterfactual methods are flexible tools for this task. However, these methods have several limitations. First, the objectives of certain techniques do not coincide with the statistical risk of the identifiable causal target, and therefore provide limited theoretical guarantees regarding estimable counterfactual distributions or optimality. Second, they tend to rely on unstable density-based methods, such as density ratio estimation. In this paper, we propose GANICE (GAN for Interventional Conditional Estimation) with several advantages: it (i) clarifies the conditional interventional distribution for each treatment--covariate state as the causal estimation target; (ii) estimates the conditional distribution such that its averaged Wasserstein risk is minimized; (iii) establishes minimax optimality. GANICE achieves these advantages through the introduction of the extended Wasserstein distance, the incorporation of a cellwise critic in its dual, and an optimality proof based on Besov space theory. Our experiments demonstrate that GANICE consistently outperforms existing methods.


Uncertainty in Physics and AI: Taxonomy, Quantification, and Validation

arXiv.org Machine Learning

Reliable uncertainty quantification is essential for the use of machine learning in physics, where scientific discoveries depend on validated probabilistic statements. We provide a structured overview of uncertainty quantification in ML for physics, introducing a unified taxonomy of uncertainty and clarifying the interpretation of predictive and inference uncertainties across frequentist and Bayesian frameworks. We discuss principled validation tools, including coverage, calibration, bias tests, and proper scoring rules, and illustrate them with simple regression and classification examples.


A Recursive Decomposition Framework for Causal Structure Learning in the Presence of Latent Variables

arXiv.org Machine Learning

Constraint-based causal discovery is widely used for learning causal structures, but heavy reliance on conditional independence (CI) testing makes it computationally expensive in high-dimensional settings. To mitigate this limitation, many divide-and-conquer frameworks have been proposed, but most assume causal sufficiency, i.e., no latent variables. In this paper, we show that divide-and-conquer strategies can be theoretically generalized beyond causal sufficiency to settings with latent variables. Specifically, we propose a recursive decomposition framework, termed DiCoLa, that enables divide-and-conquer causal discovery in the presence of latent variables. It recursively decomposes the global learning task into smaller subproblems and integrates their solutions through a principled reconstruction step to recover the global structure. We theoretically establish the soundness and completeness of the proposed framework. Extensive experiments on synthetic data demonstrate that our approach significantly improves computational efficiency across a range of causal discovery algorithms, while experiments on a real-world dataset further illustrate its practical effectiveness.