Goto

Collaborating Authors

 Bayesian Inference


JADAI: Jointly Amortizing Adaptive Design and Bayesian Inference

arXiv.org Machine Learning

We consider problems of parameter estimation where design variables can be actively optimized to maximize information gain. To this end, we introduce JADAI, a framework that jointly amortizes Bayesian adaptive design and inference by training a policy, a history network, and an inference network end-to-end. The networks minimize a generic loss that aggregates incremental reductions in posterior error along experimental sequences. Inference networks are instantiated with diffusion-based posterior estimators that can approximate high-dimensional and multimodal posteriors at every experimental step. Across standard adaptive design benchmarks, JADAI achieves superior or competitive performance.


Likelihood-Preserving Embeddings for Statistical Inference

arXiv.org Machine Learning

Modern machine learning embeddings provide powerful compression of high-dimensional data, yet they typically destroy the geometric structure required for classical likelihood-based statistical inference. This paper develops a rigorous theory of likelihood-preserving embeddings: learned representations that can replace raw data in likelihood-based workflows -- hypothesis testing, confidence interval construction, model selection -- without altering inferential conclusions. We introduce the Likelihood-Ratio Distortion metric $Δ_n$, which measures the maximum error in log-likelihood ratios induced by an embedding. Our main theoretical contribution is the Hinge Theorem, which establishes that controlling $Δ_n$ is necessary and sufficient for preserving inference. Specifically, if the distortion satisfies $Δ_n = o_p(1)$, then (i) all likelihood-ratio based tests and Bayes factors are asymptotically preserved, and (ii) surrogate maximum likelihood estimators are asymptotically equivalent to full-data MLEs. We prove an impossibility result showing that universal likelihood preservation requires essentially invertible embeddings, motivating the need for model-class-specific guarantees. We then provide a constructive framework using neural networks as approximate sufficient statistics, deriving explicit bounds connecting training loss to inferential guarantees. Experiments on Gaussian and Cauchy distributions validate the sharp phase transition predicted by exponential family theory, and applications to distributed clinical inference demonstrate practical utility.


A review of NMF, PLSA, LBA, EMA, and LCA with a focus on the identifiability issue

arXiv.org Machine Learning

Across fields such as machine learning, social science, geography, considerable attention has been given to models that factorize a nonnegative matrix into the product of two or three matrices, subject to nonnegative or row-sum-to-1 constraints. Although these models are to a large extend similar or even equivalent, they are presented under different names, and their similarity is not well known. This paper highlights similarities among five popular models, latent budget analysis (LBA), latent class analysis (LCA), end-member analysis (EMA), probabilistic latent semantic analysis (PLSA), and nonnegative matrix factorization (NMF). We focus on an essential issue-identifiability-of these models and prove that the solution of LBA, EMA, LCA, PLSA is unique if and only if the solution of NMF is unique. We also provide a brief review for algorithms of these models. We illustrate the models with a time budget dataset from social science, and end the paper with a discussion of closely related models such as archetypal analysis.


Thermodynamic Characterizations of Singular Bayesian Models: Specific Heat, Susceptibility, and Entropy Flow in Posterior Geometry

arXiv.org Machine Learning

Singular learning theory (SLT) \citep{watanabe2009algebraic,watanabe2018mathematical} provides a rigorous asymptotic framework for Bayesian models with non-identifiable parameterizations, yet the statistical meaning of its second-order invariant, the \emph{singular fluctuation}, has remained unclear. In this work, we show that singular fluctuation admits a precise and natural interpretation as a \emph{specific heat}: the second derivative of the Bayesian free energy with respect to temperature. Equivalently, it measures the posterior variance of the log-likelihood observable under the tempered Gibbs posterior. We further introduce a collection of related thermodynamic quantities, including entropy flow, prior susceptibility, and cross-susceptibility, that together provide a detailed geometric diagnosis of singular posterior structure. Through extensive numerical experiments spanning discrete symmetries, boundary singularities, continuous gauge freedoms, and piecewise (ReLU) models, we demonstrate that these thermodynamic signatures cleanly distinguish singularity types, exhibit stable finite-sample behavior, and reveal phase-transition--like phenomena as temperature varies. We also show empirically that the widely used WAIC estimator \citep{watanabe2010asymptotic, watanabe2013widely} is exactly twice the thermodynamic specific heat at unit temperature, clarifying its robustness in singular models.Our results establish a concrete bridge between singular learning theory and statistical mechanics, providing both theoretical insight and practical diagnostics for modern Bayesian models.




Gaussian Process Volatility Model

Neural Information Processing Systems

The prediction of time-changing variances is an important task in the modeling of financial data. Standard econometric models are often limited as they assume rigid functional relationships for the evolution of the variance. Moreover, functional parameters are usually learned by maximum likelihood, which can lead to overfitting. To address these problems we introduce GP-Vol, a novel non-parametric model for time-changing variances based on Gaussian Processes. This new model can capture highly flexible functional relationships for the variances. Furthermore, we introduce a new online algorithm for fast inference in GP-Vol. This method is much faster than current offline inference procedures and it avoids overfitting problems by following a fully Bayesian approach. Experiments with financial data show that GP-Vol performs significantly better than current standard alternatives.


Learning Descriptive Image Captioning via Semipermeable Maximum Likelihood Estimation

Neural Information Processing Systems

Image captioning aims to describe visual content in natural language. As'a picture is worth a thousand words', there could be various correct descriptions for an image. However, with maximum likelihood estimation as the training objective, the captioning model is penalized whenever its prediction mismatches with the label. For instance, when the model predicts a word expressing richer semantics than the label, it will be penalized and optimized to prefer more concise expressions, referred to as . In contrast, predictions that are more concise than labels lead to . Such conflicting optimization directions could eventually result in the model generating general descriptions. In this work, we introduce Semipermeable MaxImum Likelihood Estimation (SMILE), which allows richness optimization while blocking conciseness optimization, thus encouraging the model to generate longer captions with more details. Extensive experiments on two mainstream image captioning datasets MSCOCO and Flickr30K demonstrate that SMILE significantly enhances the descriptiveness of generated captions. We further provide in-depth investigations to facilitate a better understanding of how SMILE works.


How to Turn Your Knowledge Graph Embeddings into Generative Models

Neural Information Processing Systems

Some of the most successful knowledge graph embedding (KGE) models for link prediction - CP, RESCAL, TuckER, ComplEx - can be interpreted as energy-based models. Under this perspective they are not amenable for exact maximum-likelihood estimation (MLE), sampling and struggle to integrate logical constraints. This work re-interprets the score functions of these KGEs as circuits - constrained computational graphs allowing efficient marginalisation. Then, we design two recipes to obtain efficient generative circuit models by either restricting their activations to be non-negative or squaring their outputs. Our interpretation comes with little or no loss of performance for link prediction, while the circuits framework unlocks exact learning by MLE, efficient sampling of new triples, and guarantee that logical constraints are satisfied by design.


Label-Retrieval-Augmented Diffusion Models for Learning from Noisy Labels

Neural Information Processing Systems

Learning from noisy labels is an important and long-standing problem in machine learning for real applications. One of the main research lines focuses on learning a label corrector to purify potential noisy labels. However, these methods typically rely on strict assumptions and are limited to certain types of label noise. In this paper, we reformulate the label-noise problem from a generative-model perspective,, labels are generated by gradually refining an initial random guess. This new perspective immediately enables existing powerful diffusion models to seamlessly learn the stochastic generative process. Once the generative uncertainty is modeled, we can perform classification inference using maximum likelihood estimation of labels.