Goto

Collaborating Authors

 Bayesian Inference


A simple mean field model of feature learning

arXiv.org Artificial Intelligence

Feature learning (FL), where neural networks adapt their internal representations during training, remains poorly understood. Using methods from statistical physics, we derive a tractable, self-consistent mean-field (MF) theory for the Bayesian posterior of two-layer non-linear networks trained with stochastic gradient Langevin dynamics (SGLD). At infinite width, this theory reduces to kernel ridge regression, but at finite width it predicts a symmetry breaking phase transition where networks abruptly align with target functions. While the basic MF theory provides theoretical insight into the emergence of FL in the finite-width regime, semi-quantitatively predicting the onset of FL with noise or sample size, it substantially underestimates the improvements in generalisation after the transition. We trace this discrepancy to a key mechanism absent from the plain MF description: \textit{self-reinforcing input feature selection}. Incorporating this mechanism into the MF theory allows us to quantitatively match the learning curves of SGLD-trained networks and provides mechanistic insight into FL.


Towards Error Centric Intelligence I, Beyond Observational Learning

arXiv.org Artificial Intelligence

We argue that progress toward AGI is theory limited rather than data or scale limited. Building on the critical rationalism of Popper and Deutsch, we challenge the Platonic Representation Hypothesis. Observationally equivalent worlds can diverge under interventions, so observational adequacy alone cannot guarantee interventional competence. We begin by laying foundations, definitions of knowledge, learning, intelligence, counterfactual competence and AGI, and then analyze the limits of observational learning that motivate an error centric shift. We recast the problem as three questions about how explicit and implicit errors evolve under an agent's actions, which errors are unreachable within a fixed hypothesis space, and how conjecture and criticism expand that space. From these questions we propose Causal Mechanics, a mechanisms first program in which hypothesis space change is a first class operation and probabilistic structure is used when useful rather than presumed. We advance structural principles that make error discovery and correction tractable, including a differential Locality and Autonomy Principle for modular interventions, a gauge invariant form of Independent Causal Mechanisms for separability, and the Compositional Autonomy Principle for analogy preservation, together with actionable diagnostics. The aim is a scaffold for systems that can convert unreachable errors into reachable ones and correct them.


Geometric Convergence Analysis of Variational Inference via Bregman Divergences

arXiv.org Machine Learning

Variational Inference (VI) provides a scalable framework for Bayesian inference by optimizing the Evidence Lower Bound (ELBO), but convergence analysis remains challenging due to the objective's non-convexity and non-smoothness in Euclidean space. We establish a novel theoretical framework for analyzing VI convergence by exploiting the exponential family structure of distributions. We express negative ELBO as a Bregman divergence with respect to the log-partition function, enabling a geometric analysis of the optimization landscape. We show that this Bregman representation admits a weak monotonicity property that, while weaker than convexity, provides sufficient structure for rigorous convergence analysis. By deriving bounds on the objective function along rays in parameter space, we establish properties governed by the spectral characteristics of the Fisher information matrix. Under this geometric framework, we prove non-asymptotic convergence rates for gradient descent algorithms with both constant and diminishing step sizes.


Information Theory in Open-world Machine Learning Foundations, Frameworks, and Future Direction

arXiv.org Machine Learning

Open world Machine Learning (OWML) aims to develop intelligent systems capable of recognizing known categories, rejecting unknown samples, and continually learning from novel information. Despite significant progress in open set recognition, novelty detection, and continual learning, the field still lacks a unified theoretical foundation that can quantify uncertainty, characterize information transfer, and explain learning adaptability in dynamic, nonstationary environments. This paper presents a comprehensive review of information theoretic approaches in open world machine learning, emphasizing how core concepts such as entropy, mutual information, and Kullback Leibler divergence provide a mathematical language for describing knowledge acquisition, uncertainty suppression, and risk control under open world conditions. We synthesize recent studies into three major research axes: information theoretic open set recognition enabling safe rejection of unknowns, information driven novelty discovery guiding new concept formation, and information retentive continual learning ensuring stable long term adaptation. Furthermore, we discuss theoretical connections between information theory and provable learning frameworks, including PAC Bayes bounds, open-space risk theory, and causal information flow, to establish a pathway toward provable and trustworthy open world intelligence. Finally, the review identifies key open problems and future research directions, such as the quantification of information risk, development of dynamic mutual information bounds, multimodal information fusion, and integration of information theory with causal reasoning and world model learning.


Briding Diffusion Posterior Sampling and Monte Carlo methods: a survey

arXiv.org Artificial Intelligence

Diffusion models enable the synthesis of highly accurate samples from complex distributions and have become foundational in generative modeling. Recently, they have demonstrated significant potential for solving Bayesian inverse problems by serving as priors. This review offers a comprehensive overview of current methods that leverage \emph{pre-trained} diffusion models alongside Monte Carlo methods to address Bayesian inverse problems without requiring additional training. We show that these methods primarily employ a \emph{twisting} mechanism for the intermediate distributions within the diffusion process, guiding the simulations toward the posterior distribution. We describe how various Monte Carlo methods are then used to aid in sampling from these twisted distributions.


ExoPredicator: Learning Abstract Models of Dynamic Worlds for Robot Planning

arXiv.org Artificial Intelligence

Long-horizon embodied planning is challenging because the world does not only change through an agent's actions: exogenous processes (e.g., water heating, dominoes cascading) unfold concurrently with the agent's actions. We propose a framework for abstract world models that jointly learns (i) symbolic state representations and (ii) causal processes for both endogenous actions and exogenous mechanisms. Each causal process models the time course of a stochastic cause-effect relation. We learn these world models from limited data via variational Bayesian inference combined with LLM proposals. Across five simulated tabletop robotics environments, the learned models enable fast planning that generalizes to held-out tasks with more objects and more complex goals, outperforming a range of baselines.


EM Approaches to Nonparametric Estimation for Mixture of Linear Regressions

arXiv.org Machine Learning

In a mixture of linear regression model, the regression coefficients are treated as random vectors that may follow either a continuous or discrete distribution. We propose two Expectation-Maximization (EM) algorithms to estimate this prior distribution. The first algorithm solves a kernelized version of the nonparametric maximum likelihood estimation (NPMLE). This method not only recovers continuous prior distributions but also accurately estimates the number of clusters when the prior is discrete. The second algorithm, designed to approximate the NPMLE, targets prior distributions with a density. It also performs well for discrete priors when combined with a post-processing step. We study the convergence properties of both algorithms and demonstrate their effectiveness through simulations and applications to real datasets.


Simplicial Gaussian Models: Representation and Inference

arXiv.org Machine Learning

Thus, they are widely used in several applications, including computer vision, computational biology, and spatial statistics [2, 3, 4]. In a PGM, random variables are associated with the vertices of a graph, while edges encode statistical dependencies. The meaning of the edges depend on the graph type: Bayesian Networks capture directional dependencies through directed acyclic graphs (DAGs) [5], whereas Markov Random Fields (MRFs) model symmetric conditional dependencies with undirected graphs, thanks to the Markov property [6]. A well-studied family is Gaussian Markov Random Fields (GMRFs), i.e., MRFs that model Gaussian random variables [7]. Indeed, conditional dependencies in the Gaussian distribution are encoded by the precision matrix, thus allowing to learn GMRF from data with efficient algorithms [8]. However, PGMs are inherently limited to graphs. First, PGMs typically associate random variables with individual nodes (sets of cardinality one), while in many settings random quantities naturally relates with larger sets. Examples include data traffic in communication networks or water flows in distribution networks, where measurements are collected on the links of the networks [9, 10, 11]. Second, PGMs are restricted to modeling pairwise dependencies via edges.


Robust Statistics vs. Machine Learning vs. Bayesian Inference: Insights into Handling Faulty GNSS Measurements in Field Robotics

arXiv.org Artificial Intelligence

This paper presents research findings on handling faulty measurements (i.e., outliers) of global navigation satellite systems (GNSS) for vehicle localization under adverse signal conditions in field applications, where raw GNSS data are frequently corrupted due to environmental interference such as multipath, signal blockage, or non-line-of-sight conditions. In this context, we investigate three strategies applied specifically to GNSS pseudorange observations: robust statistics for error mitigation, machine learning for faulty measurement prediction, and Bayesian inference for noise distribution approximation. Since previous studies have provided limited insight into the theoretical foundations and practical evaluations of these three methodologies within a unified problem statement (i.e., state estimation using ranging sensors), we conduct extensive experiments using real-world sensor data collected in diverse urban environments. Our goal is to examine both established techniques and newly proposed methods, thereby advancing the understanding of how to handle faulty range measurements, such as GNSS, for robust, long-term vehicle localization. In addition to presenting successful results, this work highlights critical observations and open questions to motivate future research in robust state estimation.


PriorGuide: Test-Time Prior Adaptation for Simulation-Based Inference

arXiv.org Machine Learning

Amortized simulator-based inference offers a powerful framework for tackling Bayesian inference in computational fields such as engineering or neuroscience, increasingly leveraging modern generative methods like diffusion models to map observed data to model parameters or future predictions. These approaches yield posterior or posterior-predictive samples for new datasets without requiring further simulator calls after training on simulated parameter-data pairs. However, their applicability is often limited by the prior distribution(s) used to generate model parameters during this training phase. To overcome this constraint, we introduce PriorGuide, a technique specifically designed for diffusion-based amortized inference methods. PriorGuide leverages a novel guidance approximation that enables flexible adaptation of the trained diffusion model to new priors at test time, crucially without costly retraining. This allows users to readily incorporate updated information or expert knowledge post-training, enhancing the versatility of pre-trained inference models.