AITopics | singular model

Collaborating Authors

singular model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

PAC-Bayes Bounds for Gibbs Posteriors via Singular Learning Theory

Wang, Chenyang, Yang, Yun

arXiv.org Machine LearningApr-21-2026

We derive explicit non-asymptotic PAC-Bayes generalization bounds for Gibbs posteriors, that is, data-dependent distributions over model parameters obtained by exponentially tilting a prior with the empirical risk. Unlike classical worst-case complexity bounds based on uniform laws of large numbers, which require explicit control of the model space in terms of metric entropy (integrals), our analysis yields posterior-averaged risk bounds that can be applied to overparameterized models and adapt to the data structure and the intrinsic model complexity. The bound involves a marginal-type integral over the parameter space, which we analyze using tools from singular learning theory to obtain explicit and practically meaningful characterizations of the posterior risk. Applications to low-rank matrix completion and ReLU neural network regression and classification show that the resulting bounds are analytically tractable and substantially tighter than classical complexity-based bounds. Our results highlight the potential of PAC-Bayes analysis for precise finite-sample generalization guarantees in modern overparameterized and singular models.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Machine Learning

2604.17219

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Maryland > Prince George's County > College Park (0.04)
North America > United States > Illinois > Champaign County > Urbana (0.04)

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.85)

Add feedback

Observable Geometry of Singular Statistical Models

Plummer, Sean

arXiv.org Machine LearningApr-3-2026

Singular statistical models arise whenever different parameter values induce the same distribution, leading to non-identifiability and a breakdown of classical asymptotic theory. While existing approaches analyze these phenomena in parameter space, the resulting descriptions depend heavily on parameterization and obscure the intrinsic statistical structure of the model. In this paper, we introduce an invariant framework based on \emph{observable charts}: collections of functionals of the data distribution that distinguish probability measures. These charts define local coordinate systems directly on the model space, independent of parameterization. We formalize \emph{observable completeness} as the ability of such charts to detect identifiable directions, and introduce \emph{observable order} to quantify higher-order distinguishability along analytic perturbations. Our main result establishes that, under mild regularity conditions, observable order provides a lower bound on the rate at which Kullback-Leibler divergence vanishes along analytic paths. This connects intrinsic geometric structure in model space to statistical distinguishability and recovers classical behavior in regular models while extending naturally to singular settings. We illustrate the framework in reduced-rank regression and Gaussian mixture models, where observable coordinates reveal both identifiable structure and singular degeneracies. These results suggest that observable charts provide a unified and parameterization-invariant language for studying singular models and offer a pathway toward intrinsic formulations of invariants such as learning coefficients.

artificial intelligence, machine learning, observable chart, (17 more...)

arXiv.org Machine Learning

2604.01267

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Thermodynamic Characterizations of Singular Bayesian Models: Specific Heat, Susceptibility, and Entropy Flow in Posterior Geometry

Plummer, Sean

arXiv.org Machine LearningDec-29-2025

Singular learning theory (SLT) \citep{watanabe2009algebraic,watanabe2018mathematical} provides a rigorous asymptotic framework for Bayesian models with non-identifiable parameterizations, yet the statistical meaning of its second-order invariant, the \emph{singular fluctuation}, has remained unclear. In this work, we show that singular fluctuation admits a precise and natural interpretation as a \emph{specific heat}: the second derivative of the Bayesian free energy with respect to temperature. Equivalently, it measures the posterior variance of the log-likelihood observable under the tempered Gibbs posterior. We further introduce a collection of related thermodynamic quantities, including entropy flow, prior susceptibility, and cross-susceptibility, that together provide a detailed geometric diagnosis of singular posterior structure. Through extensive numerical experiments spanning discrete symmetries, boundary singularities, continuous gauge freedoms, and piecewise (ReLU) models, we demonstrate that these thermodynamic signatures cleanly distinguish singularity types, exhibit stable finite-sample behavior, and reveal phase-transition--like phenomena as temperature varies. We also show empirically that the widely used WAIC estimator \citep{watanabe2010asymptotic, watanabe2013widely} is exactly twice the thermodynamic specific heat at unit temperature, clarifying its robustness in singular models.Our results establish a concrete bridge between singular learning theory and statistical mechanics, providing both theoretical insight and practical diagnostics for modern Bayesian models.

artificial intelligence, fluctuation, machine learning, (19 more...)

arXiv.org Machine Learning

2512.21411

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Using physics-inspired Singular Learning Theory to understand grokking & other phase transitions in modern neural networks

Lakkapragada, Anish

arXiv.org Machine LearningDec-5-2025

Classical statistical inference and learning theory often fail to explain the success of modern neural networks. A key reason is that these models are non-identifiable (singular), violating core assumptions behind PAC bounds and asymptotic normality. Singular learning theory (SLT), a physics-inspired framework grounded in algebraic geometry, has gained popularity for its ability to close this theory-practice gap. In this paper, we empirically study SLT in toy settings relevant to interpretability and phase transitions. First, we understand the SLT free energy $\mathcal{F}_n$ by testing an Arrhenius-style rate hypothesis using both a grokking modulo-arithmetic model and Anthropic's Toy Models of Superposition. Second, we understand the local learning coefficient $λ_α$ by measuring how it scales with problem difficulty across several controlled network families (polynomial regressors, low-rank linear networks, and low-rank autoencoders). Our experiments recover known scaling laws while others yield meaningful deviations from theoretical expectations. Overall, our paper illustrates the many merits of SLT for understanding neural network phase transitions, and poses open research questions for the field.

neural network, phase transition, watanabe, (12 more...)

arXiv.org Machine Learning

2512.00686

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.36)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Estimation of the Learning Coefficient Using Empirical Loss

Takio, Tatsuyoshi, Suzuki, Joe

arXiv.org Machine LearningFeb-14-2025

The learning coefficient plays a crucial role in analyzing the performance of information criteria, such as the Widely Applicable Information Criterion (WAIC) and the Widely Applicable Bayesian Information Criterion (WBIC), which Sumio Watanabe developed to assess model generalization ability. In regular statistical models, the learning coefficient is given by d/2, where d is the dimension of the parameter space. More generally, it is defined as the absolute value of the pole order of a zeta function derived from the Kullback-Leibler divergence and the prior distribution. However, except for specific cases such as reduced-rank regression, the learning coefficient cannot be derived in a closed form. Watanabe proposed a numerical method to estimate the learning coefficient, which Imai further refined to enhance its convergence properties. These methods utilize the asymptotic behavior of WBIC and have been shown to be statistically consistent as the sample size grows. In this paper, we propose a novel numerical estimation method that fundamentally differs from previous approaches and leverages a new quantity, "Empirical Loss," which was introduced by Watanabe. Through numerical experiments, we demonstrate that our proposed method exhibits both lower bias and lower variance compared to those of Watanabe and Imai. Additionally, we provide a theoretical analysis that elucidates why our method outperforms existing techniques and present empirical evidence that supports our findings.

artificial intelligence, machine learning, variance, (15 more...)

arXiv.org Machine Learning

2502.09998

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.49)

Add feedback

Online Aggregation of Trajectory Predictors

Tong, Alex, Sharma, Apoorva, Veer, Sushant, Pavone, Marco, Yang, Heng

arXiv.org Artificial IntelligenceFeb-10-2025

Trajectory prediction, the task of forecasting future agent behavior from past data, is central to safe and efficient autonomous driving. A diverse set of methods (e.g., rule-based or learned with different architectures and datasets) have been proposed, yet it is often the case that the performance of these methods is sensitive to the deployment environment (e.g., how well the design rules model the environment, or how accurately the test data match the training data). Building upon the principled theory of online convex optimization but also going beyond convexity and stationarity, we present a lightweight and model-agnostic method to aggregate different trajectory predictors online. We propose treating each individual trajectory predictor as an "expert" and maintaining a probability vector to mix the outputs of different experts. Then, the key technical approach lies in leveraging online data -the true agent behavior to be revealed at the next timestep- to form a convex-or-nonconvex, stationary-or-dynamic loss function whose gradient steers the probability vector towards choosing the best mixture of experts. We instantiate this method to aggregate trajectory predictors trained on different cities in the NUSCENES dataset and show that it performs just as well, if not better than, any singular model, even when deployed on the out-of-distribution LYFT dataset.

artificial intelligence, machine learning, predictor, (16 more...)

arXiv.org Artificial Intelligence

2502.07178

Country:

North America > United States > Nevada > Clark County > Las Vegas (0.08)
Asia > Singapore > Central Region > Singapore (0.05)
Oceania > Australia > Australian Capital Territory > Canberra (0.04)
Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)

Genre:

Research Report (0.64)
Instructional Material (0.46)

Industry: Transportation > Ground > Road (0.86)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.66)

Add feedback

Statistical inference for quantum singular models

Yano, Hiroshi, Maeda, Yota, Yamamoto, Naoki

arXiv.org Machine LearningNov-25-2024

Deep learning has seen substantial achievements, with numerical and theoretical evidence suggesting that singularities of statistical models are considered a contributing factor to its performance. From this remarkable success of classical statistical models, it is naturally expected that quantum singular models will play a vital role in many quantum statistical tasks. However, while the theory of quantum statistical models in regular cases has been established, theoretical understanding of quantum singular models is still limited. To investigate the statistical properties of quantum singular models, we focus on two prominent tasks in quantum statistical inference: quantum state estimation and model selection. In particular, we base our study on classical singular learning theory and seek to extend it within the framework of Bayesian quantum state estimation. To this end, we define quantum generalization and training loss functions and give their asymptotic expansions through algebraic geometrical methods. The key idea of the proof is the introduction of a quantum analog of the likelihood function using classical shadows. Consequently, we construct an asymptotically unbiased estimator of the quantum generalization loss, the quantum widely applicable information criterion (QWAIC), as a computable model selection metric from given measurement outcomes.

singular, singular model, singularity, (14 more...)

arXiv.org Machine Learning

2411.16396

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Japan > Honshū > Kantō > Kanagawa Prefecture > Yokohama (0.04)
(4 more...)

Genre: Research Report (0.40)

Add feedback

Estimating the Number of Components in Finite Mixture Models via Variational Approximation

Wang, Chenyang, Yang, Yun

arXiv.org Machine LearningApr-25-2024

This work introduces a new method for selecting the number of components in finite mixture models (FMMs) using variational Bayes, inspired by the large-sample properties of the Evidence Lower Bound (ELBO) derived from mean-field (MF) variational approximation. Specifically, we establish matching upper and lower bounds for the ELBO without assuming conjugate priors, suggesting the consistency of model selection for FMMs based on maximizing the ELBO. As a by-product of our proof, we demonstrate that the MF approximation inherits the stable behavior (benefited from model singularity) of the posterior distribution, which tends to eliminate the extra components under model misspecification where the number of mixture components is over-specified. This stable behavior also leads to the $n^{-1/2}$ convergence rate for parameter estimation, up to a logarithmic factor, under this model overspecification. Empirical experiments are conducted to validate our theoretical findings and compare with other state-of-the-art methods for selecting the number of components in FMMs.

approximation, elbo, mixture model, (16 more...)

arXiv.org Machine Learning

2404.16746

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Illinois > Champaign County > Urbana (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Modeling & Simulation (0.93)

Add feedback

Quantifying degeneracy in singular models via the learning coefficient

Lau, Edmund, Murfet, Daniel, Wei, Susan

arXiv.org Artificial IntelligenceAug-23-2023

Deep neural networks (DNN) are singular statistical models which exhibit complex degeneracies. In this work, we illustrate how a quantity known as the \emph{learning coefficient} introduced in singular learning theory quantifies precisely the degree of degeneracy in deep neural networks. Importantly, we will demonstrate that degeneracy in DNN cannot be accounted for by simply counting the number of "flat" directions. We propose a computationally scalable approximation of a localized version of the learning coefficient using stochastic gradient Langevin dynamics. To validate our approach, we demonstrate its accuracy in low-dimensional models with known theoretical values. Importantly, the local learning coefficient can correctly recover the ordering of degeneracy between various parameter regions of interest. An experiment on MNIST shows the local learning coefficient can reveal the inductive bias of stochastic opitmizers for more or less degenerate critical points.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2308.12108

Country:

North America > United States (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Recent Advances in Algebraic Geometry and Bayesian Statistics

Watanabe, Sumio

arXiv.org Artificial IntelligenceNov-18-2022

This article is a review of theoretical advances in the research field of algebraic geometry and Bayesian statistics in the last two decades. Many statistical models and learning machines which contain hierarchical structures or latent variables are called nonidentifiable, because the map from a parameter to a statistical model is not one-to-one. In nonidentifiable models, both the likelihood function and the posterior distribution have singularities in general, hence it was difficult to analyze their statistical properties. However, from the end of the 20th century, new theory and methodology based on algebraic geometry have been established which enables us to investigate such models and machines in the real world. In this article, the following results in recent advances are reported. First, we explain the framework of Bayesian statistics and introduce a new perspective from the birational geometry. Second, two mathematical solutions are derived based on algebraic geometry. An appropriate parameter space can be found by a resolution map, which makes the posterior distribution be normal crossing and the log likelihood ratio function be well-defined. Third, three applications to statistics are introduced. The posterior distribution is represented by the renormalized form, the asymptotic free energy is derived, and the universal formula among the generalization loss, the cross validation, and the information criterion is established. Two mathematical solutions and three applications to statistics based on algebraic geometry reported in this article are now being used in many practical fields in data science and artificial intelligence.

artificial intelligence, machine learning, posterior distribution, (18 more...)

arXiv.org Artificial Intelligence

2211.10049

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback