AITopics | statistical manifold

Collaborating Authors

statistical manifold

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Continuous Diffusion Model for Language Modeling

Neural Information Processing SystemsJun-19-2026, 14:22:45 GMT

Diffusion models have emerged as a promising alternative to autoregressive models in modeling discrete categorical data. However, diffusion models that directly work on discrete data space fail to fully exploit the power of iterative refinement, as the signals are lost during transitions between discrete states. Existing continuous diffusion models for discrete data underperform compared to discrete methods, and the lack of a clear connection between the two approaches hinders the development of effective diffusion models for discrete data. In this work, we propose a continuous diffusion model for language modeling that incorporates the geometry of the underlying categorical distribution. We establish a connection between the discrete diffusion and continuous flow on the statistical manifold, and building on this analogy, introduce a simple diffusion process that generalizes existing discrete diffusion models. We further propose a simulation-free training framework based on radial symmetry, along with a simple technique to address the high dimensionality of the manifold. Comprehensive experiments on language modeling benchmarks and other modalities show that our method outperforms existing discrete diffusion models and approaches the performance of autoregressive models. The code is available at https://github.com/harryjo97/RDLM.

diffusion model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

North America > United States (1.00)
Europe (0.92)

Genre: Research Report > Experimental Study (1.00)

Industry: Government > Regional Government > North America Government > United States Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Fisher Width: A Geometric Measure of Complexity on Statistical Manifolds

Ky, Vu Khac

arXiv.org Machine LearningJun-18-2026

Gaussian width is a central geometric complexity measure in high-dimensional probability, compressed sensing, convex optimization, and learning theory. It quantifies the average extent of a set along random directions, thereby capturing the effective dimension of constraint sets, hypothesis classes, and descent cones. However, this notion is intrinsically Euclidean. Statistical models instead carry a natural Riemannian geometry induced by the Fisher information metric, where directions are scaled according to statistical distinguishability rather than ambient Euclidean length. We introduce Fisher width, a Fisher-geometric analogue of Gaussian width for statistical manifolds. At a parameter point $θ$, Fisher width replaces the Euclidean identity by the local metric tensor $G(θ)^{1/2}$, measuring the Gaussian width of the Fisher-rescaled set. This makes the resulting quantity sensitive to local statistical curvature and invariant under smooth reparameterizations. We develop the basic theory of Fisher width, showing that it retains key structural features of Gaussian width, including concentration, metric perturbation stability, and spectral comparison bounds with the Euclidean baseline, while also capturing anisotropic geometric effects invisible to Euclidean measures. As an application, we prove a generalization bound for Fisher-Lipschitz hypothesis classes and propose computable estimators, which we evaluate empirically on MNIST across three model classes. Fisher width is to statistical manifolds what Gaussian width is to Euclidean convex bodies. This work lays the foundation for studying complexity and learning on curved statistical manifolds.

artificial intelligence, geometry, machine learning, (16 more...)

arXiv.org Machine Learning

2606.18306

Genre: Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Information-geometric adaptive sampling for graph diffusion

Lu, Yuhui, Liu, Wenjing, Zhan, Kun

arXiv.org Machine LearningMay-4-2026

Standard diffusion models for graph generation typically rely on uniform time-stepping, an approach that overlooks the non-homogeneous dynamics of distributional evolution on complex manifolds. In this paper, we present an information-geometric framework that reinterprets the diffusion sampling trajectory as a parametric curve on a Riemannian manifold. Our key observation is that the Fisher-Rao metric provides a principled measure of the intrinsic distance. By analyzing this metric, we derive the Drift Variation Score (DVS), a geometry-aware indicator that quantifies the instantaneous rate of distributional change. Unlike prior heuristic-based adaptive samplers, our DVS solver enforces a constant informational speed on the statistical manifold, automatically maintaining a uniform rate of distributional change along the sampling trajectory. This equal arc-length strategy ensures that each discretization step contributes equally to the information speed. Theoretical analysis verifies that DVS characterizes the local stiffness of the sampling dynamics in the Fisher-Rao sense. Experimental results on molecule and social network generation show that DVS significantly improves structural fidelity and sampling efficiency. Code is at https://github.com/kunzhan/DVS

artificial intelligence, information-geometric adaptive sampling, machine learning, (15 more...)

arXiv.org Machine Learning

2605.0025

Country: Asia (0.46)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry: Information Technology (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

01ecd39ca49ddecc5729ca996304781b-Paper-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 05:29:08 GMT

artificial intelligence, machine learning, manifold, (16 more...)

Neural Information Processing Systems

Country: North America > United States > New York (0.28)

Genre: Research Report (0.46)

Industry: Leisure & Entertainment > Sports (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)

Add feedback

Gating Enables Curvature: A Geometric Expressivity Gap in Attention

Bathula, Satwik, Joshi, Anand A.

arXiv.org Machine LearningApr-17-2026

Multiplicative gating is widely used in neural architectures and has recently been applied to attention layers to improve performance and training stability in large language models. Despite the success of gated attention, the mathematical implications of gated attention mechanisms remain poorly understood. We study attention through the geometry of its representations by modeling outputs as mean parameters of Gaussian distributions and analyzing the induced Fisher--Rao geometry. We show that ungated attention operator is restricted to intrinsically flat statistical manifolds due to its affine structure, while multiplicative gating enables non-flat geometries, including positively curved manifolds that are unattainable in the ungated setting. These results establish a geometric expressivity gap between ungated and gated attention. Empirically, we show that gated models exhibit higher representation curvature and improved performance on tasks requiring nonlinear decision boundaries whereas they provide no consistent advantage on tasks with linear decision boundaries. Furthermore, we identify a structured regime in which curvature accumulates under composition, yielding a systematic depth amplification effect.

curvature, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

2604.14702

Country:

North America > United States > California (0.14)
Asia > India > West Bengal > Kolkata (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Robust Sequential Tracking via Bounded Information Geometry and Non-Parametric Field Actions

Rodriguez, Carlos C.

arXiv.org Machine LearningMar-17-2026

Standard sequential inference architectures are compromised by a normalizability crisis when confronted with extreme, structured outliers. By operating on unbounded parameter spaces, state-of-the-art estimators lack the intrinsic geometry required to appropriately sever anomalies, resulting in unbounded covariance inflation and mean divergence. This paper resolves this structural failure by analyzing the abstraction sequence of inference at the meta-prior level (S_2). We demonstrate that extremizing the action over an infinite-dimensional space requires a non-parametric field anchored by a pre-prior, as a uniform volume element mathematically does not exist. By utilizing strictly invariant Delta (or ν) Information Separations on the statistical manifold, we physically truncate the infinite tails of the spatial distribution. When evaluated as a Radon-Nikodym derivative against the base measure, the active parameter space compresses into a strictly finite, normalizable probability droplet. Empirical benchmarks across three domains--LiDAR maneuvering target tracking, high-frequency cryptocurrency order flow, and quantum state tomography--demonstrate that this bounded information geometry analytically truncates outliers, ensuring robust estimation without relying on infinite-tailed distributional assumptions.

artificial intelligence, machine learning, manifold, (14 more...)

arXiv.org Machine Learning

2603.13613

Country: North America > United States > New York (0.04)

Genre: Research Report (0.40)

Industry: Banking & Finance > Trading (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.72)

Add feedback

Categorical Flow Matching on Statistical Manifolds

Neural Information Processing SystemsFeb-15-2026, 10:40:18 GMT

We introduce Statistical Flow Matching (SFM), a novel and mathematically rigorous flow-matching framework on the manifold of parameterized probability measures inspired by the results from information geometry. We demonstrate the effectiveness of our method on the discrete generation problem by instantiat-ing SFM on the manifold of categorical distributions whose geometric properties remain unexplored in previous discrete generative models. Utilizing the Fisher information metric, we equip the manifold with a Riemannian structure whose intrinsic geometries are effectively leveraged by following the shortest paths of geodesics. We develop an efficient training and sampling algorithm that overcomes numerical stability issues with a diffeomorphism between manifolds. Our distinctive geometric perspective of statistical manifolds allows us to apply optimal transport during training and interpret SFM as following the steepest direction of the natural gradient. Unlike previous models that rely on variational bounds for likelihood estimation, SFM enjoys the exact likelihood calculation for arbitrary probability measures. We manifest that SFM can learn more complex patterns on the statistical manifold where existing models often fail due to strong prior assumptions. Comprehensive experiments on real-world generative tasks ranging from image, text to biological domains further demonstrate that SFM achieves higher sampling quality and likelihood than other discrete diffusion or flow-based models. Our code is available at https://github.com/ccr-cheng/

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois > Champaign County > Urbana (0.04)
North America > United States > New York (0.04)
North America > Canada > Quebec (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

Spectral Concentration at the Edge of Stability: Information Geometry of Kernel Associative Memory

Tamamori, Akira

arXiv.org Machine LearningDec-23-2025

Recent advances using Kernel Logistic Regression (KLR) have demonstrated that learning can sculpt these landscapes to achieve capacities far exceeding classical limits [1-3]. Our previous phenomenological analysis identified a Ridge of Optimization where stability is maximized via a mechanism we termed Spectral Concentration, defined as a state where the weight spectrum exhibits a sharp hierarchy [4]. However, a deeper question remains: Why does the learning dynamics self-organize into this specific spectral state? Why does the system operate at the brink of instability? T o answer these questions, we must look beyond the Euclidean geometry of the weight parameters and consider the intrinsic geometry of the probability distributions they represent. This is the domain of Information Geometry [5]. In this work, we reinterpret the KLR Hopfield network as a statistical manifold equipped with a Fisher-Rao metric.

curvature, spectral concentration, stability, (10 more...)

arXiv.org Machine Learning

2511.23083

Country: Asia > Japan (0.04)

Genre: Research Report > New Finding (0.89)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.96)

Add feedback

Learning under Distributional Drift: Reproducibility as an Intrinsic Statistical Resource

Zaichyk, Sofiya

arXiv.org Machine LearningDec-16-2025

Statistical learning under distributional drift remains insufficiently characterized: when each observation alters the data-generating law, classical generalization bounds can collapse. We introduce a new statistical primitive, the reproducibility budget $C_T$, which quantifies a system's finite capacity for statistical reproducibility - the extent to which its sampling process can remain governed by a consistent underlying distribution in the presence of both exogenous change and endogenous feedback. Formally, $C_T$ is defined as the cumulative Fisher-Rao path length of the coupled learner-environment evolution, measuring the total distributional motion accumulated during learning. From this construct we derive a drift-feedback generalization bound of order $O(T^{-1/2} + C_T/T)$, and we prove a matching minimax lower bound showing that this rate is minimax-optimal. Consequently, the results establish a reproducibility speed limit: no algorithm can achieve smaller worst-case generalization error than that imposed by the average Fisher-Rao drift rate $C_T/T$ of the data-generating process. The framework situates exogenous drift, adaptive data analysis, and performative prediction within a common geometric structure, with $C_T$ emerging as the intrinsic quantity measuring distributional motion across these settings.

learner, learning, trajectory, (14 more...)

arXiv.org Machine Learning

2512.13506

Country:

North America > United States > New Jersey > Mercer County > Princeton (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
North America > United States > Rhode Island > Providence County > Providence (0.04)
(7 more...)

Genre: Research Report (1.00)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Self-Improving AI Agents through Self-Play

Chojecki, Przemyslaw

arXiv.org Artificial IntelligenceDec-3-2025

We extend the moduli-theoretic framework of psychometric batteries to the domain of dynamical systems. While previous work established the AAI capability score as a static functional on the space of agent representations, this paper formalizes the agent as a flow $ν_r$ parameterized by computational resource $r$, governed by a recursive Generator-Verifier-Updater (GVU) operator. We prove that this operator generates a vector field on the parameter manifold $Θ$, and we identify the coefficient of self-improvement $κ$ as the Lie derivative of the capability functional along this flow. The central contribution of this work is the derivation of the Variance Inequality, a spectral condition that is sufficient (under mild regularity) for the stability of self-improvement. We show that a sufficient condition for $κ> 0$ is that, up to curvature and step-size effects, the combined noise of generation and verification must be small enough. We then apply this formalism to unify the recent literature on Language Self-Play (LSP), Self-Correction, and Synthetic Data bootstrapping. We demonstrate that architectures such as STaR, SPIN, Reflexion, GANs and AlphaZero are specific topological realizations of the GVU operator that satisfy the Variance Inequality through filtration, adversarial discrimination, or grounding in formal systems.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2512.02731

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Cognitive Science (0.93)

Add feedback