AITopics

2606.3031

Country: North America > Canada > Ontario (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Machine LearningJun-30-2026

I-BBS: Coordinate-Free Inference of Latent Sub-Manifolds Using Random Distance Matrix Theory

Halperin, Igor

Bogomolny, Bohigas and Schmit (BBS) found that the spectrum of the pairwise distance matrix on N points sampled from a smooth d-dimensional manifold encodes a signature of the underlying geometry. We develop I-BBS (Inference-BBS), a coordinate-free method that identifies a low-dimensional latent sub-manifold embedded in a high-dimensional ambient distance matrix alone, without accessing an ambient high-dimensional vector space. It therefore applies even when that space is only partly observable or undefined. We model the ambient embedding by two classes of generative noise, model-based and model-free. The noise mixes the latent signal with off-manifold components, so the eigenvalues reorganise collectively and the latent geometry cannot be read off eigenvalue by eigenvalue. We recover it instead from two integer-stable signatures that survive the noise: the multiplicity of the top non-Perron multiplet, which fixes $d$, and a parameter-free law for how the multiplet positions shrink as the noise grows. On synthetic spheres $S^1$, $S^2$ and $S^3$ these integer signatures are far more stable under noise than the continuous spectral slope, and a blind test recovers both the manifold and the noise model from a single distance matrix. Applications to neural-network representations and to the dynamic training regime are developed in two companion papers.

artificial intelligence, machine learning, matrix, (19 more...)

2606.29675

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.45)

Martinez-Sermeno, Flor, Jaramillo, Arturo, Van Horebeek, Johan

Adjusted Wasserstein distances for bridging empirical and true distributions with applications to MDS

arXiv.org Machine LearningJun-30-2026

This paper examines how metric adjustments to Multidimensional Scaling (MDS) can enhance its effectiveness as a visual tool for pattern recognition. The distance under consideration, referred to as Max-D-SW, is an adjustment of the Max-Sliced Wasserstein distance. In contrast to the original formulation, which optimizes over single unit directions, Max-D-SW aggregates contributions over orthonormal bases. This modification provides a clear numerical advantage in MDS outcomes, particularly when applied to heavy-tailed distributions. We also establish sample-complexity bounds showing that Max-D-SW remains statistically tractable, with rates comparable to those of its max-sliced counterpart. Moreover, we show that a better sample complexity for a metric does not necessarily translate into better performance when the metric is used as an input for MDS.

artificial intelligence, machine learning, pattern recognition, (16 more...)

2606.29665

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.34)

arXiv.org Machine LearningJun-24-2026

Asymptotic Signal Subspace Recovery in Softmax Attention Models

Truong, Lan V.

Attention mechanisms have demonstrated remarkable empirical success in identifying relevant information from large collections of tokens, yet the theoretical principles underlying this behavior remain poorly understood. We study a stylized softmax-attention model in which a query vector is learned by stochastic gradient ascent from a collection of informative and nuisance tokens. Exploiting the symmetry of the model, we derive a population objective and characterize the limiting ordinary differential equation governing the learning dynamics. Using tools from stochastic approximation and dynamical systems theory, we establish a rigorous connection between the stochastic learning algorithm and its deterministic limit. Our main result shows that, under suitable high-dimensional scaling assumptions and standard step-size conditions, the learned query converges almost surely to the one-dimensional signal subspace spanned by the latent informative direction. Equivalently, the query asymptotically recovers the latent signal up to the intrinsic sign ambiguity. These results provide a rigorous theoretical foundation for understanding attention mechanisms as signal extraction procedures in high-dimensional noisy environments and offer a dynamical-systems perspective on how attention discovers relevant information in the presence of substantial noise.

artificial intelligence, machine learning, vector, (18 more...)

2606.22406

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Neural Information Processing SystemsJun-22-2026, 23:52:32 GMT

ACLT for Polynomial GNNs on Community-Based Graphs

We consider the empirical distribution of the embeddings of a k-layer polynomial GNN on a semi-supervised node classification task and prove a central limit theorem for them.

artificial intelligence, convergence, machine learning, (18 more...)

Country: North America > United States > California (0.46)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Neural Information Processing SystemsJun-22-2026, 22:13:41 GMT

Robust Regression of General ReLUs with Queries

We study the task of agnostically learning general (as opposed to homogeneous) ReLUs under the Gaussian distribution with respect to the squared loss. In the passive learning setting, recent work gave a computationally efficient algorithm that uses poly(d,1/ϵ)labeled examples and outputs a hypothesis with error O(opt)+ϵ, where optis the squared loss of the best fit ReLU. Here we focus on the interactive setting, where the learner has some form of query access to the labels of unlabeled examples. Our main result is the first computationally efficient learner that uses dpolylog(1/ϵ)+ O(min{1/p,1/ϵ})black-box label queries, where pis the bias of the target function, and achieves error O(opt)+ϵ. We complement our algorithmic result by showing that its query complexity bound is qualitatively near-optimal, even ignoring computational constraints. Finally, we establish that query access is essentially necessary to improve on the label complexity of passive learning. Specifically, for pool-based active learning, any active learner requires Ω(d/ϵ) labels, unless it draws a super-polynomial number of unlabeled examples.

artificial intelligence, machine learning, polylog, (17 more...)

Country: North America > United States (0.45)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.46)

Neural Information Processing SystemsJun-19-2026, 14:22:45 GMT

Continuous Diffusion Model for Language Modeling

Diffusion models have emerged as a promising alternative to autoregressive models in modeling discrete categorical data. However, diffusion models that directly work on discrete data space fail to fully exploit the power of iterative refinement, as the signals are lost during transitions between discrete states. Existing continuous diffusion models for discrete data underperform compared to discrete methods, and the lack of a clear connection between the two approaches hinders the development of effective diffusion models for discrete data. In this work, we propose a continuous diffusion model for language modeling that incorporates the geometry of the underlying categorical distribution. We establish a connection between the discrete diffusion and continuous flow on the statistical manifold, and building on this analogy, introduce a simple diffusion process that generalizes existing discrete diffusion models. We further propose a simulation-free training framework based on radial symmetry, along with a simple technique to address the high dimensionality of the manifold. Comprehensive experiments on language modeling benchmarks and other modalities show that our method outperforms existing discrete diffusion models and approaches the performance of autoregressive models. The code is available at https://github.com/harryjo97/RDLM.

diffusion model, machine learning, natural language, (18 more...)

Country:

North America > United States (1.00)
Europe (0.92)

Genre: Research Report > Experimental Study (1.00)

Industry: Government > Regional Government > North America Government > United States Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Azaïs, Jean-Marc, Dalmao, Federico, De Castro, Yohann

Non-asymptotic Tail Bounds for the Kostlan--Shub--Smale Field: Tensor PCA and Spherical $k$-Spin Complexity

arXiv.org Machine LearningJun-17-2026

This paper builds a hierarchy of explicit, non-asymptotic tail bounds for the supremum of the Kostlan--Shub--Smale (KSS) random field on the sphere, and applies it to two problems: Spiked Tensor PCA and the landscape of the spherical $k$-spin model. For Tensor PCA, we study the non-asymptotic statistical limits of estimating a rank-$R$ symmetric signal tensor of order~$k\ge 3$ and dimension~$d\ge 3$ from a single Gaussian observation at signal-to-noise ratio~$λ$, through the \emph{profile maximum likelihood estimator}, the MLE restricted to normalized rank-$R$ tensors of coherence at least~$κ$. Our analysis uses a single reduction: a deterministic geometric inequality (the Tube Method) and a rank-reduction step bound the estimation error by the supremum of the canonical KSS field, which the Kac--Rice formula turns into a Gaussian integral against the expected absolute characteristic polynomial of a shifted Gaussian Orthogonal Ensemble, controlled in turn by the four explicit tail bounds of our hierarchy (three from a Mehta--Fyodorov representation, one from a Ben Arous--Dembo--Guionnet large deviation). The same reduction yields two results, each with explicit constants. For estimation, a finite-$(k,d)$ error bound recovers the asymptotically optimal rate~$\sqrt{d\log k}$ of Perry, Wein and Bandeira, with explicit dependence on the rank~$R$ and the coherence~$κ$. For the landscape, a two-sided non-asymptotic bracketing of the annealed complexity of the spherical $k$-spin Hamiltonian recovers the Auffinger--Ben Arous--Černý complexity function in the high-dimensional limit.

artificial intelligence, bayesian inference, machine learning, (20 more...)

2606.17665

Country:

North America > Canada (0.45)
Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.45)
Europe > France (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Neural Information Processing SystemsJun-16-2026, 19:26:25 GMT

Tight Generalization Bounds for Large-Margin Halfspaces

We prove the first generalization bound for large-margin halfspaces that is asymptotically tight in the tradeoff between the margin, the fraction of training points with the given margin, the failure probability and the number of training points.

artificial intelligence, machine learning, probability, (19 more...)

Country:

North America > United States (0.67)
Europe (0.46)

Genre: Research Report > Experimental Study (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Neural Information Processing SystemsJun-15-2026, 15:53:40 GMT

Stable Minima of ReLU Neural Networks Suffer from the Curse of Dimensionality: The Neural Shattering Phenomenon

We study the implicit bias of flatness / low (loss) curvature and its effects on generalization in two-layer overparameterized ReLU networks with multivariate inputs--a problem well motivated by the minima stability and edge-of-stability phenomena in gradient-descent training. Existing work either requires interpolation or focuses only on univariate inputs. This paper presents new and somewhat surprising theoretical results for multivariate inputs. On two natural settings (1) generalization gap for flat solutions, and (2) mean-squared error (MSE) in nonparametric function estimation by stable minima, we prove upper and lower bounds, which establish that while flatness does imply generalization, the resulting rates of convergence necessarily deteriorate exponentially as the input dimension grows. This gives an exponential separation between the flat solutions compared to low-norm solutions (i.e., weight decay), which are known not to suffer from the curse of dimensionality. In particular, our minimax lower bound construction, based on a novel packing argument with boundary-localized ReLU neurons, reveals how flat solutions can exploit a kind of "neural shattering" where neurons rarely activate, but with high weight magnitudes. This leads to poor performance in high dimensions. We corroborate these theoretical findings with extensive numerical simulations. To the best of our knowledge, our analysis provides the first systematic explanation for why flat minima may fail to generalize in high dimensions.

artificial intelligence, machine learning, theorem 3, (17 more...)

Country: North America > United States (0.45)

Genre: Research Report > Experimental Study (1.00)

Industry: Government (0.45)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)