AITopics

Country: North America > United States (0.92)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)

Neural Information Processing SystemsJun-18-2026, 05:48:30 GMT

Fast exact recovery of noisy matrix from few entries: the infinity norm approach

The matrix recovery (completion) problem, a central problem in data science, involves recovering a matrix Afrom a relatively small random set of entries. While such a task is generally impossible, it has been shown that one can recover A exactly in polynomial time, with high probability, under three basic and necessary assumptions: (1) the rank of A is very small compared to its dimensions (low rank), (2) A has delocalized singular vectors (incoherence), and (3) the sample size is sufficiently large. Various algorithms address this task, including convex optimization by Candes, Recht, and Tao (2009), alternating projection by Hardt and Wooters (2014), and low-rank approximation with gradient descent by Keshavan, Montanari, and Oh (2009, 2010). In applications, Candes and Plan (2009) noted that it is more realistic to assume noisy observations. In such cases, the above approaches provide approximate recovery with small root mean square error, which is difficult to convert into exact recovery.

artificial intelligence, assumption, machine learning, (19 more...)

Country:

North America > United States (0.28)
Europe (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

arXiv.org Machine LearningMay-29-2026

Improved Guarantees for Heterogeneous Treatment-Effect Estimation via Matrix Completion

Mehrotra, Anay, Tran, Phuc, Vu, Van H., Zampetakis, Manolis

A central goal of modern causal inference is estimating heterogeneous treatment effects to answer questions like "how does an intervention affect each unit," rather than only on average. We study this problem with panel-data where we observe $n$ units across $m$ times under unknown, non-uniform treatment assignments. The data in this setting is naturally represented as a matrix of all unit--time treatment effects. Estimating heterogeneous treatment effects can then be expressed as obtaining a good estimation of each row's average in this matrix. This allows us to formulate the problem as matrix completion, which can be solved under natural low-rankness assumptions. However, existing matrix-completion guarantees are not powerful enough to get meaningful bounds for the per-row guarantee required for estimating the heterogeneous treatment effect; roughly speaking, they are only useful for estimating average treatment effect bounds, as also illustrated in a recent line of work. We give a simple, computationally efficient estimator that, without knowledge of the propensities and under standard low-rankness and regularity assumptions, achieves a row-wise $\ell_2$ error of $\tilde{O}(\sqrt{\frac{1}{n} + \frac{n}{m^2}})$. Technically, our analysis establishes the first sharp row-wise $\ell_2$-perturbation bound for low-rank approximation, complementing existing spectral-, Frobenius-, and entrywise perturbation theory.

artificial intelligence, machine learning, probability, (19 more...)

2605.30319

Genre: Research Report > Experimental Study (0.46)

Industry: Health & Medicine (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Lauditi, Clarissa, Pehlevan, Cengiz, Bordelon, Blake

Spectral Dynamics in Deep Networks: Feature Learning, Outlier Escape, and Learning Rate Transfer

arXiv.org Machine LearningMay-22-2026

We study the evolution of hidden-weight spectra in wide neural networks trained by (stochastic) gradient descent. We develop a two-level dynamical mean-field theory (DMFT) that jointly tracks bulk and outlier spectral dynamics for spiked ensembles whose spike directions remain statistically dependent on the random bulk. We apply this framework to two settings: (1) infinite-width nonlinear networks in mean-field/$μ$P scaling and (2) deep linear networks in the proportional high-dimensional limit, where width, input dimension, and sample size diverge with fixed ratios. Our theory predicts how outliers evolve with training time, width, output scale, and initialization variance. In deep linear networks, $μ$P yields width-consistent outlier dynamics and hyperparameter transfer, including width-stable growth of the leading NTK mode toward the edge of stability (EoS). In contrast, NTK parameterization exhibits strongly width-dependent outlier dynamics, despite converging to a stable large-width limit. We show that this bulk+outlier picture is descriptive of simple tasks with small output channels, but that tasks involving large numbers of outputs (ImageNet classification or GPT language modeling) are better described by a restructuring of the spectral bulk. We develop a toy model with extensive output channels that recapitulates this phenomenon and show that edge of the spectrum still converges for sufficiently wide networks.

artificial intelligence, arxiv preprint arxiv, machine learning, (17 more...)

2605.0787

Country: North America > United States (0.46)

Genre: Research Report (0.49)

Industry:

Telecommunications > Networks (0.40)
Information Technology > Networks (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Tang, Dier, Han, Guangyue

Universal Feature Selection with Noisy Observations and Weak Symmetry Conditions

arXiv.org Machine LearningMay-12-2026

This paper relaxes the restrictive symmetry conditions adopted in [4], [5] and extends their universal feature selection framework to accommodate noisy observations as well as attribute structures that may exhibit directional preferences. We introduce the notion of weak spherical symmetry, quantified by second-moment distances, which allows controlled deviations from rotational invariance. Under this relaxed condition, we develop a universal feature selection framework based on the singular value decomposition of the canonical dependence matrix computed from noisy data. Our main result shows that the selected features achieve asymptotically optimal error exponents up to a residual term that depends on the symmetry deviation $δ$ and the noise levels $η_1, η_2$. When $δ, η_1, η_2$ are relatively small, our result recovers that of [5], thereby demonstrating that exact spherical symmetry is unnecessary. Overall, our findings highlight the robustness of the selection framework against second-moment deviations and observation noise, thereby broadening its applicability across diverse inference tasks and providing a theoretically grounded tool for universal feature selection in practical scenarios.

artificial intelligence, machine learning, symmetry, (13 more...)

2605.09396

Country: Asia > China (0.15)

Genre: Research Report > New Finding (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Morisset, Lucas, Durmus, Alain, Hardy, Adrien

Characterizing the Generalization Error of Random Feature Regression with Arbitrary Data-Augmentation

arXiv.org Machine LearningMay-12-2026

Data augmentation (DA) is now a standard ingredient in modern machine learning pipelines, with extensive empirical evidence reporting improvements in generalization across modalities and tasks Mumuni and Mumuni (2022); Wang et al. (2025). It is often used to encode task-relevant symmetries directly into the training procedure, for instance by encouraging invariance to image rotations or other transformations of the input Shorten and Khoshgoftaar (2019); Chen et al. (2020). It has also been identified as one of the most effective regularization techniques across both supervised learning settings Bishop (1995); Cubuk et al. (2019); Mumuni and Mumuni (2022); Wang et al. (2025) and self-supervised/unsupervised learning Feng et al. (2021); Van Assel et al. (2025). Domain-specific augmentation pipelines have been central to progress in computer vision Shorten and Khoshgoftaar (2019); Kumar et al. (2024), natural language processing Feng et al. (2021); Shorten et al. (2021); Bayer et al. (2022), and time-series or audio applications Wen et al. (2020); Iwana and Uchida (2021); Iglesias et al. (2023). Despite these empirical successes, the benefits of DA remain highly task-and data-dependent, and augmentation schemes are often engineered in an ad hoc manner Fawzi et al. (2016); Cubuk et al. (2019); Lim et al. (2019); Hataya et al. (2020). In contrast with this rich empirical literature, comprehensive theoretical analyses of DA remain relatively scarce. Two classical starting points are, first, the interpretation of additive Gaussian noise as a form of explicit (ridge-like) regularization Bishop (1995); Lin et al. (2024), and second, the idea that leveraging distributional invariances and group structure in the learning objective helps decrease the variance of the model without increasing its bias Chen et al. (2020). Yet, when applied to modern and complex augmentation schemes, these works either provide only upper bounds on the generalization error Lin et al. (2024), or require very strong assumptions on the data distribution (e.g.

machine learning, natural language, random feature regression, (18 more...)

2605.1029

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.45)

Qilong Gu, Arindam Banerjee

High Dimensional Structured Superposition Models

Neural Information Processing SystemsApr-30-2026, 22:53:55 GMT

High dimensional superposition models characterize observations using parameters which can be written as a sum of multiple component parameters, each with its own structure, e.g., sum of low rank and sparse matrices, sum of sparse and rotated sparse vectors, etc. In this paper, we consider general superposition models which allow sum of any number of component parameters, and each component structure can be characterized by any norm. We present a simple estimator for such models, give a geometric condition under which the components can be accurately estimated, characterize sample complexity of the estimator, and give high probability nonasymptotic bounds on the componentwise estimation error. We use tools from empirical processes and generic chaining for the statistical analysis, and our results, which substantially generalize prior work on superposition models, are in terms of Gaussian widths of suitable sets.

artificial intelligence, machine learning, sc condition, (16 more...)

Country: Europe (0.28)

Genre: Research Report > New Finding (0.48)

Industry: Energy (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.94)

Neural Information Processing SystemsApr-26-2026, 00:23:55 GMT

Laplacian Canonization: AMinimalist Approach to Sign and Basis Invariant Spectral Embedding

Spectral embedding is a powerful graph embedding technique that has received a lot of attention recently due to its effectiveness on Graph Transformers. However, from a theoretical perspective, the universal expressive power of spectral embedding comes at the price of losing two important invariance properties of graphs, sign and basis invariance, which also limits its effectiveness on graph data. To remedy this issue, many previous methods developed costly approaches to learn new invariants and suffer from high computation complexity. In this work, we explore a minimal approach that resolves the ambiguity issues by directly finding canonical directions for the eigenvectors, named Laplacian Canonization (LC). As a pure pre-processing method, LC is light-weighted and can be applied to any existing GNNs. We provide a thorough investigation, from theory to algorithm, on this approach, and discover an efficient algorithm named Maximal Axis Projection (MAP) that works for both sign and basis invariance and successfully canonizes more than 90% of all eigenvectors. Experiments on real-world benchmark datasets like ZINC, MOLTOX21, and MOLPCBA show that MAP consistently outperforms existing methods while bringing minimal computation overhead.

artificial intelligence, eigenvector, machine learning, (18 more...)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Neural Information Processing SystemsApr-25-2026, 05:47:45 GMT

29d74915e1b323676bfc28f91b3c4802-Paper.pdf

artificial intelligence, machine learning, matrix, (18 more...)

Country: Europe (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)