AITopics | krzakala

Collaborating Authors

krzakala

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Characterization of Gaussian Universality Breakdown in High-Dimensional Empirical Risk Minimization

Yaakoubi, Chiheb, Louart, Cosme, Tiomoko, Malik, Liao, Zhenyu

arXiv.org Machine LearningApr-6-2026

We study high-dimensional convex empirical risk minimization (ERM) under general non-Gaussian data designs. By heuristically extending the Convex Gaussian Min-Max Theorem (CGMT) to non-Gaussian settings, we derive an asymptotic min-max characterization of key statistics, enabling approximation of the mean $μ_{\hatθ}$ and covariance $C_{\hatθ}$ of the ERM estimator $\hatθ$. Specifically, under a concentration assumption on the data matrix and standard regularity conditions on the loss and regularizer, we show that for a test covariate $x$ independent of the training data, the projection $\hatθ^\top x$ approximately follows the convolution of the (generally non-Gaussian) distribution of $μ_{\hatθ}^\top x$ with an independent centered Gaussian variable of variance $\text{Tr}(C_{\hatθ}\mathbb{E}[xx^\top])$. This result clarifies the scope and limits of Gaussian universality for ERMs. Additionally, we prove that any $\mathcal{C}^2$ regularizer is asymptotically equivalent to a quadratic form determined solely by its Hessian at zero and gradient at $μ_{\hatθ}$. Numerical simulations across diverse losses and models are provided to validate our theoretical predictions and qualitative insights.

artificial intelligence, machine learning, universality, (19 more...)

arXiv.org Machine Learning

2604.03146

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Russia > Northwestern Federal District > Leningrad Oblast > Saint Petersburg (0.04)
Asia > South Korea > Seoul > Seoul (0.04)
(3 more...)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

A Noise Sensitivity Exponent Controls Large Statistical-to-Computational Gaps in Single- and Multi-Index Models

Defilippis, Leonardo, Krzakala, Florent, Loureiro, Bruno, Maillard, Antoine

arXiv.org Machine LearningMar-19-2026

Understanding when learning is statistically possible yet computationally hard is a central challenge in high-dimensional statistics. In this work, we investigate this question in the context of single- and multi-index models, classes of functions widely studied as benchmarks to probe the ability of machine learning methods to discover features in high-dimensional data. Our main contribution is to show that a Noise Sensitivity Exponent (NSE) - a simple quantity determined by the activation function - governs the existence and magnitude of statistical-to-computational gaps within a broad regime of these models. We first establish that, in single-index models with large additive noise, the onset of a computational bottleneck is fully characterized by the NSE. We then demonstrate that the same exponent controls a statistical-computational gap in the specialization transition of large separable multi-index models, where individual components become learnable. Finally, in hierarchical multi-index models, we show that the NSE governs the optimal computational rate in which different directions are sequentially learned. Taken together, our results identify the NSE as a unifying property linking noise robustness, computational hardness, and feature specialization in high-dimensional learning.

artificial intelligence, machine learning, multi-index model, (17 more...)

arXiv.org Machine Learning

2603.17896

Country:

Europe > France (0.14)
Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

All-or-nothingstatisticalandcomputationalphase transitionsinsparsespikedmatrixestimation

Neural Information Processing SystemsFeb-9-2026, 18:35:52 GMT

Similarly the ISOMAP face database consists ofimages (256levels ofgray)ofsize64 64,i.e.,vectors in R4096, whereas the correct intrinsic dimension is only3 (for the vertical, horizontal pause and lightingdirection). The second approach, is anaverage caseapproach (in the spirit of thestatistical mechanics treatment ofhighdimensional systems), thatmodelsfeaturevectorsby arandom ensemble,taken as aset ofrandom vectors with independently identically distributed (i.i.d.) components, and a small but xed fraction of non-zero components.

algorithm, artificial intelligence, lnn, (18 more...)

Neural Information Processing Systems

Country:

Europe > Austria > Vienna (0.14)
Europe > United Kingdom (0.04)
Asia > Middle East > Jordan (0.04)
(6 more...)

Technology: Information Technology > Artificial Intelligence (0.66)

Add feedback

All-or-nothingstatisticalandcomputationalphase transitionsinsparsespikedmatrixestimation

Neural Information Processing SystemsFeb-9-2026, 18:35:44 GMT

artificial intelligence, krzakala, transition, (17 more...)

Neural Information Processing Systems

Country:

Europe > Austria > Vienna (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
(5 more...)

Technology: Information Technology > Artificial Intelligence (0.67)

Add feedback

Optimal scaling laws in learning hierarchical multi-index models

Defilippis, Leonardo, Krzakala, Florent, Loureiro, Bruno, Maillard, Antoine

arXiv.org Machine LearningFeb-6-2026

In this work, we provide a sharp theory of scaling laws for two-layer neural networks trained on a class of hierarchical multi-index targets, in a genuinely representation-limited regime. We derive exact information-theoretic scaling laws for subspace recovery and prediction error, revealing how the hierarchical features of the target are sequentially learned through a cascade of phase transitions. We further show that these optimal rates are achieved by a simple, target-agnostic spectral estimator, which can be interpreted as the small learning-rate limit of gradient descent on the first-layer weights. Once an adapted representation is identified, the readout can be learned statistically optimally, using an efficient procedure. As a consequence, we provide a unified and rigorous explanation of scaling laws, plateau phenomena, and spectral structure in shallow neural networks trained on such hierarchical targets.

artificial intelligence, machine learning, neural network, (16 more...)

arXiv.org Machine Learning

2602.05846

Country:

North America > United States (0.14)
Europe > France (0.14)
Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)
(2 more...)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

All-or-nothing statistical and computational phase transitions in sparse spiked matrix estimation

Neural Information Processing SystemsAug-15-2025, 17:10:03 GMT

These jump from their maximum possible value to zero, at well denullned signal-to-noise thresholds whose asymptotic values we determine exactly.

estimation, phase transition, transition, (15 more...)

Neural Information Processing Systems

Country:

Europe > Austria > Vienna (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)
(8 more...)

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

On the phase diagram of extensive-rank symmetric matrix denoising beyond rotational invariance

Barbier, Jean, Camilli, Francesco, Ko, Justin, Okajima, Koki

arXiv.org Artificial IntelligenceNov-4-2024

Matrix denoising is central to signal processing and machine learning. Its analysis when the matrix to infer has a factorised structure with a rank growing proportionally to its dimension remains a challenge, except when it is rotationally invariant. In this case the information theoretic limits and a Bayes-optimal denoising algorithm, called rotational invariant estimator [1,2], are known. Beyond this setting few results can be found. The reason is that the model is not a usual spin system because of the growing rank dimension, nor a matrix model due to the lack of rotation symmetry, but rather a hybrid between the two. In this paper we make progress towards the understanding of Bayesian matrix denoising when the hidden signal is a factored matrix $XX^\intercal$ that is not rotationally invariant. Monte Carlo simulations suggest the existence of a denoising-factorisation transition separating a phase where denoising using the rotational invariant estimator remains Bayes-optimal due to universality properties of the same nature as in random matrix theory, from one where universality breaks down and better denoising is possible by exploiting the signal's prior and factorised structure, though algorithmically hard. We also argue that it is only beyond the transition that factorisation, i.e., estimating $X$ itself, becomes possible up to sign and permutation ambiguities. On the theoretical side, we combine mean-field techniques in an interpretable multiscale fashion in order to access the minimum mean-square error and mutual information. Interestingly, our alternative method yields equations which can be reproduced using the replica approach of [3]. Using numerical insights, we then delimit the portion of the phase diagram where this mean-field theory is reliable, and correct it using universality when it is not. Our ansatz matches well the numerics when accounting for finite size effects.

matrix, mmse, transition, (17 more...)

arXiv.org Artificial Intelligence

2411.01974

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(7 more...)

Genre: Research Report > New Finding (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Information limits and Thouless-Anderson-Palmer equations for spiked matrix models with structured noise

Barbier, Jean, Camilli, Francesco, Mondelli, Marco, Xu, Yizhou

arXiv.org Artificial IntelligenceJul-8-2024

We consider a prototypical problem of Bayesian inference for a structured spiked model: a low-rank signal is corrupted by additive noise. While both information-theoretic and algorithmic limits are well understood when the noise is a Gaussian Wigner matrix, the more realistic case of structured noise still proves to be challenging. To capture the structure while maintaining mathematical tractability, a line of work has focused on rotationally invariant noise. However, existing studies either provide sub-optimal algorithms or are limited to special cases of noise ensembles. In this paper, using tools from statistical physics (replica method) and random matrix theory (generalized spherical integrals) we establish the first characterization of the information-theoretic limits for a noise matrix drawn from a general trace ensemble. Remarkably, our analysis unveils the asymptotic equivalence between the rotationally invariant model and a surrogate Gaussian one. Finally, we show how to saturate the predicted statistical limits using an efficient algorithm inspired by the theory of adaptive Thouless-Anderson-Palmer (TAP) equations.

algorithm, equation, matrix, (17 more...)

arXiv.org Artificial Intelligence

2405.20993

Country:

Europe > Austria (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Italy > Friuli Venezia Giulia > Trieste Province > Trieste (0.04)
(2 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)

Add feedback

Optimal thresholds and algorithms for a model of multi-modal learning in high dimensions

Keup, Christian, Zdeborová, Lenka

arXiv.org Machine LearningJul-3-2024

This work explores multi-modal inference in a high-dimensional simplified model, analytically quantifying the performance gain of multi-modal inference over that of analyzing modalities in isolation. We present the Bayes-optimal performance and weak recovery thresholds in a model where the objective is to recover the latent structures from two noisy data matrices with correlated spikes. The paper derives the approximate message passing (AMP) algorithm for this model and characterizes its performance in the high-dimensional limit via the associated state evolution. The analysis holds for a broad range of priors and noise channels, which can differ across modalities. The linearization of AMP is compared numerically to the widely used partial least squares (PLS) and canonical correlation analysis (CCA) methods, which are both observed to suffer from a sub-optimal recovery threshold.

iteration, modality, threshold, (15 more...)

arXiv.org Machine Learning

2407.03522

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)
North America > United States > Washington > King County > Seattle (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Rigorous dynamical mean field theory for stochastic gradient descent methods

Gerbelot, Cedric, Troiani, Emanuele, Mignacco, Francesca, Krzakala, Florent, Zdeborova, Lenka

arXiv.org Machine LearningNov-29-2023

We prove closed-form equations for the exact high-dimensional asymptotics of a family of first order gradient-based methods, learning an estimator (e.g. M-estimator, shallow neural network, ...) from observations on Gaussian data with empirical risk minimization. This includes widely used algorithms such as stochastic gradient descent (SGD) or Nesterov acceleration. The obtained equations match those resulting from the discretization of dynamical mean-field theory (DMFT) equations from statistical physics when applied to gradient flow. Our proof method allows us to give an explicit description of how memory kernels build up in the effective dynamics, and to include non-separable update functions, allowing datasets with non-identity covariance matrices. Finally, we provide numerical implementations of the equations for SGD with generic extensive batch-size and with constant learning rates.

artificial intelligence, equation, machine learning, (16 more...)

arXiv.org Machine Learning

2210.06591

Country:

Europe > Switzerland > Vaud > Lausanne (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(4 more...)

Genre: Research Report (0.50)

Industry: Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback