Goto

Collaborating Authors

 weak recovery



Algorithmic Contiguity from Low-Degree Heuristic II: Predicting Detection-Recovery Gaps

arXiv.org Machine Learning

The low-degree polynomial framework has emerged as a powerful tool for providing evidence of statistical-computational gaps in high-dimensional inference. For detection problems, the standard approach bounds the low-degree advantage through an explicit orthonormal basis. However, this method does not extend naturally to estimation tasks, and thus fails to capture the \emph{detection-recovery gap phenomenon} that arises in many high-dimensional problems. Although several important advances have been made to overcome this limitation \cite{SW22, SW25, CGGV25+}, the existing approaches often rely on delicate, model-specific combinatorial arguments. In this work, we develop a general approach for obtaining \emph{conditional computational lower bounds} for recovery problems from mild bounds on low-degree testing advantage. Our method combines the notion of algorithmic contiguity in \cite{Li25} with a cross-validation reduction in \cite{DHSS25} that converts successful recovery into a hypothesis test with lopsided success probabilities. In contrast to prior unconditional lower bounds, our argument is conceptually simple, flexible, and largely model-independent. We apply this framework to several canonical inference problems, including planted submatrix, planted dense subgraph, stochastic block model, multi-frequency angular synchronization, orthogonal group synchronization, and multi-layer stochastic block model. In the first three settings, our method recovers existing low-degree lower bounds for recovery in \cite{SW22, SW25} via a substantially simpler argument. In the latter three, it gives new evidence for conjectured computational thresholds including the persistence of detection-recovery gaps. Together, these results suggest that mild control of low-degree advantage is often sufficient to explain computational barriers for recovery in high-dimensional statistical models.


A Noise Sensitivity Exponent Controls Large Statistical-to-Computational Gaps in Single- and Multi-Index Models

arXiv.org Machine Learning

Understanding when learning is statistically possible yet computationally hard is a central challenge in high-dimensional statistics. In this work, we investigate this question in the context of single- and multi-index models, classes of functions widely studied as benchmarks to probe the ability of machine learning methods to discover features in high-dimensional data. Our main contribution is to show that a Noise Sensitivity Exponent (NSE) - a simple quantity determined by the activation function - governs the existence and magnitude of statistical-to-computational gaps within a broad regime of these models. We first establish that, in single-index models with large additive noise, the onset of a computational bottleneck is fully characterized by the NSE. We then demonstrate that the same exponent controls a statistical-computational gap in the specialization transition of large separable multi-index models, where individual components become learnable. Finally, in hierarchical multi-index models, we show that the NSE governs the optimal computational rate in which different directions are sequentially learned. Taken together, our results identify the NSE as a unifying property linking noise robustness, computational hardness, and feature specialization in high-dimensional learning.



A solvable high-dimensional model where nonlinear autoencoders learn structure invisible to PCA while test loss misaligns with generalization

arXiv.org Machine Learning

Many real-world datasets contain hidden structure that cannot be detected by simple linear correlations between input features. For example, latent factors may influence the data in a coordinated way, even though their effect is invisible to covariance-based methods such as PCA. In practice, nonlinear neural networks often succeed in extracting such hidden structure in unsupervised and self-supervised learning. However, constructing a minimal high-dimensional model where this advantage can be rigorously analyzed has remained an open theoretical challenge. We introduce a tractable high-dimensional spiked model with two latent factors: one visible to covariance, and one statistically dependent yet uncorrelated, appearing only in higher-order moments. PCA and linear autoencoders fail to recover the latter, while a minimal nonlinear autoencoder provably extracts both. We analyze both the population risk, and empirical risk minimization. Our model also provides a tractable example where self-supervised test loss is poorly aligned with representation quality: nonlinear autoencoders recover latent structure that linear methods miss, even though their reconstruction loss is higher.


From Information to Generative Exponent: Learning Rate Induces Phase Transitions in SGD

arXiv.org Machine Learning

To understand feature learning dynamics in neural networks, recent theoretical works have focused on gradient-based learning of Gaussian single-index models, where the label is a nonlinear function of a latent one-dimensional projection of the input. While the sample complexity of online SGD is determined by the information exponent of the link function, recent works improved this by performing multiple gradient steps on the same sample with different learning rates -- yielding a non-correlational update rule -- and instead are limited by the (potentially much smaller) generative exponent. However, this picture is only valid when these learning rates are sufficiently large. In this paper, we characterize the relationship between learning rate(s) and sample complexity for a broad class of gradient-based algorithms that encapsulates both correlational and non-correlational updates. We demonstrate that, in certain cases, there is a phase transition from an "information exponent regime" with small learning rate to a "generative exponent regime" with large learning rate. Our framework covers prior analyses of one-pass SGD and SGD with batch reuse, while also introducing a new layer-wise training algorithm that leverages a two-timescales approach (via different learning rates for each layer) to go beyond correlational queries without reusing samples or modifying the loss from squared error. Our theoretical study demonstrates that the choice of learning rate is as important as the design of the algorithm in achieving statistical and computational efficiency.




Low degree conjecture implies sharp computational thresholds in stochastic block model

arXiv.org Artificial Intelligence

We investigate implications of the (extended) low-degree conjecture (recently formalized in [MW23]) in the context of the symmetric stochastic block model. Assuming the conjecture holds, we establish that no polynomial-time algorithm can weakly recover community labels below the Kesten-Stigum (KS) threshold. In particular, we rule out polynomial-time estimators that, with constant probability, achieve correlation with the true communities that is significantly better than random. Whereas, above the KS threshold, polynomial-time algorithms are known to achieve constant correlation with the true communities with high probability[Mas14,AS15]. To our knowledge, we provide the first rigorous evidence for the sharp transition in recovery rate for polynomial-time algorithms at the KS threshold. Notably, under a stronger version of the low-degree conjecture, our lower bound remains valid even when the number of blocks diverges. Furthermore, our results provide evidence of a computational-to-statistical gap in learning the parameters of stochastic block models. In contrast to prior work, which either (i) rules out polynomial-time algorithms for hypothesis testing with 1-o(1) success probability [Hopkins18, BBK+21a] under the low-degree conjecture, or (ii) rules out low-degree polynomials for learning the edge connection probability matrix [LG23], our approach provides stronger lower bounds on the recovery and learning problem. Our proof combines low-degree lower bounds from [Hopkins18, BBK+21a] with graph splitting and cross-validation techniques. In order to rule out general recovery algorithms, we employ the correlation preserving projection method developed in [HS17].


Multi-View Stochastic Block Models

arXiv.org Machine Learning

Graph clustering is a central topic in unsupervised learning with a multitude of practical applications. In recent years, multi-view graph clustering has gained a lot of attention for its applicability to real-world instances where one has access to multiple data sources. In this paper we formalize a new family of models, called \textit{multi-view stochastic block models} that captures this setting. For this model, we first study efficient algorithms that naively work on the union of multiple graphs. Then, we introduce a new efficient algorithm that provably outperforms previous approaches by analyzing the structure of each graph separately. Furthermore, we complement our results with an information-theoretic lower bound studying the limits of what can be done in this model. Finally, we corroborate our results with experimental evaluations.