Goto

Collaborating Authors

 spectral initialization


Sample efficient inductive matrix completion with noise and inexact side information

arXiv.org Machine Learning

Low-rank matrix completion is a widely studied problem with many variants. Inductive matrix completion (IMC) incorporates row and column side information to significantly narrow the search space. Prior work falls into two regimes: methods that exploit this structure to achieve reduced sample complexity but only in noiseless settings, and methods that handle noise but require sample complexity matching the ambient matrix dimension, forfeiting the sample efficiency that side information should provide. In this paper, we close this gap by studying noisy IMC with a nonconvex projected gradient descent algorithm with spectral initialization. Our main technical contribution is establishing a regularity condition for the IMC loss function that holds at the reduced sample complexity determined by the effective problem size, scaling with the side information dimension a rather than the ambient dimension n. This directly yields linear convergence and an estimation error that both depend only on the effective problem size rather than the ambient matrix dimension. We further extend our analysis to the inexact side information setting, demonstrating that the reduced sample complexity is maintained and the estimation error is order-optimal with respect to the inexactness of the side information. Extensive simulations and real-world experiments on the MovieLens dataset validate our theoretical findings.





Learning single index model with gradient descent: spectral initialization and precise asymptotics

arXiv.org Machine Learning

Non-convex optimization plays a central role in many statistics and machine learning problems. Despite the landscape irregularities for general non-convex functions, some recent work showed that for many learning problems with random data and large enough sample size, there exists a region around the true signal with benign landscape. Motivated by this observation, a widely used strategy is a two-stage algorithm, where we first apply a spectral initialization to plunge into the region, and then run gradient descent for further refinement. While this two-stage algorithm has been extensively analyzed for many non-convex problems, the precise distributional property of both its transient and long-time behavior remains to be understood. In this work, we study this two-stage algorithm in the context of single index models under the proportional asymptotics regime. We derive a set of dynamical mean field equations, which describe the precise behavior of the trajectory of spectral initialized gradient descent in the large system limit. We further show that when the spectral initialization successfully lands in a region of benign landscape, the above equation system is asymptotically time translation invariant and exponential converging, and thus admits a set of long-time fixed points that represents the mean field characterization of the limiting point of the gradient descent dynamic. As a proof of concept, we demonstrate our general theory in the example of regularized Wirtinger flow for phase retrieval.


Model-free algorithms for fast node clustering in SBM type graphs and application to social role inference in animals

arXiv.org Machine Learning

Graphs have become extremely useful for representing a wide variety of systems in different contexts: biological, social, information... A basic attempt to study them may consist in partitioning the vertices of a graph into clusters that are more densely connected; that is commonly called community detection or graph clustering; see for instance [20, 1]. Community detection and clustering are central problems in machine learning and data science. In particular, the stochastic block model (SBM) [34, 25] has been widely used as a canonical model for community detection and as a building block for clustering with more structural assumptions. In its most general form, the SBM corresponds to a randomly weighted graph model where each node has an unobserved label and the probability of observing a given edge between two nodes depends only on the labels of the nodes under consideration.


Understanding Incremental Learning with Closed-form Solution to Gradient Flow on Overparamerterized Matrix Factorization

arXiv.org Artificial Intelligence

Many theoretical studies on neural networks attribute their excellent empirical performance to the implicit bias or regularization induced by first-order optimization algorithms when training networks under certain initialization assumptions. One example is the incremental learning phenomenon in gradient flow (GF) on an overparamerterized matrix factorization problem with small initialization: GF learns a target matrix by sequentially learning its singular values in decreasing order of magnitude over time. In this paper, we develop a quantitative understanding of this incremental learning behavior for GF on the symmetric matrix factorization problem, using its closed-form solution obtained by solving a Riccati-like matrix differential equation. We show that incremental learning emerges from some time-scale separation among dynamics corresponding to learning different components in the target matrix. By decreasing the initialization scale, these time-scale separations become more prominent, allowing one to find low-rank approximations of the target matrix. Lastly, we discuss the possible avenues for extending this analysis to asymmetric matrix factorization problems.


To Reviewer 1 (R1)

Neural Information Processing Systems

We thank all three reviewers for their constructive comments. We address them below one by one. Q1: what makes it nontrivial to extend the regularity condition and proof technique in [11] to Riemmanian optimization. The Grassmannian manifold is nonconvex, making the analysis more complex. We will incorporate these into a revised version of the manuscript.



Frequency-Constrained Learning for Long-Term Forecasting

arXiv.org Artificial Intelligence

However, modern deep forecasting models often fail to capture these recurring patterns due to spectral bias and a lack of frequency-aware inductive priors. Motivated by this gap, we propose a simple yet effective method that enhances long-term forecasting by explicitly modeling periodicity through spectral initialization and frequency-constrained optimization. Specifically, we extract dominant low-frequency components via Fast Fourier Transform (FFT)-guided coordinate descent, initialize sinusoidal embeddings with these components, and employ a two-speed learning schedule to preserve meaningful frequency structure during training. Our approach is model-agnostic and integrates seamlessly into existing Transformer-based architectures. Extensive experiments across diverse real-world benchmarks demonstrate consistent performance gains--particularly at long horizons--highlighting the benefits of injecting spectral priors into deep temporal models for robust and interpretable long-range forecasting. Moreover, on synthetic data, our method accurately recovers ground-truth frequencies, further validating its interpretability and effectiveness in capturing latent periodic patterns.