AITopics | svd

SUMO: Subspace-Aware Moment-Orthogonalization for Accelerating Memory-Efficient LLMTraining

Neural Information Processing SystemsJun-22-2026, 22:17:59 GMT

Low-rank gradient-based optimization methods have significantly improved memory efficiency during the training of large language models (LLMs), enabling operations within constrained hardware without sacrificing performance. However, these methods primarily emphasize memory savings, often overlooking potential acceleration in convergence due to their reliance on standard isotropic steepest descent techniques, which can perform suboptimally in the highly anisotropic landscapes typical of deep networks, particularly LLMs. In this paper, we propose SUMO (Subspace-Aware Moment-Orthogonalization), an optimizer that employs exact singular value decomposition (SVD) for moment orthogonalization within a dynamically adapted low-dimensional subspace, enabling norm-inducing steepest descent optimization steps. By explicitly aligning optimization steps with the spectral characteristics of the loss landscape, SUMO effectively mitigates approximation errors associated with commonly used methods, such as the Newton-Schulz orthogonalization approximation. We theoretically establish an upper bound on these approximation errors, proving their dependence on the condition numbers of moments, conditions we analytically demonstrate are encountered during LLM training. Furthermore, we both theoretically and empirically illustrate that exact orthogonalization via SVD substantially improves convergence rates while reducing overall complexity. Empirical evaluations confirm that SUMO accelerates convergence, enhances stability, improves performance, and reduces memory requirements by up to 20% compared to state-of-the-art methods.

large language model, machine learning, orthogonalization, (18 more...)

Neural Information Processing Systems

Country: Asia > Middle East (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Rethinking PCAThrough Duality

Neural Information Processing SystemsJun-17-2026, 02:25:40 GMT

Motivated by the recently shown connection between self-attention and (kernel) principal component analysis (PCA), we revisit the fundamentals of PCA. Using the difference-of-convex (DC) framework, we present several novel formulations and provide new theoretical insights. In particular, we show the kernelizability and outof-sample applicability for a PCA-like family of problems. Moreover, we uncover that simultaneous iteration, which is connected to the classical QR algorithm, is an instance of the difference-of-convex algorithm (DCA), offering an optimization perspective on this longstanding method. Further, we describe new algorithms for PCA and empirically compare them with state-of-the-art methods. Lastly, we introduce a kernelizable dual formulation for a robust variant of PCA that minimizes the l1-deviation of the reconstruction errors.

artificial intelligence, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country: Europe > Belgium (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
(3 more...)

Add feedback

Parameter Efficient Fine-tuning via Explained Variance Adaptation

Neural Information Processing SystemsJun-16-2026, 19:39:25 GMT

Foundation models (FMs) are pre-trained on large-scale datasets and then finetuned for a specific downstream task. The most common fine-tuning method is to update pretrained weights via low-rank adaptation (LoRA). Existing initialization strategies for LoRA often rely on singular value decompositions (SVD) of gradients or weight matrices. However, they do not provably maximize the expected gradient signal, which is critical for fast adaptation. To this end, we introduce Explained Variance Adaptation (EVA), an initialization scheme that uses the directions capturing the most activation variance, provably maximizing the expected gradient signal and accelerating fine-tuning.

large language model, machine learning, natural language, (22 more...)

Neural Information Processing Systems

Country:

North America > United States (1.00)
Europe > Austria (0.67)
North America > Canada > Quebec (0.27)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine (0.92)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(4 more...)

Add feedback

46fc943ecd56441056a560ba37d0b9e8-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 16:38:19 GMT

artificial intelligence, data mining, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Industry: Government (0.46)

Technology:

Information Technology > Data Science > Data Mining (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Single Pass PCA of Matrix Products

Shanshan Wu, Srinadh Bhojanapalli, Sujay Sanghavi, Alexandros G. Dimakis

Neural Information Processing SystemsMar-23-2026, 01:24:29 GMT

Neural Information Processing Systems http://nips.cc/

approximation, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States > Texas (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

6e69ebbfad976d4637bb4b39de261bf7-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-19-2026, 16:00:17 GMT

complexity, epoch, matrix, (12 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.34)

Add feedback

df0b8fb21c53254b7afa62e020447c81-Paper.pdf

Neural Information Processing SystemsFeb-19-2026, 07:59:15 GMT

algorithm, matrix, spectral norm, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
North America > Canada (0.04)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.69)

Add feedback

834f4c0b8d241b4943a9dcb77fd85675-Paper-Conference.pdf

Neural Information Processing SystemsFeb-19-2026, 06:54:07 GMT

algorithm, rotation, transformation, (15 more...)

Neural Information Processing Systems

Country: North America > United States (0.04)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.94)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

d2d3ca53fd8fbd564bb948f8c09c0d85-Paper-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 06:23:03 GMT

artificial intelligence, bitr, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Sweden > Vaestra Goetaland > Gothenburg (0.04)
Europe > Latvia > Riga Municipality > Riga (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (0.45)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Vision (0.68)
Information Technology > Data Science (0.67)
(2 more...)

Add feedback

MVSplat360: Feed-Forward 360 Scene Synthesis from Sparse Views Y uedong Chen

Neural Information Processing SystemsFeb-17-2026, 23:01:57 GMT

Diffusion (SVD) model, where these features then act as pose and visual cues to guide the denoising process and produce photorealistic 3D-consistent views. Our model is end-to-end trainable and supports rendering arbitrary views with as few as 5 sparse input views. To evaluate MVSplat360's performance, we introduce a new benchmark using the challenging DL3DV -10K dataset, where

arxiv preprint arxiv, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: