Randomized Gradient Subspaces for Efficient Large Language Model Training

Rajabi, Sahar, Nonta, Nayeema, Vajpayee, Samanvay, Rambhatla, Sirisha

Oct-3-2025–arXiv.org Artificial Intelligence

LoRA (Hu et al., 2021) reduces fine-tuning memory via low-rank adapters, with extensions such as QLoRA (Dettmers et al., 2024) and Deep LoRA (Y aras et al., 2024) improving efficiency and robustness. Further variants enhance adaptation (Lialin et al., 2023; Renduchintala et al., 2024; Xia et al., 2024; Pan et al., 2024), while other approaches boost memory efficiency by compressing activations (Miles et al., 2024) or reformulating optimization via block 8 Randomized Gradient Subspaces for Efficient Large Language Model Trainingcoordinate descent (Luo et al., 2024). FLora (Hao et al., 2024) provides a complementary perspective by showing that LoRA can be interpreted as a random projection-based gradient compressor. They resamples projection matrices to achieve effectively high-rank updates while maintaining sub-linear optimizer state complexity. Another line of work exploits the structure of high-dimensional data by projecting it into evolving low-dimensional subspaces. Incremental and Grassmannian-based methods have been proposed for subspace tracking under partial observations (Balzano et al., 2011), noise (Zhang & Balzano, 2016; Kasai, 2017), and geodesic evolution (Blocker et al., 2023), offering a principled foundation for gradient projection techniques in LLM training.

artificial intelligence, large language model, natural language, (13 more...)

arXiv.org Artificial Intelligence

Oct-3-2025

arXiv.org PDF

Add feedback

Country:
- North America > Mexico (0.28)

Genre:
- Research Report (0.67)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found