Randomized Gradient Subspaces for Efficient Large Language Model Training
Rajabi, Sahar, Nonta, Nayeema, Vajpayee, Samanvay, Rambhatla, Sirisha
–arXiv.org Artificial Intelligence
LoRA (Hu et al., 2021) reduces fine-tuning memory via low-rank adapters, with extensions such as QLoRA (Dettmers et al., 2024) and Deep LoRA (Y aras et al., 2024) improving efficiency and robustness. Further variants enhance adaptation (Lialin et al., 2023; Renduchintala et al., 2024; Xia et al., 2024; Pan et al., 2024), while other approaches boost memory efficiency by compressing activations (Miles et al., 2024) or reformulating optimization via block 8 Randomized Gradient Subspaces for Efficient Large Language Model Trainingcoordinate descent (Luo et al., 2024). FLora (Hao et al., 2024) provides a complementary perspective by showing that LoRA can be interpreted as a random projection-based gradient compressor. They resamples projection matrices to achieve effectively high-rank updates while maintaining sub-linear optimizer state complexity. Another line of work exploits the structure of high-dimensional data by projecting it into evolving low-dimensional subspaces. Incremental and Grassmannian-based methods have been proposed for subspace tracking under partial observations (Balzano et al., 2011), noise (Zhang & Balzano, 2016; Kasai, 2017), and geodesic evolution (Blocker et al., 2023), offering a principled foundation for gradient projection techniques in LLM training.
arXiv.org Artificial Intelligence
Oct-3-2025