CoLA: Compute-Efficient Pre-Training of LLMs via Low-Rank Activation

Liu, Ziyue, Zhang, Ruijie, Wang, Zhengyang, Yang, Zi, Hovland, Paul, Nicolae, Bogdan, Cappello, Franck, Zhang, Zheng

Feb-15-2025–arXiv.org Artificial Intelligence

Large language models (LLMs) are revolutionizing many science and engineering fields. However, their huge model sizes impose extremely demanding needs of computational resources in the pre-training stage. Although low-rank factorizations can reduce model parameters, their direct application in LLM pre-training often lead to non-negligible performance loss. To address this fundamental challenge, we introduce CoLA and its memory-efficient implementation, CoLA-M. We leverage the low-rank structure observed widely in model activations, enforcing non-linear transformations between factorized weight matrices to reduce model size, boost model capacity and training efficiency. Experiments on LLaMA models with 60 million to 7 billion parameters show that CoLA reduces the computing cost by $\bf 2\pmb{\times}$ and improves training throughput by $\bf 1.86\pmb{\times}$ while maintaining full-rank level performance. CoLA-M further squeezes memory cost without sacrificing throughput, offering a pre-training approach with collectively superior parameter, computing, and memory efficiency. The LLMs produced are also $\bf 2\pmb{\times}$ smaller, enabling faster inference with lower memory cost on resource-constrained platforms

cola, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

Feb-15-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.46)

Genre:
- Research Report (0.82)

Industry:
- Energy (0.47)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)