Exploiting Block Coordinate Descent for Cost-Effective LLM Model Training

Liu, Zeyu, Li, Yan, Zhang, Yunquan, Zhang, Boyang, Jiang, Guoyong, Zhang, Xin, Xiao, Limin, Zhang, Weifeng, Cheng, Daning

Sep-29-2025–arXiv.org Artificial Intelligence

Training large language models typically demands extensive GPU memory and substantial financial investment, which poses a barrier for many small- to medium-sized teams. In this paper, we propose a full-parameter pre-training and fine-tuning framework based on block coordinate descent (BCD), enhanced with engineering optimizations, to enable efficient training of large-scale models on cost-effective RTX 4090, A100 and A800 GPU clusters. Under identical hardware configurations, we reduce the training cost of a 7B model to 33% on A100/A800 and only 2.6% on RTX 4090, compared to standard full-parameter training. It also enables large models previously restricted to A100 clusters to be trained on RTX 4090 without degrading performance. BCD achieves comparable or better accuracy than full-parameter and fine-tuning methods at most cases, with lower GPU consumption and improved hardware utilization.

bcd method, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

Sep-29-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (0.68)

Technology:
- Information Technology
  - Hardware (1.00)
  - Artificial Intelligence
    - Natural Language > Large Language Model (1.00)
    - Machine Learning > Neural Networks
      - Deep Learning (0.70)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found