Save It All: Enabling Full Parameter Tuning for Federated Large Language Models via Cycle Black Gradient Descent

Wang, Lin, Wang, Zhichao, Tang, Xiaoying

Jun-16-2024–arXiv.org Artificial Intelligence

The advent of large language models (LLMs) has revolutionized the deep learning paradigm, yielding impressive results across a wide array of tasks. However, the pre-training or fine-tuning of LLMs within a federated learning (FL) framework poses substantial challenges, including considerable computational and memory resource demands, as well as communication bottlenecks between servers and clients. Existing solutions either make the unrealistic Figure 1: Observation on Federated Learning. Bar assumption that the entire model is graphs represent the estimated memory usage for full exchanged for training, or apply parametereffective parameter tuning of an LLaMA-7B model on a single fine-tuning methods from centralized device and the line graph represents the loss across different learning to train LLMs in FL which tend to training paradigms. 'Centralized-Cy' denotes centralized underperform during training or fine-tuning training with cyclical block updates, 'Fed-full' stages due to the limited search subspace of refers to federated full parameter tuning with complete parameter updating. In this paper, we introduce model communication to clients, 'FedBAvg' signifies a novel method for the efficient training federated training with block updates where the server and fine-tuning of LLMs in FL, with minimal selects clients for tuning and aggregates updates, and resource consumption. Our approach, termed'FedCyBGD' represents our approach, where clients FedCyBGD, utilizes Cycle Block Gradient Descent cyclically participate in block tuning.

large language model, machine learning, preprint arxiv, (18 more...)

arXiv.org Artificial Intelligence

Jun-16-2024

arXiv.org PDF

Add feedback

Country:
- Asia > China (0.14)

Genre:
- Research Report > Promising Solution (0.34)

Industry:
- Information Technology > Security & Privacy (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found