Save It All: Enabling Full Parameter Tuning for Federated Large Language Models via Cycle Black Gradient Descent

Wang, Lin, Wang, Zhichao, Tang, Xiaoying

arXiv.org Artificial Intelligence 

The advent of large language models (LLMs) has revolutionized the deep learning paradigm, yielding impressive results across a wide array of tasks. However, the pre-training or fine-tuning of LLMs within a federated learning (FL) framework poses substantial challenges, including considerable computational and memory resource demands, as well as communication bottlenecks between servers and clients. Existing solutions either make the unrealistic Figure 1: Observation on Federated Learning. Bar assumption that the entire model is graphs represent the estimated memory usage for full exchanged for training, or apply parametereffective parameter tuning of an LLaMA-7B model on a single fine-tuning methods from centralized device and the line graph represents the loss across different learning to train LLMs in FL which tend to training paradigms. 'Centralized-Cy' denotes centralized underperform during training or fine-tuning training with cyclical block updates, 'Fed-full' stages due to the limited search subspace of refers to federated full parameter tuning with complete parameter updating. In this paper, we introduce model communication to clients, 'FedBAvg' signifies a novel method for the efficient training federated training with block updates where the server and fine-tuning of LLMs in FL, with minimal selects clients for tuning and aggregates updates, and resource consumption. Our approach, termed'FedCyBGD' represents our approach, where clients FedCyBGD, utilizes Cycle Block Gradient Descent cyclically participate in block tuning.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found