GPT-FL: Generative Pre-trained Model-Assisted Federated Learning

Zhang, Tuo, Feng, Tiantian, Alam, Samiul, Dimitriadis, Dimitrios, Zhang, Mi, Narayanan, Shrikanth S., Avestimehr, Salman

Sep-29-2023–arXiv.org Artificial Intelligence

In this work, we propose GPT-FL, a generative pre-trained model-assisted federated learning (FL) framework. At its core, GPT-FL leverages generative pretrained models to generate diversified synthetic data. These generated data are used to train a downstream model on the server, which is then fine-tuned with private client data under the standard FL framework. We show that GPT-FL consistently outperforms state-of-the-art FL methods in terms of model test accuracy, communication efficiency, and client sampling efficiency. Through comprehensive ablation analysis, we discover that the downstream model generated by synthetic data plays a crucial role in controlling the direction of gradient diversity during FL training, which enhances convergence speed and contributes to the notable accuracy boost observed with GPT-FL. Also, regardless of whether the target data falls within or outside the domain of the pre-trained generative model, GPT-FL consistently achieves significant performance gains, surpassing the results obtained by models trained solely with FL or synthetic data. Federated learning (FL) is a privacy-preserving machine learning paradigm that allows a collection of clients to collaboratively train a machine learning model without sharing their private data Zhang et al. (2021). Most existing FL studies such as McMahan et al. (2016); Bonawitz et al. (2019) follow the standard FL architecture, where each participating client trains a local model using its own private data and a central server aggregates these locally trained models to update a global model and send it back to the clients for the next round of training. However, although many efforts have been made Sahu et al. (2018); Karimireddy et al. (2019); Reddi et al. (2020), the performance of standard FL is still constrained by client drift caused by the heterogeneity in private data distribution across the clients. To enhance the performance of FL, recent studies propose to incorporate data collected from public spaces such as the internet into the FL process Lin et al. (2020); Li et al. (2021); Itahara et al. (2020); Cho et al. (2022). However, the performance of such public data-based approaches is heavily dependent on the quality of the collected public data. Unfortunately, obtaining the desired public data can be extremely challenging in practice and there is a lack of principled guidance on how to obtain them. To address the issues of public data-based approaches, FL methods based on synthetic data emerge Zhang et al. (2022); Zhu et al. (2021); Pi et al. (2022).

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Sep-29-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States > California (0.14)

Genre:
- Research Report > New Finding (0.93)

Industry:
- Information Technology > Security & Privacy (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks (0.46)
  - Natural Language > Large Language Model (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found