AugFL: Augmenting Federated Learning with Pretrained Models

Yue, Sheng, Qin, Zerui, Deng, Yongheng, Ren, Ju, Zhang, Yaoxue, Zhang, Junshan

arXiv.org Artificial Intelligence 

--Federated Learning (FL) has garnered widespread interest in recent years. However, owing to strict privacy policies or limited storage capacities of training participants such as IoT devices, its effective deployment is often impeded by the scarcity of training data in practical decentralized learning environments. In this paper, we study enhancing FL with the aid of (large) pre-trained models (PMs), that encapsulate wealthy general/domain-agnostic knowledge, to alleviate the data requirement in conducting FL from scratch. Specifically, we consider a networked FL system formed by a central server and distributed clients. First, we formulate the PM-aided personalized FL as a regularization-based federated meta-learning problem, where clients join forces to learn a meta-model with knowledge transferred from a private PM stored at the server . FL, to optimize the problem with no need to expose the PM or incur additional computational costs to local clients. EDERA TED Learning (FL) [2]-[4] has gained prominence as a distributed learning paradigm allowing a large number of decentralized users to collaboratively train models without sharing their local data, which has garnered significant attention from both academia and industry [5]-[11]. Despite its rapid advancements, the effective deployment of FL has been hampered by a significant hurdle: in practice, the participants, such as IoT devices [12], can provide only scarce training data due to limited storage capacities or strict privacy policies [13], [14]. Therefore, it is often unsatisfactory to carry out FL from scratch in many modern data-hungry applications like natural language processing [15] and robotic control [16]. This work was accepted in part by the 22nd International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing (MobiHoc) [1]. Sheng Y ue is with the School of Cyber Science and Technology, Sun Y at-sen University. Zerui Qin, Y ongheng Deng, Ju Ren (corresponding author), and Y aoxue Zhang are with the Department of Computer Science and Technology, Tsinghua University. Junshan Zhang is with the Department of Electrical and Computer Engineering, University of California, Davis.