Benchmarking Semi-supervised Federated Learning
Zhang, Zhengming, Yao, Zhewei, Yang, Yaoqing, Yan, Yujun, Gonzalez, Joseph E., Mahoney, Michael W.
Current state-of-the-art machine learning models can potentially benefit from the large amount of user data privately-held on mobile devices, as well as the computing power locally-available on these devices. In response to this, federated learning (FL), which only requires transmitting the trained (intermediate) models, has been proposed as a privacy-preserving solution to exploit the data and computing power on mobile devices [1, 2]. In a typical FL pipeline, a server maintains a model and shares it with users/devices. Each user/device updates the global shared model for multiple steps locally using only locally-held data, and then it uploads the updated model back to the server. After aggregating all the models from users, the server takes an averaging step over all the models (e.g., FedAvg [2]), and it then sends the averaged model back to users [1, 3]. This approach respects privacy in the (weak) sense that the server does not access the private user data at any point in the procedure. However, prior work in FL has made the unrealistic assumption that the data stored on the local device are fully annotated with ground-truth labels and that the server does not have access to any labeled data. In fact, the private data at the local device are more often unlabeled, since annotating data requires both time and domain knowledge [4, 5], and servers are often hosted by organizations that do have labeled data.
Aug-25-2020
- Country:
- North America > United States
- California (0.14)
- Michigan (0.14)
- North America > United States
- Genre:
- Research Report (0.64)
- Industry:
- Information Technology > Security & Privacy (1.00)
- Technology: