Federated Learning via Synthetic Data
Federated Learning (FL) helps protect user privacy by transmitting model updates instead of private user data. However these updates could potentially be much larger than the private data they are replacing, and depending on the number of users each user may need to transmit updates multiple times during the training of a single model. This puts an increased communication cost on the user, and reducing that burden is an important research direction in federated learning (Kairouz et al., 2019; Li et al., 2020; Liu et al., 2020). We propose a training process which reduces the upload communication costs incurred by the user. This method was motivated by Wang et al. (2018), which showed that training on large datasets can be fairly well approximated by specifically built small synthetic datasets (in that training on the small synthetic datasets can produce networks which are almost as good as ones trained on large datasets, as long as that training data is available when producing the synthetic data). We will build on this method to present a procedure which can reduce the upload communication costs by one or two orders of magnitude, while still producing good server models. We will start by combining these ideas with ideas from data poisoning attacks to introduce the procedure at a high level. We will then discuss a few technical changes which make this different from either of those techniques, and which improve the performance of the procedure, including an extension of the procedure to reduce download communication costs as well as upload costs. We conclude with experiments and discuss some possible next steps in developing the procedure.
Sep-26-2020