Plotting

 Huba, Dzmitry


Federated Analytics in Practice: Engineering for Privacy, Scalability and Practicality

arXiv.org Artificial Intelligence

Cross-device Federated Analytics (FA) is a distributed computation paradigm designed to answer analytics queries about and derive insights from data held locally on users' devices. On-device computations combined with other privacy and security measures ensure that only minimal data is transmitted off-device, achieving a high standard of data protection. Despite FA's broad relevance, the applicability of existing FA systems is limited by compromised accuracy; lack of flexibility for data analytics; and an inability to scale effectively. In this paper, we describe our approach to combine privacy, scalability, and practicality to build and deploy a system that overcomes these limitations. Our FA system leverages trusted execution environments (TEEs) and optimizes the use of on-device computing resources to facilitate federated data processing across large fleets of devices, while ensuring robust, defensible, and verifiable privacy safeguards. We focus on federated analytics (statistics and monitoring), in contrast to systems for federated learning (ML workloads), and we flag the key differences.


Confidential Federated Computations

arXiv.org Artificial Intelligence

Since its introduction in 2017 [48, 42], federated learning (FL) has seen adoption by technology platforms working with private on-device data (cross-device federated learning) or proprietary server-side data (crosssilo federated learning). FL's appeal has been driven by its straightforward privacy advantages: raw data stays in the control of participating entities, with only focused updates sent for immediate aggregation, visible to the service provider. Systems that realize federated learning [18, 35, 51] run at scale today, reducing privacy risks in sensitive applications like mobile keyboards [33, 63, 21, 53] and voice assistants [12, 34]. However, basic federated learning offers an incomplete privacy story [19]: updates sent to the service provider can reveal private data unless updates are aggregated obliviously, and aggregated updates can encode individual data unless trained with a differentially private (DP) learning algorithm [30]. A dishonest service provider might log or inspect unaggregated messages, from which a great deal of information about an individual participant can be learned [15, 57]. This risk has been addressed with oblivious aggregation schemes that guarantee the service provider cannot inspect unaggregated messages, including secure multiparty computation (SMPC) from cohorts of honest devices [17], non-colluding SMPC-based secure aggregators [58], or hardware trusted execution environments (TEEs) [35].


Towards Federated Learning at Scale: System Design

arXiv.org Machine Learning

Federated Learning is a distributed machine learning approach which enables model training on a large corpus of decentralized data. We have built a scalable production system for Federated Learning in the domain of mobile devices, based on TensorFlow. In this paper, we describe the resulting high-level design, sketch some of the challenges and their solutions, and touch upon the open problems and future directions.