Amplification by Shuffling: From Local to Central Differential Privacy via Anonymity Machine Learning

Sensitive statistics are often collected across sets of users, with repeated collection of reports done over time. For example, trends in users' private preferences or software usage may be monitored via such reports. We study the collection of such statistics in the local differential privacy (LDP) model, and describe an algorithm whose privacy cost is polylogarithmic in the number of changes to a user's value. More fundamentally---by building on anonymity of the users' reports---we also demonstrate how the privacy cost of our LDP algorithm can actually be much lower when viewed in the central model of differential privacy. We show, via a new and general privacy amplification technique, that any permutation-invariant algorithm satisfying $\varepsilon$-local differential privacy will satisfy $(O(\varepsilon \sqrt{\log(1/\delta)/n}), \delta)$-central differential privacy. By this, we explain how the high noise and $\sqrt{n}$ overhead of LDP protocols is a consequence of them being significantly more private in the central model. As a practical corollary, our results imply that several LDP-based industrial deployments may have much lower privacy cost than their advertised $\varepsilon$ would indicate---at least if reports are anonymized.

The Large Margin Mechanism for Differentially Private Maximization

Neural Information Processing Systems

A basic problem in the design of privacy-preserving algorithms is the private maximization problem:the goal is to pick an item from a universe that (approximately) maximizes a data-dependent function, all under the constraint of differential privacy. Thisproblem has been used as a subroutine in many privacy-preserving algorithms for statistics and machine learning. Previous algorithms for this problem are either range-dependent--i.e., their utility diminishes with the size of the universe--or only apply to very restricted function classes. This work provides the first general purpose, range-independent algorithm forprivate maximization that guarantees approximate differential privacy. Its applicability is demonstrated on two fundamental tasks in data mining and machine learning.

Differentially Private Federated Learning: A Client Level Perspective Machine Learning

Federated learning is a recent advance in privacy protection. In this context, a trusted curator aggregates parameters optimized in decentralized fashion by multiple clients. The resulting model is then distributed back to all clients, ultimately converging to a joint representative model without explicitly having to share the data. However, the protocol is vulnerable to differential attacks, which could originate from any party contributing during federated optimization. In such an attack, a client's contribution during training and information about their data set is revealed through analyzing the distributed model. We tackle this problem and propose an algorithm for client sided differential privacy preserving federated optimization. The aim is to hide clients' contributions during training, balancing the trade-off between privacy loss and model performance. Empirical studies suggest that given a sufficiently large number of participating clients, our proposed procedure can maintain client-level differential privacy at only a minor cost in model performance.

Private Smarts: Can Digital Assistants Work without Prying into Our Lives?


Whether it is used to out-bluff world poker champions or schedule hairdresser appointments in a (mostly) convincing human voice, AI and its underlying machine-learning algorithms keep making big strides in their capabilities--and into ever-more intimate spaces of our lives. And, like any technological feat predicated on the collection and analysis of massive data sets, some of these breakthroughs come with significant privacy risks. New data-collecting techniques, however, could enable researchers to better preserve users' privacy yet still glean valuable insights from their personal information. Take digital assistants, where the fruits of AI innovation are increasingly manifested. Today, Amazon's Alexa and Google Assistant distinguish between the voices of different people in your home, and can use these voice signatures to deliver personalized traffic reports and schedule appointments in the relevant speaker's calendar.

Private Center Points and Learning of Halfspaces Artificial Intelligence

We present a private learner for halfspaces over an arbitrary finite domain $X\subset \mathbb{R}^d$ with sample complexity $mathrm{poly}(d,2^{\log^*|X|})$. The building block for this learner is a differentially private algorithm for locating an approximate center point of $m>\mathrm{poly}(d,2^{\log^*|X|})$ points -- a high dimensional generalization of the median function. Our construction establishes a relationship between these two problems that is reminiscent of the relation between the median and learning one-dimensional thresholds [Bun et al.\ FOCS '15]. This relationship suggests that the problem of privately locating a center point may have further applications in the design of differentially private algorithms. We also provide a lower bound on the sample complexity for privately finding a point in the convex hull. For approximate differential privacy, we show a lower bound of $m=\Omega(d+\log^*|X|)$, whereas for pure differential privacy $m=\Omega(d\log|X|)$.