Uniformity Testing in the Shuffle Model: Simpler, Better, Faster

Canonne, Clément L., Lyu, Hongyi

arXiv.org Machine Learning 

Learning from, or, more generally, performing statistical inference on sensitive or private data has become an increasingly important topic, where one must balance the desire to achieve good accuracy with the requirement to preserve privacy of the users' data. Among the many tasks concerned, hypothesis testing and, more specifically, goodness-of-fit testing is of particular importance, given its ubiquitous role in data analysis, the natural sciences, and more broadly as a workhorse of statistics and machine learning. In this paper, we consider the specific case of uniformity testing, the prototypical example of goodness-of-fit testing of discrete distributions, where one seeks to decide whether the data is drawn uniformly from a known finite domain. Investigating the trade-off between accuracy (or, equivalently, data requirements) and privacy for this task has received considerable attention over the past years in a variety of privacy models, including the central and local models of differential privacy, the so-called pan-privacy, and the recently proposed model of shuffle privacy. Unfortunately, while this trade-off is now well understood in most of the aforementioned privacy settings, some of the proposed algorithms remain relatively complex and far from practical, and their analysis quite involved. With this in mind, we focus in this paper on private uniformity testing in the shuffle model, both simplifying the analysis of the existing algorithms for this task and obtaining a new, arguably simpler one with the same guarantees.