Differentially Private Range Queries with Correlated Input Perturbation

Dharangutte, Prathamesh, Gao, Jie, Gong, Ruobin, Wang, Guanyang

arXiv.org Artificial Intelligence 

We construct a class of locally differentially private mechanisms for linear queries, including range queries, representable as a multiplicative operation of a pre-specified workload matrix and a confidential database. The proposed design leverages correlated input perturbation to simultaneously satisfy the following crucial properties: Unbiasedness: The sanitized output exhibits no bias with respect to the ground truth; Consistency (internal): The sanitized output may plausibly be viewed as having been queried directly from an input database without modification; Statistical transparency: The probabilistic description of the sanitized output is analytically tractable to enable reliable downstream statistical inferences; Utility control: The mechanism accommodates custom, externally specified utility requirements, expressed in terms of accuracy targets in certain query margins or as implied by the hierarchical database structure; Efficient implementation: The proposed algorithm is exact and simple to implement, with no need for approximate simulation (including Markov chain Monte Carlo) nor optimization-based post-processing. The curation of official statistics vividly illustrates the need and the challenge to simultaneously satisfy the above desiderata. As an example, the 2020 U.S. Decennial Census provide multi-resolutional tabular data products that follow a hierarchical system termed the "spine" Abowd et al. (2022), which orders from top to bottom geographic entities (states, counties, tracts, block groups, and blocks), with higher-level geographies partitioned by the lower-level ones. Population tabulations across the geographic resolutions are subject to numerous complex utility requirements. For example, state-level populations must be exactly reported per their constitutional purpose for reapportionment - an "invariant" requirement akin to external consistency; see Gao et al. (2022); Dharangutte et al. (2023). Tabulations at intermediate geographies must meet accuracy targets according to the relevant operational standards U.S. Census Bureau (2022), as does certain "off-spine" geographies (e.g.