On the Dichotomy Between Privacy and Traceability in $\ell_p$ Stochastic Convex Optimization
Voitovych, Sasha, Haghifam, Mahdi, Attias, Idan, Dziugaite, Gintare Karolina, Livni, Roi, Roy, Daniel M.
–arXiv.org Artificial Intelligence
In this paper, we investigate the necessity of memorization in stochastic convex optimization (SCO) under $\ell_p$ geometries. Informally, we say a learning algorithm memorizes $m$ samples (or is $m$-traceable) if, by analyzing its output, it is possible to identify at least $m$ of its training samples. Our main results uncover a fundamental tradeoff between traceability and excess risk in SCO. For every $p\in [1,\infty)$, we establish the existence of a risk threshold below which any sample-efficient learner must memorize a \em{constant fraction} of its sample. For $p\in [1,2]$, this threshold coincides with best risk of differentially private (DP) algorithms, i.e., above this threshold, there are algorithms that do not memorize even a single sample. This establishes a sharp dichotomy between privacy and traceability for $p \in [1,2]$. For $p \in (2,\infty)$, this threshold instead gives novel lower bounds for DP learning, partially closing an open problem in this setup. En route of proving these results, we introduce a complexity notion we term \em{trace value} of a problem, which unifies privacy lower bounds and traceability results, and prove a sparse variant of the fingerprinting lemma.
arXiv.org Artificial Intelligence
Feb-24-2025
- Country:
- North America
- Canada > Ontario
- Toronto (0.14)
- United States > Illinois (0.14)
- Canada > Ontario
- North America
- Genre:
- Research Report (0.82)
- Industry:
- Information Technology > Security & Privacy (0.46)
- Technology: