On the Dichotomy Between Privacy and Traceability in $\ell_p$ Stochastic Convex Optimization

Voitovych, Sasha, Haghifam, Mahdi, Attias, Idan, Dziugaite, Gintare Karolina, Livni, Roi, Roy, Daniel M.

Feb-24-2025–arXiv.org Artificial Intelligence

In this paper, we investigate the necessity of memorization in stochastic convex optimization (SCO) under $\ell_p$ geometries. Informally, we say a learning algorithm memorizes $m$ samples (or is $m$-traceable) if, by analyzing its output, it is possible to identify at least $m$ of its training samples. Our main results uncover a fundamental tradeoff between traceability and excess risk in SCO. For every $p\in [1,\infty)$, we establish the existence of a risk threshold below which any sample-efficient learner must memorize a \em{constant fraction} of its sample. For $p\in [1,2]$, this threshold coincides with best risk of differentially private (DP) algorithms, i.e., above this threshold, there are algorithms that do not memorize even a single sample. This establishes a sharp dichotomy between privacy and traceability for $p \in [1,2]$. For $p \in (2,\infty)$, this threshold instead gives novel lower bounds for DP learning, partially closing an open problem in this setup. En route of proving these results, we introduce a complexity notion we term \em{trace value} of a problem, which unifies privacy lower bounds and traceability results, and prove a sparse variant of the fingerprinting lemma.

artificial intelligence, machine learning, trace value, (17 more...)

arXiv.org Artificial Intelligence

Feb-24-2025

arXiv.org PDF

Add feedback

Country:
- North America
  - Canada > Ontario
    - Toronto (0.14)
  - United States > Illinois (0.14)

Genre:
- Research Report (0.82)

Industry:
- Information Technology > Security & Privacy (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Neural Networks (0.46)
    - Statistical Learning (0.46)
  - Representation & Reasoning (0.93)