Private Adaptive Gradient Methods for Convex Optimization

Asi, Hilal, Duchi, John, Fallah, Alireza, Javidbakht, Omid, Talwar, Kunal

Jun-25-2021–arXiv.org Machine Learning

While the success of stochastic gradient methods for solving empirical risk minimization has motivated their adoption across much of machine learning, increasing privacy risks in data-intensive tasks have made applying them more challenging [DMNS06]: gradients can leak users' data, intermediate models can compromise individuals, and even final trained models may be non-private without substantial care. This motivates a growing line of work developing private variants of stochastic gradient descent (SGD), where algorithms guarantee differential privacy by perturbing individual gradients with random noise [DJW13; ST13b; ACGMMTZ16; DJW18; BFTT19; FKT20]. Yet these noise addition procedures typically fail to reflect the geometry underlying the optimization problem, which in non-private cases is essential: for high-dimensional problems with sparse parameters, mirror descent and its variants [BT03; NJLS09] are essential, while in the large-scale stochastic settings prevalent in deep learning, AdaGrad and other adaptive variants [DHS11] provide stronger theoretical and practical performance. Even more, methods that do not adapt (or do not leverage geometry) can be provably sub-optimal, in that there exist problems where their convergence is much slower than adaptive variants that reflect appropriate geometry [LD19]. To address these challenges, we introduce Pagan (Private AdaGrad with Adaptive Noise), a new differentially private variant of stochastic gradient descent and AdaGrad.

algorithm, deep learning, neural network, (17 more...)

arXiv.org Machine Learning

Jun-25-2021

arXiv.org PDF

Add feedback

Country:
- North America > United States > Massachusetts (0.14)

Genre:
- Research Report > New Finding (0.93)

Industry:
- Information Technology > Security & Privacy (0.87)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Statistical Learning
    - Gradient Descent (1.00)
  - Representation & Reasoning (1.00)