Complexity of Highly Parallel Non-Smooth Convex Optimization

Oct-9-2024, 21:00:35 GMT–Neural Information Processing Systems

A landmark result of non-smooth convex optimization is that gradient descent is an optimal algorithm whenever the number of computed gradients is smaller than the dimension d . In this paper we study the extension of this result to the parallel optimization setting. Namely we consider optimization algorithms interacting with a highly parallel gradient oracle, that is one that can answer \mathrm{poly}(d) gradient queries in parallel. We show that in this case gradient descent is optimal only up to \tilde{O}(\sqrt{d}) rounds of interactions with the oracle. The lower bound improves upon a decades old construction by Nemirovski which proves optimality only up to d {1/3} rounds (as recently observed by Balkanski and Singer), and the suboptimality of gradient descent after \sqrt{d} rounds was already observed by Duchi, Bartlett and Wainwright.

complexity, gradient descent, parallel non-smooth convex optimization, (4 more...)

Neural Information Processing Systems

Oct-9-2024, 21:00:35 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.93)