Never Go Full Batch (in Stochastic Convex Optimization)

Jan-19-2025, 07:16:07 GMT–Neural Information Processing Systems

We study the generalization performance of \text{\emph{full-batch}} optimization algorithms for stochastic convex optimization: these are first-order methods that only access the exact gradient of the empirical risk (rather than gradients with respect to individual data points), that include a wide range of algorithms such as gradient descent, mirror descent, and their regularized and/or accelerated variants. We provide a new separation result showing that, while algorithms such as stochastic gradient descent can generalize and optimize the population risk to within \epsilon after O(1/\epsilon 2) iterations, full-batch methods either need at least \Omega(1/\epsilon 4) iterations or exhibit a dimension-dependent sample complexity.

full batch, gradient descent, stochastic convex optimization

Neural Information Processing Systems

Jan-19-2025, 07:16:07 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.97)