Reviews: Statistical Optimality of Stochastic Gradient Descent on Hard Learning Problems through Multiple Passes

Oct-7-2024, 05:11:12 GMT–Neural Information Processing Systems

This paper identifies and separates (kernel) linear least-squares regression problems wherein carrying out multiple passes of stochastic gradient descent (SGD) over a training set can yield better statistical error than only a single pass. This is relevant to the core of machine learning theory, and relates to a line of work published at NIPS, ICML, COLT, and similar conferences in the past several years about the statistical error of one-pass, many-pass, and ERM-based learning. The authors focus on regression problems captured, by assumption, by two parameters: alpha, which governs the exponent of a power-law eigenvalue decay, and r, which governs a transformation under which the Hilbert norm of the optimal predictor is bounded. They refer to problems where r (alpha - 1) / (2 * alpha) as "hard". The main result of the paper is to show that for these "hard" problems, multiple SGD passes either achieve (minimax) optimal rates of statistical estimation, or at least improve the rate relative to a single pass. The results are interesting and might address an unanswered core question in machine learning, and the mathematical presentation is clear, with assumptions upfront.

hard learning problem, statistical optimality, stochastic gradient descent, (11 more...)

Neural Information Processing Systems

Oct-7-2024, 05:11:12 GMT

Conferences Web Page

Add feedback

Industry:
- Education > Focused Education > Special Education (0.47)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)