A termination criterion for stochastic gradient descent for binary classification

Baghal, Sina, Paquette, Courtney, Vavasis, Stephen A.

arXiv.org Machine Learning 

Here the loss function l: R R R, the probability distribution P is unknown, and the data sample (ζ,y) R d R is a random vector distributed as P. The most prevalent algorithm employed for solving(1) is stochastic gradient descent (SGD). Whereas a significant amount of work has been devoted to the convergence analysis of SGD (see, e.g., Robbins and Monro (1951); Bottou et al. (2018); Bubeck (2015); Pflug (1986)), leading, in particular, to learning rate schedules, the question of how to terminate the algorithm when one is near an optimal classifier remains largely unaddressed. Yet, inexpensive stopping criteria are of utmost interest in machine learning. For instance, if one could produce a low cost test to determine near-optimality, then without sacrificing the quality of the solution or efficiency of the SGD algorithm, needless computational time would be eliminated. Secondly, early termination tests impose a degree of predictability on accuracy and running times-a useful quality when SGD occurs as a subproblem of a larger computation. Several works show that early termination of SGD can prevent overfitting, speed up learning procedures, and/or improve generalization properties (Prechelt, 2012; Hardt et al., 2016; Yao et al., 2007).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found