Considerminimizinganempiricalloss min

Neural Information Processing Systems 

Many learning tasks, such as regression and classification, are usually framed that way [1]. When N 1, computing the gradient of the objective in(1) becomes a bottleneck, even if individual gradients θL(zi,θ) are cheap to evaluate. For a fixed computational budget, itisthustempting toreplace vanilla gradient descent bymore iterations but using anapproximate gradient, obtained using only afewdata points. Stochastic gradient descent (SGD; [2]) follows this template.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found