Considerminimizinganempiricalloss min
–Neural Information Processing Systems
Many learning tasks, such as regression and classification, are usually framed that way [1]. When N 1, computing the gradient of the objective in(1) becomes a bottleneck, even if individual gradients θL(zi,θ) are cheap to evaluate. For a fixed computational budget, itisthustempting toreplace vanilla gradient descent bymore iterations but using anapproximate gradient, obtained using only afewdata points. Stochastic gradient descent (SGD; [2]) follows this template.
Neural Information Processing Systems
Feb-9-2026, 17:16:17 GMT
- Technology: