Reviews: Without-Replacement Sampling for Stochastic Gradient Methods
–Neural Information Processing Systems
The paper studies the problem of minimizing the average of a finite sum of convex functions over a convex domain using stochastic algorithms that, opposed to most popular methods, apply WITHOUT-replacement sampling to the data. More specifically, the paper considers methods that first randomly permute the functions, and then process the functions via incremental updates one at a time, making at most a single pass over the data (hence the data is only shuffled once). There are three main motivations for considering stochastic methods with without-replacement sampling: 1. it is observed many times empirically (and to the best of my knowledge this is what ML practitioners really do) that applying random shuffles to the data and then apply incremental updates works better than with-replacement SGD. 2. After the data is randomly permuted, these algorithms require only sequential access to the memory, which is much more efficient than standard with-replacement SGD methods that require random access to perform the updates. Since the main setting under consideration here is when making at most a single pass, it sounds plausible to assume that the data is already stored in some random permutation, and hence the algorithm is fully sequential, and there is no need to even artificially permute the data. In a distributed setting in which data is partitioned to several machine (not assuming the data is sampled i.i.d from a distribution), it is more natural and efficient to analyze without-replacement algorithms.
Neural Information Processing Systems
Jan-20-2025, 19:36:11 GMT