Variance Reduction Methods Do Not Need to Compute Full Gradients: Improved Efficiency through Shuffling

Open in new window