On the Training Instability of Shuffling SGD with Batch Normalization

Open in new window