Stochasticity of Deterministic Gradient Descent: Large Learning Rate for Multiscale Objective Function
–Neural Information Processing Systems
Consequently, the iteration of SGD, unlike GD, is not deterministic even when it is started at a fixed initial condition.
Neural Information Processing Systems
Oct-2-2025, 08:27:06 GMT