Stochasticity of Deterministic Gradient Descent: Large Learning Rate for Multiscale Objective Function

Neural Information Processing Systems 

Consequently, the iteration of SGD, unlike GD, is not deterministic even when it is started at a fixed initial condition.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found