.1 2 . Details on the vanishing gradient problem in flat histogram The original step function in formula

Neural Information Processing Systems 

In particular, it will lead to large bouncy27 jumps around optima (a large negative learning rate, i.e.,logθ(2) logθ(1) u 0 in formula (8) will be caused there).28 All algorithms were run107 iterations.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found