Goto

Collaborating Authors

 original step function


.1 2 . Details on the vanishing gradient problem in flat histogram The original step function in formula

Neural Information Processing Systems

In particular, it will lead to large bouncy27 jumps around optima (a large negative learning rate, i.e.,logθ(2) logθ(1) u 0 in formula (8) will be caused there).28 All algorithms were run107 iterations.