
Equal Contribution 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montrรฉal, Canada. is often attributed to the rapid decay in the learning rate when gradients are dense, which is often the case in many machine learning applications.