2ac79356a03fe5e9250e5e77ebc76e6e-Paper-Conference.pdf
–Neural Information Processing Systems
Our result shed light on the difference between Adam and (stochastic) gradient descent from a theoretical perspective.
Neural Information Processing Systems
Feb-10-2026, 02:39:46 GMT