Flatter, faster: scaling momentum for optimal speedup of SGD

Open in new window