TheRoadLessScheduled

Neural Information Processing Systems 

So from this viewpoint, the Schedule-Free updates can be seen as a version of momentum that has the same immediate effect, but with a greater delay foradding intheremainder ofthegradient.