Adapt or Forget: Provable Tradeoffs Between Adam and SGD in Nonstationary Optimization

Open in new window