Closing the gap between the upper bound and lower bound of Adam's iteration complexity

Neural Information Processing Systems 

Recently, Arjevani et al. [1] establish a lower bound of iteration complexity for the first-order optimization under an L -smooth condition and a bounded noise variance assumption. However, a thorough review of existing literature on Adam's convergence reveals a noticeable gap: none of them meet the above lower bound. In this paper, we close the gap by deriving a new convergence guarantee of Adam, with only an L -smooth condition and a bounded noise variance assumption. Our results remain valid across a broad spectrum of hyperparameters. Especially with properly chosen hyperparameters, we derive an upper bound of the iteration complexity of Adam and show that it meets the lower bound for first-order optimizers.