A our Work
–Neural Information Processing Systems
Reddi et al. (2018) observed that Adam does not collect long-term memory of past gradients and therefore the effective learning rate could be increasing in some cases. Zaheer et al. (2018) proposed a more controlled increase in the effective learning rate by Our work is primarily theoretical in nature, but we discuss its broader impacts here. We follow the same proof strategy as Li et al. (2019). To apply Theorem B.2, the major condition to verify is that To prove Theorem B.2, we need the following lemma from Li et al. (2019). Under Condition (3) of Theorem B.2, given the initial point We also need the following lemma adapted from Lemma C.2, Li et al. (2021).
Neural Information Processing Systems
Nov-13-2025, 22:26:41 GMT
- Technology: