Appendix to " Adam with Bandit Sampling for Deep Learning "
–Neural Information Processing Systems
According to Theorem 4. 1 in [1], the convergence rate of Adam is We prove Lemma 1 using the framework of online learning with bandit feedback. Let's consider a special case where It follows simply by plugging Lemma 3 into Theorem 2. In the main paper, we compared our method with Adam and Adam with importance sampling. In the main paper, we have shown the plots of loss value vs. wall clock time. Here, we include some plots of error rate vs. wall
Neural Information Processing Systems
Feb-8-2026, 03:14:54 GMT
- Country:
- North America
- Canada (0.04)
- United States > Michigan
- Washtenaw County > Ann Arbor (0.04)
- North America
- Technology: