Appendix to " Adam with Bandit Sampling for Deep Learning "

Neural Information Processing Systems 

According to Theorem 4. 1 in [1], the convergence rate of Adam is We prove Lemma 1 using the framework of online learning with bandit feedback. Let's consider a special case where It follows simply by plugging Lemma 3 into Theorem 2. In the main paper, we compared our method with Adam and Adam with importance sampling. In the main paper, we have shown the plots of loss value vs. wall clock time. Here, we include some plots of error rate vs. wall

Similar Docs  Excel Report  more

TitleSimilaritySource
None found