d33174c464c877fb03e77efdab4ae804-AuthorFeedback.pdf

Neural Information Processing Systems 

Our work "establishes interpretations of SGD and Adam-family optimizers from a Bayesian filtering perspective" (R3). It is "the first to demonstrate how viewing optimization as Bayesian inference requires modeling temporal dynamics" Adam W" (R4), and therefore explains the excellent performance of these SOT A methods. In the ideal case you shouldn't use a factorised model, and 77-81 aren't trying to motivate a factorised model. Also, see "Conclusions" above for non-factorised future Khan et al. 2018), but we agree that its improvement is an important avenue for future research. Minor 1. Agreed, but a few people get very confused on this point.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found