Goto

Collaborating Authors

 factorised model



d33174c464c877fb03e77efdab4ae804-AuthorFeedback.pdf

Neural Information Processing Systems

Our work "establishes interpretations of SGD and Adam-family optimizers from a Bayesian filtering perspective" (R3). It is "the first to demonstrate how viewing optimization as Bayesian inference requires modeling temporal dynamics" Adam W" (R4), and therefore explains the excellent performance of these SOT A methods. In the ideal case you shouldn't use a factorised model, and 77-81 aren't trying to motivate a factorised model. Also, see "Conclusions" above for non-factorised future Khan et al. 2018), but we agree that its improvement is an important avenue for future research. Minor 1. Agreed, but a few people get very confused on this point.