L2M: Practical posterior Laplace approximation with optimization-driven second moment estimation

Perone, Christian S., Silveira, Roberto Pereira, Paula, Thomas

arXiv.org Machine Learning 

However, Our contributions in this work are: instead of computing the curvature matrix, we show that, under some regularity conditions, - We show that under some regularity conditions, a diagonal the Laplace approximation can be easily constructed Laplace approximation can be constructed without using the gradient second moment. This computing anything besides what is already being quantity is already estimated by many exponential computed by widely used optimizers; moving average variants of Adagrad such as Adam and RMSprop, but is traditionally discarded - We qualitatively compare this approximation with after training. We show that our method (L2M) methods such as deep ensembles (Lakshminarayanan does not require changes in models or optimization, et al., 2017), MC Dropout (Gal & Ghahramani, 2016), can be implemented in a few lines of code Hamiltonian Monte Carlo (HMC) (Cobb & Jalaian, to yield reasonable results, and it does not require 2020), among others; any extra computational steps besides what is already - We also show that our approximation is orthogonal being computed by optimizers, without introducing to methods such as ensembling (Lakshminarayanan any new hyperparameter. We hope our et al., 2017) and does not require changing training method can open new research directions on using procedures, estimating new quantities, or adding new quantities already computed by optimizers for hyperparameters.