A unified theory of adaptive stochastic gradient descent as Bayesian filtering