A unified theory of adaptive stochastic gradient descent as Bayesian filtering

Open in new window