Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks

Open in new window