A view of mini-batch SGD via generating functions: conditions of convergence, phase transitions, benefit from negative momenta

Open in new window