Reviews: Splitting Steepest Descent for Growing Neural Architectures

Neural Information Processing Systems 

The paper proposes an interesting technique to train neural networks while learning as well part of the architecture by splitting neurons in a principled way. More precisely, the algorithm is defining a notion of steepest descent in the space of distributions over neuron weights, which is a steepest descent in the space of these distributions equipped with the L-infty Wasserstein distance. The corresponding steepest descent algorithm retrieves the usual steepest descent direction as long as a "usual" descent direction exists, but if a local minimum is reached and some further progress could be made locally by duplicating (or make a number of other copies of) neurons and decoupling them, then the algorithm finds a locally optimal split. The paper is well written, novel, with clear simple theory. The idea is simple and elegant.