Modular Networks: Learning to Decompose Neural Computation

Kirsch, Louis, Kunze, Julius, Barber, David

Neural Information Processing Systems 

Scaling model capacity has been vital in the success of deep learning. For a typical network, necessary compute resources and training time grow dramatically with model size. Conditional computation is a promising way to increase the number of parameters with a relatively small increase in resources. We propose a training algorithm that flexibly chooses neural modules based on the data to be processed. Both the decomposition and modules are learned end-to-end.