Modular Networks: Learning to Decompose Neural Computation
Kirsch, Louis, Kunze, Julius, Barber, David
–Neural Information Processing Systems
Scaling model capacity has been vital in the success of deep learning. For a typical network, necessary compute resources and training time grow dramatically with model size. Conditional computation is a promising way to increase the number of parameters with a relatively small increase in resources. We propose a training algorithm that flexibly chooses neural modules based on the data to be processed. Both the decomposition and modules are learned end-to-end.
Neural Information Processing Systems
Feb-14-2020, 10:41:30 GMT
- Technology: