Powerpropagation: A sparsity inducing weight reparameterisation
–Neural Information Processing Systems
The training of sparse neural networks is becoming an increasingly important tool for reducing the computational footprint of models at training and evaluation, as well enabling the effective scaling up of models. Whereas much work over the years has been dedicated to specialised pruning techniques, little attention has been paid to the inherent effect of gradient based training on model sparsity. Inthis work, we introduce Powerpropagation, a new weight-parameterisation for neural networks that leads to inherently sparse models. Exploiting the behaviour of gradient descent, our method gives rise to weight updates exhibiting a "rich get richer" dynamic, leaving low-magnitude parameters largely unaffected by learning. Models trained in this manner exhibit similar performance, but have a distributionwith markedly higher density at zero, allowing more parameters to be pruned safely.
Neural Information Processing Systems
Jan-19-2025, 13:09:55 GMT
- Technology: