Sparse maximal update parameterization: A holistic approach to sparse training dynamics
–Neural Information Processing Systems
Several challenges make it difficult for sparse neural networks to compete with dense models. Second, sparse studies often need to test multiple sparsity levels, while also introducing new hyperparameters (HPs), leading to prohibitive tuning costs. Indeed, the standard practice is to re-use the learning HPs originally crafted for dense models. Unfortunately, we show sparse anddense networks do not share the same optimal HPs. Without stable dynamics and effective training recipes, it is costly to test sparsity at scale, which is key to surpassing dense networks and making the business case for sparsity acceleration in hardware.A holistic approach is needed to tackle these challenges and we propose S \textmu Par as one such approach.
Neural Information Processing Systems
May-26-2025, 21:55:58 GMT
- Technology: