Sparse maximal update parameterization: A holistic approach to sparse training dynamics

May-26-2025, 21:55:58 GMT–Neural Information Processing Systems

Several challenges make it difficult for sparse neural networks to compete with dense models. Second, sparse studies often need to test multiple sparsity levels, while also introducing new hyperparameters (HPs), leading to prohibitive tuning costs. Indeed, the standard practice is to re-use the learning HPs originally crafted for dense models. Unfortunately, we show sparse anddense networks do not share the same optimal HPs. Without stable dynamics and effective training recipes, it is costly to test sparsity at scale, which is key to surpassing dense networks and making the business case for sparsity acceleration in hardware.A holistic approach is needed to tackle these challenges and we propose S \textmu Par as one such approach.

artificial intelligence, holistic approach, machine learning, (4 more...)

Neural Information Processing Systems

May-26-2025, 21:55:58 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.41)