Dynamic Sparsity Is Channel-Level Sparsity Learner

Dec-26-2025, 21:53:30 GMT–Neural Information Processing Systems

Sparse training has received an upsurging interest in machine learning due to its tantalizing saving potential for both the entire training process as well as the inference. Dynamic sparse training (DST) as a leading approach can train deep neural networks at high sparsity from scratch to match the performance of their dense counterparts. However, most if not all DST prior arts demonstrate their effectiveness on unstructured sparsity with highly irregular sparse patterns, which receives limited support in common hardware. This limitation hinders the usage of DST in practice. In this paper, we propose Channel-aware dynamic sparse (Chase), that for the first time seamlessly translates the promise of unstructured dynamic sparsity to GPU-friendly channel-level sparsity (not fine-grained N:M or group sparsity) during one end-to-end training process, without any ad-hoc operations. The resulting small sparse networks can be directly accelerated by commodity hardware, without using any particularly sparsity-aware hardware accelerators. This appealing outcome is partially motivated by a hidden phenomenon of dynamic sparsity: off-the-shelf unstructured DST implicitly involves biased parameter reallocation across channels, with a large fraction of channels (up to 60%) being sparser than others.

channel-level sparsity learner, dynamic sparsity, sparsity, (4 more...)

Neural Information Processing Systems

Dec-26-2025, 21:53:30 GMT

Conferences Web Page

Add feedback

Country:
- Asia > China > Tianjin Province > Tianjin (0.07)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.59)