Navigating Extremes: Dynamic Sparsity in Large Output Spaces

May-27-2025, 18:11:44 GMT–Neural Information Processing Systems

In recent years, Dynamic Sparse Training (DST) has emerged as an alternative to post-training pruning for generating efficient models. In principle, DST allows for a much more memory efficient training process,as it maintains sparsity throughout the entire training run. However, current DST implementations fail to capitalize on this. Because sparse matrix multiplication is much less efficient than dense matrix multiplication on GPUs, mostimplementations simulate sparsity by masking weights. In this paper, we leverage recent advances in semi-structured sparse training to apply DST in the domain of classificationwith large output spaces, where memory-efficiency is paramount.

dynamic sparsity, navigating extreme, output space, (3 more...)

Neural Information Processing Systems

May-27-2025, 18:11:44 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.59)