Navigating Extremes: Dynamic Sparsity in Large Output Spaces
–Neural Information Processing Systems
In recent years, Dynamic Sparse Training (DST) has emerged as an alternative to post-training pruning for generating efficient models. In principle, DST allows for a more memory efficient training process, as it maintains sparsity throughout the entire training run. However, current DST implementations fail to capitalize on this in practice. Because sparse matrix multiplication is much less efficient than dense matrix multiplication on GPUs, most implementations simulate sparsity by masking weights.
Neural Information Processing Systems
Feb-18-2026, 07:00:54 GMT
- Country:
- Asia > China
- Tianjin Province > Tianjin (0.04)
- Europe
- North America
- Canada > Alberta
- United States (0.67)
- Asia > China
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (1.00)
- Research Report
- Industry:
- Technology: