Navigating Extremes: Dynamic Sparsity in Large Output Spaces
–Neural Information Processing Systems
In recent years, Dynamic Sparse Training (DST) has emerged as an alternative to post-training pruning for generating efficient models. In principle, DST allows for a more memory efficient training process, as it maintains sparsity throughout the entire training run. However, current DST implementations fail to capitalize on this in practice. Because sparse matrix multiplication is much less efficient than dense matrix multiplication on GPUs, most implementations simulate sparsity by masking weights.
Neural Information Processing Systems
Oct-10-2025, 17:44:56 GMT
- Country:
- North America
- United States (0.67)
- Canada > Alberta
- Europe
- Asia > China
- Tianjin Province > Tianjin (0.04)
- North America
- Genre:
- Research Report
- New Finding (1.00)
- Experimental Study (1.00)
- Research Report
- Industry:
- Technology: