Navigating Extremes: Dynamic Sparsity in Large Output Spaces

Oct-10-2025, 17:44:56 GMT–Neural Information Processing Systems

In recent years, Dynamic Sparse Training (DST) has emerged as an alternative to post-training pruning for generating efficient models. In principle, DST allows for a more memory efficient training process, as it maintains sparsity throughout the entire training run. However, current DST implementations fail to capitalize on this in practice. Because sparse matrix multiplication is much less efficient than dense matrix multiplication on GPUs, most implementations simulate sparsity by masking weights.

classification, dataset, international conference, (17 more...)

Neural Information Processing Systems

Oct-10-2025, 17:44:56 GMT

Conferences PDF

Add feedback

Country:
- North America
  - United States (0.67)
  - Canada > Alberta
    - Census Division No. 6 > Calgary Metropolitan Region > Calgary (0.14)
- Europe
  - United Kingdom > England
    - Somerset > Bath (0.04)
  - Finland > Uusimaa
    - Helsinki (0.04)
- Asia > China
  - Tianjin Province > Tianjin (0.04)

Genre:
- Research Report
  - New Finding (1.00)
  - Experimental Study (1.00)

Industry:
- Information Technology (0.93)
- Government > Regional Government
  - North America Government > United States Government (0.46)

Technology:
- Information Technology
  - Communications (0.94)
  - Artificial Intelligence
    - Natural Language (1.00)
    - Machine Learning > Neural Networks
      - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
Navigating Extremes: Dynamic Sparsity in Large Output Spaces

Similar Docs Excel Report more

Title	Similarity	Source
None found