Navigating Extremes: Dynamic Sparsity in Large Output Spaces 1 Mike Lasby

Mar-27-2025, 09:40:25 GMT–Neural Information Processing Systems

In recent years, Dynamic Sparse Training (DST) has emerged as an alternative to posttraining pruning for generating efficient models. In principle, DST allows for a more memory efficient training process, as it maintains sparsity throughout the entire training run. However, current DST implementations fail to capitalize on this in practice. Because sparse matrix multiplication is much less efficient than dense matrix multiplication on GPUs, most implementations simulate sparsity by masking weights. In this paper, we leverage recent advances in semi-structured sparse training to apply DST in the domain of classification with large output spaces, where memory-efficiency is paramount.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Mar-27-2025, 09:40:25 GMT

Conferences PDF

Add feedback

Country:
- North America
  - Canada > Alberta
    - Census Division No. 6 > Calgary Metropolitan Region > Calgary (0.14)
  - United States (0.93)

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (1.00)

Industry:
- Government > Regional Government
  - North America Government > United States Government (0.46)
- Information Technology (0.93)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning > Neural Networks
      - Deep Learning (1.00)
    - Natural Language (1.00)
  - Communications (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found