Get More at Once: Alternating Sparse Training with Gradient Correction

Jan-18-2025, 21:01:37 GMT–Neural Information Processing Systems

Recently, a new trend of exploring training sparsity has emerged, which remove parameters during training, leading to both training and inference efficiency improvement. This line of works primarily aims to obtain a single sparse model under a pre-defined large sparsity ratio. It leads to a static/fixed sparse inference model that is not capable of adjusting or re-configuring its computation complexity (i.e., inference structure, latency) after training for real-world varying and dynamic hardware resource availability. To enable such run-time or post-training network morphing, the concept of dynamic inference' ortraining-once-for-all' has been proposed to train a single network consisting of multiple sub-nets once, but each sub-net could perform the same inference function with different computing complexity. However, the traditional dynamic inference training method requires a joint training scheme with multi-objective optimization, which suffers from very large training overhead. In this work, for the first time, we propose a novel alternating sparse training (AST) scheme to train multiple sparse sub-nets for dynamic inference without extra training cost compared to the case of training a single sparse model from scratch.

alternating sparse training, gradient correction, multiple sparse sub-net, (8 more...)

Neural Information Processing Systems

Jan-18-2025, 21:01:37 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence (0.40)