Goto

Collaborating Authors

 Optimization




A Win-win Deal: Towards Sparse and Robust Pre-trained Language Models

Neural Information Processing Systems

As we described in Section 3.2.2 of the main paper, we realize mask training via binarization in In practice, we control the sparsity in a local way, i.e., all the weight matrices We have introduced the PoE method in Section 3.3. Work was done when Y uanxin Liu was a graduate student of IIE, CAS. We utilize eight datasets from three NLU tasks. Tab. 2 shows the distribution of examples over classes. We use two types of GPU, i.e., Nvidia V100 and TIT AN RTX.








Advancing Model Pruning via Bi-level Optimization

Neural Information Processing Systems

As illustrated by the Lottery Ticket Hypothesis (L TH), pruning also has the potential of improving their generalization ability. At the core of L TH, iterative magnitude pruning (IMP) is the predominant pruning method to successfully find'winning tickets'. Y et, the computation cost of IMP grows prohibitively as the targeted pruning ratio increases. To reduce the computation overhead, various efficient'one-shot' pruning methods have been developed but these schemes are usually