Optimization
A Win-win Deal: Towards Sparse and Robust Pre-trained Language Models
As we described in Section 3.2.2 of the main paper, we realize mask training via binarization in In practice, we control the sparsity in a local way, i.e., all the weight matrices We have introduced the PoE method in Section 3.3. Work was done when Y uanxin Liu was a graduate student of IIE, CAS. We utilize eight datasets from three NLU tasks. Tab. 2 shows the distribution of examples over classes. We use two types of GPU, i.e., Nvidia V100 and TIT AN RTX.
Advancing Model Pruning via Bi-level Optimization
As illustrated by the Lottery Ticket Hypothesis (L TH), pruning also has the potential of improving their generalization ability. At the core of L TH, iterative magnitude pruning (IMP) is the predominant pruning method to successfully find'winning tickets'. Y et, the computation cost of IMP grows prohibitively as the targeted pruning ratio increases. To reduce the computation overhead, various efficient'one-shot' pruning methods have been developed but these schemes are usually