A Win-win Deal: Towards Sparse and Robust Pre-trained Language Models

Aug-16-2025, 04:42:05 GMT–Neural Information Processing Systems

As we described in Section 3.2.2 of the main paper, we realize mask training via binarization in In practice, we control the sparsity in a local way, i.e., all the weight matrices We have introduced the PoE method in Section 3.3. Work was done when Y uanxin Liu was a graduate student of IIE, CAS. We utilize eight datasets from three NLU tasks. Tab. 2 shows the distribution of examples over classes. We use two types of GPU, i.e., Nvidia V100 and TIT AN RTX.

machine learning, mask training, natural language, (19 more...)

Neural Information Processing Systems

Aug-16-2025, 04:42:05 GMT

Conferences PDF

Add feedback

Country:
- Asia > China (0.04)

Genre:
- Research Report (0.47)

Industry:
- Information Technology (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Machine Learning > Neural Networks (0.47)
  - Representation & Reasoning > Optimization (0.46)

Duplicate Docs Excel Report

Title
BMoreExperimentalSetups

Similar Docs Excel Report more

Title	Similarity	Source
None found