A Further Results on the Existence of Matching in BERT
–Neural Information Processing Systems
In Table 2 in Section 3, we show the highest sparsities for which IMP subnetwork performance is within one standard deviation of the unpruned BERT model on each task. As broader context for the relationship between sparsity and accuracy, Figure 11 shows the performance of IMP subnetworks across all sparsities on each task. BERT model is within one standard deviation the subnetwork's performance. In Table 6, we report both common evaluation metrics for MNLI, QQP, STS-B, and MRPC datasets. Besides STS-B (50% Pearson vs. 40% Spearman), winning ticket sparsities are the same on these In Figure 9, we study IMP on networks trained with a multi-task objective.
Neural Information Processing Systems
Aug-16-2025, 00:03:34 GMT
- Technology: