b6af2c9703f203a2794be03d443af2e3-Supplemental.pdf
–Neural Information Processing Systems
InTable2inSection 3,weshowthehighest sparsities forwhich IMP subnetworkperformance is within one standard deviation of the unpruned BERT model on each task. Our best numbers in Table 5 are in line with those reported by HuggingFace[49]. Table 8: Transfer performance of MLM subnetworksf(x;mMLMIMPθ0) obtained from different numberoftrainingexamples.
Neural Information Processing Systems
Feb-9-2026, 23:15:17 GMT