b6af2c9703f203a2794be03d443af2e3-Supplemental.pdf

Neural Information Processing Systems 

InTable2inSection 3,weshowthehighest sparsities forwhich IMP subnetworkperformance is within one standard deviation of the unpruned BERT model on each task. Our best numbers in Table 5 are in line with those reported by HuggingFace[49]. Table 8: Transfer performance of MLM subnetworksf(x;mMLMIMPθ0) obtained from different numberoftrainingexamples.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found