Appendix of DynaBERT: Dynamic BERT with Adaptive Width and Depth
–Neural Information Processing Systems
B.1 Description of Data sets in the GLUE benchmark The GLUE benchmark [11] is a collection of diverse natural language understanding tasks, including textual entailment (RTE and MNLI), question answering (QNLI), similarity and paraphrase (MRPC, QQP, STS-B), sentiment analysis (SST-2) and linguistic acceptability (CoLA). For MNLI, we use both the matched (MNLI-m) and mismatched (MNLI-mm) sections. We do not experiment on Winograd Schema (WNLI) because even a majority baseline outperforms many methods on it. The same hyperparameters as in Table 1 are used for DynaRoBERTa. The batch size is 12 throughout the training process.
Neural Information Processing Systems
May-29-2025, 16:43:35 GMT