Appendix of DynaBERT: Dynamic BERT with Adaptive Width and Depth

May-29-2025, 16:43:35 GMT–Neural Information Processing Systems

B.1 Description of Data sets in the GLUE benchmark The GLUE benchmark [11] is a collection of diverse natural language understanding tasks, including textual entailment (RTE and MNLI), question answering (QNLI), similarity and paraphrase (MRPC, QQP, STS-B), sentiment analysis (SST-2) and linguistic acceptability (CoLA). For MNLI, we use both the matched (MNLI-m) and mismatched (MNLI-mm) sections. We do not experiment on Winograd Schema (WNLI) because even a majority baseline outperforms many methods on it. The same hyperparameters as in Table 1 are used for DynaRoBERTa. The batch size is 12 throughout the training process.

accuracy, artificial intelligence, natural language, (14 more...)

Neural Information Processing Systems

May-29-2025, 16:43:35 GMT

Conferences PDF

Add feedback

Country:
- North America > Canada (0.14)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.34)