Review for NeurIPS paper: DynaBERT: Dynamic BERT with Adaptive Width and Depth
–Neural Information Processing Systems
Additional Feedback: Random things: - Table 1 is a bit overloaded and difficult to parse. Also I'm not sure which row and column are m_w vs m_d. Can you present this differently with lines corresponding to the base models? Related Work: There's a little bit of discussion in the first half of paragraph 2 of the introduction, but no comprehensive addressing of how your work sits in context to the work already out there. Including work that talks about the capacity of large language models, what they can and can't do would be important here, how more layers/parameters help language models in general (Jawahar et al 2019; What does BERT learn about the structure of language?, Jozefowicz et al 2016 Exploring the limits of language modeling, Melis et al 2017 On the State of the Art of Evaluation in Neural Language Models, Subramani et al 2019 Can Unconditional Language Models Recover Arbitrary Sentences?).
Neural Information Processing Systems
Jan-25-2025, 14:24:25 GMT
- Technology: