AITopics | dynabert

The pre-trained language models like BERT, though powerful in many natural language processing tasks, are both computation and memory expensive. To alleviate this problem, one approach is to compress them for specific tasks before deployment. However, recent works on BERT compression usually compress the large BERT model to a fixed smaller size, and can not fully satisfy the requirements of different edge devices with various hardware performances. In this paper, we propose a novel dynamic BERT model (abbreviated as DynaBERT), which can flexibly adjust the size and latency by selecting adaptive width and depth. The training process of DynaBERT includes first training a width-adaptive BERT and then allowing both adaptive width and depth, by distilling knowledge from the full-sized model to small sub-networks. Network rewiring is also used to keep the more important attention heads and neurons shared by more sub-networks. Comprehensive experiments under various efficiency constraints demonstrate that our proposed dynamic BERT (or RoBERTa) at its largest size has comparable performance as BERT-base (or RoBERTa-base), while at smaller widths and depths consistently outperforms existing BERT compression methods.

dynabert, dynamic bert, name change, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

6f5216f8d89b086c18298e043bfe48ed-Supplemental.pdf

Neural Information Processing SystemsOct-3-2025, 05:02:08 GMT

accuracy, machine learning, natural language, (14 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

DynaBERT: Dynamic BERT with Adaptive Width and Depth Lu Hou

Neural Information Processing SystemsOct-3-2025, 05:01:57 GMT

The pre-trained language models like BERT, though powerful in many natural language processing tasks, are both computation and memory expensive. To alleviate this problem, one approach is to compress them for specific tasks before deployment. However, recent works on BERT compression usually compress the large BERT model to a fixed smaller size. They can not fully satisfy the requirements of different edge devices with various hardware performances. In this paper, we propose a novel dynamic BERT model (abbreviated as Dyn-aBERT), which can flexibly adjust the size and latency by selecting adaptive width and depth. The training process of DynaBERT includes first training a width-adaptive BERT and then allowing both adaptive width and depth, by distilling knowledge from the full-sized model to small sub-networks.

dynabert, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country: North America (0.28)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Thanks for the suggestion. We will

Neural Information Processing SystemsOct-3-2025, 05:01:46 GMT

Genral response: We thank all reviewers for their constructive comments. Below is our response for common questions. BERT models; and (iii) is more environmentally friendly due to weight sharing. Q1."whether this approach can be adapted to work during the pre-training phase" Q1."paper quite dense and hard to read...rely on various complicated procedures", "if there is a We will continue thinking about simplifying the method. Q2."T able 4, what exactly is'fine-tuning'?":This is the'fine-tuning' mentioned in Lines 138-139 in Section 2.2. I can't work out what this sentence means."

artificial intelligence, machine learning, suggestion, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.36)

Add feedback

Review for NeurIPS paper: DynaBERT: Dynamic BERT with Adaptive Width and Depth

Neural Information Processing SystemsJan-25-2025, 14:24:25 GMT

Additional Feedback: Random things: - Table 1 is a bit overloaded and difficult to parse. Also I'm not sure which row and column are m_w vs m_d. Can you present this differently with lines corresponding to the base models? Related Work: There's a little bit of discussion in the first half of paragraph 2 of the introduction, but no comprehensive addressing of how your work sits in context to the work already out there. Including work that talks about the capacity of large language models, what they can and can't do would be important here, how more layers/parameters help language models in general (Jawahar et al 2019; What does BERT learn about the structure of language?, Jozefowicz et al 2016 Exploring the limits of language modeling, Melis et al 2017 On the State of the Art of Evaluation in Neural Language Models, Subramani et al 2019 Can Unconditional Language Models Recover Arbitrary Sentences?).

author response, dynamic bert, neurips paper, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

DynaBERT: Dynamic BERT with Adaptive Width and Depth

Neural Information Processing SystemsOct-10-2024, 12:09:28 GMT

The pre-trained language models like BERT, though powerful in many natural language processing tasks, are both computation and memory expensive. To alleviate this problem, one approach is to compress them for specific tasks before deployment. However, recent works on BERT compression usually compress the large BERT model to a fixed smaller size, and can not fully satisfy the requirements of different edge devices with various hardware performances. In this paper, we propose a novel dynamic BERT model (abbreviated as DynaBERT), which can flexibly adjust the size and latency by selecting adaptive width and depth. The training process of DynaBERT includes first training a width-adaptive BERT and then allowing both adaptive width and depth, by distilling knowledge from the full-sized model to small sub-networks. Network rewiring is also used to keep the more important attention heads and neurons shared by more sub-networks.

bert model, dynabert, dynamic bert, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Distilling Linguistic Context for Language Model Compression

Park, Geondo, Kim, Gyeongman, Yang, Eunho

arXiv.org Artificial IntelligenceSep-17-2021

A computationally expensive and memory intensive neural network lies behind the recent success of language representation learning. Knowledge distillation, a major technique for deploying such a vast language model in resource-scarce environments, transfers the knowledge on individual word representations learned without restrictions. In this paper, inspired by the recent observations that language representations are relatively positioned and have more semantic knowledge as a whole, we present a new knowledge distillation objective for language representation learning that transfers the contextual knowledge via two types of relationships across representations: Word Relation and Layer Transforming Relation. Unlike other recent distillation techniques for the language models, our contextual distillation does not have any restrictions on architectural changes between teacher and student. We validate the effectiveness of our method on challenging benchmarks of language understanding tasks, not only in architectures of various sizes, but also in combination with DynaBERT, the recently proposed adaptive size pruning method.

distillation, representation, word representation, (15 more...)

arXiv.org Artificial Intelligence

2109.08359

Country:

Asia > South Korea > Seoul > Seoul (0.04)
Asia > South Korea > Daejeon > Daejeon (0.04)

Genre: Research Report (0.82)

Industry: Education (0.90)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

r/MachineLearning - DynaBERT: Dynamic BERT with Adaptive Width and Depth

#artificialintelligenceApr-12-2020, 18:59:29 GMT

Abstract: The pre-trained language models like BERT and RoBERTa, though powerful in many natural language processing tasks, are both computational and memory expensive. To alleviate this problem, one approach is to compress them for specific tasks before deployment. However, recent works on BERT compression usually reduce the large BERT model to a fixed smaller size, and can not fully satisfy the requirements of different edge devices with various hardware performances. In this paper, we propose a novel dynamic BERT model (abbreviated as DynaBERT), which can run at adaptive width and depth. The training process of DynaBERT includes first training a width-adaptive BERT and then allows both adaptive width and depth, by distilling knowledge from the full-sized model to small sub-networks. Network rewiring is also used to keep the more important attention heads and neurons shared by more sub-networks.

dynabert, dynamic bert, machinelearning, (2 more...)

#artificialintelligence

Industry: Media > News (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Filters

Collaborating Authors

dynabert

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

6f5216f8d89b086c18298e043bfe48ed-Supplemental.pdf

6f5216f8d89b086c18298e043bfe48ed-Paper.pdf

DynaBERT: Dynamic BERT with Adaptive Width and Depth

6f5216f8d89b086c18298e043bfe48ed-Supplemental.pdf

DynaBERT: Dynamic BERT with Adaptive Width and Depth Lu Hou

Thanks for the suggestion. We will

Review for NeurIPS paper: DynaBERT: Dynamic BERT with Adaptive Width and Depth

DynaBERT: Dynamic BERT with Adaptive Width and Depth

Distilling Linguistic Context for Language Model Compression

r/MachineLearning - DynaBERT: Dynamic BERT with Adaptive Width and Depth