This New BERT Is Way Faster & Smaller Than The Original
Recently, the researchers at Amazon introduced an optimal subset of the popular BERT architecture for neural architecture search. This smaller version of BERT is known as BORT and is able to be pre-trained in 288 GPU hours, which is 1.2% of the time required to pre-train the highest-performing BERT parametric architectural variant, RoBERTa-large. Since its inception, BERT has achieved several groundbreaking tasks in the field of natural language processing (NLP) and natural language understanding (NLU). It has made a resounding impact in the area of language modelling, as well. However, several times, the usability of BERT has been considered an issue for various serious concerns, such as its larger size, slow inference time, complex pre-training process, among others.
Nov-3-2020, 07:15:32 GMT
- Technology: