MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining Jacob Portes

Neural Information Processing Systems 

Although BERT -style encoder models are heavily used in NLP research, many researchers do not pretrain their own BERTs from scratch due to the high cost of training. In the past half-decade since BERT first rose to prominence, many advances have been made with other transformer architectures and training configurations that have yet to be systematically incorporated into BERT.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found