MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining Jacob Portes
–Neural Information Processing Systems
Although BERT -style encoder models are heavily used in NLP research, many researchers do not pretrain their own BERTs from scratch due to the high cost of training. In the past half-decade since BERT first rose to prominence, many advances have been made with other transformer architectures and training configurations that have yet to be systematically incorporated into BERT.
Neural Information Processing Systems
Feb-7-2026, 14:55:32 GMT
- Country:
- Europe
- Italy > Calabria
- Catanzaro Province > Catanzaro (0.04)
- United Kingdom > England
- Hampshire > Southampton (0.04)
- Italy > Calabria
- North America > United States
- California > Santa Clara County > Palo Alto (0.04)
- Europe
- Genre:
- Research Report > New Finding (0.94)
- Industry:
- Media (0.46)
- Technology: