MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining Jacob Portes

Feb-7-2026, 14:55:32 GMT–Neural Information Processing Systems

Although BERT -style encoder models are heavily used in NLP research, many researchers do not pretrain their own BERTs from scratch due to the high cost of training. In the past half-decade since BERT first rose to prominence, many advances have been made with other transformer architectures and training configurations that have yet to be systematically incorporated into BERT.

large language model, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Feb-7-2026, 14:55:32 GMT

Conferences PDF

Add feedback

Country:
- North America > United States
  - California > Santa Clara County > Palo Alto (0.04)
- Europe
  - United Kingdom > England
    - Hampshire > Southampton (0.04)
  - Italy > Calabria
    - Catanzaro Province > Catanzaro (0.04)

Genre:
- Research Report > New Finding (0.94)

Industry:
- Media (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
MosaicBERT: ABidirectional Encoder Optimized for Fast Pretraining

Similar Docs Excel Report more

Title	Similarity	Source
None found