DataComp-LM: Insearchofthenextgenerationof trainingsetsforlanguagemodels

Neural Information Processing Systems 

Asabaseline for DCLM, we conduct extensive experiments and find that model-based filtering is key to assembling a high-quality training set.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found