Primer: SearchingforEfficientTransformers forLanguageModeling

Neural Information Processing Systems 

Weidentify anarchitecture, named Primer, that has a smaller training cost than the original Transformer and other variants for auto-regressive language modeling.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found