Searching for Efficient Transformers for Language Modeling

Oct-9-2024, 23:08:06 GMT–Neural Information Processing Systems

Large Transformer models have been central to recent advances in natural language processing. The training and inference costs of these models, however, have grown rapidly and become prohibitively expensive. Here we aim to reduce the costs of Transformers by searching for a more efficient variant. Compared to previous approaches, our search is performed at a lower level, over the primitives that define a Transformer TensorFlow program. We identify an architecture, named Primer, that has a smaller training cost than the original Transformer and other variants for auto-regressive language modeling.

efficient transformer, primer, transformer, (8 more...)

Neural Information Processing Systems

Oct-9-2024, 23:08:06 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.80)