Goto

Collaborating Authors

 Large Language Model





726ab29b61a749b36d2593648716ae3c-Paper-Conference.pdf

Neural Information Processing Systems

Hence, the performance of LLMs in various NLP tasks depends significantly onthecrucial roleplayedbytheattention mechanism with thesoftmaxunit.





The Mamba in the Llama: Distilling and Accelerating Hybrid Models

Neural Information Processing Systems

Linear RNN architectures, like Mamba, can be competitive with Transformer models in language modeling while having advantageous deployment characteristics. Given the focus on training large-scale Transformer models, we consider the challenge of converting these pretrained models for deployment.