Goto

Collaborating Authors

 Education




The Mamba in the Llama: Distilling and Accelerating Hybrid Models

Neural Information Processing Systems

Linear RNN architectures, like Mamba, can be competitive with Transformer models in language modeling while having advantageous deployment characteristics. Given the focus on training large-scale Transformer models, we consider the challenge of converting these pretrained models for deployment.