Goto

Collaborating Authors

 Education




The Mamba in the Llama: Distilling and Accelerating Hybrid Models

Neural Information Processing Systems

Linear RNN architectures, like Mamba, can be competitive with Transformer models in language modeling while having advantageous deployment characteristics. Given the focus on training large-scale Transformer models, we consider the challenge of converting these pretrained models for deployment.


Nature-Inspired Local Propagation

Neural Information Processing Systems

In the reminder of the paper we will try whenever possible to formally introduce functions by clearly stating domain and co-domain.