Is Mamba Capable of In-Context Learning?

Grazzi, Riccardo, Siems, Julien, Schrodi, Simon, Brox, Thomas, Hutter, Frank

arXiv.org Artificial Intelligence 

This work provides empirical evidence that Mamba, a newly proposed selective structured state space model, has similar in-context learning (ICL) capabilities as transformers. We evaluated Mamba on tasks involving simple function approximation as well as more complex natural language processing problems. Our results demonstrate that across both categories of tasks, Mamba matches the performance of transformer models for ICL. Further analysis reveals that like transformers, Mamba appears to solve ICL problems by incrementally optimizing its internal representations. Overall, our work suggests that Mamba can be an efficient alternative to transformers for ICL tasks involving longer input sequences. Recent advancements in large-scale neural language modeling (Brown et al., 2020) have demonstrated that Transformer models (Vaswani et al., 2017) exhibit in-context learning (ICL) capabilities: after (self-supervised) pre-training, they can infer how to perform tasks only from input examples without the need for explicit training nor fine-tuning.