Resona: Improving Context Copying in Linear Recurrence Models with Retrieval
Wang, Xinyu, Ma, Linrui, Huang, Jerry, Lu, Peng, Parthasarathi, Prasanna, Chang, Xiao-Wen, Chen, Boxing, Cui, Yufei
–arXiv.org Artificial Intelligence
Recent shifts in the space of large language model (LLM) research have shown an increasing focus on novel architectures to compete with prototypical Transformer-based models that have long dominated this space. Linear recurrent models have proven to be a viable competitor due to their computational efficiency. However, such models still demonstrate a sizable gap compared to Transformers in terms of in-context learning among other tasks that require recalling information from a context. In this work, we introduce __Resona__, a simple and scalable framework for augmenting linear recurrent models with retrieval. __Resona__~augments models with the ability to integrate retrieved information from the provided input context, enabling tailored behavior to diverse task requirements. Experiments on a variety of linear recurrent models demonstrate that __Resona__-augmented models observe significant performance gains on a variety of synthetic as well as real-world natural language tasks, highlighting its ability to act as a general purpose method to improve the in-context learning and language modeling abilities of linear recurrent LLMs.
arXiv.org Artificial Intelligence
Mar-28-2025
- Country:
- Asia (0.93)
- Europe > Austria
- Vienna (0.15)
- North America > United States (1.00)
- Genre:
- Research Report (0.64)
- Technology: