PICASO: Permutation-Invariant Context Composition with State Space Models
Liu, Tian Yu, Achille, Alessandro, Trager, Matthew, Golatkar, Aditya, Zancato, Luca, Soatto, Stefano
–arXiv.org Artificial Intelligence
Providing Large Language Models with relevant contextual knowledge at inference time has been shown to greatly improve the quality of their generations. This is often achieved by prepending informative passages of text, or'contexts', retrieved from external knowledge bases to their input. State Space Models (SSMs) offer a promising solution by allowing a database of contexts to be mapped onto fixed-dimensional states from which to start the generation. A key challenge arises when attempting to leverage information present across multiple contexts, since there is no straightforward way to condition generation on multiple independent states in existing SSMs. To address this, we leverage a simple mathematical relation derived from SSM dynamics to compose multiple states into one that efficiently approximates the effect of concatenating raw context tokens. Since the temporal ordering of contexts can often be uninformative, we enforce permutation-invariance by efficiently averaging states obtained via our composition algorithm across all possible context orderings. We evaluate our resulting method on WikiText and MSMARCO in both zero-shot and fine-tuned settings, and show that we can match the strongest performing baseline while enjoying on average 5.4 speedup. Incorporating new information in deep learning models has traditionally been a costly process, often requiring re-training or fine-tuning their weights on new data. Fortunately, Large Language Models (LLMs) provide a compelling alternative: These models can'learn' to leverage new contextual information in real-time by simply prepending them as inputs, without having to modify their weights (Ram et al., 2023). This has motivated a powerful application known as Retrieval-Augmented Generation (RAG), where LLMs are deployed with the ability to retrieve and incorporate relevant sources of information, or'contexts', from vast external knowledge bases when queried by users at inference time.
arXiv.org Artificial Intelligence
Mar-16-2025