Overcoming Long-Context Limitations of State-Space Models via Context-Dependent Sparse Attention

Open in new window