Overcoming Long Context Limitations of State Space Models via Context Dependent Sparse Attention

Open in new window