$π$-Attention: Periodic Sparse Transformers for Efficient Long-Context Modeling

Open in new window