Scaling Linear Attention with Sparse State Expansion

Open in new window