ZigzagAttention: Efficient Long-Context Inference with Exclusive Retrieval and Streaming Heads

Open in new window