BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences

Open in new window