BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences