FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

Open in new window