SageAttention2: Efficient Attention with Thorough Outlier Smoothing and Per-thread INT4 Quantization