Reducing the Cost of Dropout in Flash-Attention by Hiding RNG with GEMM

Open in new window