Spark Transformer: Reactivating Sparsity in FFN and Attention

Open in new window