Exploiting Transformer Activation Sparsity with Dynamic Inference

Open in new window