Differentiable Subset Pruning of Transformer Heads

Open in new window