Differentiable Subset Pruning of Transformer Heads