Improving Transformer with an Admixture of Attention Heads

Open in new window