Multi-Head Attention with Disagreement Regularization

Open in new window