Improving Transformers with Dynamically Composable Multi-Head Attention

Open in new window