Improving Transformers with Dynamically Composable Multi-Head Attention