Do Multilingual Neural Machine Translation Models Contain Language Pair Specific Attention Heads?

Open in new window