Attention Weights in Transformer NMT Fail Aligning Words Between Sequences but Largely Explain Model Predictions

Open in new window