Can the Variation of Model Weights be used as a Criterion for Self-Paced Multilingual NMT?