Can the Variation of Model Weights be used as a Criterion for Self-Paced Multilingual NMT?
Atrio, Àlex R., Allemann, Alexis, Dolamic, Ljiljana, Popescu-Belis, Andrei
–arXiv.org Artificial Intelligence
Many-to-one neural machine translation systems improve over one-to-one systems when training data is scarce. In this paper, we design and test a novel algorithm for selecting the language of minibatches when training such systems. The algorithm changes the language of the minibatch when the weights of the model do not evolve significantly, as measured by the smoothed KL divergence between all layers of the Transformer network. This algorithm outperforms the use of alternating monolingual batches, but not the use of shuffled batches, in terms of translation quality (measured with BLEU and COMET) and convergence speed.
arXiv.org Artificial Intelligence
Oct-5-2024
- Country:
- North America
- Dominican Republic (0.04)
- United States
- Pennsylvania (0.04)
- California (0.04)
- New York > New York County
- New York City (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Canada > British Columbia
- Europe
- Germany > Berlin (0.04)
- Switzerland > Vaud
- Lausanne (0.04)
- Belgium
- Brussels-Capital Region > Brussels (0.05)
- Flanders > East Flanders
- Ghent (0.04)
- North America
- Genre:
- Research Report (0.64)
- Technology: