Goto

Collaborating Authors

 Machine Translation


Learningtosummarizefromhumanfeedback

Neural Information Processing Systems

Wehope theevidence from ourpaper motivates machine learning researchers to pay closer attention to how their training loss affects the modelbehaviortheyactuallywant.



Cross-lingual Retrieval for Iterative Self-Supervised Training (supplementary materials) 1 Experiment details

Neural Information Processing Systems

Becauseof the file size limit, we will release the source code and pretrained checkpoints after the anonymity period. To be able to make a fair comparison,we followed the same preprocessingsteps as described in [13]. In each iteration, we mine all90 language pairs in parallel, using8 GPUs for each pair, each pair taking about15 30 hours to finish. We lightly tune the margin score threshold using validation BLEU (using threshold score between 1.04and1.07.) For all experiments, we use Transformerwith 12 layers of encoder and 12 layers of decoder with model dimension of1024 on 16 heads ( 680M parameters). 1 We trained for maximum20,000 steps using label-smoothed cross-entropy loss with 0.2 label smoothing,0.3




DomainSequenceModeling

Neural Information Processing Systems

Wefurther propose attention sharing strategies to facilitate parameter sharing and specialization in multilingual and multi-domain sequencemodeling.