Cross-lingual Retrieval for Iterative Self-Supervised Training (supplementary materials) 1 Experiment details
–Neural Information Processing Systems
Becauseof the file size limit, we will release the source code and pretrained checkpoints after the anonymity period. To be able to make a fair comparison,we followed the same preprocessingsteps as described in [13]. In each iteration, we mine all90 language pairs in parallel, using8 GPUs for each pair, each pair taking about15 30 hours to finish. We lightly tune the margin score threshold using validation BLEU (using threshold score between 1.04and1.07.) For all experiments, we use Transformerwith 12 layers of encoder and 12 layers of decoder with model dimension of1024 on 16 heads ( 680M parameters). 1 We trained for maximum20,000 steps using label-smoothed cross-entropy loss with 0.2 label smoothing,0.3
Neural Information Processing Systems
Feb-7-2026, 14:57:07 GMT
- Country:
- Asia
- China > Hong Kong (0.05)
- Middle East > Saudi Arabia
- Northern Borders Province > Arar (0.05)
- Europe
- Belgium (0.05)
- Bulgaria > Sofia City Province
- Sofia (0.05)
- Asia
- Technology: