Machine Translation
2433fec2144ccf5fea1c9c5ebdbc3924-Paper-Conference.pdf
Previous works have validated that text generation APIs can be stolen through imitation attacks, causing IP violations. In order to protect the IP of text generationAPIs,recentworkhasintroduced awatermarking algorithm andutilized the null-hypothesis test as a post-hoc ownership verification on the imitation models.
Cross-lingual Retrieval for Iterative Self-Supervised Training (supplementary materials) 1 Experiment details
Becauseof the file size limit, we will release the source code and pretrained checkpoints after the anonymity period. To be able to make a fair comparison,we followed the same preprocessingsteps as described in [13]. In each iteration, we mine all90 language pairs in parallel, using8 GPUs for each pair, each pair taking about15 30 hours to finish. We lightly tune the margin score threshold using validation BLEU (using threshold score between 1.04and1.07.) For all experiments, we use Transformerwith 12 layers of encoder and 12 layers of decoder with model dimension of1024 on 16 heads ( 680M parameters). 1 We trained for maximum20,000 steps using label-smoothed cross-entropy loss with 0.2 label smoothing,0.3