e45caa3d5273d105b8d045e748636957-Supplemental-Conference.pdf

Neural Information Processing Systems 

InFigure 7 of this Appendix, we show that indeed this is due to a decrease in the robustness slope. Across three different datasets, MNIST, CIFAR10, NewsGroup20, we see that increasing the number of tasks leads to a decrease in the robustness slope. Experiments on other languages For our experiments on multilingual generative models, we decided to use Greek and English because we were looking for a linguistic pair with different morphology,syntaxandphonology. This ensures that any benefits in terms of robustness are not coming from exposure to more data. Asshownin Figure 8,eventhough thetwomodels arestarting from roughly thesame perplexity,thebilingual model exhibits higher structural robustness in the presence of weight deletions.