Conditions for Catastrophic Forgetting in Multilingual Translation
–arXiv.org Artificial Intelligence
Fine-tuning multilingual foundation models on specific languages often induces catastrophic forgetting, degrading performance on languages unseen in fine-tuning. While this phenomenon is widely-documented, the literature presents fragmented results about when forgetting occurs. To address this ambiguity, we conduct a systematic empirical study using machine translation as a testbed to identify the conditions that trigger catastrophic forgetting in multilingual fine-tuning. Through controlled experiments across different model architectures, data scales, and fine-tuning approaches, we reveal that the relative scale between model and data size is a primary determinant of forgetting. Moreover, we demonstrate that a model's instruction-following ability is more critical for retaining multilingual knowledge than its architecture. Contrary to assumptions, parameter-efficient fine-tuning offers no clear advantage over full fine-tuning in mitigating forgetting. Lastly, we show that cross-lingual alignment can mitigate forgetting while also facilitating positive transfer to unseen target languages.
arXiv.org Artificial Intelligence
Oct-23-2025
- Country:
- Europe (1.00)
- Asia (0.94)
- North America > United States
- Minnesota (0.28)
- Genre:
- Research Report
- Experimental Study (0.55)
- New Finding (0.46)
- Research Report
- Technology: