Language-Independent Representations Improve Zero-Shot Summarization
Solovyev, Vladimir, Liu, Danni, Niehues, Jan
–arXiv.org Artificial Intelligence
Finetuning pretrained models on downstream generation tasks often leads to catastrophic forgetting in zero-shot conditions. In this work, we focus on summarization and tackle the problem through the lens of language-independent representations. After training on monolingual summarization, we perform zero-shot transfer to new languages or language pairs. We first show naively finetuned models are highly language-specific in both output behavior and internal representations, resulting in poor zero-shot performance. Next, we propose query-key (QK) finetuning to decouple task-specific knowledge from the pretrained language generation abilities. Then, after showing downsides of the standard adversarial language classifier, we propose a balanced variant that more directly enforces language-agnostic representations. Moreover, our qualitative analyses show removing source language identity correlates to zero-shot summarization performance. Our code is openly available.
arXiv.org Artificial Intelligence
Apr-8-2024
- Country:
- Asia (0.93)
- Europe (1.00)
- North America > United States (0.93)
- Genre:
- Research Report (1.00)
- Technology: