Order Matters in the Presence of Dataset Imbalance for Multilingual Learning

Choi, Dami, Xin, Derrick, Dadkhahi, Hamid, Gilmer, Justin, Garg, Ankush, Firat, Orhan, Yeh, Chih-Kuan, Dai, Andrew M., Ghorbani, Behrooz

Dec-11-2023–arXiv.org Artificial Intelligence

In this paper, we empirically study the optimization dynamics of multi-task learning, particularly focusing on those that govern a collection of tasks with significant data imbalance. We present a simple yet effective method of pre-training on high-resource tasks, followed by fine-tuning on a mixture of high/low-resource tasks. We provide a thorough empirical study and analysis of this method's benefits showing that it achieves consistent improvements relative to the performance trade-off profile of standard static weighting. We analyze under what data regimes this method is applicable and show its improvements empirically in neural machine translation (NMT) and multi-lingual language modeling.

cross-entropy loss, train cross-entropy loss, valid cross-entropy loss, (13 more...)

arXiv.org Artificial Intelligence

Dec-11-2023

arXiv.org PDF

Add feedback

Country:
- Asia (0.04)
- North America > Canada
  - Ontario > Toronto (0.14)
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.04)

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Machine Translation (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.68)