BUFFET: Benchmarking Large Language Models for Few-shot Cross-lingual Transfer

Asai, Akari, Kudugunta, Sneha, Yu, Xinyan Velocity, Blevins, Terra, Gonen, Hila, Reid, Machel, Tsvetkov, Yulia, Ruder, Sebastian, Hajishirzi, Hannaneh

May-24-2023–arXiv.org Artificial Intelligence

Despite remarkable advancements in few-shot generalization in natural language processing, most models are developed and evaluated primarily in English. To facilitate research on few-shot cross-lingual transfer, we introduce a new benchmark, called BUFFET, which unifies 15 diverse tasks across 54 languages in a sequence-to-sequence format and provides a fixed set of few-shot examples and instructions. BUFFET is designed to establish a rigorous and equitable evaluation framework for few-shot cross-lingual transfer across a broad range of tasks and languages. Using BUFFET, we perform thorough evaluations of state-of-the-art multilingual large language models with different transfer methods, namely in-context learning and fine-tuning. Our findings reveal significant room for improvement in few-shot in-context cross-lingual transfer. In particular, ChatGPT with in-context learning often performs worse than much smaller mT5-base models fine-tuned on English task data and few-shot in-language examples. Our analysis suggests various avenues for future research in few-shot cross-lingual transfer, such as improved pretraining, understanding, and future evaluations.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

May-24-2023

arXiv.org PDF

Add feedback

Country:
- Asia (0.93)
- North America > United States (0.67)

Genre:
- Research Report > New Finding (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.91)
  - Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found