Few-Shot Multilingual Open-Domain QA from 5 Examples

Feb-26-2025–arXiv.org Artificial Intelligence

Recent approaches to multilingual open-domain question answering (MLODQA) have achieved promising results given abundant language-specific training data. However, the considerable annotation cost limits the application of these methods for underrepresented languages. We introduce a \emph{few-shot learning} approach to synthesise large-scale multilingual data from large language models (LLMs). Our method begins with large-scale self-supervised pre-training using WikiData, followed by training on high-quality synthetic multilingual data generated by prompting LLMs with few-shot supervision. The final model, \textsc{FsModQA}, significantly outperforms existing few-shot and supervised baselines in MLODQA and cross-lingual and monolingual retrieval. We further show our method can be extended for effective zero-shot adaptation to new languages through a \emph{cross-lingual prompting} strategy with only English-supervised data, making it a general and applicable solution for MLODQA tasks without costly large-scale annotation.

computational linguistic, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

Feb-26-2025

arXiv.org PDF

Add feedback

Country:
- Asia > Middle East
  - UAE (0.14)
- Europe (1.00)
- North America
  - Mexico > Mexico City (0.14)
  - United States (1.00)
- Oceania > Australia
  - Victoria (0.14)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Government > Regional Government (0.92)
- Health & Medicine (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.68)
  - Natural Language > Large Language Model (1.00)