Few-Shot Multilingual Open-Domain QA from 5 Examples
Jiang, Fan, Drummond, Tom, Cohn, Trevor
–arXiv.org Artificial Intelligence
Recent approaches to multilingual open-domain question answering (MLODQA) have achieved promising results given abundant language-specific training data. However, the considerable annotation cost limits the application of these methods for underrepresented languages. We introduce a \emph{few-shot learning} approach to synthesise large-scale multilingual data from large language models (LLMs). Our method begins with large-scale self-supervised pre-training using WikiData, followed by training on high-quality synthetic multilingual data generated by prompting LLMs with few-shot supervision. The final model, \textsc{FsModQA}, significantly outperforms existing few-shot and supervised baselines in MLODQA and cross-lingual and monolingual retrieval. We further show our method can be extended for effective zero-shot adaptation to new languages through a \emph{cross-lingual prompting} strategy with only English-supervised data, making it a general and applicable solution for MLODQA tasks without costly large-scale annotation.
arXiv.org Artificial Intelligence
Feb-26-2025
- Country:
- Oceania > Australia
- North America
- Dominican Republic (0.04)
- Canada (0.04)
- United States
- Massachusetts (0.04)
- Washington > King County
- Seattle (0.04)
- New York > Bronx County
- New York City (0.04)
- Florida > Miami-Dade County
- Miami (0.04)
- Mexico > Mexico City
- Mexico City (0.04)
- Europe
- Asia
- Philippines (0.14)
- China > Hong Kong (0.04)
- Middle East
- UAE > Abu Dhabi Emirate
- Abu Dhabi (0.04)
- Israel > Tel Aviv District
- Tel Aviv (0.04)
- UAE > Abu Dhabi Emirate
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Health & Medicine (0.93)
- Government > Regional Government (0.46)
- Technology: