A Set of Quebec-French Corpus of Regional Expressions and Terms
Beauchemin, David, Tremblay, Yan, Youssef, Mohamed Amine, Khoury, Richard
–arXiv.org Artificial Intelligence
The tasks of idiom understanding and dialect understanding are both well-established benchmarks in natural language processing. In this paper, we propose combining them, and using regional idioms as a test of dialect understanding. Towards this end, we propose two new benchmark datasets for the Quebec dialect of French: QFrCoRE, which contains 4,633 instances of idiomatic phrases, and QFrCoRT, which comprises 171 regional instances of idiomatic words. We explain how to construct these corpora, so that our methodology can be replicated for other dialects. Our experiments with 94 LLM demonstrate that our regional idiom benchmarks are a reliable tool for measuring a model's proficiency in a specific dialect.
arXiv.org Artificial Intelligence
Oct-7-2025
- Country:
- Asia
- Middle East
- Myanmar > Tanintharyi Region
- Dawei (0.04)
- Singapore (0.04)
- Thailand > Bangkok
- Bangkok (0.04)
- Europe
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Estonia
- Harju County > Tallinn (0.04)
- Tartu County > Tartu (0.04)
- France (0.04)
- Italy > Tuscany
- Florence (0.04)
- Slovenia > Gorizia
- Municipality of Cerkno > Cerkno (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Belgium > Brussels-Capital Region
- North America
- Canada > Quebec
- Montreal (0.14)
- Mexico > Mexico City
- Mexico City (0.04)
- United States > Florida
- Miami-Dade County > Miami (0.04)
- Canada > Quebec
- Oceania > Australia (0.04)
- Asia
- Genre:
- Research Report > New Finding (0.93)
- Technology: