Linguistic Nepotism: Trading-off Quality for Language Preference in Multilingual RAG

Ki, Dayeon, Carpuat, Marine, McNamee, Paul, Khashabi, Daniel, Yang, Eugene, Lawrie, Dawn, Duh, Kevin

Oct-3-2025–arXiv.org Artificial Intelligence

Multilingual Retrieval-Augmented Generation (mRAG) systems enable language models to answer knowledge-intensive queries with citation-supported responses across languages. While such systems have been proposed, an open questions is whether the mixture of different document languages impacts generation and citation in unintended ways. To investigate, we introduce a controlled methodology using model internals to measure language preference while holding other factors such as document relevance constant. Across eight languages and six open-weight models, we find that models preferentially cite English sources when queries are in English, with this bias amplified for lower-resource languages and for documents positioned mid-context. Crucially, we find that models sometimes trade-off document relevance for language preference, indicating that citation choices are not always driven by informativeness alone. Our findings shed light on how language models leverage multilingual context and influence citation behavior. Retrieval-Augmented Generation (RAG) systems have become a core component of modern large language model (LLM) pipelines, enabling models to answer knowledge-intensive queries by supplementing their limited parametric knowledge with external information (Lewis et al., 2020; Karpukhin et al., 2020; Gao et al., 2024). Given that over 50% of digital content is produced in languages other than English (Statista, 2025), recent work has extended these systems to multilingual RAG (mRAG) settings, which handle queries and documents in languages beyond English (Chirkova et al., 2024; Wu et al., 2024). Despite recent advances, prior work highlights a key challenge in mRAG systems: language preference - a systematic tendency of models to favor sources written in certain languages during generation (Park & Lee, 2025). Understanding this behavior is crucial, as citation patterns shape both the information users see and the languages prioritized in multilingual knowledge access. Existing approaches to measuring language preference, however, often fail to capture citation correctness. In short-form mRAG, preference has been estimated via information overlap (Sharma et al., 2025) or embedding similarity (Park & Lee, 2025), which do not directly account for correctness. In long-form mRAG, where outputs contain in-line citations (Zheng et al., 2025; Xu & Peng, 2025), preference has typically been measured by comparing citation frequencies against the language distribution of retrieved documents.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

Oct-3-2025

arXiv.org PDF

Add feedback

Country:
- Africa > Niger (0.04)
- Asia
  - Indonesia > Bali (0.04)
  - Middle East > Saudi Arabia
    - Asir Province > Abha (0.04)
  - Myanmar > Tanintharyi Region
    - Dawei (0.04)
  - Singapore (0.04)
  - Thailand > Bangkok
    - Bangkok (0.04)
- Europe
  - Austria > Vienna (0.14)
  - France (0.04)
  - Italy > Tuscany
    - Florence (0.04)
  - Middle East > Malta (0.04)
- North America
  - Mexico > Mexico City
    - Mexico City (0.04)
  - United States
    - Florida > Miami-Dade County
      - Miami (0.04)
    - Maryland (0.04)
    - New Mexico > Bernalillo County
      - Albuquerque (0.04)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Government (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language
    - Chatbot (1.00)
    - Large Language Model (1.00)