Towards Geo-Culturally Grounded LLM Generations

Lertvittayakumjorn, Piyawat, Kinney, David, Prabhakaran, Vinodkumar, Martin, Donald Jr., Dev, Sunipa

Feb-20-2025–arXiv.org Artificial Intelligence

Generative large language models (LLMs) have been demonstrated to have gaps in diverse, cultural knowledge across the globe. We investigate the effect of retrieval augmented generation and search-grounding techniques on the ability of LLMs to display familiarity with a diverse range of national cultures. Specifically, we compare the performance of standard LLMs, LLMs augmented with retrievals from a bespoke knowledge base (i.e., KB grounding), and LLMs augmented with retrievals from a web search (i.e., search grounding) on a series of cultural familiarity benchmarks. We find that search grounding significantly improves the LLM performance on multiple-choice benchmarks that test propositional knowledge (e.g., the norms, artifacts, and institutions of national cultures), while KB grounding's effectiveness is limited by inadequate knowledge base coverage and a subopti-mal retriever. However, search grounding also increases the risk of stereotypical judgments by language models, while failing to improve evaluators' judgments of cultural familiarity in a human evaluation with adequate statistical power. These results highlight the distinction between propositional knowledge about a culture and open-ended cultural fluency when it comes to evaluating the cultural familiarity of generative LLMs.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Feb-20-2025

arXiv.org PDF

Add feedback

Country:
- Asia
  - Japan > Honshū (0.14)
  - Middle East > Republic of Türkiye (0.14)
  - Vietnam > Hồ Chí Minh City
    - Hồ Chí Minh City (0.14)
- Europe (1.00)
- North America > United States (0.46)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Education (0.88)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)