PerCul: A Story-Driven Cultural Evaluation of LLMs in Persian

Monazzah, Erfan Moosavi, Rahimzadeh, Vahid, Yaghoobzadeh, Yadollah, Shakery, Azadeh, Pilehvar, Mohammad Taher

Feb-11-2025–arXiv.org Artificial Intelligence

Large language models predominantly reflect Western cultures, largely due to the dominance of English-centric training data. This imbalance presents a significant challenge, as LLMs are increasingly used across diverse contexts without adequate evaluation of their cultural competence in non-English languages, including Persian. To address this gap, we introduce PerCul, a carefully constructed dataset designed to assess the sensitivity of LLMs toward Persian culture. PerCul features story-based, multiple-choice questions that capture culturally nuanced scenarios. Unlike existing benchmarks, PerCul is curated with input from native Persian annotators to ensure authenticity and to prevent the use of translation as a shortcut. We evaluate several state-of-the-art multilingual and Persian-specific LLMs, establishing a foundation for future research in cross-cultural NLP evaluation. Our experiments demonstrate a 11.3% gap between best closed source model and layperson baseline while the gap increases to 21.3% by using the best open-weight model. You can access the dataset from here: https://huggingface.co/datasets/teias-ai/percul

category, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

Feb-11-2025

arXiv.org PDF

Add feedback

Country:
- Asia
  - China (0.04)
  - Middle East > Iran
    - Alborz Province > Karaj (0.04)
    - Tehran Province > Tehran (0.05)
  - Thailand > Bangkok
    - Bangkok (0.05)
- Europe > United Kingdom (0.04)
- North America
  - Canada > Ontario
    - Toronto (0.04)
  - Dominican Republic (0.04)
  - Mexico > Mexico City
    - Mexico City (0.04)
  - United States > Illinois (0.04)

Genre:
- Research Report (0.64)

Industry:
- Education > Educational Setting > K-12 Education (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)