MCEval: A Dynamic Framework for Fair Multilingual Cultural Evaluation of LLMs
Huang, Shulin, Yang, Linyi, Zhang, Yue
–arXiv.org Artificial Intelligence
Large language models exhibit cultural biases and limited cross-cultural understanding capabilities, particularly when serving diverse global user populations. We propose MCEval, a novel multilingual evaluation framework that employs dynamic cultural question construction and enables causal analysis through Counterfactual Rephrasing and Confounder Rephrasing. Our comprehensive evaluation spans 13 cultures and 13 languages, systematically assessing both cultural awareness and cultural bias across different linguistic scenarios. The framework provides 39,897 cultural awareness instances and 17,940 cultural bias instances. Experimental results reveal performance disparities across different linguistic scenarios, demonstrating that optimal cultural performance is not only linked to training data distribution, but also is related to language-culture alignment. The evaluation results also expose the fairness issue, where approaches appearing successful in the English scenario create substantial disadvantages. MCEval represents the first comprehensive multilingual cultural evaluation framework that provides deeper insights into LLMs' cultural understanding.
arXiv.org Artificial Intelligence
Jul-15-2025
- Country:
- Africa > Middle East
- Egypt (0.04)
- Asia
- China > Guangdong Province
- Shenzhen (0.04)
- Indonesia > Bali (0.04)
- China > Guangdong Province
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- North America > United States
- Ohio > Hamilton County > Cincinnati (0.04)
- Africa > Middle East
- Genre:
- Research Report > New Finding (0.68)
- Industry:
- Education (0.67)
- Technology: