Cross-Lingual Stability and Bias in Instruction-Tuned Language Models for Humanitarian NLP
Nemkova, Poli, Adhikari, Amrit, Pearson, Matthew, Sadu, Vamsi Krishna, Albert, Mark V.
–arXiv.org Artificial Intelligence
Humanitarian organizations face a critical choice: invest in costly commercial APIs or rely on free open-weight models for multilingual human rights monitoring. While commercial systems offer reliability, open-weight alternatives lack empirical validation -- especially for low-resource languages common in conflict zones. This paper presents the first systematic comparison of commercial and open-weight large language models (LLMs) for human-rights-violation detection across seven languages, quantifying the cost-reliability trade-off facing resource-constrained organizations. Across 78,000 multilingual inferences, we evaluate six models -- four instruction-aligned (Claude-Sonnet-4, DeepSeek-V3, Gemini-Flash-2.0, GPT-4.1-mini) and two open-weight (LLaMA-3-8B, Mistral-7B) -- using both standard classification metrics and new measures of cross-lingual reliability: Calibration Deviation (CD), Decision Bias (B), Language Robustness Score (LRS), and Language Stability Score (LSS). Results show that alignment, not scale, determines stability: aligned models maintain near-invariant accuracy and balanced calibration across typologically distant and low-resource languages (e.g., Lingala, Burmese), while open-weight models exhibit significant prompt-language sensitivity and calibration drift. These findings demonstrate that multilingual alignment enables language-agnostic reasoning and provide practical guidance for humanitarian organizations balancing budget constraints with reliability in multilingual deployment.
arXiv.org Artificial Intelligence
Oct-28-2025
- Country:
- Africa
- Middle East > Egypt (0.04)
- Nigeria (0.04)
- Asia
- Europe
- Russia (0.05)
- Ukraine
- Donetsk Oblast > Mariupol (0.04)
- Kharkiv Oblast > Kharkiv (0.04)
- Luhansk Oblast > Luhansk (0.04)
- Mykolaiv Oblast > Mykolaiv (0.04)
- North America > United States
- Texas > Denton County > Denton (0.04)
- Africa
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Government > Military (0.68)
- Law (0.90)
- Technology: