Reasoning Beyond Labels: Measuring LLM Sentiment in Low-Resource, Culturally Nuanced Contexts
Ochieng, Millicent, Thieme, Anja, Ezeani, Ignatius, Ueno, Risa, Maina, Samuel, Ronen, Keshet, Gonzalez, Javier, O'Neill, Jacki
–arXiv.org Artificial Intelligence
Sentiment analysis in low-resource, culturally nuanced contexts challenges conventional NLP approaches that assume fixed labels and universal affective expressions. We present a diagnostic framework that treats sentiment as a context-dependent, culturally embedded construct, and evaluate how large language models (LLMs) reason about sentiment in informal, code-mixed WhatsApp messages from Nairobi youth health groups. Using a combination of human-annotated data, sentiment-flipped counterfactuals, and rubric-based explanation evaluation, we probe LLM interpretability, robustness, and alignment with human reasoning. Framing our evaluation through a social-science measurement lens, we operationalize and interrogate LLMs outputs as an instrument for measuring the abstract concept of sentiment. Our findings reveal significant variation in model reasoning quality, with top-tier LLMs demonstrating interpretive stability, while open models often falter under ambiguity or sentiment shifts. This work highlights the need for culturally sensitive, reasoning-aware AI evaluation in complex, real-world communication.
arXiv.org Artificial Intelligence
Aug-7-2025
- Country:
- Africa > Kenya
- Nairobi City County > Nairobi (0.25)
- Nairobi Province (0.04)
- Asia
- Europe > Netherlands
- North Holland > Amsterdam (0.04)
- North America
- Dominican Republic (0.04)
- United States (0.04)
- Africa > Kenya
- Genre:
- Research Report > New Finding (0.34)
- Industry:
- Health & Medicine > Therapeutic Area (0.70)
- Technology: