CLASH: Evaluating Language Models on Judging High-Stakes Dilemmas from Multiple Perspectives
Lee, Ayoung, Kwon, Ryan Sungmo, Railton, Peter, Wang, Lu
–arXiv.org Artificial Intelligence
Navigating dilemmas involving conflicting values is challenging even for humans in high-stakes domains, let alone for AI, yet prior work has been limited to everyday scenarios. To close this gap, we introduce CLASH (Character perspective-based LLM Assessments in Situations with High-stakes), a meticulously curated dataset consisting of 345 high-impact dilemmas along with 3,795 individual perspectives of diverse values. CLASH enables the study of critical yet underex-plored aspects of value-based decision-making processes, including understanding of decision ambivalence and psychological discomfort as well as capturing the temporal shifts of values in the perspectives of characters. By benchmarking 14 non-thinking and thinking models, we uncover several key findings. Instead, new failure patterns emerge, including early commitment and overcom-mitment. This paper aims to address a core question: Can LLMs make proper judgments in high-stakes dilemmas according to different perspectives?
arXiv.org Artificial Intelligence
Sep-29-2025
- Country:
- Europe
- Netherlands > South Holland
- Delft (0.04)
- United Kingdom > England
- Oxfordshire > Oxford (0.04)
- Netherlands > South Holland
- North America
- Mexico > Mexico City
- Mexico City (0.04)
- United States
- Michigan > Washtenaw County
- Ann Arbor (0.14)
- Pennsylvania > Allegheny County
- Pittsburgh (0.04)
- Michigan > Washtenaw County
- Mexico > Mexico City
- Europe
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Education > Educational Setting
- K-12 Education (0.45)
- Health & Medicine > Therapeutic Area (0.68)
- Law (0.67)
- Education > Educational Setting
- Technology: