CLASH: Evaluating Language Models on Judging High-Stakes Dilemmas from Multiple Perspectives

Lee, Ayoung, Kwon, Ryan Sungmo, Railton, Peter, Wang, Lu

Sep-29-2025–arXiv.org Artificial Intelligence

Navigating dilemmas involving conflicting values is challenging even for humans in high-stakes domains, let alone for AI, yet prior work has been limited to everyday scenarios. To close this gap, we introduce CLASH (Character perspective-based LLM Assessments in Situations with High-stakes), a meticulously curated dataset consisting of 345 high-impact dilemmas along with 3,795 individual perspectives of diverse values. CLASH enables the study of critical yet underex-plored aspects of value-based decision-making processes, including understanding of decision ambivalence and psychological discomfort as well as capturing the temporal shifts of values in the perspectives of characters. By benchmarking 14 non-thinking and thinking models, we uncover several key findings. Instead, new failure patterns emerge, including early commitment and overcom-mitment. This paper aims to address a core question: Can LLMs make proper judgments in high-stakes dilemmas according to different perspectives?

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

Sep-29-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States > Michigan (0.27)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Health & Medicine > Therapeutic Area (0.68)
- Law (0.67)
- Education > Educational Setting
  - K-12 Education (0.45)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.71)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found