Evaluating the Moral Beliefs Encoded in LLMs

Dec-26-2025, 11:32:55 GMT–Neural Information Processing Systems

This paper presents a case study on the design, administration, post-processing, and evaluation of surveys on large language models (LLMs). It comprises two components:(1) A statistical method for eliciting beliefs encoded in LLMs. We introduce statistical measures and evaluation metrics that quantify the probability of an LLM making a choice, the associated uncertainty, and the consistency of that choice.(2) We apply this method to study what moral beliefs are encoded in different LLMs, especially in ambiguous cases where the right choice is not obvious.We design a large-scale survey comprising 680 high-ambiguity moral scenarios (e.g., Should I tell a white lie?) and 687 low-ambiguity moral scenarios (e.g., Should I stop for a pedestrian on the road?). Each scenario includes a description, two possible actions, and auxiliary labels indicating violated rules (e.g., do not kill).

moral belief encoded, name change, scenario, (5 more...)

Neural Information Processing Systems

Dec-26-2025, 11:32:55 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)