Goto

Collaborating Authors

 schwartz


Value-Based Large Language Model Agent Simulation for Mutual Evaluation of Trust and Interpersonal Closeness

Sakamoto, Yuki, Uchida, Takahisa, Ishiguro, Hiroshi

arXiv.org Artificial Intelligence

Large language models (LLMs) have emerged as powerful tools for simulating complex social phenomena using human-like agents with specific traits. In human societies, value similarity is important for building trust and close relationships; however, it remains unexplored whether this principle holds true in artificial societies comprising LLM agents. Therefore, this study investigates the influence of value similarity on relationship-building among LLM agents through two experiments. First, in a preliminary experiment, we evaluated the controllability of values in LLMs to identify the most effective model and prompt design for controlling the values. Subsequently, in the main experiment, we generated pairs of LLM agents imbued with specific values and analyzed their mutual evaluations of trust and interpersonal closeness following a dialogue. The experiments were conducted in English and Japanese to investigate language dependence. The results confirmed that pairs of agents with higher value similarity exhibited greater mutual trust and interpersonal closeness. Our findings demonstrate that the LLM agent simulation serves as a valid testbed for social science theories, contributes to elucidating the mechanisms by which values influence relationship building, and provides a foundation for inspiring new theories and insights into the social sciences.


Beyond Multiple Choice: Verifiable OpenQA for Robust Vision-Language RFT

Liu, Yesheng, Li, Hao, Xu, Haiyu, Pei, Baoqi, Wang, Jiahao, Zhao, Mingxuan, Zheng, Jingshu, He, Zheqi, Yao, JG, Qin, Bowen, Yang, Xi, Zhang, Jiajun

arXiv.org Artificial Intelligence

Multiple-choice question answering (MCQA) has been a popular format for evaluating and reinforcement fine-tuning (RFT) of modern multimodal language models. Its constrained output format allows for simplified, deterministic automatic verification. However, we find that the options may leak exploitable signals, which makes the accuracy metrics unreliable for indicating real capabilities and encourages explicit or implicit answer guessing behaviors during RFT. We propose ReVeL (Rewrite and Verify by LLM), a framework that rewrites multiple-choice questions into open-form questions while keeping answers verifiable whenever possible. The framework categorizes questions according to different answer types, apply different rewriting and verification schemes, respectively. When applied for RFT, we converted 20k MCQA examples and use GRPO to finetune Qwen2.5-VL models. Models trained on ReVeL-OpenQA match MCQA accuracy on multiple-choice benchmarks and improve OpenQA accuracy by about six percentage points, indicating better data efficiency and more robust reward signals than MCQA-based training. When used for evaluation, ReVeL also reveals up to 20 percentage points of score inflation in MCQA benchmarks (relative to OpenQA), improves judging accuracy, and reduces both cost and latency. We will release code and data publicly.


Prompt-Based Value Steering of Large Language Models

Abbo, Giulio Antonio, Belpaeme, Tony

arXiv.org Artificial Intelligence

Large language models are increasingly used in applications where alignment with human values is critical. While model fine-tuning is often employed to ensure safe responses, this technique is static and does not lend itself to everyday situations involving dynamic values and preferences. In this paper, we present a practical, reproducible, and model-agnostic procedure to evaluate whether a prompt candidate can effectively steer generated text toward specific human values, formalising a scoring method to quantify the presence and gain of target values in generated responses. We apply our method to a variant of the Wizard-Vicuna language model, using Schwartz's theory of basic human values and a structured evaluation through a dialogue dataset. With this setup, we compare a baseline prompt to one explicitly conditioned on values, and show that value steering is possible even without altering the model or dynamically optimis-ing prompts.


fba9d88164f3e2d9109ee770223212a0-AuthorFeedback.pdf

Neural Information Processing Systems

We thank the reviewers for their detailed and useful reviews of our paper. Then, we illustrate how texture interpolation will serve further studies of visual perception. Our future work will be dedicated to vision experiments i.e. directed toward a less theoretical audience. If accepted, this paper will be the core technical reference. Y et, the question of why the Gram-based interpolations are patchy is open.


Inside the Messy, Accidental Kryptos Reveal

WIRED

After 35 years, the secretive CIA sculpture finally gave up its mystery, thanks to a novelist, a playwright, and some misplaced documents. But the chase to decode continues. Jim Sanborn couldn't believe it. He was weeks away from auctioning off the answer to Kryptos, the sculpture he created for the CIA that had defied solution for 35 years. As always, wannabe solvers kept on paying him a $50 fee to offer their guesses to the remaining unsolved portion of the 1,800-character encrypted message, known as K4--wrong without exception.


Psychometric Item Validation Using Virtual Respondents with Trait-Response Mediators

Lim, Sungjib, Song, Woojung, Lee, Eun-Ju, Jo, Yohan

arXiv.org Artificial Intelligence

As psychometric surveys are increasingly used to assess the traits of large language models (LLMs), the need for scalable survey item generation suited for LLMs has also grown. A critical challenge here is ensuring the construct validity of generated items, i.e., whether they truly measure the intended trait. Traditionally, this requires costly, large-scale human data collection. To make it efficient, we present a framework for virtual respondent simulation using LLMs. Our central idea is to account for mediators: factors through which the same trait can give rise to varying responses to a survey item. By simulating respondents with diverse mediators, we identify survey items that robustly measure intended traits. Experiments on three psychological trait theories (Big5, Schwartz, VIA) show that our mediator generation methods and simulation framework effectively identify high-validity items. LLMs demonstrate the ability to generate plausible mediators from trait definitions and to simulate respondent behavior for item validation. Our problem formulation, metrics, methodology, and dataset open a new direction for cost-effective survey development and a deeper understanding of how LLMs simulate human survey responses. We publicly release our dataset and code to support future work.


SOLAR: Towards Characterizing Subjectivity of Individuals through Modeling Value Conflicts and Trade-offs

Lee, Younghun, Goldwasser, Dan

arXiv.org Artificial Intelligence

Large Language Models (LLMs) not only have solved complex reasoning problems but also exhibit remarkable performance in tasks that require subjective decision making. Existing studies suggest that LLM generations can be subjectively grounded to some extent, yet exploring whether LLMs can account for individual-level subjectivity has not been sufficiently studied. In this paper, we characterize subjectivity of individuals on social media and infer their moral judgments using LLMs. We propose a framework, SOLAR (Subjective Ground with Value Abstraction), that observes value conflicts and trade-offs in the user-generated texts to better represent subjective ground of individuals. Empirical results show that our framework improves overall inference results as well as performance on controversial situations. Additionally, we qualitatively show that SOLAR provides explanations about individuals' value preferences, which can further account for their judgments.


fba9d88164f3e2d9109ee770223212a0-AuthorFeedback.pdf

Neural Information Processing Systems

We thank the reviewers for their detailed and useful reviews of our paper. Then, we illustrate how texture interpolation will serve further studies of visual perception. Our future work will be dedicated to vision experiments i.e. directed toward a less theoretical audience. If accepted, this paper will be the core technical reference. Y et, the question of why the Gram-based interpolations are patchy is open.


Internal Value Alignment in Large Language Models through Controlled Value Vector Activation

Jin, Haoran, Li, Meng, Wang, Xiting, Xu, Zhihao, Huang, Minlie, Jia, Yantao, Lian, Defu

arXiv.org Artificial Intelligence

Aligning Large Language Models (LLMs) with human values has attracted increasing attention since it provides clarity, transparency, and the ability to adapt to evolving scenarios. In this paper, we introduce a Controlled Value Vector Activation (ConVA) method that directly aligns the internal values of LLMs by interpreting how a value is encoded in their latent representations and modifies relevant activations to ensure consistent values in LLMs. To ensure an accurate and unbiased interpretation, we propose a context-controlled value vector identification method. To consistently control values without sacrificing model performance, we introduce a gated value vector activation method for effective and minimum degree of value control. Experiments show that our method achieves the highest control success rate across 10 basic values without hurting LLM performance and fluency, and ensures target values even with opposite and potentially malicious input prompts. Source code and data are available at~ https://github.com/hr-jin/ConVA.


Value Portrait: Assessing Language Models' Values through Psychometrically and Ecologically Valid Items

Han, Jongwook, Choi, Dongmin, Song, Woojung, Lee, Eun-Ju, Jo, Yohan

arXiv.org Artificial Intelligence

The importance of benchmarks for assessing the values of language models has been pronounced due to the growing need of more authentic, human-aligned responses. However, existing benchmarks rely on human or machine annotations that are vulnerable to value-related biases. Furthermore, the tested scenarios often diverge from real-world contexts in which models are commonly used to generate text and express values. To address these issues, we propose the Value Portrait benchmark, a reliable framework for evaluating LLMs' value orientations with two key characteristics. First, the benchmark consists of items that capture real-life user-LLM interactions, enhancing the relevance of assessment results to real-world LLM usage. Second, each item is rated by human subjects based on its similarity to their own thoughts, and correlations between these ratings and the subjects' actual value scores are derived. This psychometrically validated approach ensures that items strongly correlated with specific values serve as reliable items for assessing those values. Through evaluating 44 LLMs with our benchmark, we find that these models prioritize Benevolence, Security, and Self-Direction values while placing less emphasis on Tradition, Power, and Achievement values. Also, our analysis reveals biases in how LLMs perceive various demographic groups, deviating from real human data.