AITopics | schwartz

Collaborating Authors

schwartz

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Value-Based Large Language Model Agent Simulation for Mutual Evaluation of Trust and Interpersonal Closeness

Sakamoto, Yuki, Uchida, Takahisa, Ishiguro, Hiroshi

arXiv.org Artificial IntelligenceNov-26-2025

Large language models (LLMs) have emerged as powerful tools for simulating complex social phenomena using human-like agents with specific traits. In human societies, value similarity is important for building trust and close relationships; however, it remains unexplored whether this principle holds true in artificial societies comprising LLM agents. Therefore, this study investigates the influence of value similarity on relationship-building among LLM agents through two experiments. First, in a preliminary experiment, we evaluated the controllability of values in LLMs to identify the most effective model and prompt design for controlling the values. Subsequently, in the main experiment, we generated pairs of LLM agents imbued with specific values and analyzed their mutual evaluations of trust and interpersonal closeness following a dialogue. The experiments were conducted in English and Japanese to investigate language dependence. The results confirmed that pairs of agents with higher value similarity exhibited greater mutual trust and interpersonal closeness. Our findings demonstrate that the LLM agent simulation serves as a valid testbed for social science theories, contributes to elucidating the mechanisms by which values influence relationship building, and provides a foundation for inspiring new theories and insights into the social sciences.

artificial intelligence, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1038/s41598-025-25531-1

2507.11979

Country: Asia > Japan (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Beyond Multiple Choice: Verifiable OpenQA for Robust Vision-Language RFT

Liu, Yesheng, Li, Hao, Xu, Haiyu, Pei, Baoqi, Wang, Jiahao, Zhao, Mingxuan, Zheng, Jingshu, He, Zheqi, Yao, JG, Qin, Bowen, Yang, Xi, Zhang, Jiajun

arXiv.org Artificial IntelligenceNov-25-2025

Multiple-choice question answering (MCQA) has been a popular format for evaluating and reinforcement fine-tuning (RFT) of modern multimodal language models. Its constrained output format allows for simplified, deterministic automatic verification. However, we find that the options may leak exploitable signals, which makes the accuracy metrics unreliable for indicating real capabilities and encourages explicit or implicit answer guessing behaviors during RFT. We propose ReVeL (Rewrite and Verify by LLM), a framework that rewrites multiple-choice questions into open-form questions while keeping answers verifiable whenever possible. The framework categorizes questions according to different answer types, apply different rewriting and verification schemes, respectively. When applied for RFT, we converted 20k MCQA examples and use GRPO to finetune Qwen2.5-VL models. Models trained on ReVeL-OpenQA match MCQA accuracy on multiple-choice benchmarks and improve OpenQA accuracy by about six percentage points, indicating better data efficiency and more robust reward signals than MCQA-based training. When used for evaluation, ReVeL also reveals up to 20 percentage points of score inflation in MCQA benchmarks (relative to OpenQA), improves judging accuracy, and reduces both cost and latency. We will release code and data publicly.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2511.17405

Country:

North America > United States (0.68)
Asia (0.46)

Genre:

Questionnaire & Opinion Survey (0.56)
Research Report (0.50)

Industry:

Education (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.47)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

Prompt-Based Value Steering of Large Language Models

Abbo, Giulio Antonio, Belpaeme, Tony

arXiv.org Artificial IntelligenceNov-24-2025

Large language models are increasingly used in applications where alignment with human values is critical. While model fine-tuning is often employed to ensure safe responses, this technique is static and does not lend itself to everyday situations involving dynamic values and preferences. In this paper, we present a practical, reproducible, and model-agnostic procedure to evaluate whether a prompt candidate can effectively steer generated text toward specific human values, formalising a scoring method to quantify the presence and gain of target values in generated responses. We apply our method to a variant of the Wizard-Vicuna language model, using Schwartz's theory of basic human values and a structured evaluation through a dialogue dataset. With this setup, we compare a baseline prompt to one explicitly conditioned on values, and show that value steering is possible even without altering the model or dynamically optimis-ing prompts.

large language model, natural language, prompt candidate, (16 more...)

arXiv.org Artificial Intelligence

2511.16688

Country:

Europe (0.46)
Asia (0.28)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Inside the Messy, Accidental Kryptos Reveal

WIREDOct-24-2025, 15:08:57 GMT

After 35 years, the secretive CIA sculpture finally gave up its mystery, thanks to a novelist, a playwright, and some misplaced documents. But the chase to decode continues. Jim Sanborn couldn't believe it. He was weeks away from auctioning off the answer to Kryptos, the sculpture he created for the CIA that had defied solution for 35 years. As always, wannabe solvers kept on paying him a $50 fee to offer their guesses to the remaining unsolved portion of the 1,800-character encrypted message, known as K4--wrong without exception.

kobek, plaintext, sanborn, (14 more...)

WIRED

Country:

North America > United States > District of Columbia > Washington (0.14)
Asia > Middle East > Palestine (0.14)
North America > United States > California > San Francisco County > San Francisco (0.05)
(5 more...)

Industry:

Media (1.00)
Law (1.00)
Information Technology (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence (0.70)
Information Technology > Security & Privacy (0.67)

Add feedback

Psychometric Item Validation Using Virtual Respondents with Trait-Response Mediators

Lim, Sungjib, Song, Woojung, Lee, Eun-Ju, Jo, Yohan

arXiv.org Artificial IntelligenceOct-7-2025

As psychometric surveys are increasingly used to assess the traits of large language models (LLMs), the need for scalable survey item generation suited for LLMs has also grown. A critical challenge here is ensuring the construct validity of generated items, i.e., whether they truly measure the intended trait. Traditionally, this requires costly, large-scale human data collection. To make it efficient, we present a framework for virtual respondent simulation using LLMs. Our central idea is to account for mediators: factors through which the same trait can give rise to varying responses to a survey item. By simulating respondents with diverse mediators, we identify survey items that robustly measure intended traits. Experiments on three psychological trait theories (Big5, Schwartz, VIA) show that our mediator generation methods and simulation framework effectively identify high-validity items. LLMs demonstrate the ability to generate plausible mediators from trait definitions and to simulate respondent behavior for item validation. Our problem formulation, metrics, methodology, and dataset open a new direction for cost-effective survey development and a deeper understanding of how LLMs simulate human survey responses. We publicly release our dataset and code to support future work.

large language model, mediator, natural language, (18 more...)

arXiv.org Artificial Intelligence

2507.0589

Country:

North America > United States (0.28)
Asia > Middle East > UAE (0.28)

Genre:

Research Report > New Finding (1.00)
Questionnaire & Opinion Survey (1.00)

Industry:

Health & Medicine (0.46)
Education > Educational Setting (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

SOLAR: Towards Characterizing Subjectivity of Individuals through Modeling Value Conflicts and Trade-offs

Lee, Younghun, Goldwasser, Dan

arXiv.org Artificial IntelligenceSep-29-2025

Large Language Models (LLMs) not only have solved complex reasoning problems but also exhibit remarkable performance in tasks that require subjective decision making. Existing studies suggest that LLM generations can be subjectively grounded to some extent, yet exploring whether LLMs can account for individual-level subjectivity has not been sufficiently studied. In this paper, we characterize subjectivity of individuals on social media and infer their moral judgments using LLMs. We propose a framework, SOLAR (Subjective Ground with Value Abstraction), that observes value conflicts and trade-offs in the user-generated texts to better represent subjective ground of individuals. Empirical results show that our framework improves overall inference results as well as performance on controversial situations. Additionally, we qualitatively show that SOLAR provides explanations about individuals' value preferences, which can further account for their judgments.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2504.12633

Country: North America > United States (1.00)

Genre:

Research Report > New Finding (0.66)
Research Report > Experimental Study (0.48)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

fba9d88164f3e2d9109ee770223212a0-AuthorFeedback.pdf

Neural Information Processing SystemsAug-17-2025, 09:35:57 GMT

We thank the reviewers for their detailed and useful reviews of our paper. Then, we illustrate how texture interpolation will serve further studies of visual perception. Our future work will be dedicated to vision experiments i.e. directed toward a less theoretical audience. If accepted, this paper will be the core technical reference. Y et, the question of why the Gram-based interpolations are patchy is open.

artificial intelligence, interpolation, machine learning, (17 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Therapeutic Area > Neurology (0.51)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.33)

Add feedback

Internal Value Alignment in Large Language Models through Controlled Value Vector Activation

Jin, Haoran, Li, Meng, Wang, Xiting, Xu, Zhihao, Huang, Minlie, Jia, Yantao, Lian, Defu

arXiv.org Artificial IntelligenceJul-16-2025

Aligning Large Language Models (LLMs) with human values has attracted increasing attention since it provides clarity, transparency, and the ability to adapt to evolving scenarios. In this paper, we introduce a Controlled Value Vector Activation (ConVA) method that directly aligns the internal values of LLMs by interpreting how a value is encoded in their latent representations and modifies relevant activations to ensure consistent values in LLMs. To ensure an accurate and unbiased interpretation, we propose a context-controlled value vector identification method. To consistently control values without sacrificing model performance, we introduce a gated value vector activation method for effective and minimum degree of value control. Experiments show that our method achieves the highest control success rate across 10 basic values without hurting LLM performance and fluency, and ensures target values even with opposite and potentially malicious input prompts. Source code and data are available at~ https://github.com/hr-jin/ConVA.

large language model, machine learning, schwartz, (17 more...)

arXiv.org Artificial Intelligence

2507.11316

Country:

North America > United States (0.67)
Asia > China (0.46)

Genre: Research Report > New Finding (0.93)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Value Portrait: Assessing Language Models' Values through Psychometrically and Ecologically Valid Items

Han, Jongwook, Choi, Dongmin, Song, Woojung, Lee, Eun-Ju, Jo, Yohan

arXiv.org Artificial IntelligenceJun-12-2025

The importance of benchmarks for assessing the values of language models has been pronounced due to the growing need of more authentic, human-aligned responses. However, existing benchmarks rely on human or machine annotations that are vulnerable to value-related biases. Furthermore, the tested scenarios often diverge from real-world contexts in which models are commonly used to generate text and express values. To address these issues, we propose the Value Portrait benchmark, a reliable framework for evaluating LLMs' value orientations with two key characteristics. First, the benchmark consists of items that capture real-life user-LLM interactions, enhancing the relevance of assessment results to real-world LLM usage. Second, each item is rated by human subjects based on its similarity to their own thoughts, and correlations between these ratings and the subjects' actual value scores are derived. This psychometrically validated approach ensures that items strongly correlated with specific values serve as reliable items for assessing those values. Through evaluating 44 LLMs with our benchmark, we find that these models prioritize Benevolence, Security, and Self-Direction values while placing less emphasis on Tradition, Power, and Achievement values. Also, our analysis reveals biases in how LLMs perceive various demographic groups, deviating from real human data.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2505.01015

Country:

North America > United States (0.46)
Asia > Middle East > UAE (0.27)

Genre:

Research Report > New Finding (1.00)
Questionnaire & Opinion Survey (0.93)
Research Report > Experimental Study (0.67)

Industry:

Education (0.92)
Health & Medicine > Consumer Health (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback