AITopics | susceptibility

Collaborating Authors

susceptibility

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Phase transition on a context-sensitive random language model with short range interactions

Toji, Yuma, Takahashi, Jun, Roychowdhury, Vwani, Miyahara, Hideyuki

arXiv.org Machine LearningApr-2-2026

Since the random language model was proposed by E. DeGiuli [Phys. Rev. Lett. 122, 128301], language models have been investigated intensively from the viewpoint of statistical mechanics. Recently, the existence of a Berezinskii--Kosterlitz--Thouless transition was numerically demonstrated in models with long-range interactions between symbols. In statistical mechanics, it has long been known that long-range interactions can induce phase transitions. Therefore, it has remained unclear whether phase transitions observed in language models originate from genuinely linguistic properties that are absent in conventional spin models. In this study, we construct a random language model with short-range interactions and numerically investigate its statistical properties. Our model belongs to the class of context-sensitive grammars in the Chomsky hierarchy and allows explicit reference to contexts. We find that a phase transition occurs even when the model refers only to contexts whose length remains constant with respect to the sentence length. This result indicates that finite-temperature phase transitions in language models are genuinely induced by the intrinsic nature of language, rather than by long-range interactions.

artificial intelligence, natural language, transition, (19 more...)

arXiv.org Machine Learning

2604.00947

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Reading (0.04)
(3 more...)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.34)

Add feedback

648a5a590ca6f2bb5de53f938e230160-Paper-Conference.pdf

Neural Information Processing SystemsFeb-15-2026, 11:31:30 GMT

artificial intelligence, machine learning, transition, (18 more...)

Neural Information Processing Systems

Country:

Europe > Spain > Galicia > Madrid (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)
Asia > Japan (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.68)
Information Technology > Artificial Intelligence > Vision (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Thermodynamic Characterizations of Singular Bayesian Models: Specific Heat, Susceptibility, and Entropy Flow in Posterior Geometry

Plummer, Sean

arXiv.org Machine LearningDec-29-2025

Singular learning theory (SLT) \citep{watanabe2009algebraic,watanabe2018mathematical} provides a rigorous asymptotic framework for Bayesian models with non-identifiable parameterizations, yet the statistical meaning of its second-order invariant, the \emph{singular fluctuation}, has remained unclear. In this work, we show that singular fluctuation admits a precise and natural interpretation as a \emph{specific heat}: the second derivative of the Bayesian free energy with respect to temperature. Equivalently, it measures the posterior variance of the log-likelihood observable under the tempered Gibbs posterior. We further introduce a collection of related thermodynamic quantities, including entropy flow, prior susceptibility, and cross-susceptibility, that together provide a detailed geometric diagnosis of singular posterior structure. Through extensive numerical experiments spanning discrete symmetries, boundary singularities, continuous gauge freedoms, and piecewise (ReLU) models, we demonstrate that these thermodynamic signatures cleanly distinguish singularity types, exhibit stable finite-sample behavior, and reveal phase-transition--like phenomena as temperature varies. We also show empirically that the widely used WAIC estimator \citep{watanabe2010asymptotic, watanabe2013widely} is exactly twice the thermodynamic specific heat at unit temperature, clarifying its robustness in singular models.Our results establish a concrete bridge between singular learning theory and statistical mechanics, providing both theoretical insight and practical diagnostics for modern Bayesian models.

artificial intelligence, fluctuation, machine learning, (19 more...)

arXiv.org Machine Learning

2512.21411

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Evaluating the Simulation of Human Personality-Driven Susceptibility to Misinformation with LLMs

Pratelli, Manuel, Petrocchi, Marinella

arXiv.org Artificial IntelligenceNov-27-2025

Large language models (LLMs) make it possible to generate synthetic behavioural data at scale, offering an ethical and low-cost alternative to human experiments. Whether such data can faithfully capture psychological differences driven by personality traits, however, remains an open question. We evaluate the capacity of LLM agents, conditioned on Big-Five profiles, to reproduce personality-based variation in susceptibility to misinformation, focusing on news discernment, the ability to judge true headlines as true and false headlines as false. Leveraging published datasets in which human participants with known personality profiles rated headline accuracy, we create matching LLM agents and compare their responses to the original human patterns. Certain trait-misinformation associations, notably those involving Agreeableness and Conscientiousness, are reliably replicated, whereas others diverge, revealing systematic biases in how LLMs internalize and express personality. The results underscore both the promise and the limits of personality-aligned LLMs for behavioral simulation, and offer new insight into modeling cognitive diversity in artificial agents.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.3233/FAIA250901

2506.2361

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry:

Media > News (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.52)

Add feedback

Moral Susceptibility and Robustness under Persona Role-Play in Large Language Models

Costa, Davi Bastos, Alves, Felippe, Vicente, Renato

arXiv.org Artificial IntelligenceNov-12-2025

Large language models (LLMs) increasingly operate in social contexts, motivating analysis of how they express and shift moral judgments. In this work, we investigate the moral response of LLMs to persona role-play, prompting a LLM to assume a specific character. Using the Moral Foundations Questionnaire (MFQ), we introduce a benchmark that quantifies two properties: moral susceptibility and moral robustness, defined from the variability of MFQ scores across and within personas, respectively. We find that, for moral robustness, model family accounts for most of the variance, while model size shows no systematic effect. The Claude family is, by a significant margin, the most robust, followed by Gemini and GPT-4 models, with other families exhibiting lower robustness. In contrast, moral susceptibility exhibits a mild family effect but a clear within-family size effect, with larger variants being more susceptible. Moreover, robustness and susceptibility are positively correlated, an association that is more pronounced at the family level. Additionally, we present moral foundation profiles for models without persona role-play and for personas averaged across models. Together, these analyses provide a systematic view of how persona conditioning shapes moral behavior in large language models.

large language model, machine learning, susceptibility, (20 more...)

arXiv.org Artificial Intelligence

2511.08565

Country: North America > United States (0.46)

Genre:

Questionnaire & Opinion Survey (0.89)
Research Report (0.66)

Industry:

Banking & Finance (1.00)
Government (0.68)
Law (0.68)
(6 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

Simulating Misinformation Vulnerabilities With Agent Personas

Farr, David, Ng, Lynnette Hui Xian, Prochaska, Stephen, Cruickshank, Iain J., West, Jevin

arXiv.org Artificial IntelligenceNov-10-2025

School of Computer Science, Carnegie Mellon University, Pittsburgh, P A, USA ABSTRACT Disinformation campaigns can distort public perception and destabilize institutions. Understanding how different populations respond to information is crucial for designing effective interventions, yet real-world experimentation is impractical and ethically challenging. To address this, we develop an agent-based simulation using Large Language Models (LLMs) to model responses to misinformation. We construct agent personas spanning five professions and three mental schemas, and evaluate their reactions to news headlines. Our findings show that LLM-generated agents align closely with ground-truth labels and human predictions, supporting their use as proxies for studying information responses. We also find that mental schemas, more than professional background, influence how agents interpret misinformation. This work provides a validation of LLMs to be used as agents in an agent-based model of an information network for analyzing trust, polarization, and susceptibility to deceptive content in complex social systems. 1 INTRODUCTION Protection against foreign information campaigns and the ability to conduct effective information operations are critical to modern national security. In an era where the information domain can be leveraged as a battlefield, there is a need to maintain information advantage, defined as "the use, protection, and exploitation of information to achieve objectives more effectively than enemies and adversaries do" (U.S. Achieving and sustaining information advantage requires not only the ability to disseminate compelling narratives but also to detect, counter, and mitigate adversarial information operations.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2511.04697

Country:

North America > United States (1.00)
Europe (1.00)

Genre: Research Report > New Finding (1.00)

Industry:

Media > News (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Government > Military > Army (0.89)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Death by a Thousand Prompts: Open Model Vulnerability Analysis

Chang, Amy, Conley, Nicholas, Ganesan, Harish Santhanalakshmi, Swanda, Adam

arXiv.org Artificial IntelligenceNov-6-2025

Open-weight models provide researchers and developers with accessible foundations for diverse downstream applications. We tested the safety and security postures of eight open-weight large language models (LLMs) to identify vulnerabilities that may impact subsequent fine-tuning and deployment. Using automated adversarial testing, we measured each model's resilience against single-turn and multi-turn prompt injection and jailbreak attacks. Our findings reveal pervasive vulnerabilities across all tested models, with multi-turn attacks achieving success rates between 25.86\% and 92.78\% -- representing a $2\times$ to $10\times$ increase over single-turn baselines. These results underscore a systemic inability of current open-weight models to maintain safety guardrails across extended interactions. We assess that alignment strategies and lab priorities significantly influence resilience: capability-focused models such as Llama 3.3 and Qwen 3 demonstrate higher multi-turn susceptibility, whereas safety-oriented designs such as Google Gemma 3 exhibit more balanced performance. The analysis concludes that open-weight models, while crucial for innovation, pose tangible operational and ethical risks when deployed without layered security controls. These findings are intended to inform practitioners and developers of the potential risks and the value of professional AI security solutions to mitigate exposure. Addressing multi-turn vulnerabilities is essential to ensure the safe, reliable, and responsible deployment of open-weight LLMs in enterprise and public domains. We recommend adopting a security-first design philosophy and layered protections to ensure resilient deployments of open-weight models.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2511.03247

Genre: Research Report > New Finding (0.86)

Industry:

Information Technology > Security & Privacy (1.00)
Commercial Services & Supplies > Security & Alarm Services (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Detecting Prefix Bias in LLM-based Reward Models

Kumar, Ashwin, He, Yuzi, Markosyan, Aram H., Chern, Bobbie, Arrieta-Ibarra, Imanol

arXiv.org Artificial IntelligenceNov-3-2025

Reinforcement Learning with Human Feedback (RLHF) has emerged as a key paradigm for task-specific fine-tuning of language models using human preference data. While numerous publicly available preference datasets provide pairwise comparisons of responses, the potential for biases in the resulting reward models remains underexplored. In this work, we introduce novel methods to detect and evaluate prefix bias -- a systematic shift in model preferences triggered by minor variations in query prefixes -- in LLM-based reward models trained on such datasets. We leverage these metrics to reveal significant biases in preference models across racial and gender dimensions. Our comprehensive evaluation spans diverse open-source preference datasets and reward model architectures, demonstrating susceptibility to this kind of bias regardless of the underlying model architecture. Furthermore, we propose a data augmentation strategy to mitigate these biases, showing its effectiveness in reducing the impact of prefix bias. Our findings highlight the critical need for bias-aware dataset design and evaluation in developing fair and reliable reward models, contributing to the broader discourse on fairness in AI.

large language model, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2505.13487

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.87)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback

MMPersuade: A Dataset and Evaluation Framework for Multimodal Persuasion

Qiu, Haoyi, Zhou, Yilun, Venkit, Pranav Narayanan, Huang, Kung-Hsiang, Zhang, Jiaxin, Peng, Nanyun, Wu, Chien-Sheng

arXiv.org Artificial IntelligenceOct-28-2025

As Large Vision-Language Models (LVLMs) are increasingly deployed in domains such as shopping, health, and news, they are exposed to pervasive persuasive content. A critical question is how these models function as persuadees-how and why they can be influenced by persuasive multimodal inputs. Understanding both their susceptibility to persuasion and the effectiveness of different persuasive strategies is crucial, as overly persuadable models may adopt misleading beliefs, override user preferences, or generate unethical or unsafe outputs when exposed to manipulative messages. We introduce MMPersuade, a unified framework for systematically studying multimodal persuasion dynamics in LVLMs. MMPersuade contributes (i) a comprehensive multimodal dataset that pairs images and videos with established persuasion principles across commercial, subjective and behavioral, and adversarial contexts, and (ii) an evaluation framework that quantifies both persuasion effectiveness and model susceptibility via third-party agreement scoring and self-estimated token probabilities on conversation histories. Our study of six leading LVLMs as persuadees yields three key insights: (i) multimodal inputs substantially increase persuasion effectiveness-and model susceptibility-compared to text alone, especially in misinformation scenarios; (ii) stated prior preferences decrease susceptibility, yet multimodal information maintains its persuasive advantage; and (iii) different strategies vary in effectiveness across contexts, with reciprocity being most potent in commercial and subjective contexts, and credibility and logic prevailing in adversarial contexts. By jointly analyzing persuasion effectiveness and susceptibility, MMPersuade provides a principled foundation for developing models that are robust, preference-consistent, and ethically aligned when engaging with persuasive multimodal content.

large language model, machine learning, persuasion, (22 more...)

arXiv.org Artificial Intelligence

2510.22768

Country: North America > United States > California (0.46)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.87)

Industry:

Information Technology (1.00)
Health & Medicine > Consumer Health (1.00)
Government (1.00)
(5 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

Uncovering the Persuasive Fingerprint of LLMs in Jailbreaking Attacks

Noughabi, Havva Alizadeh, Serbanescu, Julien, Zarrinkalam, Fattane, Dehghantanha, Ali

arXiv.org Artificial IntelligenceOct-28-2025

Despite recent advances, Large Language Models remain vulnerable to jailbreak attacks that bypass alignment safeguards and elicit harmful outputs. While prior research has proposed various attack strategies differing in human readability and transferability, little attention has been paid to the linguistic and psychological mechanisms that may influence a model's susceptibility to such attacks. In this paper, we examine an interdisciplinary line of research that leverages foundational theories of persuasion from the social sciences to craft adversarial prompts capable of circumventing alignment constraints in LLMs. Drawing on well-established persuasive strategies, we hypothesize that LLMs, having been trained on large-scale human-generated text, may respond more compliantly to prompts with persuasive structures. Furthermore, we investigate whether LLMs themselves exhibit distinct persuasive fingerprints that emerge in their jailbreak responses. Empirical evaluations across multiple aligned LLMs reveal that persuasion-aware prompts significantly bypass safeguards, demonstrating their potential to induce jailbreak behaviors. This work underscores the importance of cross-disciplinary insight in addressing the evolving challenges of LLM safety. The code and data are available.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2510.21983

Country: North America > Canada (0.16)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.99)

Add feedback