AITopics | assertiveness

Collaborating Authors

assertiveness

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

LLM Assertiveness can be Mechanistically Decomposed into Emotional and Logical Components

Tsujimura, Hikaru, Tagade, Arush

arXiv.org Artificial IntelligenceSep-3-2025

Large Language Models (LLMs) often display overconfidence, presenting information with unwarranted certainty in high-stakes contexts. We investigate the internal basis of this behavior via mechanistic interpretability. Using open-sourced Llama 3.2 models fine-tuned on human annotated assertiveness datasets, we extract residual activations across all layers, and compute similarity metrics to localize assertive representations. Our analysis identifies layers most sensitive to assertiveness contrasts and reveals that high-assertive representations decompose into two orthogonal sub-components of emotional and logical clusters-paralleling the dual-route Elaboration Likelihood Model in Psychology. Steering vectors derived from these sub-components show distinct causal effects: emotional vectors broadly influence prediction accuracy, while logical vectors exert more localized effects. These findings provide mechanistic evidence for the multi-component structure of LLM assertiveness and highlight avenues for mitigating overconfident behavior.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2508.17182

Country: North America > United States > California (0.14)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Pairwise or Pointwise? Evaluating Feedback Protocols for Bias in LLM-Based Evaluation

Tripathi, Tuhina, Wadhwa, Manya, Durrett, Greg, Niekum, Scott

arXiv.org Artificial IntelligenceAug-22-2025

Large Language Models (LLMs) are widely used as proxies for human labelers in both training (Reinforcement Learning from AI Feedback) and large-scale response evaluation (LLM-as-a-judge). Alignment and evaluation are critical components in the development of reliable LLMs, and the choice of feedback protocol plays a central role in both but remains understudied. In this work, we show that the choice of feedback protocol for evaluation (absolute scores versus relative preferences) can significantly affect evaluation reliability and induce systematic biases. In the context of LLM-as-a-judge evaluation, we show that pairwise protocols are more vulnerable to distracted evaluation. Generator models can exploit spurious attributes (or distractor features) favored by the LLM judge, resulting in inflated scores for lower-quality outputs. We find that absolute scoring is more robust to such manipulation, producing judgments that better reflect response quality and are less influenced by distractor features. Our results demonstrate that generator models can flip preferences by embedding distractor features, skewing LLM-as-a-judge comparisons and leading to inaccurate conclusions about model quality in benchmark evaluations. Pairwise preferences flip in about 35% of the cases, compared to only 9% for absolute scores. We offer recommendations for choosing feedback protocols based on dataset characteristics and evaluation objectives.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2504.14716

Country: North America > United States (0.46)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

"Pull or Not to Pull?'': Investigating Moral Biases in Leading Large Language Models Across Ethical Dilemmas

Ding, Junchen, Jiang, Penghao, Xu, Zihao, Ding, Ziqi, Zhu, Yichen, Jiang, Jiaojiao, Li, Yuekang

arXiv.org Artificial IntelligenceAug-12-2025

As large language models (LLMs) increasingly mediate ethically sensitive decisions, understanding their moral reasoning processes becomes imperative. This study presents a comprehensive empirical evaluation of 14 leading LLMs, both reasoning enabled and general purpose, across 27 diverse trolley problem scenarios, framed by ten moral philosophies, including utilitarianism, deontology, and altruism. Using a factorial prompting protocol, we elicited 3,780 binary decisions and natural language justifications, enabling analysis along axes of decisional assertiveness, explanation answer consistency, public moral alignment, and sensitivity to ethically irrelevant cues. Our findings reveal significant variability across ethical frames and model types: reasoning enhanced models demonstrate greater decisiveness and structured justifications, yet do not always align better with human consensus. Notably, "sweet zones" emerge in altruistic, fairness, and virtue ethics framings, where models achieve a balance of high intervention rates, low explanation conflict, and minimal divergence from aggregated human judgments. However, models diverge under frames emphasizing kinship, legality, or self interest, often producing ethically controversial outcomes. These patterns suggest that moral prompting is not only a behavioral modifier but also a diagnostic tool for uncovering latent alignment philosophies across providers. We advocate for moral reasoning to become a primary axis in LLM alignment, calling for standardized benchmarks that evaluate not just what LLMs decide, but how and why.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2508.07284

Genre: Research Report > New Finding (0.88)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.98)

Add feedback

Epistemic Integrity in Large Language Models

Ghafouri, Bijean, Mohammadzadeh, Shahrad, Zhou, James, Nair, Pratheeksha, Tian, Jacob-Junqi, Goel, Mayank, Rabbany, Reihaneh, Godbout, Jean-François, Pelrine, Kellin

arXiv.org Artificial IntelligenceNov-10-2024

Large language models are increasingly relied upon as sources of information, but their propensity for generating false or misleading statements with high confidence poses risks for users and society. In this paper, we confront the critical problem of epistemic miscalibration -- where a model's linguistic assertiveness fails to reflect its true internal certainty. We introduce a new human-labeled dataset and a novel method for measuring the linguistic assertiveness of Large Language Models (LLMs) which cuts error rates by over 50% relative to previous benchmarks. Validated across multiple datasets, our method reveals a stark misalignment between how confidently models linguistically present information and their actual accuracy. Further human evaluations confirm the severity of this miscalibration. This evidence underscores the urgent risk of the overstated certainty LLMs hold which may mislead users on a massive scale. Our framework provides a crucial step forward in diagnosing this miscalibration, offering a path towards correcting it and more trustworthy AI across domains. Large Language Models (LLMs) have markedly transformed how humans seek and consume information, becoming integral across diverse fields such as public health (Ali et al., 2023), coding (Zambrano et al., 2023), and education (Whalen & et al., 2023). Despite their growing influence, LLMs are not without shortcomings. One notable issue is the potential for generating responses that, while convincing, may be inaccurate or nonsensical--a long-standing phenomenon often referred to as "hallucinations" (Jo, 2023; Huang et al., 2023; Zhou et al., 2024b). This raises concerns about the reliability and trustworthiness of these models. A critical aspect of trustworthiness in LLMs is epistemic calibration, which represents the alignment between a model's internal confidence in its outputs and the way it expresses that confidence through natural language. Misalignment between internal certainty and external expression can lead to users being misled by overconfident or underconfident statements, posing significant risks in high-stakes domains such as legal advice, medical diagnosis, and misinformation detection. While of great normative concern, how LLMs express linguistic uncertainty has received relatively little attention to date (Sileo & Moens, 2023; Belem et al., 2024). Figures 1 and 5 illustrate the issue of epistemic calibration providing insights into the operation of certainty in the context of human interactions with LLMs. Distinct Roles of Certainty: Internal certainty and linguistic assertiveness have distinct functions within LLM interactions that shape individual beliefs. Human access to LLM certainty: Linguistic assertiveness holds a critical role as the primary form of certainty available to users.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2411.06528

Country:

North America > United States > California (0.14)
North America > Canada > Quebec > Montreal (0.14)
Asia > Afghanistan (0.04)
(6 more...)

Genre: Research Report > Promising Solution (0.48)

Industry:

Government > Military (1.00)
Government > Regional Government > North America Government > United States Government (0.93)
Media > News (0.88)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Human Feedback is not Gold Standard

Hosking, Tom, Blunsom, Phil, Bartolo, Max

arXiv.org Artificial IntelligenceJan-16-2024

Human feedback has become the de facto standard for evaluating the performance of Large Language Models, and is increasingly being used as a training objective. However, it is not clear which properties of a generated output this single `preference' score captures. We hypothesise that preference scores are subjective and open to undesirable biases. We critically analyse the use of human feedback for both training and evaluation, to verify whether it fully captures a range of crucial error criteria. We find that while preference scores have fairly good coverage, they under-represent important aspects like factuality. We further hypothesise that both preference scores and error annotation may be affected by confounders, and leverage instruction-tuned models to generate outputs that vary along two possible confounding dimensions: assertiveness and complexity. We find that the assertiveness of an output skews the perceived rate of factuality errors, indicating that human annotations are not a fully reliable evaluation metric or training objective. Finally, we offer preliminary evidence that using human feedback as a training objective disproportionately increases the assertiveness of model outputs. We encourage future work to carefully consider whether preference scores are well aligned with the desired objective.

annotator, assertiveness, evaluation, (16 more...)

arXiv.org Artificial Intelligence

2309.16349

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
North America > Canada > Ontario > Toronto (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(8 more...)

Genre:

Research Report (1.00)
Instructional Material (1.00)

Industry: Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

I Can Get Any Woman I Want Online. Somehow That Doesn't Work In Person.

SlateJan-9-2024, 18:00:00 GMT

How to Do It is Slate's sex advice column. Send it to Stoya and Rich here. As a sexually dominant-leaning female, I get a lot of instant gratification out of gorgeous women online telling me my assertiveness is impressive and sexy. When I have sex with women in my dreams, it's perfect. While my "traditional" long-term relationships have been with male-presenting people, I slept with several women in my early 20s--though I struggled to find satisfying connections.

artificial intelligence, sex life, social media, (10 more...)

Slate

Genre: Personal > Human Interest (0.40)

Industry: Health & Medicine (0.30)

Technology:

Information Technology > Communications > Social Media (0.50)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.50)

Add feedback