AITopics | truthfulness

Collaborating Authors

truthfulness

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Honesty Is the Best Policy: Defining and Mitigating AIDeception

Neural Information Processing SystemsApr-24-2026, 10:30:15 GMT

Deceptive agents are a challenge for the safety, trustworthiness, and cooperation of AI systems. We focus on the problem that agents might deceive in order to achieve their goals (for instance, in our experiments with language models, the goal of being evaluated as truthful). There are a number of existing definitions of deception in the literature on game theory and symbolic AI, but there is no overarching theory of deception for learning agents in games. We introduce a formal definition of deception in structural causal games, grounded in the philosophy literature, and applicable to real-world machine learning systems. Several examples and results illustrate that our formal definition aligns with the philosophical and commonsense meaning of deception. Our main technical result is to provide graphical criteria for deception. We show, experimentally, that these results can be used to mitigate deception in reinforcement learning agents and language models.

large language model, machine learning, reinforcement learning, (21 more...)

Neural Information Processing Systems

Country:

North America > United States (1.00)
Europe (1.00)

Genre: Research Report > New Finding (0.34)

Industry:

Government (0.46)
Leisure & Entertainment > Games (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.95)
(2 more...)

Add feedback

bfab6c120092a9bf530b6aff18e1436c-Paper-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 22:01:18 GMT

artificial intelligence, data mining, machine learning, (22 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology > Services (0.46)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)
Information Technology > Data Science > Data Mining > Big Data (0.48)

Add feedback

or Fiction: Can Truthful Mechanisms Eliminate Federated Free Riding?

Neural Information Processing SystemsFeb-16-2026, 04:04:35 GMT

Standard federated learning (FL) approaches are vulnerable to the free-rider dilemma: participating agents can contribute little to nothing yet receive a well-trained aggregated model.

agent, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Maryland (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry:

Education (0.68)
Government > Military (0.67)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science (0.92)

Add feedback

Randomized Strategic Facility Location with Predictions

Neural Information Processing SystemsFeb-11-2026, 17:07:39 GMT

The aim is to design truthful mechanisms, ensuring agents cannot gain by misreporting.

artificial intelligence, machine learning, mechanism, (18 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany > Berlin (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

RandomRank: TheOneandOnlyStrategyproofand ProportionallyFairRandomizedFacilityLocation Mechanism

Neural Information Processing SystemsFeb-11-2026, 15:02:12 GMT

Proportionality is an attractive fairness concept that has been applied to a range of problems including the facility location problem, a classic problem in social choice.

artificial intelligence, mechanism, proportionality, (16 more...)

Neural Information Processing Systems

Country:

Oceania > Australia (0.05)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence (0.50)

Add feedback

1435d2d0fca85a84d83ddcb754f58c29-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsFeb-8-2026, 06:35:39 GMT

benchmark, information, truthfulness, (15 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Asia > China > Hong Kong (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry:

Leisure & Entertainment (1.00)
Media > Music (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Honesty Is the Best Policy: Defining and Mitigating AI Deception Francis Rhys Ward, Francesco Belardinelli, Francesca T oni

Neural Information Processing SystemsFeb-7-2026, 12:24:41 GMT

Deceptive agents are a challenge for the safety, trustworthiness, and cooperation of AI systems.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

Asia > Indonesia > Bali (0.07)
Asia > Middle East > Jordan (0.04)
North America > United States > Hawaii (0.04)
(14 more...)

Genre: Research Report (0.68)

Industry:

Leisure & Entertainment (0.46)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Inference-Time Intervention: Eliciting Truthful Answers from a Language Model

Neural Information Processing SystemsDec-26-2025, 06:08:26 GMT

We introduce Inference-Time Intervention (ITI), a technique designed to enhance the truthfulness of large language models (LLMs). ITI operates by shifting model activations during inference, following a learned set of directions across a limited number of attention heads. This intervention significantly improves the performance of LLaMA models on the TruthfulQA benchmark. On an instruction-finetuned LLaMA called Alpaca, ITI improves its truthfulness from $32.5\%$ to $65.1\%$. We identify a tradeoff between truthfulness and helpfulness and demonstrate how to balance it by tuning the intervention strength. ITI is minimally invasive and computationally inexpensive. Moreover, the technique is data efficient: while approaches like RLHF require extensive annotations, ITI locates truthful directions using only few hundred examples. Our findings suggest that LLMs may have an internal representation of the likelihood of something being true, even as they produce falsehoods on the surface.

eliciting truthful answer, inference-time intervention, name change, (3 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.61)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.85)

Add feedback

SAE-SSV: Supervised Steering in Sparse Representation Spaces for Reliable Control of Language Models

He, Zirui, Jin, Mingyu, Shen, Bo, Payani, Ali, Zhang, Yongfeng, Du, Mengnan

arXiv.org Artificial IntelligenceDec-8-2025

Large language models (LLMs) have demonstrated impressive capabilities in natural language understanding and generation, but controlling their behavior reliably remains challenging, especially in open-ended generation settings. This paper introduces a novel supervised steering approach that operates in sparse, interpretable representation spaces. We employ sparse autoencoders (SAEs) to obtain sparse latent representations that aim to disentangle semantic attributes from model activations. Then we train linear classifiers to identify a small subspace of task-relevant dimensions in latent representations. Finally, we learn supervised steering vectors constrained to this subspace, optimized to align with target behaviors. Experiments across sentiment, truthfulness, and political polarity steering tasks with multiple LLMs demonstrate that our supervised steering vectors achieve higher success rates with minimal degradation in generation quality compared to existing methods. Further analysis reveals that a notably small subspace is sufficient for effective steering, enabling more targeted and interpretable interventions. Our implementation is publicly available at https://github.com/Ineedanamehere/SAE-SSV.

large language model, machine learning, sentiment, (18 more...)

arXiv.org Artificial Intelligence

2505.16188

Genre: Research Report (1.00)

Industry: Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.90)

Add feedback

Reasoning-Intensive Regression

Tchuindjo, Diane, Khattab, Omar

arXiv.org Artificial IntelligenceDec-2-2025

AI researchers and practitioners increasingly apply large language models (LLMs) to what we call reasoning-intensive regression (RiR), i.e., deducing subtle numerical scores from text. Unlike standard language regression tasks, e.g., for sentiment or similarity, RiR often appears instead in ad-hoc problems such as rubric-based scoring, modeling dense rewards in complex environments, or domain-specific retrieval, where much deeper analysis of context is required while only limited task-specific training data and computation are available. We cast four realistic problems as RiR tasks to establish an initial benchmark, and use that to test our hypothesis that prompting frozen LLMs and finetuning Transformer encoders via gradient descent will both often struggle in RiR. We then propose MENTAT, a simple and lightweight method that combines batch-reflective prompt optimization with neural ensemble learning. MENTAT achieves up to 65% improvement over both baselines, though substantial room remains for future advances in RiR.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2508.21762

Country: