AITopics | dishonesty

Collaborating Authors

dishonesty

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

But what is your honest answer? Aiding LLM-judges with honest alternatives using steering vectors

Eshuijs, Leon, Chaudhury, Archie, McBeth, Alan, Nguyen, Ethan

arXiv.org Artificial IntelligenceNov-7-2025

Detecting subtle forms of dishonesty like sycophancy and manipulation in Large Language Models (LLMs) remains challenging for both humans and automated evaluators, as these behaviors often appear through small biases rather than clear false statements. We introduce Judge Using Safety-Steered Alternatives (JUSSA), a novel framework that employs steering vectors not to improve model behavior directly, but to enhance LLM judges' evaluation capabilities. JUSSA applies steering vectors during inference to generate more honest alternatives, providing judges with contrastive examples that make subtle dishonest patterns easier to detect. While existing evaluation methods rely on black-box evaluation, JUSSA leverages model internals to create targeted comparisons from single examples. We evaluate our method on sycophancy detection and introduce a new manipulation dataset covering multiple types of manipulation. Our results demonstrate that JUSSA effectively improves detection accuracy over single-response evaluation in various cases. Analysis across judge models reveals that JUSSA helps weaker judges on easier dishonesty detection tasks, and stronger judges on harder tasks. Layer-wise experiments show how dishonest prompts cause representations to diverge from honest ones in middle layers, revealing where steering interventions are most effective for generating contrastive examples. By demonstrating that steering vectors can enhance safety evaluation rather than just modify behavior, our work opens new directions for scalable model auditing as systems become increasingly sophisticated.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2505.1776

Country: Europe > Austria (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Banking & Finance (0.93)
Education (0.68)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Dishonesty in Helpful and Harmless Alignment

Huang, Youcheng, Tang, Jingkun, Feng, Duanyu, Zhang, Zheng, Lei, Wenqiang, Lv, Jiancheng, Cohn, Anthony G.

arXiv.org Artificial IntelligenceJun-5-2024

Humans tell lies when seeking rewards. Large language models (LLMs) are aligned to human values with reinforcement learning where they get rewards if they satisfy human preference. We find that this also induces dishonesty in helpful and harmless alignment where LLMs tell lies in generating harmless responses. Using the latest interpreting tools, we detect dishonesty, show how LLMs can be harmful if their honesty is increased, and analyze such phenomena at the parameter-level. Given these preliminaries and the hypothesis that reward-seeking stimulates dishonesty, we theoretically show that this dishonesty can in-turn decrease the alignment performances and augment reward-seeking alignment with representation regularization. Experimental results, including GPT-4 evaluated win-rates, perplexities, and cases studies demonstrate that we can train more honest, helpful, and harmless LLMs. We will make all our codes and results be open-sourced upon this paper's acceptance.

alignment, dishonesty, llm, (16 more...)

arXiv.org Artificial Intelligence

2406.01931

Country:

North America > United States > California > Los Angeles County > Long Beach (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(11 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry:

Government (1.00)
Information Technology (0.93)
Law (0.67)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Educators object to ChatGPT, an AI that 'writes' papers for students - Washington Times

#artificialintelligenceJan-12-2023, 18:55:07 GMT

Educators across the U.S. are sounding the alarm over ChatGPT, an upstart artificial intelligence that can write term papers for students based on keywords without clear signs of plagiarism. "I have a lot of experience of students cheating, and I have to say ChatGPT allows for an unprecedented level of dishonesty," said Joy Kutaka-Kennedy, a member of the American Educational Research Association and education professor at National University. "Do we really want professionals serving us who cheated their way into their credentials?" Trey Vasquez, a special education professor at the University of Central Florida, recently tested the next-generation "chatbot" with a group of other professors and students. They asked it to summarize an academic article, create a computer program, and write two 400-word essays on the use and limits of AI in education.

artificial intelligence, educational setting, natural language, (18 more...)

#artificialintelligence

Country: North America > United States > Illinois (0.16)

Industry:

Education > Educational Setting > K-12 Education (0.50)
Education > Educational Setting > Higher Education (0.50)
Education > Focused Education > Special Education (0.37)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)

Add feedback

Are Customers Lying to Your Chatbot?

#artificialintelligenceMay-10-2022, 02:35:32 GMT

Automated customer service systems that use tools such as online forms, chatbots, and other digital interfaces have become increasingly common across a wide range of industries. These tools offer many benefits to both companies and their customers — but new research suggests they can also come at a cost: Through two simple experiments, researchers found that people are more than twice as likely to lie when interacting with a digital system than when talking to a human. This is because one of the main psychological forces that encourages us to be honest is an intrinsic desire to protect our reputations, and interacting with a machine fundamentally poses less of a reputational risk than talking with a real human. The good news is, the researchers also found that customers who are more likely to cheat will often choose to use a digital (rather than human) communication system, giving companies an avenue to identify users who are more likely to cheat. Of course, there’s no eliminating digital dishonesty. But with a better understanding of the psychology that makes people more or less likely to lie, organizations can build systems that discourage fraud, identify likely cases of cheating, and proactively nudge people to be more honest.

coin flip, online form, participant, (11 more...)

#artificialintelligence

Genre: Research Report > New Finding (0.73)

Technology: Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.61)

Add feedback

A Report About Lie Detector App - very soon app might tell if you lie or not - Leamtechi News

#artificialintelligenceJun-20-2018, 00:21:50 GMT

Very soon, your phone might be able to tell if you are lying or telling the truth. There is new machine algorithm wants to tap into the digital interactions that reveal when you are bluffing. Researchers have been finding some ways in which they can turn your phone into a lie detector instrument. There is a new machine learning algorithm which has been built by computer scientists at the University of Copenhagen which can detect honesty and dishonesty by analyzing the way you swipe or tap a smartphone. The research is based on the assumption that dishonesty interactions always take longer and involve more hand movement than honesty interaction.

artificial intelligence, interaction, machine learning, (9 more...)

#artificialintelligence

Country:

Europe > Denmark > Capital Region > Copenhagen (0.26)
North America > Canada > Ontario > Toronto (0.06)

Industry: Health & Medicine (0.52)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications > Mobile (0.73)

Add feedback

Neuroscientists show how tiny fibs snowball into big lies

Los Angeles TimesOct-25-2016, 03:41:00 GMT

A little dishonesty goes a long way. Scientists who studied the brain activity of people who told small lies to benefit themselves found that these fibs appeared to pave the way to telling whoppers later. The findings, published in the journal Nature Neuroscience, demonstrate how self-serving lies can escalate and offer a window into the processes in the brain at work. It's commonly held wisdom that small transgressions often lead to bigger and bigger ones, study coauthor Tali Sharot of University College London said in a news briefing. "Whether it's evading tax, infidelity, doping in sports, making up data in science, or financial fraud, deceivers often recall how small acts of dishonesty snowballed over time and they suddenly found themselves committing quite large crimes," Sharot said.

artificial intelligence, operational suicide early friday morning, social media, (11 more...)

Los Angeles Times

Country: North America > United States > California > Los Angeles County > Los Angeles (0.05)

Genre: Research Report (0.70)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Communications > Social Media (0.50)
Information Technology > Artificial Intelligence (0.36)

Add feedback

The Most Intelligent Robots Are Those that Exaggerate: Examining Robot Exaggeration

Wagner, Alan Richard (Georgia Institute of Technology Research Institute)

AAAI ConferencesNov-1-2015

This paper presents a model of exaggeration suitable for implementation on a robot. Exaggeration is an interest form of dishonesty in that it serves as a tradeoff between the different costs associated with lying and the reward received by having one’s lie accepted. Moreover, exaggeration offers the deceiver additional control in the form of much the exaggerated statement differs from the truth. We use a color guessing game to examine the different tradeoffs between these costs and rewards and their impact on exaggeration. Our results indicate some amount of exaggeration is the preferred option during most early interactions. Further, because the cost of lying increases linear with the number of lies, exaggeration decreases with additional interactions. We conclude by arguing why social robots must be capable of lying.

deception, exaggeration, robot, (14 more...)

AAAI Conferences

2015 AAAI Fall Symposium Series

Country:

North America > United States > Illinois > Cook County > Chicago (0.05)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > New York > Rensselaer County > Troy (0.04)
(5 more...)

Genre: Research Report (0.34)

Industry: Leisure & Entertainment > Games (0.47)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)

Add feedback

Dishonest Reasoning by Abduction

Sakama, Chiaki (Wakayama University)

AAAI ConferencesJul-19-2011

This paper studies a computational logic for dishonest reasoning. We introduce logic programs with disinformation to represent and reason with dishonesty. We then consider two different cases of dishonesty: deductive dishonesty and abductive dishonesty. The former misleads another agent to deduce wrong conclusions, while the latter interrupts another agent to abduce correct explanations. In deductive or abductive dishonesty, an agent can perform different types of dishonest reasoning such as lying, bullshitting, and withholding information. We show that these different types of dishonest reasoning are characterized by extended abduction, and address their computational methods using abductive logic programming.

abductive dishonesty, dishonesty, reasoning, (17 more...)

AAAI Conferences

Twenty-Second International Joint Conference on Artificial Intelligence

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > Japan > Honshū > Kansai > Wakayama Prefecture > Wakayama (0.04)

Genre: Research Report (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Abductive Reasoning (1.00)

Add feedback