AITopics | consultancy

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.64)

Carro, María Victoria, Mester, Denise Alejandra, Nieto, Facundo, Stanchi, Oscar Agustín, Bergman, Guido Ernesto, Leiva, Mario Alejandro, Sprejer, Eitan, Gangi, Luca Nicolás Forziati, Selasco, Francisca Gauna, Corvalán, Juan Gustavo, Simari, Gerardo I., Martinez, María Vanina

AI Debaters are More Persuasive when Arguing in Alignment with Their Own Beliefs

arXiv.org Artificial IntelligenceNov-25-2025

The core premise of AI debate as a scalable oversight technique is that it is harder to lie convincingly than to refute a lie, enabling the judge to identify the correct position. Yet, existing debate experiments have relied on datasets with ground truth, where lying is reduced to defending an incorrect proposition. This overlooks a subjective dimension: lying also requires the belief that the claim defended is false. In this work, we apply debate to subjective questions and explicitly measure large language models' prior beliefs before experiments. Debaters were asked to select their preferred position, then presented with a judge persona deliberately designed to conflict with their identified priors. This setup tested whether models would adopt sycophantic strategies, aligning with the judge's presumed perspective to maximize persuasiveness, or remain faithful to their prior beliefs. We implemented and compared two debate protocols, sequential and simultaneous, to evaluate potential systematic biases. Finally, we assessed whether models were more persuasive and produced higher-quality arguments when defending positions consistent with their prior beliefs versus when arguing against them. Our main findings show that models tend to prefer defending stances aligned with the judge persona rather than their prior beliefs, sequential debate introduces significant bias favoring the second debater, models are more persuasive when defending positions aligned with their prior beliefs, and paradoxically, arguments misaligned with prior beliefs are rated as higher quality in pairwise comparison. These results can inform human judges to provide higher-quality training signals and contribute to more aligned AI systems, while revealing important aspects of human-AI interaction regarding persuasion dynamics in language models.

large language model, machine learning, natural language, (20 more...)

2510.13912

Country:

South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.04)
North America > United States (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (0.93)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Belief Revision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Neural Information Processing SystemsNov-19-2025, 20:18:01 GMT

899511e37a8e01e1bd6f6f1d377cc250-Paper-Conference.pdf

large language model, machine learning, natural language, (19 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Education (0.68)
Banking & Finance (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Communications (0.67)

The GuardianOct-11-2025, 08:05:07 GMT

Tony Blair and Nick Clegg hosted dinner giving tech bosses access to UK minister

Blair (left) and Clegg hosted the private dinner at the five-star Corinthia hotel in London in January. Blair (left) and Clegg hosted the private dinner at the five-star Corinthia hotel in London in January. Exclusive: Six tech leaders dined with investment minister, documents reveal, underlining growing influence of ex-PM's consultancy Tony Blair and Nick Clegg hosted a private dinner earlier this year at which a select group of technology entrepreneurs were given access to a key minister, official documents have revealed. He and Clegg, the former deputy prime minister who at the time was a senior executive at Meta, invited leaders of six tech companies to dine with Poppy Gustafsson, who was the government's investment minister responsible for persuading firms to invest in Britain. Blair is an evangelical proponent of the revolutionary potential of technology to transform faltering public services and has long courted alliances with leaders in the industry.

artificial intelligence, minister, social media, (11 more...)

The Guardian

Country:

Europe > United Kingdom (1.00)
Europe > Ukraine (0.06)
Oceania > Australia (0.05)
(3 more...)

Industry:

Information Technology (1.00)
Government > Regional Government > Europe Government > United Kingdom Government (0.74)
Government > Regional Government > North America Government > United States Government (0.50)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence (1.00)

Neural Information Processing SystemsOct-10-2025, 08:48:16 GMT

899511e37a8e01e1bd6f6f1d377cc250-Paper-Conference.pdf

large language model, machine learning, natural language, (19 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Education (0.68)
Banking & Finance (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Communications (0.67)

Recchia, Gabriel, Mangat, Chatrik Singh, Nyachhyon, Jinu, Sharma, Mridul, Canavan, Callum, Epstein-Gross, Dylan, Abdulbari, Muhammed

Confirmation bias: A challenge for scalable oversight

arXiv.org Artificial IntelligenceJul-29-2025

Scalable oversight protocols aim to empower evaluators to accurately verify AI models more capable than themselves. However, human evaluators are subject to biases that can lead to systematic errors. We conduct two studies examining the performance of simple oversight protocols where evaluators know that the model is "correct most of the time, but not all of the time". We find no overall advantage for the tested protocols, although in Study 1, showing arguments in favor of both answers improves accuracy in cases where the model is incorrect. In Study 2, participants in both groups become more confident in the system's answers after conducting online research, even when those answers are incorrect. We also reanalyze data from prior work that was more optimistic about simple protocols, finding that human evaluators possessing knowledge absent from models likely contributed to their positive results--an advantage that diminishes as models continue to scale in capability. These findings underscore the importance of testing the degree to which oversight protocols are robust to evaluator biases, whether they outperform simple deference to the model under evaluation, and whether their performance scales with increasing problem difficulty and model capability.

large language model, machine learning, natural language, (17 more...)

2507.19486

Country:

Europe > United Kingdom (0.14)
North America > United States (0.04)
Europe > Poland > Lesser Poland Province > Kraków (0.04)
(3 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study > Negative Result (0.46)

Industry:

Health & Medicine (1.00)
Education (1.00)
Leisure & Entertainment > Sports > Tennis (0.46)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Neural Information Processing SystemsMay-27-2025, 07:57:35 GMT

On scalable oversight with weak LLMs judging strong LLMs

Scalable oversight protocols aim to enable humans to accurately supervise superhuman AI. In this paper we study debate, where two AI's compete to convince a judge; consultancy, where a single AI tries to convince a judge that asks questions;and compare to a baseline of direct question-answering, where the judge just answers outright without the AI.We use large language models (LLMs) as both AI agents and as stand-ins for human judges, taking the judge models to be weaker than agent models. We benchmark on a diverse range of asymmetries between judges and agents, extending previous work on a single extractive QA task with information asymmetry, to also include mathematics, coding, logic and multimodal reasoning asymmetries. We find that debate outperforms consultancy across all tasks when the consultant is randomly assigned to argue for the correct/incorrect answer. Comparing debate to direct question answering, the results depend on the type of task: in extractive QA tasks with information asymmetry debate outperforms direct question answering, but in other tasks without information asymmetry the results are mixed.Previous work assigned debaters/consultants an answer to argue for.

artificial intelligence, large language model, natural language, (9 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Adhikari, Ashutosh, Lapata, Mirella

Debating for Better Reasoning: An Unsupervised Multimodal Approach

arXiv.org Artificial IntelligenceMay-21-2025

As Large Language Models (LLMs) gain expertise across diverse domains and modalities, scalable oversight becomes increasingly challenging, particularly when their capabilities may surpass human evaluators. Debate has emerged as a promising mechanism for enabling such oversight. In this work, we extend the debate paradigm to a multimodal setting, exploring its potential for weaker models to supervise and enhance the performance of stronger models. We focus on visual question answering (VQA), where two "sighted" expert vision-language models debate an answer, while a "blind" (text-only) judge adjudicates based solely on the quality of the arguments. In our framework, the experts defend only answers aligned with their beliefs, thereby obviating the need for explicit role-playing and concentrating the debate on instances of expert disagreement. Experiments on several multimodal tasks demonstrate that the debate framework consistently outperforms individual expert models. Moreover, judgments from weaker LLMs can help instill reasoning capabilities in vision-language models through finetuning.

large language model, machine learning, natural language, (19 more...)

2505.14627

Country:

North America > United States > New Jersey (0.06)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
(3 more...)

Genre: Research Report > New Finding (0.68)

Industry:

Health & Medicine (0.70)
Transportation (0.48)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Arnesen, Samuel, Rein, David, Michael, Julian

Training Language Models to Win Debates with Self-Play Improves Judge Accuracy

arXiv.org Artificial IntelligenceSep-25-2024

We test the robustness of debate as a method of scalable oversight by training models to debate with data generated via self-play. In a long-context reading comprehension task, we find that language model based evaluators answer questions more accurately when judging models optimized to win debates. By contrast, we find no such relationship for consultancy models trained to persuade a judge without an opposing debater present. In quantitative and qualitative comparisons between our debate models and novel consultancy baselines, we find evidence that debate training encourages stronger and more informative arguments, showing promise that it can help provide high-quality supervision for tasks that are difficult to directly evaluate.

large language model, machine learning, natural language, (21 more...)

2409.16636

Country:

North America > United States > New York (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > Singapore (0.04)
(2 more...)

Genre: Research Report > New Finding (0.92)

Industry: Education > Assessment & Standards > Student Performance (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)

arXiv.org Artificial IntelligenceJul-12-2024

On scalable oversight with weak LLMs judging strong LLMs

Kenton, Zachary, Siegel, Noah Y., Kramár, János, Brown-Cohen, Jonah, Albanie, Samuel, Bulian, Jannis, Agarwal, Rishabh, Lindner, David, Tang, Yunhao, Goodman, Noah D., Shah, Rohin

Scalable oversight protocols aim to enable humans to accurately supervise superhuman AI. In this paper we study debate, where two AI's compete to convince a judge; consultancy, where a single AI tries to convince a judge that asks questions; and compare to a baseline of direct question-answering, where the judge just answers outright without the AI. We use large language models (LLMs) as both AI agents and as stand-ins for human judges, taking the judge models to be weaker than agent models. We benchmark on a diverse range of asymmetries between judges and agents, extending previous work on a single extractive QA task with information asymmetry, to also include mathematics, coding, logic and multimodal reasoning asymmetries. We find that debate outperforms consultancy across all tasks when the consultant is randomly assigned to argue for the correct/incorrect answer. Comparing debate to direct question answering, the results depend on the type of task: in extractive QA tasks with information asymmetry debate outperforms direct question answering, but in other tasks without information asymmetry the results are mixed. Previous work assigned debaters/consultants an answer to argue for. When we allow them to instead choose which answer to argue for, we find judges are less frequently convinced by the wrong answer in debate than in consultancy. Further, we find that stronger debater models increase judge accuracy, though more modestly than in previous studies.

large language model, machine learning, natural language, (18 more...)