AITopics | tug-of-war

ClashEval: Quantifying the tug-of-war between an LLM's internal prior and external evidence

Neural Information Processing SystemsDec-25-2025, 02:11:24 GMT

Retrieval augmented generation (RAG) is frequently used to mitigate hallucinations and provide up-to-date knowledge for large language models (LLMs). However, given that document retrieval is an imprecise task and sometimes results in erroneous or even harmful content being presented in context, this raises the question of how LLMs handle retrieved information: If the provided content is incorrect, does the model know to ignore it, or does it recapitulate the error? Conversely, when the model's initial response is incorrect, does it always know to use the retrieved information to correct itself, or does it insist on its wrong prior response? To answer this, we curate a dataset of over 1200 questions across six domains (e.g., drug dosages, Olympic records, locations) along with content relevant to answering each question. We further apply precise perturbations to the answers in the content that range from subtle to blatant errors.We benchmark six top-performing LLMs, including GPT-4o, on this dataset and find that LLMs are susceptible to adopting incorrect retrieved content, overriding their own correct prior knowledge over 60\% of the time. However, the more unrealistic the retrieved content is (i.e. more deviated from truth), the less likely the model is to adopt it. Also, the less confident a model is in its initial response (via measuring token probabilities), the more likely it is to adopt the information in the retrieved content. We exploit this finding and demonstrate simple methods for improving model accuracy where there is conflicting retrieved content. Our results highlight a difficult task and benchmark for LLMs -- namely, their ability to correctly discern when it is wrong in light of correct retrieved content and to reject cases when the provided content is incorrect.

large language model, machine learning, natural language, (13 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.58)

Add feedback

Resolving the Tug-of-War: A Separation of Communication and Learning in Federated Learning

Neural Information Processing SystemsDec-23-2025, 20:27:59 GMT

Federated learning (FL) is a promising privacy-preserving machine learning paradigm over distributed data. In this paradigm, each client trains the parameter of a model locally and the server aggregates the parameter from clients periodically. Therefore, we perform the learning and communication over the same set of parameters. However, we find that learning and communication have fundamentally divergent requirements for parameter selection, akin to two opposite teams in a tug-of-war game. To mitigate this discrepancy, we introduce FedSep, a novel two-layer federated learning framework.

communication and learning, learning, name change, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

The tug-of-war between engineering and design to build the Hyundai Palisade XRT Pro

The brand toughened up its popular family SUV, but first, the design and engineering team had to agree on the dimensions and recovery hook placement. The 2026 Hyundai Palisade XRT Pro is the most capable trim in the SUV's lineup. Breakthroughs, discoveries, and DIY tips sent every weekday. Every car company, it seems, is looking to present its most versatile face in the form of dirt-ready vehicles. Even models that might not have been off-road appropriate in the past are toughening up to capitalize on the go-outdoors trend that has exploded in the last several years.

palisade xrt, recovery hook, vehicle, (9 more...)

Popular Science

Country: North America > United States (0.15)

Industry: Automobiles & Trucks > Manufacturer (0.91)

Technology: Information Technology > Artificial Intelligence (0.51)

Add feedback

ClashEval: Quantifying the tug-of-war between an LLM's internal prior and external evidence

Neural Information Processing SystemsMay-26-2025, 21:46:58 GMT

Retrieval augmented generation (RAG) is frequently used to mitigate hallucinations and provide up-to-date knowledge for large language models (LLMs). However, given that document retrieval is an imprecise task and sometimes results in erroneous or even harmful content being presented in context, this raises the question of how LLMs handle retrieved information: If the provided content is incorrect, does the model know to ignore it, or does it recapitulate the error? Conversely, when the model's initial response is incorrect, does it always know to use the retrieved information to correct itself, or does it insist on its wrong prior response? To answer this, we curate a dataset of over 1200 questions across six domains (e.g., drug dosages, Olympic records, locations) along with content relevant to answering each question. We further apply precise perturbations to the answers in the content that range from subtle to blatant errors.We benchmark six top-performing LLMs, including GPT-4o, on this dataset and find that LLMs are susceptible to adopting incorrect retrieved content, overriding their own correct prior knowledge over 60\% of the time.

large language model, machine learning, natural language, (11 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.59)

Add feedback

Resolving the Tug-of-War: A Separation of Communication and Learning in Federated Learning

Neural Information Processing SystemsOct-9-2024, 14:24:51 GMT

Federated learning (FL) is a promising privacy-preserving machine learning paradigm over distributed data. In this paradigm, each client trains the parameter of a model locally and the server aggregates the parameter from clients periodically. Therefore, we perform the learning and communication over the same set of parameters. However, we find that learning and communication have fundamentally divergent requirements for parameter selection, akin to two opposite teams in a tug-of-war game. To mitigate this discrepancy, we introduce FedSep, a novel two-layer federated learning framework.

communication and learning, federated learning, learning, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Sketch-Based Linear Value Function Approximation

Neural Information Processing SystemsMar-14-2024, 16:46:04 GMT

Hashing is a common method to reduce large, potentially infinite feature vectors to a fixed-size table. In reinforcement learning, hashing is often used in conjunction with tile coding to represent states in continuous spaces. Hashing is also a promising approach to value function approximation in large discrete domains such as Go and Hearts, where feature vectors can be constructed by exhaustively combining a set of atomic features. Unfortunately, the typical use of hashing in value function approximation results in biased value estimates due to the possibility of collisions. Recent work in data stream summaries has led to the development of the tug-of-war sketch, an unbiased estimator for approximating inner products. Our work investigates the application of this new data structure to linear value function approximation. Although in the reinforcement learning setting the use of the tug-of-war sketch leads to biased value estimates, we show that this bias can be orders of magnitude less than that of standard hashing. We provide empirical results on two RL benchmark domains and fifty-five Atari 2600 games to highlight the superior learning performance obtained when using tug-of-war hashing.

approximation, reinforcement, tug-of-war, (15 more...)

Neural Information Processing Systems

Country: North America > Canada > Alberta (0.15)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Industry: Leisure & Entertainment > Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

The artificial intelligence tug-of-war in the world of cybersecurity [Q&A]

#artificialintelligenceJul-15-2022, 10:35:25 GMT

It's a rare cybersecurity product these days that doesn't claim to have some form of AI capability. But exactly what benefits does AI deliver? And is there a risk of an arms race as threat actors also turn to the technology? We spoke to Corey Nachreiner, CSO at WatchGuard Technologies, to find out more about the role of AI in cybersecurity. BN: What role does AI play in cybersecurity?

artificial intelligence, machine learning, tug-of-war, (14 more...)

#artificialintelligence

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.53)

Add feedback

Sketch-Based Linear Value Function Approximation

Bellemare, Marc, Veness, Joel, Bowling, Michael

Neural Information Processing SystemsDec-31-2012

Hashing is a common method to reduce large, potentially infinite feature vectors to a fixed-size table. In reinforcement learning, hashing is often used in conjunction withtile coding to represent states in continuous spaces. Hashing is also a promising approach to value function approximation in large discrete domains such as Go and Hearts, where feature vectors can be constructed by exhaustively combining a set of atomic features. Unfortunately, the typical use of hashing in value function approximation results in biased value estimates due to the possibility ofcollisions. Recent work in data stream summaries has led to the development of the tug-of-war sketch, an unbiased estimator for approximating inner products. Our work investigates the application of this new data structure to linear value function approximation. Although in the reinforcement learning setting the use of the tug-of-war sketch leads to biased value estimates, we show that this bias can be orders of magnitude less than that of standard hashing. We provide empirical results on two RL benchmark domains and fifty-five Atari 2600 games to highlight the superior learning performance obtained when using tug-of-war hashing.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country: North America > Canada > Alberta (0.15)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Industry: Leisure & Entertainment > Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Filters

Collaborating Authors

tug-of-war

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

ClashEval: Quantifying the tug-of-war between an LLM's internal prior and external evidence

Resolving the Tug-of-War: A Separation of Communication and Learning in Federated Learning

The tug-of-war between engineering and design to build the Hyundai Palisade XRT Pro

ClashEval: Quantifying the tug-of-war between an LLM's internal prior and external evidence

Resolving the Tug-of-War: A Separation of Communication and Learning in Federated Learning

Sketch-Based Linear Value Function Approximation

The artificial intelligence tug-of-war in the world of cybersecurity [Q&A]

Sketch-Based Linear Value Function Approximation