Goto

Collaborating Authors

 tug-of-war


The tug-of-war between engineering and design to build the Hyundai Palisade XRT Pro

Popular Science

The brand toughened up its popular family SUV, but first, the design and engineering team had to agree on the dimensions and recovery hook placement. The 2026 Hyundai Palisade XRT Pro is the most capable trim in the SUV's lineup. Breakthroughs, discoveries, and DIY tips sent every weekday. Every car company, it seems, is looking to present its most versatile face in the form of dirt-ready vehicles. Even models that might not have been off-road appropriate in the past are toughening up to capitalize on the go-outdoors trend that has exploded in the last several years.


ClashEval: Quantifying the tug-of-war between an LLM's internal prior and external evidence

Neural Information Processing Systems

Retrieval augmented generation (RAG) is frequently used to mitigate hallucinations and provide up-to-date knowledge for large language models (LLMs). However, given that document retrieval is an imprecise task and sometimes results in erroneous or even harmful content being presented in context, this raises the question of how LLMs handle retrieved information: If the provided content is incorrect, does the model know to ignore it, or does it recapitulate the error? Conversely, when the model's initial response is incorrect, does it always know to use the retrieved information to correct itself, or does it insist on its wrong prior response? To answer this, we curate a dataset of over 1200 questions across six domains (e.g., drug dosages, Olympic records, locations) along with content relevant to answering each question. We further apply precise perturbations to the answers in the content that range from subtle to blatant errors.We benchmark six top-performing LLMs, including GPT-4o, on this dataset and find that LLMs are susceptible to adopting incorrect retrieved content, overriding their own correct prior knowledge over 60\% of the time.


Resolving the Tug-of-War: A Separation of Communication and Learning in Federated Learning

Neural Information Processing Systems

Federated learning (FL) is a promising privacy-preserving machine learning paradigm over distributed data. In this paradigm, each client trains the parameter of a model locally and the server aggregates the parameter from clients periodically. Therefore, we perform the learning and communication over the same set of parameters. However, we find that learning and communication have fundamentally divergent requirements for parameter selection, akin to two opposite teams in a tug-of-war game. To mitigate this discrepancy, we introduce FedSep, a novel two-layer federated learning framework.


Sketch-Based Linear Value Function Approximation

Neural Information Processing Systems

Hashing is a common method to reduce large, potentially infinite feature vectors to a fixed-size table. In reinforcement learning, hashing is often used in conjunction with tile coding to represent states in continuous spaces. Hashing is also a promising approach to value function approximation in large discrete domains such as Go and Hearts, where feature vectors can be constructed by exhaustively combining a set of atomic features. Unfortunately, the typical use of hashing in value function approximation results in biased value estimates due to the possibility of collisions. Recent work in data stream summaries has led to the development of the tug-of-war sketch, an unbiased estimator for approximating inner products. Our work investigates the application of this new data structure to linear value function approximation. Although in the reinforcement learning setting the use of the tug-of-war sketch leads to biased value estimates, we show that this bias can be orders of magnitude less than that of standard hashing. We provide empirical results on two RL benchmark domains and fifty-five Atari 2600 games to highlight the superior learning performance obtained when using tug-of-war hashing.


The artificial intelligence tug-of-war in the world of cybersecurity [Q&A]

#artificialintelligence

It's a rare cybersecurity product these days that doesn't claim to have some form of AI capability. But exactly what benefits does AI deliver? And is there a risk of an arms race as threat actors also turn to the technology? We spoke to Corey Nachreiner, CSO at WatchGuard Technologies, to find out more about the role of AI in cybersecurity. BN: What role does AI play in cybersecurity?


Sketch-Based Linear Value Function Approximation

Bellemare, Marc, Veness, Joel, Bowling, Michael

Neural Information Processing Systems

Hashing is a common method to reduce large, potentially infinite feature vectors to a fixed-size table. In reinforcement learning, hashing is often used in conjunction withtile coding to represent states in continuous spaces. Hashing is also a promising approach to value function approximation in large discrete domains such as Go and Hearts, where feature vectors can be constructed by exhaustively combining a set of atomic features. Unfortunately, the typical use of hashing in value function approximation results in biased value estimates due to the possibility ofcollisions. Recent work in data stream summaries has led to the development of the tug-of-war sketch, an unbiased estimator for approximating inner products. Our work investigates the application of this new data structure to linear value function approximation. Although in the reinforcement learning setting the use of the tug-of-war sketch leads to biased value estimates, we show that this bias can be orders of magnitude less than that of standard hashing. We provide empirical results on two RL benchmark domains and fifty-five Atari 2600 games to highlight the superior learning performance obtained when using tug-of-war hashing.