AITopics | nash

Re-evaluating evaluation

Neural Information Processing SystemsDec-26-2025, 01:43:44 GMT

Progress in machine learning is measured by careful evaluation on problems of outstanding common interest. However, the proliferation of benchmark suites and environments, adversarial attacks, and other complications has diluted the basic evaluation model by overwhelming researchers with choices. Deliberate or accidental cherry picking is increasingly likely, and designing well-balanced evaluation suites requires increasing effort. In this paper we take a step back and propose Nash averaging. The approach builds on a detailed analysis of the algebraic structure of evaluation in two basic scenarios: agent-vs-agent and agent-vs-task. The key strength of Nash averaging is that it automatically adapts to redundancies in evaluation data, so that results are not biased by the incorporation of easy tasks or weak agents. Nash averaging thus encourages maximally inclusive evaluation -- since there is no harm (computational cost aside) from including all available tasks and agents.

artificial intelligence, machine learning, proceedings, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.41)

Add feedback

Re-evaluating evaluation

Neural Information Processing SystemsNov-20-2025, 23:01:00 GMT

Progress in machine learning is measured by careful evaluation on problems of outstanding common interest. However, the proliferation of benchmark suites and environments, adversarial attacks, and other complications has diluted the basic evaluation model by overwhelming researchers with choices. Deliberate or accidental cherry picking is increasingly likely, and designing well-balanced evaluation suites requires increasing effort. In this paper we take a step back and propose Nash averaging. The approach builds on a detailed analysis of the algebraic structure of evaluation in two basic scenarios: agent-vs-agent and agent-vs-task. The key strength of Nash averaging is that it automatically adapts to redundancies in evaluation data, so that results are not biased by the incorporation of easy tasks or weak agents. Nash averaging thus encourages maximally inclusive evaluation -- since there is no harm (computational cost aside) from including all available tasks and agents.

artificial intelligence, machine learning, proceedings, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.41)

Add feedback

Re-evaluating evaluation

David Balduzzi, Karl Tuyls, Julien Perolat, Thore Graepel

Neural Information Processing SystemsNov-20-2025, 20:08:08 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, deep learning, machine learning, (16 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
(2 more...)

Industry: Leisure & Entertainment > Games > Chess (0.31)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Game Theory (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Re-evaluating evaluation

David Balduzzi, Karl Tuyls, Julien Perolat, Thore Graepel

Neural Information Processing SystemsNov-18-2025, 18:05:52 GMT

"What we observe is not nature itself, but nature exposed to our method of questioning.

artificial intelligence, deep learning, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
(2 more...)

Industry: Leisure & Entertainment > Games > Chess (0.31)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Game Theory (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

ca172e964907a97d5ebd876bfdd4adbd-Paper.pdf

Neural Information Processing SystemsNov-15-2025, 08:06:03 GMT

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Texas (0.04)
North America > Canada (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.68)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Games (0.93)

Add feedback

AT Proofs

Neural Information Processing SystemsNov-14-2025, 06:03:45 GMT

We then follow the proof of Theorem 3 in Farnia and Tse [2016]. Our formulation differs from Nowak-Vila et al. [2020] in the fact that we allow probabilistic prediction to be ground truth. Proposition 4. Let G be a multi-graph. We follow the proof of Friesen [2019] for simple graphs. Proposition 5. Let G be a multi-graph.

artificial intelligence, emp, machine learning, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.34)

Add feedback

Rate optimal learning of equilibria from data

Freihaut, Till, Viano, Luca, Nevali, Emanuele, Cevher, Volkan, Geist, Matthieu, Ramponi, Giorgia

arXiv.org Artificial IntelligenceOct-13-2025

We close open theoretical gaps in Multi-Agent Imitation Learning (MAIL) by characterizing the limits of non-interactive MAIL and presenting the first interactive algorithm with near-optimal sample complexity. In the non-interactive setting, we prove a statistical lower bound that identifies the all-policy deviation concentrability coefficient as the fundamental complexity measure, and we show that Behavior Cloning (BC) is rate-optimal. For the interactive setting, we introduce a framework that combines reward-free reinforcement learning with interactive MAIL and instantiate it with an algorithm, MAIL-WARM. It improves the best previously known sample complexity from $\mathcal{O}(\varepsilon^{-8})$ to $\mathcal{O}(\varepsilon^{-2}),$ matching the dependence on $\varepsilon$ implied by our lower bound. Finally, we provide numerical results that support our theory and illustrate, in environments such as grid worlds, where Behavior Cloning fails to learn.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2510.09325

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)
North America > United States (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)

Genre: Research Report > New Finding (0.67)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Adaptive Evolutionary Framework for Safe, Efficient, and Cooperative Autonomous Vehicle Interactions

Tian, Zhen, Lin, Zhihao

arXiv.org Artificial IntelligenceSep-10-2025

Modern transportation systems face significant challenges in ensuring road safety, given serious injuries caused by road accidents. The rapid growth of autonomous vehicles (AVs) has prompted new traffic designs that aim to optimize interactions among AVs. However, effective interactions between AVs remains challenging due to the absence of centralized control. Besides, there is a need for balancing multiple factors, including passenger demands and overall traffic efficiency. Traditional rule-based, optimization-based, and game-theoretic approaches each have limitations in addressing these challenges. Rule-based methods struggle with adaptability and generalization in complex scenarios, while optimization-based methods often require high computational resources. Game-theoretic approaches, such as Stackelberg and Nash games, suffer from limited adaptability and potential inefficiencies in cooperative settings. This paper proposes an Evolutionary Game Theory (EGT)-based framework for AV interactions that overcomes these limitations by utilizing a decentralized and adaptive strategy evolution mechanism. A causal evaluation module (CEGT) is introduced to optimize the evolutionary rate, balancing mutation and evolution by learning from historical interactions. Simulation results demonstrate the proposed CEGT outperforms EGT and popular benchmark games in terms of lower collision rates, improved safety distances, higher speeds, and overall better performance compared to Nash and Stackelberg games across diverse scenarios and parameter settings.

artificial intelligence, machine learning, vehicle, (19 more...)

arXiv.org Artificial Intelligence

2509.07411

Country: Europe > United Kingdom (0.28)

Genre: Research Report (0.70)

Industry: