AITopics | multi-agent learning

Collaborating Authors

multi-agent learning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Multi-Agent Learning with Heterogeneous Linear Contextual Bandits

Neural Information Processing SystemsDec-27-2025, 06:18:35 GMT

As trained intelligent systems become increasingly pervasive, multiagent learning has emerged as a popular framework for studying complex interactions between autonomous agents. Yet, a formal understanding of how and when learners in heterogeneous environments benefit from sharing their respective experiences is far from complete. In this paper, we seek answers to these questions in the context of linear contextual bandits. We present a novel distributed learning algorithm based on the upper confidence bound (UCB) algorithm, which we refer to as H-LINUCB, wherein agents cooperatively minimize the group regret under the coordination of a central server. In the setting where the level of heterogeneity or dissimilarity across the environments is known to the agents, we show that H-LINUCB is provably optimal in regimes where the tasks are highly similar or highly dissimilar.

heterogeneous linear contextual bandit, multi-agent learning, name change, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.79)
Information Technology > Artificial Intelligence > Machine Learning (0.59)

Add feedback

Contextual Games: Multi-Agent Learning with Side Information

Neural Information Processing SystemsDec-24-2025, 22:07:54 GMT

We formulate the novel class of contextual games, a type of repeated games driven by contextual information at each round. By means of kernel-based regularity assumptions, we model the correlation between different contexts and game outcomes and propose a novel online (meta) algorithm that exploits such correlations to minimize the contextual regret of individual players. We define game-theoretic notions of contextual Coarse Correlated Equilibria (c-CCE) and optimal contextual welfare for this new class of games and show that c-CCEs and optimal welfare can be approached whenever players' contextual regrets vanish. Finally, we empirically validate our results in a traffic routing experiment, where our algorithm leads to better performance and higher welfare compared to baselines that do not exploit the available contextual information or the correlations present in the game.

contextual game, multi-agent learning, name change, (7 more...)

Neural Information Processing Systems

Technology:

Information Technology > Game Theory (0.61)
Information Technology > Artificial Intelligence (0.39)

Add feedback

Multi-agent learning under uncertainty: Recurrence vs. concentration

Lotidis, Kyriakos, Mertikopoulos, Panayotis, Bambos, Nicholas, Blanchet, Jose

arXiv.org Artificial IntelligenceDec-10-2025

In this paper, we examine the convergence landscape of multi-agent learning under uncertainty. Specifically, we analyze two stochastic models of regularized learning in continuous games -- one in continuous and one in discrete time with the aim of characterizing the long-run behavior of the induced sequence of play. In stark contrast to deterministic, full-information models of learning (or models with a vanishing learning rate), we show that the resulting dynamics do not converge in general. In lieu of this, we ask instead which actions are played more often in the long run, and by how much. We show that, in strongly monotone games, the dynamics of regularized learning may wander away from equilibrium infinitely often, but they always return to its vicinity in finite time (which we estimate), and their long-run distribution is sharply concentrated around a neighborhood thereof. We quantify the degree of this concentration, and we show that these favorable properties may all break down if the underlying game is not strongly monotone -- underscoring in this way the limits of regularized learning in the presence of persistent randomness and uncertainty.

artificial intelligence, equilibrium, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2512.08132

Country:

Europe (1.00)
North America > United States > California (0.28)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Games (0.46)
Government > Regional Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Countering Feedback Delays in Multi-Agent Learning

Neural Information Processing SystemsNov-21-2025, 14:43:30 GMT

We consider a model of game-theoretic learning based on online mirror descent (OMD) with asynchronous and delayed feedback information. Instead of focusing on specific games, we consider a broad class of continuous games defined by the general equilibrium stability notion, which we call λ-variational stability. Our first contribution is that, in this class of games, the actual sequence of play induced by OMD-based learning converges to Nash equilibria provided that the feedback delays faced by the players are synchronous and bounded. Subsequently, to tackle fully decentralized, asynchronous environments with (possibly) unbounded delays between actions and feedback, we propose a variant of OMD which we call delayed mirror descent (DMD), and which relies on the repeated leveraging of past information. With this modification, the algorithm converges to Nash equilibria with no feedback synchronicity assumptions and even when the delays grow superlinearly relative to the horizon of play.

countering feedback delay, multi-agent learning, name change, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.39)

Add feedback

Modelling the dynamics of multi-agent learning has long been

Neural Information Processing SystemsNov-16-2025, 05:28:20 GMT

The theoretical contributions of our work and the works [Mguni et al., AAAI'18; Mguni et al., AAMAS'19] mentioned

equation, multi-agent learning, population dynamic, (15 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Interactive Learning for LLM Reasoning

Lin, Hehai, Cao, Shilei, Wang, Sudong, Wu, Haotian, Li, Minzhi, Yang, Linyi, Zheng, Juepeng, Qin, Chengwei

arXiv.org Artificial IntelligenceOct-3-2025

Existing multi-agent learning approaches have developed interactive training environments to explicitly promote collaboration among multiple Large Language Models (LLMs), thereby constructing stronger multi-agent systems (MAS). However, during inference, they require re-executing the MAS to obtain final solutions, which diverges from human cognition that individuals can enhance their reasoning capabilities through interactions with others and resolve questions independently in the future. To investigate whether multi-agent interaction can enhance LLMs' independent problem-solving ability, we introduce ILR, a novel co-learning framework for MAS that integrates two key components: Dynamic Interaction and Perception Calibration. Specifically, Dynamic Interaction first adaptively selects either cooperative or competitive strategies depending on question difficulty and model ability. LLMs then exchange information through Idea3 (Idea Sharing, Idea Analysis, and Idea Fusion), an innovative interaction paradigm designed to mimic human discussion, before deriving their respective final answers. In Perception Calibration, ILR employs Group Relative Policy Optimization (GRPO) to train LLMs while integrating one LLM's reward distribution characteristics into another's reward function, thereby enhancing the cohesion of multi-agent interactions. We validate ILR on three LLMs across two model families of varying scales, evaluating performance on five mathematical benchmarks and one coding benchmark. Experimental results show that ILR consistently outperforms single-agent learning, yielding an improvement of up to 5% over the strongest baseline. We further discover that Idea3 can enhance the robustness of stronger LLMs during multi-agent inference, and dynamic interaction types can boost multi-agent learning compared to pure cooperative or competitive strategies.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2509.26306

Country: Asia (0.46)

Genre: Research Report > New Finding (0.66)

Industry: Education > Educational Setting > Online (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Supplementary Material Contextual Games: Multi-Agent Learning with Side Information Pier Giuseppe Sessa, Ilija Bogunovic, Andreas Krause, Maryam Kamgarpour (NeurIPS 2020)

Neural Information Processing SystemsAug-17-2025, 08:54:25 GMT

The theoretical guarantees obtained in Section 3 rely on the following two main lemmas. 's be the distributions computed using the MW rule: p Similarly to Appendix A.1, we let As it was done in proof of Theorems 1 and Appendix A.1, Also, we now explicitly consider the adaptiveness of the adversary. The second equality follows by the law of total expectation. Consider a contextual game and assume contexts are sampled i.i.d. Hoeffding's inequality [21] shows that for any > 0 P E In this section we describe the experimental setup of the contextual traffic routing game of Section 5.

inequality, probability, sequence, (11 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.83)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.64)

Add feedback

Review for NeurIPS paper: Contextual Games: Multi-Agent Learning with Side Information

Neural Information Processing SystemsFeb-8-2025, 11:42:32 GMT

Weaknesses: From a technical point of view, the result is an incremental enhancement of [32], and follows by connecting known results. As such, the significance of the paper relies on the originality and usefulness of the novel framework of contextual games. This is by itself of course fine, since impact and usefulness are possibly the most important aspects anyway. The main weakness of this paper is that the usefulness and motivation of the results are a bit vague. The reason is that it's not clear why would selfish players follow the proposed algorithm.

contextual game, multi-agent learning, side information, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.52)
Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

Review for NeurIPS paper: Contextual Games: Multi-Agent Learning with Side Information

Neural Information Processing SystemsFeb-8-2025, 11:42:25 GMT

The reviewers agree that this is a good contribution to the literature on learning in games. The authors are strongly encouraged to improve presentation regarding how the various constants (e.g.

contextual game, multi-agent learning, side information, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.40)
Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

Multi-Agent Learning with Heterogeneous Linear Contextual Bandits

Neural Information Processing SystemsJan-20-2025, 02:58:08 GMT

algorithm, heterogeneous linear contextual bandit, multi-agent learning, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.96)

Add feedback