Goto

Collaborating Authors

 multi-agent learning


Multi-Agent Learning with Heterogeneous Linear Contextual Bandits

Neural Information Processing Systems

As trained intelligent systems become increasingly pervasive, multiagent learning has emerged as a popular framework for studying complex interactions between autonomous agents. Yet, a formal understanding of how and when learners in heterogeneous environments benefit from sharing their respective experiences is far from complete. In this paper, we seek answers to these questions in the context of linear contextual bandits. We present a novel distributed learning algorithm based on the upper confidence bound (UCB) algorithm, which we refer to as H-LINUCB, wherein agents cooperatively minimize the group regret under the coordination of a central server. In the setting where the level of heterogeneity or dissimilarity across the environments is known to the agents, we show that H-LINUCB is provably optimal in regimes where the tasks are highly similar or highly dissimilar.


Contextual Games: Multi-Agent Learning with Side Information

Neural Information Processing Systems

We formulate the novel class of contextual games, a type of repeated games driven by contextual information at each round. By means of kernel-based regularity assumptions, we model the correlation between different contexts and game outcomes and propose a novel online (meta) algorithm that exploits such correlations to minimize the contextual regret of individual players. We define game-theoretic notions of contextual Coarse Correlated Equilibria (c-CCE) and optimal contextual welfare for this new class of games and show that c-CCEs and optimal welfare can be approached whenever players' contextual regrets vanish. Finally, we empirically validate our results in a traffic routing experiment, where our algorithm leads to better performance and higher welfare compared to baselines that do not exploit the available contextual information or the correlations present in the game.


Multi-agent learning under uncertainty: Recurrence vs. concentration

Lotidis, Kyriakos, Mertikopoulos, Panayotis, Bambos, Nicholas, Blanchet, Jose

arXiv.org Artificial Intelligence

In this paper, we examine the convergence landscape of multi-agent learning under uncertainty. Specifically, we analyze two stochastic models of regularized learning in continuous games -- one in continuous and one in discrete time with the aim of characterizing the long-run behavior of the induced sequence of play. In stark contrast to deterministic, full-information models of learning (or models with a vanishing learning rate), we show that the resulting dynamics do not converge in general. In lieu of this, we ask instead which actions are played more often in the long run, and by how much. We show that, in strongly monotone games, the dynamics of regularized learning may wander away from equilibrium infinitely often, but they always return to its vicinity in finite time (which we estimate), and their long-run distribution is sharply concentrated around a neighborhood thereof. We quantify the degree of this concentration, and we show that these favorable properties may all break down if the underlying game is not strongly monotone -- underscoring in this way the limits of regularized learning in the presence of persistent randomness and uncertainty.


Countering Feedback Delays in Multi-Agent Learning

Neural Information Processing Systems

We consider a model of game-theoretic learning based on online mirror descent (OMD) with asynchronous and delayed feedback information. Instead of focusing on specific games, we consider a broad class of continuous games defined by the general equilibrium stability notion, which we call λ-variational stability. Our first contribution is that, in this class of games, the actual sequence of play induced by OMD-based learning converges to Nash equilibria provided that the feedback delays faced by the players are synchronous and bounded. Subsequently, to tackle fully decentralized, asynchronous environments with (possibly) unbounded delays between actions and feedback, we propose a variant of OMD which we call delayed mirror descent (DMD), and which relies on the repeated leveraging of past information. With this modification, the algorithm converges to Nash equilibria with no feedback synchronicity assumptions and even when the delays grow superlinearly relative to the horizon of play.



Interactive Learning for LLM Reasoning

Lin, Hehai, Cao, Shilei, Wang, Sudong, Wu, Haotian, Li, Minzhi, Yang, Linyi, Zheng, Juepeng, Qin, Chengwei

arXiv.org Artificial Intelligence

Existing multi-agent learning approaches have developed interactive training environments to explicitly promote collaboration among multiple Large Language Models (LLMs), thereby constructing stronger multi-agent systems (MAS). However, during inference, they require re-executing the MAS to obtain final solutions, which diverges from human cognition that individuals can enhance their reasoning capabilities through interactions with others and resolve questions independently in the future. To investigate whether multi-agent interaction can enhance LLMs' independent problem-solving ability, we introduce ILR, a novel co-learning framework for MAS that integrates two key components: Dynamic Interaction and Perception Calibration. Specifically, Dynamic Interaction first adaptively selects either cooperative or competitive strategies depending on question difficulty and model ability. LLMs then exchange information through Idea3 (Idea Sharing, Idea Analysis, and Idea Fusion), an innovative interaction paradigm designed to mimic human discussion, before deriving their respective final answers. In Perception Calibration, ILR employs Group Relative Policy Optimization (GRPO) to train LLMs while integrating one LLM's reward distribution characteristics into another's reward function, thereby enhancing the cohesion of multi-agent interactions. We validate ILR on three LLMs across two model families of varying scales, evaluating performance on five mathematical benchmarks and one coding benchmark. Experimental results show that ILR consistently outperforms single-agent learning, yielding an improvement of up to 5% over the strongest baseline. We further discover that Idea3 can enhance the robustness of stronger LLMs during multi-agent inference, and dynamic interaction types can boost multi-agent learning compared to pure cooperative or competitive strategies.


Supplementary Material Contextual Games: Multi-Agent Learning with Side Information Pier Giuseppe Sessa, Ilija Bogunovic, Andreas Krause, Maryam Kamgarpour (NeurIPS 2020)

Neural Information Processing Systems

The theoretical guarantees obtained in Section 3 rely on the following two main lemmas. 's be the distributions computed using the MW rule: p Similarly to Appendix A.1, we let As it was done in proof of Theorems 1 and Appendix A.1, Also, we now explicitly consider the adaptiveness of the adversary. The second equality follows by the law of total expectation. Consider a contextual game and assume contexts are sampled i.i.d. Hoeffding's inequality [21] shows that for any > 0 P E In this section we describe the experimental setup of the contextual traffic routing game of Section 5.


Review for NeurIPS paper: Contextual Games: Multi-Agent Learning with Side Information

Neural Information Processing Systems

Weaknesses: From a technical point of view, the result is an incremental enhancement of [32], and follows by connecting known results. As such, the significance of the paper relies on the originality and usefulness of the novel framework of contextual games. This is by itself of course fine, since impact and usefulness are possibly the most important aspects anyway. The main weakness of this paper is that the usefulness and motivation of the results are a bit vague. The reason is that it's not clear why would selfish players follow the proposed algorithm.


Review for NeurIPS paper: Contextual Games: Multi-Agent Learning with Side Information

Neural Information Processing Systems

The reviewers agree that this is a good contribution to the literature on learning in games. The authors are strongly encouraged to improve presentation regarding how the various constants (e.g.


Multi-Agent Learning with Heterogeneous Linear Contextual Bandits

Neural Information Processing Systems

As trained intelligent systems become increasingly pervasive, multiagent learning has emerged as a popular framework for studying complex interactions between autonomous agents. Yet, a formal understanding of how and when learners in heterogeneous environments benefit from sharing their respective experiences is far from complete. In this paper, we seek answers to these questions in the context of linear contextual bandits. We present a novel distributed learning algorithm based on the upper confidence bound (UCB) algorithm, which we refer to as H-LINUCB, wherein agents cooperatively minimize the group regret under the coordination of a central server. In the setting where the level of heterogeneity or dissimilarity across the environments is known to the agents, we show that H-LINUCB is provably optimal in regimes where the tasks are highly similar or highly dissimilar.