Goto

Collaborating Authors

 independent learning


A Finite-Sample Analysis of Payoff-Based Independent Learning in Zero-Sum Stochastic Games

Neural Information Processing Systems

In this work, we study two-player zero-sum stochastic games and develop a variant of the smoothed best-response learning dynamics that combines independent learning dynamics for matrix games with the minimax value iteration for stochastic games. The resulting learning dynamics are payoff-based, convergent, rational, and symmetric between the two players.


MARL Warehouse Robots

Allman, Price, Thang, Lian, Simmons, Dre, Riaz, Salmon

arXiv.org Artificial Intelligence

Our research investigates the complex task of multiple autonomous agents learning to coordinate and deliver packages in warehouse environments--a problem requiring implicit communication, collision avoidance, and efficient task allocation without centralized control. Traditional warehouse automation relies on centralized planning systems that face scalability limitations; multi-agent reinforcement learning (MARL) offers an alternative through decentralized learned policies, but requires solving the credit assignment problem. We compare MARL algorithms on warehouse coordination: QMIX [Rashid et al., 2018] (value decomposition), IPPO (independent learning), and MASAC (centralized critic). Our study progresses from MPE for validation to RWARE for warehouse evaluation, culminating in Unity 3D deployment where agents demonstrate learned package delivery behavior. QMIX emerged as the best performer after systematic comparison. Our contributions: (1) hyperparameter analysis showing default configurations fail on sparse-reward warehouse tasks, (2) comparative evaluation across algorithms and scales, (3) Unity ML-Agents integration demonstrating sim-to-sim transfer with successful package delivery, and (4) identification of scaling challenges. Full experimental details and results are documented in our Quarto documentation book. 1


Personalized Collaborative Learning with Affinity-Based Variance Reduction

Zhang, Chenyu, Azizan, Navid

arXiv.org Machine Learning

Multi-agent learning faces a fundamental tension: leveraging distributed collaboration without sacrificing the personalization needed for diverse agents. This tension intensifies when aiming for full personalization while adapting to unknown heterogeneity levels -- gaining collaborative speedup when agents are similar, without performance degradation when they are different. Embracing the challenge, we propose personalized collaborative learning (PCL), a novel framework for heterogeneous agents to collaboratively learn personalized solutions with seamless adaptivity. Through carefully designed bias correction and importance correction mechanisms, our method AffPCL robustly handles both environment and objective heterogeneity. We prove that AffPCL reduces sample complexity over independent learning by a factor of $\max\{n^{-1}, δ\}$, where $n$ is the number of agents and $δ\in[0,1]$ measures their heterogeneity. This affinity-based acceleration automatically interpolates between the linear speedup of federated learning in homogeneous settings and the baseline of independent learning, without requiring prior knowledge of the system. Our analysis further reveals that an agent may obtain linear speedup even by collaborating with arbitrarily dissimilar agents, unveiling new insights into personalization and collaboration in the high heterogeneity regime.


Independent Learning in Performative Markov Potential Games

Sahitaj, Rilind, Sasnauskas, Paulius, Yalın, Yiğit, Mandal, Debmalya, Radanović, Goran

arXiv.org Artificial Intelligence

Performative Reinforcement Learning (PRL) refers to a scenario in which the deployed policy changes the reward and transition dynamics of the underlying environment. In this work, we study multi-agent PRL by incorporating performative effects into Markov Potential Games (MPGs). We introduce the notion of a performatively stable equilibrium (PSE) and show that it always exists under a reasonable sensitivity assumption. We then provide convergence results for state-of-the-art algorithms used to solve MPGs. Specifically, we show that independent policy gradient ascent (IPGA) and independent natural policy gradient (INPG) converge to an approximate PSE in the best-iterate sense, with an additional term that accounts for the performative effects. Furthermore, we show that INPG asymptotically converges to a PSE in the last-iterate sense. As the performative effects vanish, we recover the convergence rates from prior work. For a special case of our game, we provide finite-time last-iterate convergence results for a repeated retraining approach, in which agents independently optimize a surrogate objective. We conduct extensive experiments to validate our theoretical findings.


Unlocking Learning Potentials: The Transformative Effect of Generative AI in Education Across Grade Levels

Xie, Meijuan, Luo, Liling

arXiv.org Artificial Intelligence

The advent of generative artificial intelligence (GAI) has brought about a notable surge in the field of education. The use of GAI to support learning is becoming increasingly prevalent among students. However, the manner and extent of its utilisation vary considerably from one individual to another. And researches about student's utilisation and perceptions of GAI remains relatively scarce. To gain insight into the issue, this paper proposed a hybrid-survey method to examine the impact of GAI on students across four different grades in six key areas (LIPSAL): learning interest, independent learning, problem solving, self-confidence, appropriate use, and learning enjoyment. Firstly, through questionnaire, we found that among LIPSAL, GAI has the greatest impact on the concept of appropriate use, the lowest level of learning interest and self-confidence. Secondly, a comparison of four grades revealed that the high and low factors of LIPSAL exhibited grade-related variation, and college students exhibited a higher level than high school students across LIPSAL. Thirdly, through interview, the students demonstrated a comprehensive understanding of the application of GAI. We found that students have a positive attitude towards GAI and are very willing to use it, which is why GAI has grown so rapidly in popularity. They also told us prospects and challenges in using GAI. In the future, as GAI matures technologically, it will have an greater impact on students. These findings may help better understand usage by different students and inform future research in digital education.


Review for NeurIPS paper: Independent Policy Gradient Methods for Competitive Reinforcement Learning

Neural Information Processing Systems

Weaknesses: I am not convinced by the main motivation of this paper for decoupled or independent learning. Specifically, from the communication perspective, once agents can also communicate the actions each other took per round, then each agent can also simulate any coupled algorithm locally (or only coupled online algorithm if has storage limitation). Since agents have to communicate with the oracle or environment in each round anyway, I don't see in practice why communicate the actions in the learning process is that problematic. Second, this paper says that the independent learning is important because it allows the algorithm "being versatile, being applicable even in uncertain environments where the type of interaction and number of other agents are not known to the agent. " I feel this description does not fit the algorithm studied in this paper, thus a bit misleading.


Approximate Global Convergence of Independent Learning in Multi-Agent Systems

Jin, Ruiyang, Chen, Zaiwei, Lin, Yiheng, Song, Jie, Wierman, Adam

arXiv.org Artificial Intelligence

Independent learning (IL), despite being a popular approach in practice to achieve scalability in large-scale multi-agent systems, usually lacks global convergence guarantees. In this paper, we study two representative algorithms, independent $Q$-learning and independent natural actor-critic, within value-based and policy-based frameworks, and provide the first finite-sample analysis for approximate global convergence. The results imply a sample complexity of $\tilde{\mathcal{O}}(\epsilon^{-2})$ up to an error term that captures the dependence among agents and characterizes the fundamental limit of IL in achieving global convergence. To establish the result, we develop a novel approach for analyzing IL by constructing a separable Markov decision process (MDP) for convergence analysis and then bounding the gap due to model difference between the separable MDP and the original one. Moreover, we conduct numerical experiments using a synthetic MDP and an electric vehicle charging example to verify our theoretical findings and to demonstrate the practical applicability of IL.


Independent Learning in Constrained Markov Potential Games

Jordan, Philip, Barakat, Anas, He, Niao

arXiv.org Artificial Intelligence

Constrained Markov games offer a formal mathematical framework for modeling multi-agent reinforcement learning problems where the behavior of the agents is subject to constraints. In this work, we focus on the recently introduced class of constrained Markov Potential Games. While centralized algorithms have been proposed for solving such constrained games, the design of converging independent learning algorithms tailored for the constrained setting remains an open question. We propose an independent policy gradient algorithm for learning approximate constrained Nash equilibria: Each agent observes their own actions and rewards, along with a shared state. Inspired by the optimization literature, our algorithm performs proximal-point-like updates augmented with a regularized constraint set. Each proximal step is solved inexactly using a stochastic switching gradient algorithm. Notably, our algorithm can be implemented independently without a centralized coordination mechanism requiring turn-based agent updates. Under some technical constraint qualification conditions, we establish convergence guarantees towards constrained approximate Nash equilibria. We perform simulations to illustrate our results.


Dealing With Non-stationarity in Decentralized Cooperative Multi-Agent Deep Reinforcement Learning via Multi-Timescale Learning

Nekoei, Hadi, Badrinaaraayanan, Akilesh, Sinha, Amit, Amini, Mohammad, Rajendran, Janarthanan, Mahajan, Aditya, Chandar, Sarath

arXiv.org Artificial Intelligence

Decentralized cooperative multi-agent deep reinforcement learning (MARL) can be a versatile learning framework, particularly in scenarios where centralized training is either not possible or not practical. One of the critical challenges in decentralized deep MARL is the non-stationarity of the learning environment when multiple agents are learning concurrently. A commonly used and efficient scheme for decentralized MARL is independent learning in which agents concurrently update their policies independently of each other. We first show that independent learning does not always converge, while sequential learning where agents update their policies one after another in a sequence is guaranteed to converge to an agent-by-agent optimal solution. In sequential learning, when one agent updates its policy, all other agent's policies are kept fixed, alleviating the challenge of non-stationarity due to simultaneous updates in other agents' policies. However, it can be slow because only one agent is learning at any time. Therefore it might also not always be practical. In this work, we propose a decentralized cooperative MARL algorithm based on multi-timescale learning. In multi-timescale learning, all agents learn simultaneously, but at different learning rates. In our proposed method, when one agent updates its policy, other agents are allowed to update their policies as well, but at a slower rate. This speeds up sequential learning, while also minimizing non-stationarity caused by other agents updating concurrently. Multi-timescale learning outperforms state-of-the-art decentralized learning methods on a set of challenging multi-agent cooperative tasks in the epymarl(Papoudakis et al., 2020) benchmark. This can be seen as a first step towards more general decentralized cooperative deep MARL methods based on multi-timescale learning.


Stochastic Market Games

Schmid, Kyrill, Belzner, Lenz, Müller, Robert, Tochtermann, Johannes, Linnhoff-Popien, Claudia

arXiv.org Artificial Intelligence

Some of the most relevant future applications of multi-agent systems like autonomous driving or factories as a service display mixed-motive scenarios, where agents might have conflicting goals. In these settings agents are likely to learn undesirable outcomes in terms of cooperation under independent learning, such as overly greedy behavior. Motivated from real world societies, in this work we propose to utilize market forces to provide incentives for agents to become cooperative. As demonstrated in an iterated version of the Prisoner's Dilemma, the proposed market formulation can change the dynamics of the game to consistently learn cooperative policies. Further we evaluate our approach in spatially and temporally extended settings for varying numbers of agents. We empirically find that the presence of markets can improve both the overall result and agent individual returns via their trading activities.