multiagent
Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains
Subramaniam, Vighnesh, Du, Yilun, Tenenbaum, Joshua B., Torralba, Antonio, Li, Shuang, Mordatch, Igor
Large language models (LLMs) have achieved remarkable performance in recent years but are fundamentally limited by the underlying training data. To improve models beyond the training data, recent works have explored how LLMs can be used to generate synthetic data for autonomous self-improvement. However, successive steps of self-improvement can reach a point of diminishing returns. In this work, we propose a complementary approach towards self-improvement where finetuning is applied to a multiagent society of language models. A group of language models, all starting from the same base model, are independently specialized by updating each one using data generated through multiagent interactions among the models. By training each model on independent sets of data, we illustrate how this approach enables specialization across models and diversification over the set of models. As a result, our overall system is able to preserve diverse reasoning chains and autonomously improve over many more rounds of fine-tuning than single-agent self-improvement methods. We quantitatively illustrate the efficacy of the approach across a wide suite of reasoning tasks.
V-Learning -- A Simple, Efficient, Decentralized Algorithm for Multiagent RL
Jin, Chi, Liu, Qinghua, Wang, Yuanhao, Yu, Tiancheng
A major challenge of multiagent reinforcement learning (MARL) is the curse of multiagents, where the size of the joint action space scales exponentially with the number of agents. This remains to be a bottleneck for designing efficient MARL algorithms even in a basic scenario with finitely many states and actions. This paper resolves this challenge for the model of episodic Markov games. We design a new class of fully decentralized algorithms -- V-learning, which provably learns Nash equilibria (in the two-player zero-sum setting), correlated equilibria and coarse correlated equilibria (in the multiplayer general-sum setting) in a number of samples that only scales with $\max_{i\in[m]} A_i$, where $A_i$ is the number of actions for the $i^{\rm th}$ player. This is in sharp contrast to the size of the joint action space which is $\prod_{i=1}^m A_i$. V-learning (in its basic form) is a new class of single-agent RL algorithms that convert any adversarial bandit algorithm with suitable regret guarantees into a RL algorithm. Similar to the classical Q-learning algorithm, it performs incremental updates to the value functions. Different from Q-learning, it only maintains the estimates of V-values instead of Q-values. This key difference allows V-learning to achieve the claimed guarantees in the MARL setting by simply letting all agents run V-learning independently.
What AI can do for football, and what football can do for AI
Karl Tuyls, a former RoboCup participant and local chair of the 2D simulation league (2013), recently published an article along with his colleagues at DeepMind called Game plan: what AI can do for football, and what football can do for AI. Karl Tuyls: The long-term vision in this project is to advance research in multi-agent decision-making by building an automated video assistant coach for real-world soccer (or football), that can help coaches and teams in analyzing games, making tactical choices in a match (e.g. in set pieces situations), improve their overall game-play, and even assist with in-game analysis and decision-making. Next to that one can also think additionally of human factors like injury prediction and the search for new players. For this we are blending research from game theory, vision and machine learning. So far our work has focused on game-theoretic analysis of set pieces and on trajectory predictions of players and ball with the purpose to allow for counterfactual reasoning (what happens if player X moves in direction Y, for example).
R-MADDPG for Partially Observable Environments and Limited Communication
Wang, Rose E., Everett, Michael, How, Jonathan P.
There are several real-world tasks that would benefit from applying multiagent reinforcement learning (MARL) algorithms, including the coordination among self-driving cars. The real world has challenging conditions for multiagent learning systems, such as its partial observable and nonstationary nature. Moreover, if agents must share a limited resource (e.g. network bandwidth) they must all learn how to coordinate resource use. This paper introduces a deep recurrent multiagent actor-critic framework (R-MADDPG) for handling multiagent coordination under partial observable set-tings and limited communication. We investigate recurrency effects on performance and communication use of a team of agents. We demonstrate that the resulting framework learns time dependencies for sharing missing observations, handling resource limitations, and developing different communication patterns among agents.
Introduction to This Special Issue
Developing agents that could perceive the world, reason about what they perceive in relation to their own goals and acts, has been the Holy Grail of AI. Early attempts at such holistic intelligence (for example, SRI International's AI researchers turned their attention to component technologies for structuring a single agent, such as planning, knowledge representation, diagnosis, and learning. Although most of AI research was focused on single-agent issues, a small number of AI researchers gathered at the Massachusetts Institute of Technology Endicott House in 1980 for the First Workshop on Distributed AI. The main scientific goal of distributed AI (DAI) is to understand the principles underlying the behavior of multiple entities in the world, called agents and their interactions. The discipline is concerned with how agent interactions produce overall multiagent system (MAS) behavior.
The 1996 AAAI Mobile Robot Competition and Exhibition
The Fifth Annual AAAI Mobile Robot Competition and Exhibition was held in Portland, Oregon, in conjunction with the Thirteenth National Conference on Artificial Intelligence. The competition consisted of two events: (1) Office Navigation and (2) Clean Up the Tennis Court. The first event stressed navigation and planning. The second event stressed vision sensing and manipulation. In addition to the competition, there was a mobile robot exhibition in which teams demonstrated robot behaviors that did not fit into the competition tasks.
Planning and Acting Together
People often act together with a shared purpose; they collaborate. Collaboration enables them to work more efficiently and to complete activities they could not accomplish individually. An increasing number of computer applications also require collaboration among various systems and people. Thus, a major challenge for AI researchers is to determine how to construct computer systems that are able to act effectively as partners in collaborative activity. Collaborative activity entails participants forming commitments to achieve the goals of the group activity and requires group decision making and group planning procedures.
Multiagent Systems
Agent-based systems technology has generated lots of excitement in recent years because of its promise as a new paradigm for conceptualizing, designing, and implementing software systems. This promise is particularly attractive for creating software that operates in environments that are distributed and open, such as the internet. Currently, the great majority of agent-based systems consist of a single agent. However, as the technology matures and addresses increasingly complex applications, the need for systems that consist of multiple agents that communicate in a peer-topeer fashion is becoming apparent. Central to the design and effective operation of such multiagent systems (MASs) are a core set of issues and research questions that have been studied over the years by the distributed AI community.
The 1999 Asia-Pacific Conference on Intelligent-Agent Technology
IAT'99 was the first meeting in this new series and was held in Hong Kong from 14 to 17 December. It was sponsored by Hong Kong Baptist University, the Croucher Foundation, the Epson Foundation, The MIT Press, the Association for Computing Machinery (ACM) Hong Kong, and the Institute of Electrical and Electronics Engineers Hong Kong Section Computer Chapter and in cooperation with ACM Special Interest Groups in Artificial Intelligence (SIGART), Knowledge Discovery in Data (SIGKDD), and Computer-Human Interaction (SIGCHI). Jiming Liu (Hong Kong Baptist University) and Ning Zhong (Yamaguchi University, Japan) were the program chairs, and Setsuo Ohsuga (Waseda University) and Ernest Lam (Hong Kong Baptist University) were the general chairs. IAT'99 successfully brought together over 150 researchers and practitioners to share their original research results and practical development experiences in intelligent-agent technology. The participants were from Australia, Austria, Belgium, ...
Multiagent Systems
In this article, I describe several challenges facing the integration of two distinct lines of AI research: (1) decision-theoretic planning (DTP) and (2) multiagent systems. Both areas (especially the second) are attracting considerable interest, but work in multiagent systems often assumes either classical planning models or prespecified economic valuations on the part of the agents in question. By integrating models of DTP in multiagent systems research, more sophisticated multiagent planning scenarios can be accommodated, at the same time explaining precisely how agents determine their valuations for different sources or activities. I discuss several research challenges that emerge from this integration, involving the development of coordination protocols, the reasoning about lack of coordination, and the predicting of behavior in markets. I also briefly mention some opportunities afforded planning agents in multiagent settings and how these might be addressed.