AITopics

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Neural Information Processing SystemsOct-10-2024, 08:26:05 GMT

Variational Automatic Curriculum Learning for Sparse-Reward Cooperative Multi-Agent Problems

We introduce an automatic curriculum algorithm, Variational Automatic Curriculum Learning (VACL), for solving challenging goal-conditioned cooperative multi-agent reinforcement learning problems. We motivate our curriculum learning paradigm through a variational perspective, where the learning objective can be decomposed into two terms: task learning on the current curriculum, and curriculum update to a new task distribution. Local optimization over the second term suggests that the curriculum should gradually expand the training tasks from easy to hard. Our VACL algorithm implements this variational paradigm with two practical components, task expansion and entity curriculum, which produces a series of training tasks over both the task configurations as well as the number of entities in the task. Experiment results show that VACL solves a collection of sparse-reward problems with a large number of agents.

sparse-reward cooperative multi-agent problem, training task, variational automatic curriculum learning, (1 more...)

Industry: Education (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)

Diversity of Thought Elicits Stronger Reasoning Capabilities in Multi-Agent Debate Frameworks

Hegazy, Mahmood

Large language models (LLMs) excel in natural language generation but often confidently produce incorrect responses, especially in tasks like mathematical reasoning. Chain-of-thought prompting, self-verification, and multi-agent debate are among the strategies proposed to improve the reasoning and factual accuracy of LLMs. Building on Du et al.'s multi-agent debate framework, we find that multi-agent debate helps at any model scale, and that diversity of thought elicits stronger reasoning in debating LLMs. Across various model sizes, performance on mathematical reasoning tasks benefits most when diverse trained models are used. Remarkably, after 4 rounds of debate, a diverse set of medium-capacity models (Gemini-Pro, Mixtral 7BX8, and PaLM 2-M) outperforms GPT-4 on the GSM-8K benchmark, scoring 91% accuracy. By comparison, when 3 instances of Gemini-Pro are used, performance only reaches 82%. Finally, this diverse set of medium-capacity models sets a new state-of-the-art performance on the ASDiv benchmark (94%). These results underscore the idea that the future of AI is agentic, with diverse cooperating agents yielding emergent capabilities beyond even the most powerful individual models.

large language model, machine learning, natural language, (15 more...)

2410.12853

Country: North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.48)

Learning to Balance Altruism and Self-interest Based on Empathy in Mixed-Motive Games

Kong, Fanqi, Huang, Yizhe, Zhu, Song-Chun, Qi, Siyuan, Feng, Xue

Real-world multi-agent scenarios often involve mixed motives, demanding altruistic agents capable of self-protection against potential exploitation. However, existing approaches often struggle to achieve both objectives. In this paper, based on that empathic responses are modulated by inferred social relationships between agents, we propose LASE Learning to balance Altruism and Self-interest based on Empathy), a distributed multi-agent reinforcement learning algorithm that fosters altruistic cooperation through gifting while avoiding exploitation by other agents in mixed-motive games. LASE allocates a portion of its rewards to co-players as gifts, with this allocation adapting dynamically based on the social relationship -- a metric evaluating the friendliness of co-players estimated by counterfactual reasoning. In particular, social relationship measures each co-player by comparing the estimated $Q$-function of current joint action to a counterfactual baseline which marginalizes the co-player's action, with its action distribution inferred by a perspective-taking module. Comprehensive experiments are performed in spatially and temporally extended mixed-motive games, demonstrating LASE's ability to promote group collaboration without compromising fairness and its capacity to adapt policies to various types of interactive co-players.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2410.07863

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
Asia > China > Hubei Province > Wuhan (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry:

Information Technology (0.67)
Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.68)

Krishnamurthy, Vikram, Rojas, Cristian

Slow Convergence of Interacting Kalman Filters in Word-of-Mouth Social Learning

We consider word-of-mouth social learning involving $m$ Kalman filter agents that operate sequentially. The first Kalman filter receives the raw observations, while each subsequent Kalman filter receives a noisy measurement of the conditional mean of the previous Kalman filter. The prior is updated by the $m$-th Kalman filter. When $m=2$, and the observations are noisy measurements of a Gaussian random variable, the covariance goes to zero as $k^{-1/3}$ for $k$ observations, instead of $O(k^{-1})$ in the standard Kalman filter. In this paper we prove that for $m$ agents, the covariance decreases to zero as $k^{-(2^m-1)}$, i.e, the learning slows down exponentially with the number of agents. We also show that by artificially weighing the prior at each time, the learning rate can be made optimal as $k^{-1}$. The implication is that in word-of-mouth social learning, artificially re-weighing the prior can yield the optimal learning rate.

agent, artificial intelligence, machine learning, (15 more...)

2410.08447

Country:

North America > United States (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report (0.64)

Industry: Education > Curriculum (0.83)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.34)

Strategic Classification With Externalities

Chen, Yiling, Hossain, Safwan, Micha, Evi, Procaccia, Ariel

We propose a new variant of the strategic classification problem: a principal reveals a classifier, and $n$ agents report their (possibly manipulated) features to be classified. Motivated by real-world applications, our model crucially allows the manipulation of one agent to affect another; that is, it explicitly captures inter-agent externalities. The principal-agent interactions are formally modeled as a Stackelberg game, with the resulting agent manipulation dynamics captured as a simultaneous game. We show that under certain assumptions, the pure Nash Equilibrium of this agent manipulation game is unique and can be efficiently computed. Leveraging this result, PAC learning guarantees are established for the learner: informally, we show that it is possible to learn classifiers that minimize loss on the distribution, even when a random number of agents are manipulating their way to a pure Nash Equilibrium. We also comment on the optimization of such classifiers through gradient-based approaches. This work sets the theoretical foundations for a more realistic analysis of classifiers that are robust against multiple strategic actors interacting in a common environment.

agent, classifier, externality, (15 more...)

2410.08032

Country:

North America > United States > California (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Israel > Southern District > Eilat (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry:

Government (1.00)
Education > Educational Setting (0.93)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.48)

Saccani, Irene, Ardizzoni, Stefano, Consolini, Luca, Locatelli, Marco

Dynamic Programming based Local Search approaches for Multi-Agent Path Finding problems on Directed Graphs

Among sub-optimal Multi-Agent Path Finding (MAPF) solvers, rule-based algorithms are particularly appealing since they are complete. Even in crowded scenarios, they allow finding a feasible solution that brings each agent to its target, preventing deadlock situations. However, generally, rule-based algorithms provide much longer solutions than the shortest one. The main contribution of this paper is introducing a new local search procedure for improving a known feasible solution. We start from a feasible sub-optimal solution, and perform a local search in a neighborhood of this solution. If we are able to find a shorter solution, we repeat this procedure until the solution cannot be shortened anymore. At the end, we obtain a solution that is still sub-optimal, but generally of much better quality than the initial one. We propose two different local search policies. In the first, we explore all paths in which the agents positions remain in a neighborhood of the corresponding positions of the reference solution. In the second, we set an upper limit to the number of agents that can change their path with respect to the reference solution. These two different policies can also be alternated. We explore the neighborhoods by dynamic programming. The fact that our search is local is fundamental in terms of time complexity. Indeed, if the dynamic programming approach is applied to the full MAPF problem, the number of explored states grows exponentially with the number of agents. Instead, the introduction of a locality constraint allows exploring the neghborhoods in a time that grows polynomially with respect to the number of agents.

agent, configuration, neighborhood, (16 more...)

2410.07954

Country:

North America > Mexico > Quintana Roo > Cancún (0.04)
Europe > Italy (0.04)
Asia > Thailand > Phuket > Phuket (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.77)

Neural Information Processing SystemsOct-9-2024, 21:00:57 GMT

A Structured Prediction Approach for Generalization in Cooperative Multi-Agent Reinforcement Learning

Effective coordination is crucial to solve multi-agent collaborative (MAC) problems. While centralized reinforcement learning methods can optimally solve small MAC instances, they do not scale to large problems and they fail to generalize to scenarios different from those seen during training. In this paper, we consider MAC problems with some intrinsic notion of locality (e.g., geographic proximity) such that interactions between agents and tasks are locally limited. By leveraging this property, we introduce a novel structured prediction approach to assign agents to tasks. At each step, the assignment is obtained by solving a centralized optimization problem (the inference procedure) whose objective function is parameterized by a learned scoring model.

cooperative multi-agent reinforcement learning, generalization, structured prediction approach, (2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.40)

Neural Information Processing SystemsOct-9-2024, 14:52:28 GMT

LDSA: Learning Dynamic Subtask Assignment in Cooperative Multi-Agent Reinforcement Learning

Cooperative multi-agent reinforcement learning (MARL) has made prominent progress in recent years. For training efficiency and scalability, most of the MARL algorithms make all agents share the same policy or value network. However, in many complex multi-agent tasks, different agents are expected to possess specific abilities to handle different subtasks. In those scenarios, sharing parameters indiscriminately may lead to similar behavior across all agents, which will limit the exploration efficiency and degrade the final performance. To balance the training complexity and the diversity of agent behavior, we propose a novel framework to learn dynamic subtask assignment (LDSA) in cooperative MARL. Specifically, we first introduce a subtask encoder to construct a vector representation for each subtask according to its identity.

cooperative multi-agent reinforcement learning, different subtask, learning dynamic subtask assignment, (3 more...)

Genre: Play > Prospect > Charge (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.64)

Neural Information Processing SystemsOct-9-2024, 10:39:49 GMT

Queue Up Your Regrets: Achieving the Dynamic Capacity Region of Multiplayer Bandits

Consider N cooperative agents such that for T turns, each agent n takes an action a_{n} and receives a stochastic reward r_{n}\left(a_{1},\ldots,a_{N}\right) . Agents cannot observe the actions of other agents and do not know even their own reward function. The agents can communicate with their neighbors on a connected graph G with diameter d\left(G\right) . We want each agent n to achieve an expected average reward of at least \lambda_{n} over time, for a given quality of service (QoS) vector \boldsymbol{\lambda} . By giving up on immediate reward, knowing that the other agents will compensate later, agents can improve their achievable capacity region.

agent, boldsymbol, dynamic capacity region, (8 more...)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.59)