AITopics | Agent Societies

Collaborating Authors

Agent Societies

News Overviews Instructional Materials AI-Alerts Classics

Toward Risk-based Optimistic Exploration for Cooperative Multi-Agent Reinforcement Learning

Oh, Jihwan, Kim, Joonkee, Jeong, Minchan, Yun, Se-Young

arXiv.org Artificial IntelligenceMar-3-2023

The multi-agent setting is intricate and unpredictable since the behaviors of multiple agents influence one another. To address this environmental uncertainty, distributional reinforcement learning algorithms that incorporate uncertainty via distributional output have been integrated with multi-agent reinforcement learning (MARL) methods, achieving state-of-the-art performance. However, distributional MARL algorithms still rely on the traditional $\epsilon$-greedy, which does not take cooperative strategy into account. In this paper, we present a risk-based exploration that leads to collaboratively optimistic behavior by shifting the sampling region of distribution. Initially, we take expectations from the upper quantiles of state-action values for exploration, which are optimistic actions, and gradually shift the sampling region of quantiles to the full distribution for exploitation. By ensuring that each agent is exposed to the same level of risk, we can force them to take cooperatively optimistic actions. Our method shows remarkable performance in multi-agent settings requiring cooperative exploration based on quantile regression appropriately controlling the level of risk.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2303.01768

Country: Asia > South Korea (0.28)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Games > Computer Games (0.46)
Energy > Oil & Gas > Upstream (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.41)

Add feedback

GHQ: Grouped Hybrid Q Learning for Heterogeneous Cooperative Multi-agent Reinforcement Learning

Yu, Xiaoyang, Lin, Youfang, Wang, Xiangsen, Han, Sheng, Lv, Kai

arXiv.org Artificial IntelligenceMar-2-2023

Previous deep multi-agent reinforcement learning (MARL) algorithms have achieved impressive results, typically in homogeneous scenarios. However, heterogeneous scenarios are also very common and usually harder to solve. In this paper, we mainly discuss cooperative heterogeneous MARL problems in Starcraft Multi-Agent Challenges (SMAC) environment. We firstly define and describe the heterogeneous problems in SMAC. In order to comprehensively reveal and study the problem, we make new maps added to the original SMAC maps. We find that baseline algorithms fail to perform well in those heterogeneous maps. To address this issue, we propose the Grouped Individual-Global-Max Consistency (GIGM) and a novel MARL algorithm, Grouped Hybrid Q Learning (GHQ). GHQ separates agents into several groups and keeps individual parameters for each group, along with a novel hybrid structure for factorization. To enhance coordination between groups, we maximize the Inter-group Mutual Information (IGMI) between groups' trajectories. Experiments on original and new heterogeneous maps show the fabulous performance of GHQ compared to other state-of-the-art algorithms.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

2303.0107

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Multiagent Inverse Reinforcement Learning via Theory of Mind Reasoning

Wu, Haochen, Sequeira, Pedro, Pynadath, David V.

arXiv.org Artificial IntelligenceMar-1-2023

We approach the problem of understanding how people interact with each other in collaborative settings, especially when individuals know little about their teammates, via Multiagent Inverse Reinforcement Learning (MIRL), where the goal is to infer the reward functions guiding the behavior of each individual given trajectories of a team's behavior during some task. Unlike current MIRL approaches, we do not assume that team members know each other's goals a priori; rather, that they collaborate by adapting to the goals of others perceived by observing their behavior, all while jointly performing a task. To address this problem, we propose a novel approach to MIRL via Theory of Mind (MIRL-ToM). For each agent, we first use ToM reasoning to estimate a posterior distribution over baseline reward profiles given their demonstrated behavior. We then perform MIRL via decentralized equilibrium by employing single-agent Maximum Entropy IRL to infer a reward function for each agent, where we simulate the behavior of other teammates according to the time-varying distribution over profiles. We evaluate our approach in a simulated 2-player search-and-rescue operation where the goal of the agents, playing different roles, is to search for and evacuate victims in the environment. Our results show that the choice of baseline profiles is paramount to the recovery of the ground-truth rewards, and that MIRL-ToM is able to recover the rewards used by agents interacting both with known and unknown teammates.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.5555/3545946.3598703

2302.10238

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > United Kingdom > England > Greater London > London (0.04)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
(2 more...)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Parameter Sharing with Network Pruning for Scalable Multi-Agent Deep Reinforcement Learning

Kim, Woojun, Sung, Youngchul

arXiv.org Artificial IntelligenceMar-1-2023

Handling the problem of scalability is one of the essential issues for multi-agent reinforcement learning (MARL) algorithms to be applied to real-world problems typically involving massively many agents. For this, parameter sharing across multiple agents has widely been used since it reduces the training time by decreasing the number of parameters and increasing the sample efficiency. However, using the same parameters across agents limits the representational capacity of the joint policy and consequently, the performance can be degraded in multi-agent tasks that require different behaviors for different agents. In this paper, we propose a simple method that adopts structured pruning for a deep neural network to increase the representational capacity of the joint policy without introducing additional parameters. We evaluate the proposed method on several benchmark tasks, and numerical results show that the proposed method significantly outperforms other parameter-sharing methods.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2303.00912

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
Asia > South Korea > Daejeon > Daejeon (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

From Smart Sensing to Consciousness: An info-structural model of computational consciousness for non-interacting agents

Iovane, Gerardo, Landi, Riccardo Emanuele

arXiv.org Artificial IntelligenceMar-1-2023

This study proposes a model of computational consciousness for non-interacting agents. The phenomenon of interest was assumed as sequentially dependent on the cognitive tasks of sensation, perception, emotion, affection, attention, awareness, and consciousness. Starting from the Smart Sensing prodromal study, the cognitive layers associated with the processes of attention, awareness, and consciousness were formally defined and tested together with the other processes concerning sensation, perception, emotion, and affection. The output of the model consists of an index that synthesizes the energetic and entropic contributions of consciousness from a computationally moral perspective. Attention was modeled through a bottom-up approach, while awareness and consciousness by distinguishing environment from subjective cognitive processes. By testing the solution on visual stimuli eliciting the emotions of happiness, anger, fear, surprise, contempt, sadness, disgust, and the neutral state, it was found that the proposed model is concordant with the scientific evidence concerning covert attention. Comparable results were also obtained regarding studies investigating awareness as a consequence of visual stimuli repetition, as well as those investigating moral judgments to visual stimuli eliciting disgust and sadness. The solution represents a novel approach for defining computational consciousness through artificial emotional activity and morality.

artificial intelligence, consciousness, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2209.02414

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre:

Research Report > New Finding (0.94)
Research Report > Experimental Study (0.68)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.60)

Add feedback

Ask and You Shall be Served: Representing and Solving Multi-agent Optimization Problems with Service Requesters and Providers

Lavie, Maya, Caspi, Tehila, Lev, Omer, Zivan, Roei

arXiv.org Artificial IntelligenceFeb-28-2023

In scenarios with numerous emergencies that arise and require the assistance of various rescue units (e.g., medical, fire, \& police forces), the rescue units would ideally be allocated quickly and distributedly while aiming to minimize casualties. This is one of many examples of distributed settings with service providers (the rescue units) and service requesters (the emergencies) which we term \textit{service oriented settings}. Allocating the service providers in a distributed manner while aiming for a global optimum is hard to model, let alone achieve, using the existing Distributed Constraint Optimization Problem (DCOP) framework. Hence, the need for a novel approach and corresponding algorithms. We present the Service Oriented Multi-Agent Optimization Problem (SOMAOP), a new framework that overcomes the shortcomings of DCOP in service oriented settings. We evaluate the framework using various algorithms based on auctions and matching algorithms (e.g., Gale Shapely). We empirically show that algorithms based on repeated auctions converge to a high quality solution very fast, while repeated matching problems converge slower, but produce higher quality solutions. We demonstrate the advantages of our approach over standard incomplete DCOP algorithms and a greedy centralized algorithm.

algorithm, artificial intelligence, optimization problem, (18 more...)

arXiv.org Artificial Intelligence

2302.14507

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
Asia > Middle East > Israel (0.04)
Europe > France > Occitanie > Hérault > Montpellier (0.04)

Genre: Research Report > Promising Solution (0.48)

Industry: Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

Add feedback

Planning-Assisted Context-Sensitive Autonomous Shepherding of Dispersed Robotic Swarms in Obstacle-Cluttered Environments

Liu, Jing, Singh, Hemant, Elsayed, Saber, Hunjet, Robert, Abbass, Hussein

arXiv.org Artificial IntelligenceFeb-27-2023

Robotic shepherding is a bio-inspired approach to autonomously guiding a swarm of agents towards a desired location. The research area has earned increasing research interest recently due to the efficacy of controlling a large number of agents in a swarm (sheep) using a smaller number of actuators (sheepdogs). However, shepherding a highly dispersed swarm in an obstacle-cluttered environment remains challenging for existing methods. To improve the efficacy of shepherding in complex environments with obstacles and dispersed sheep, this paper proposes a planning-assisted context-sensitive autonomous shepherding framework with collision avoidance abilities. The proposed approach models the swarm shepherding problem as a single Travelling Salesperson Problem (TSP), with two sheepdogs\textquoteright\ modes: no-interaction and interaction. An adaptive switching approach is integrated into the framework to guide real-time path planning for avoiding collisions with static and dynamic obstacles; the latter representing moving sheep swarms. We then propose an overarching hierarchical mission planning system, which is made of three sub-systems: a clustering approach to group and distinguish sheep sub-swarms, an Ant Colony Optimisation algorithm as a TSP solver for determining the optimal herding sequence of the sub-swarms, and an online path planner for calculating optimal paths for both sheepdogs and sheep. The experiments on various environments, both with and without obstacles, objectively demonstrate the effectiveness of the proposed shepherding framework and planning approaches.

evolutionary algorithm, machine learning, sheepdog, (17 more...)

arXiv.org Artificial Intelligence

2301.10363

Country:

Oceania > Australia > New South Wales (0.04)
Oceania > Australia > Australian Capital Territory > Canberra (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Industry:

Transportation (0.66)
Food & Agriculture > Agriculture (0.46)
Government > Military (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.34)

Add feedback

PACCART: Reinforcing Trust in Multiuser Privacy Agreement Systems

Di Scala, Daan, Yolum, Pınar

arXiv.org Artificial IntelligenceFeb-27-2023

Collaborative systems, such as Online Social Networks and the Internet of Things, enable users to share privacy sensitive content. Content in these systems is often co-owned by multiple users with different privacy expectations, leading to possible multiuser privacy conflicts. In order to resolve these conflicts, various agreement mechanisms have been designed and agents that could participate in such mechanisms have been proposed. However, research shows that users hesitate to use software tools for managing their privacy. To remedy this, we argue that users should be supported by trustworthy agents that adhere to the following criteria: (i) concealment of privacy preferences, such that only necessary information is shared with others, (ii) equity of treatment, such that different kinds of users are supported equally, (iii) collaboration of users, such that a group of users can support each other in agreement and (iv) explainability of actions, such that users know why certain information about them was shared to reach a decision. Accordingly, this paper proposes PACCART, an open-source agent that satisfies these criteria. Our experiments over simulations and user study indicate that PACCART increases user trust significantly.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2302.1365

Genre:

Research Report (1.00)
Questionnaire & Opinion Survey (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (0.94)
Information Technology > Services (0.66)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.93)
Information Technology > Communications > Networks (0.88)
(4 more...)

Add feedback

Temporal Difference Learning with Compressed Updates: Error-Feedback meets Reinforcement Learning

Mitra, Aritra, Pappas, George J., Hassani, Hamed

arXiv.org Artificial IntelligenceFeb-26-2023

In large-scale machine learning, recent works have studied the effects of compressing gradients in stochastic optimization in order to alleviate the communication bottleneck. These works have collectively revealed that stochastic gradient descent (SGD) is robust to structured perturbations such as quantization, sparsification, and delays. Perhaps surprisingly, despite the surge of interest in large-scale, multi-agent reinforcement learning, almost nothing is known about the analogous question: Are common reinforcement learning (RL) algorithms also robust to similar perturbations? In this paper, we investigate this question by studying a variant of the classical temporal difference (TD) learning algorithm with a perturbed update direction, where a general compression operator is used to model the perturbation. Our main technical contribution is to show that compressed TD algorithms, coupled with an error-feedback mechanism used widely in optimization, exhibit the same non-asymptotic theoretical guarantees as their SGD counterparts. We then extend our results significantly to nonlinear stochastic approximation algorithms and multi-agent settings. In particular, we prove that for multi-agent TD learning, one can achieve linear convergence speedups in the number of agents while communicating just $\tilde{O}(1)$ bits per agent at each time step. Our work is the first to provide finite-time results in RL that account for general compression operators and error-feedback in tandem with linear function approximation and Markovian sampling. Our analysis hinges on studying the drift of a novel Lyapunov function that captures the dynamics of a memory variable introduced by error feedback.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2301.00944

Country:

North America > United States > Pennsylvania (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.34)

Add feedback

Order Matters: Agent-by-agent Policy Optimization

Wang, Xihuai, Tian, Zheng, Wan, Ziyu, Wen, Ying, Wang, Jun, Zhang, Weinan

arXiv.org Artificial IntelligenceFeb-26-2023

While multi-agent trust region algorithms have achieved great success empirically in solving coordination tasks, most of them, however, suffer from a non-stationarity problem since agents update their policies simultaneously. In contrast, a sequential scheme that updates policies agent-by-agent provides another perspective and shows strong performance. However, sample inefficiency and lack of monotonic improvement guarantees for each agent are still the two significant challenges for the sequential scheme. In this paper, we propose the \textbf{A}gent-by-\textbf{a}gent \textbf{P}olicy \textbf{O}ptimization (A2PO) algorithm to improve the sample efficiency and retain the guarantees of monotonic improvement for each agent during training. We justify the tightness of the monotonic improvement bound compared with other trust region algorithms. From the perspective of sequentially updating agents, we further consider the effect of agent updating order and extend the theory of non-stationarity into the sequential update scheme. To evaluate A2PO, we conduct a comprehensive empirical study on four benchmarks: StarCraftII, Multi-agent MuJoCo, Multi-agent Particle Environment, and Google Research Football full game scenarios. A2PO consistently outperforms strong baselines.

agent, artificial intelligence, step 10 7, (17 more...)

arXiv.org Artificial Intelligence

2302.06205

Country:

Asia > China > Shanghai > Shanghai (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
Asia > Middle East > Jordan (0.04)
(12 more...)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment > Sports > Soccer (0.46)
Leisure & Entertainment > Sports > Football (0.45)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

Add feedback