Agents
Recovering network topology and dynamics via sequence characterization
Guerreiro, Lucas, Silva, Filipi N., Amancio, Diego R.
Sequences arise in many real-world scenarios; thus, identifying the mechanisms behind symbol generation is essential to understanding many complex systems. This paper analyzes sequences generated by agents walking on a networked topology. Given that in many real scenarios, the underlying processes generating the sequence is hidden, we investigate whether the reconstruction of the network via the co-occurrence method is useful to recover both the network topology and agent dynamics generating sequences. We found that the characterization of reconstructed networks provides valuable information regarding the process and topology used to create the sequences. In a machine learning approach considering 16 combinations of network topology and agent dynamics as classes, we obtained an accuracy of 87% with sequences generated with less than 40% of nodes visited. More extensive sequences turned out to generate improved machine learning models. Our findings suggest that the proposed methodology could be extended to classify sequences and understand the mechanisms behind sequence generation.
On the Near-Optimality of Local Policies in Large Cooperative Multi-Agent Reinforcement Learning
Mondal, Washim Uddin, Aggarwal, Vaneet, Ukkusuri, Satish V.
We show that in a cooperative $N$-agent network, one can design locally executable policies for the agents such that the resulting discounted sum of average rewards (value) well approximates the optimal value computed over all (including non-local) policies. Specifically, we prove that, if $|\mathcal{X}|, |\mathcal{U}|$ denote the size of state, and action spaces of individual agents, then for sufficiently small discount factor, the approximation error is given by $\mathcal{O}(e)$ where $e\triangleq \frac{1}{\sqrt{N}}\left[\sqrt{|\mathcal{X}|}+\sqrt{|\mathcal{U}|}\right]$. Moreover, in a special case where the reward and state transition functions are independent of the action distribution of the population, the error improves to $\mathcal{O}(e)$ where $e\triangleq \frac{1}{\sqrt{N}}\sqrt{|\mathcal{X}|}$. Finally, we also devise an algorithm to explicitly construct a local policy. With the help of our approximation results, we further establish that the constructed local policy is within $\mathcal{O}(\max\{e,\epsilon\})$ distance of the optimal policy, and the sample complexity to achieve such a local policy is $\mathcal{O}(\epsilon^{-3})$, for any $\epsilon>0$.
Robust Event-Driven Interactions in Cooperative Multi-Agent Learning
Ornia, Daniel Jarne, Mazo, Manuel Jr
Lately, with the wide adoption of Deep Learning techniques for compact representations of value functions and policies in model-free problems [16, 21, 34], the field of Multi-Agent Reinforcement Learning (MARL) has seen an explosion in the applications of such algorithms to solve real-world problems [19]. However, this has naturally led to a trend where both the amount of data handled in such data driven approaches and the complexity of the targeted problems grow exponentially. In a MARL setting where communication between agents is required, this may inevitably lead to restrictive requirements in the frequency and reliability of the communication to and from each agents (as it was already pointed out in [23]). The effect of asynchronous communication in dynamic programming problems was studied already in [2]. In particular, one of the first examples of how communication affects learning and policy performance in MARL is found in [31], where the author investigates the impact of agents sharing different combinations of state variable subsets or Q values.
Responsibility: An Example-based Explainable AI approach via Training Process Inspection
Khadivpour, Faraz, Banerjee, Arghasree, Guzdial, Matthew
Explainable Artificial Intelligence (XAI) methods are intended to help human users better understand the decision making of an AI agent. However, many modern XAI approaches are unintuitive to end users, particularly those without prior AI or ML knowledge. In this paper, we present a novel XAI approach we call Responsibility that identifies the most responsible training example for a particular decision. This example can then be shown as an explanation: "this is what I (the AI) learned that led me to do that". We present experimental results across a number of domains along with the results of an Amazon Mechanical Turk user study, comparing responsibility and existing XAI methods on an image classification task. Our results demonstrate that responsibility can help improve accuracy for both human end users and secondary ML models.
Obtaining Robust Control and Navigation Policies for Multi-Robot Navigation via Deep Reinforcement Learning
Jestel, Christian, Surmann, Hartmut, Stenzel, Jonas, Urbann, Oliver, Brehler, Marius
Multi-robot-navigation is one of the main challenges in mobile robotics. Multiple robots must be coordinated simultaneously to finish their task and have to navigate through a complex dynamic environment without causing collisions. One approach to enable the coordination of multi-robot navigation is prioritized planning, where robots plan their trajectories sequentially one after another. Prioritized planning algorithms tend to find a deadlock-free solution for route planning and centralized as well as decentralized planning solutions exist [1]. With a centralized approach all robots are coordinated by a single system, whereas navigation conflicts are resolved via communication between the robots in decentralized approaches. Prioritized path planning approaches tend to find solutions for scenarios with a high number of robots, while other approaches or reactive collisionavoidance algorithms like ORCA [2] fail. However, the main drawback of centralized approaches is the bad scalability as the planning complexity increases drastically with the number of robots and the size and complexity of the environment [3]. Additionally, a reliable and synchronized communication between the centralized planner and all robots is essential. Decentralized approaches often rely on communication between robots in order to share state information (e.g.
On Decentralizing Federated Reinforcement Learning in Multi-Robot Scenarios
Nair, Jayprakash S., Kulkarni, Divya D., Joshi, Ajitem, Suresh, Sruthy
Federated Learning (FL) allows for collaboratively aggregating learned information across several computing devices and sharing the same amongst them, thereby tackling issues of privacy and the need of huge bandwidth. FL techniques generally use a central server or cloud for aggregating the models received from the devices. Such centralized FL techniques suffer from inherent problems such as failure of the central node and bottlenecks in channel bandwidth. When FL is used in conjunction with connected robots serving as devices, a failure of the central controlling entity can lead to a chaotic situation. This paper describes a mobile agent based paradigm to decentralize FL in multi-robot scenarios. Using Webots, a popular free open-source robot simulator, and Tartarus, a mobile agent platform, we present a methodology to decentralize federated learning in a set of connected robots. With Webots running on different connected computing systems, we show how mobile agents can perform the task of Decentralized Federated Reinforcement Learning (dFRL). Results obtained from experiments carried out using Q-learning and SARSA by aggregating their corresponding Q-tables, show the viability of using decentralized FL in the domain of robotics. Since the proposed work can be used in conjunction with other learning algorithms and also real robots, it can act as a vital tool for the study of decentralized FL using heterogeneous learning algorithms concurrently in multi-robot scenarios.
JARVIS: A Neuro-Symbolic Commonsense Reasoning Framework for Conversational Embodied Agents
Zheng, Kaizhi, Zhou, Kaiwen, Gu, Jing, Fan, Yue, Wang, Jialu, Di, Zonglin, He, Xuehai, Wang, Xin Eric
Building a conversational embodied agent to execute real-life tasks has been a long-standing yet quite challenging research goal, as it requires effective human-agent communication, multi-modal understanding, long-range sequential decision making, etc. Traditional symbolic methods have scaling and generalization issues, while end-to-end deep learning models suffer from data scarcity and high task complexity, and are often hard to explain. To benefit from both worlds, we propose JARVIS, a neuro-symbolic commonsense reasoning framework for modular, generalizable, and interpretable conversational embodied agents. First, it acquires symbolic representations by prompting large language models (LLMs) for language understanding and sub-goal planning, and by constructing semantic maps from visual observations. Then the symbolic module reasons for sub-goal planning and action generation based on task- and action-level common sense. Extensive experiments on the TEACh dataset validate the efficacy and efficiency of our JARVIS framework, which achieves state-of-the-art (SOTA) results on all three dialog-based embodied tasks, including Execution from Dialog History (EDH), Trajectory from Dialog (TfD), and Two-Agent Task Completion (TATC) (e.g., our method boosts the unseen Success Rate on EDH from 6.1\% to 15.8\%). Moreover, we systematically analyze the essential factors that affect the task performance and also demonstrate the superiority of our method in few-shot settings. Our JARVIS model ranks first in the Alexa Prize SimBot Public Benchmark Challenge.
Forget chess, DeepMind's training its new AI to play football
Researchers from DeepMind, the UK's juggernaut AI lab, have forsaken the noble games of chess and Go for a more plebeian delight: football. The Google sister company yesterday published a research paper and accompanying blog post detailing its new neural probabilistic motor primitives (NPMP) -- a method by which artificial intelligence agents can learn to operate physical bodies. An NPMP is a general-purpose motor control module that translates short-horizon motor intentions to low-level control signals, and it's trained offline or via RL by imitating motion capture (MoCap) data, recorded with trackers on humans or animals performing motions of interest. Up front: Essentially, the DeepMind team created an AI system that can learn how to do things inside of a physics simulator by watching videos of other agents performing those tasks. And, of course, if you've got a giant physics engine and an endless supply of curious robots, the only rational thing to do is to teach it how to dribble and shoot: We optimized teams of agents to play simulated football via reinforcement learning, constraining the solution space to that of plausible movements learned using human motion capture data. Background: In order to train AI to operate and control robots in the world, researchers have to prepare the machines for reality.
KT-BT: A Framework for Knowledge Transfer Through Behavior Trees in Multi-Robot Systems
Venkata, Sanjay Sarma Oruganti, Parasuraman, Ramviyas, Pidaparti, Ramana
Multi-Robot and Multi-Agent Systems demonstrate collective (swarm) intelligence through systematic and distributed integration of local behaviors in a group. Agents sharing knowledge about the mission and environment can enhance performance at individual and mission levels. However, this is difficult to achieve, partly due to the lack of a generic framework for transferring part of the known knowledge (behaviors) between agents. This paper presents a new knowledge representation framework and a transfer strategy called KT-BT: Knowledge Transfer through Behavior Trees. The KT-BT framework follows a query-response-update mechanism through an online Behavior Tree framework, where agents broadcast queries for unknown conditions and respond with appropriate knowledge using a condition-action-control sub-flow. We embed a novel grammar structure called stringBT that encodes knowledge, enabling behavior sharing. We theoretically investigate the properties of the KT-BT framework in achieving homogeneity of high knowledge across the entire group compared to a heterogeneous system without the capability of sharing their knowledge. We extensively verify our framework in a simulated multi-robot search and rescue problem. The results show successful knowledge transfers and improved group performance in various scenarios. We further study the effects of opportunities and communication range on group performance, knowledge spread, and functional heterogeneity in a group of agents, presenting interesting insights.
A Zeroth-Order Momentum Method for Risk-Averse Online Convex Games
Wang, Zifan, Shen, Yi, Bell, Zachary I., Nivison, Scott, Zavlanos, Michael M., Johansson, Karl H.
We consider risk-averse learning in repeated unknown games where the goal of the agents is to minimize their individual risk of incurring significantly high cost. Specifically, the agents use the conditional value at risk (CVaR) as a risk measure and rely on bandit feedback in the form of the cost values of the selected actions at every episode to estimate their CVaR values and update their actions. A major challenge in using bandit feedback to estimate CVaR is that the agents can only access their own cost values, which, however, depend on the actions of all agents. To address this challenge, we propose a new risk-averse learning algorithm with momentum that utilizes the full historical information on the cost values. We show that this algorithm achieves sub-linear regret and matches the best known algorithms in the literature. We provide numerical experiments for a Cournot game that show that our method outperforms existing methods.