Agents
Emergence of Theory of Mind Collaboration in Multiagent Systems
Yuan, Luyao, Fu, Zipeng, Zhou, Linqi, Yang, Kexin, Zhu, Song-Chun
Currently, in the study of multiagent systems, the intentions of agents are usually ignored. Nonetheless, as pointed out by Theory of Mind (ToM), people regularly reason about other's mental states, including beliefs, goals, and intentions, to obtain performance advantage in competition, cooperation or coalition. However, due to its intrinsic recursion and intractable modeling of distribution over belief, integrating ToM in multiagent planning and decision making is still a challenge. In this paper, we incorporate ToM in multiagent partially observable Markov decision process (POMDP) and propose an adaptive training algorithm to develop effective collaboration between agents with ToM. We evaluate our algorithms with two games, where our algorithm surpasses all previous decentralized execution algorithms without modeling ToM.
Decentralized Graph-Based Multi-Agent Reinforcement Learning Using Reward Machines
Hu, Jueming, Xu, Zhe, Wang, Weichang, Qu, Guannan, Pang, Yutian, Liu, Yongming
In multi-agent reinforcement learning (MARL), it is challenging for a collection of agents to learn complex temporally extended tasks. The difficulties lie in computational complexity and how to learn the high-level ideas behind reward functions. We study the graph-based Markov Decision Process (MDP) where the dynamics of neighboring agents are coupled. We use a reward machine (RM) to encode each agent's task and expose reward function internal structures. RM has the capacity to describe high-level knowledge and encode non-Markovian reward functions. We propose a decentralized learning algorithm to tackle computational complexity, called decentralized graph-based reinforcement learning using reward machines (DGRM), that equips each agent with a localized policy, allowing agents to make decisions independently, based on the information available to the agents. DGRM uses the actor-critic structure, and we introduce the tabular Q-function for discrete state problems. We show that the dependency of Q-function on other agents decreases exponentially as the distance between them increases. Furthermore, the complexity of DGRM is related to the local information size of the largest $\kappa$-hop neighborhood, and DGRM can find an $O(\rho^{\kappa+1})$-approximation of a stationary point of the objective function. To further improve efficiency, we also propose the deep DGRM algorithm, using deep neural networks to approximate the Q-function and policy function to solve large-scale or continuous state problems. The effectiveness of the proposed DGRM algorithm is evaluated by two case studies, UAV package delivery and COVID-19 pandemic mitigation. Experimental results show that local information is sufficient for DGRM and agents can accomplish complex tasks with the help of RM. DGRM improves the global accumulated reward by 119% compared to the baseline in the case of COVID-19 pandemic mitigation.
Modeling Interactions of Autonomous Vehicles and Pedestrians with Deep Multi-Agent Reinforcement Learning for Collision Avoidance
Trumpp, Raphael, Bayerlein, Harald, Gesbert, David
Reliable pedestrian crash avoidance mitigation (PCAM) systems are crucial components of safe autonomous vehicles (AVs). The sequential nature of the vehicle-pedestrian interaction, i.e., where immediate decisions of one agent directly influence the following decisions of the other agent, is an often neglected but important aspect. In this work, we model the corresponding interaction sequence as a Markov decision process (MDP) that is solved by deep reinforcement learning (DRL) algorithms to define the PCAM system's policy. The simulated driving scenario is based on an AV acting as a DRL agent driving along an urban street, facing a pedestrian at an unmarked crosswalk who tries to cross. Since modeling realistic crossing behavior of the pedestrian is challenging, we introduce two levels of intelligent pedestrian behavior: While the baseline model follows a predefined strategy, our advanced model captures continuous learning and the inherent uncertainty in human behavior by defining the pedestrian as a second DRL agent, i.e., we introduce a deep multi-agent reinforcement learning (DMARL) problem. The presented PCAM system with different levels of intelligent pedestrian behavior is benchmarked according to the agents' collision rate and the resulting traffic flow efficiency. In this analysis, our focus lies on evaluating the influence of observation noise on the decision making of the agents. The results show that the AV is able to completely mitigate collisions under the majority of the investigated conditions and that the DRL-based pedestrian model indeed learns a more human-like crossing behavior.
Subdimensional Expansion Using Attention-Based Learning For Multi-Agent Path Finding
Virmani, Lakshay, Ren, Zhongqiang, Rathinam, Sivakumar, Choset, Howie
Multi-Agent Path Finding (MAPF) finds conflict-free paths for multiple agents from their respective start to goal locations. MAPF is challenging as the joint configuration space grows exponentially with respect to the number of agents. Among MAPF planners, search-based methods, such as CBS and M*, effectively bypass the curse of dimensionality by employing a dynamically-coupled strategy: agents are planned in a fully decoupled manner at first, where potential conflicts between agents are ignored; and then agents either follow their individual plans or are coupled together for planning to resolve the conflicts between them. In general, the number of conflicts to be resolved decides the run time of these planners and most of the existing work focuses on how to efficiently resolve these conflicts. In this work, we take a different view and aim to reduce the number of conflicts (and thus improve the overall search efficiency) by improving each agent's individual plan. By leveraging a Visual Transformer, we develop a learning-based single-agent planner, which plans for a single agent while paying attention to both the structure of the map and other agents with whom conflicts may happen. We then develop a novel multi-agent planner called LM* by integrating this learning-based single-agent planner with M*. Our results show that for both "seen" and "unseen" maps, in comparison with M*, LM* has fewer conflicts to be resolved and thus, runs faster and enjoys higher success rates. We empirically show that MAPF solutions computed by LM* are near-optimal. Our code is available at https://github.com/lakshayvirmani/learning-assisted-mstar .
From Organisational Structure to Organisational Behaviour Formalisation
Jonker, Catholijn M., Treur, Jan
As the complexity of systems based on multiple software agents increases, as is the case, for example in the context of Internet, their dynamics are less easy to predict and to manage. A recent development is to incorporate organisation modelling methods within the software engineering process of multi-agent systems. Indeed, like complex agent-based software systems, societies are characterised by complex dynamics involving interaction between large numbers of actors and groups of actors. If within society such complex dynamics would take place in an completely unstructured, incoherent manner, any actor involved has not much to rely on to do prediction, and therefore is not able to function in a knowledgeable manner. This has serious disadvantages, which is a reason why in history within human societies organisational structure has been developed as a means to manage complex dynamics. Here it is assumed that organisational structure provides co-ordination of the processes in such a manner that a process or agent involved can function in a more adequate manner. So the basic assumption is that providing organisational structure has implications to organisational dynamics. The dynamics induced by a given organisational structure are much more dependable than in an entirely unstructured situation. It is assumed that the organisational structure itself is relatively stable, i.e., the structure may change, but the frequency and scale of change are
A taxonomy of strategic human interactions in traffic conflicts
Sarkar, Atrisha, Larson, Kate, Czarnecki, Krzysztof
In order to enable autonomous vehicles (AV) to navigate busy traffic situations, in recent years there has been a focus on game-theoretic models for strategic behavior planning in AVs. However, a lack of common taxonomy impedes a broader understanding of the strategies the models generate as well as the development of safety specification to identity what strategies are safe for an AV to execute. Based on common patterns of interaction in traffic conflicts, we develop a taxonomy for strategic interactions along the dimensions of agents' initial response to right-of-way rules and subsequent response to other agents' behavior. Furthermore, we demonstrate a process of automatic mapping of strategies generated by a strategic planner to the categories in the taxonomy, and based on vehicle-vehicle and vehicle-pedestrian interaction simulation, we evaluate two popular solution concepts used in strategic planning in AVs, QLk and Subgame perfect $\epsilon$-Nash Equilibrium, with respect to those categories.
Global cooperation on autonomous driving advancing sector-Ecns.cn
An autonomous bus has a test drive with passengers aboard in Qingdao, Shandong province on Sept 19. International carmakers are partnering with Chinese companies to tailor autonomous driving solutions for their vehicles sold in the world's largest vehicle market. Last week, the largest carmaker in the United States said it is investing $300 million in Chinese autonomous driving startup Momenta. The deal is expected to accelerate General Motors' development of self-driving technologies for its vehicles in China, said Julian Blissett, executive vice-president of GM and president of GM China. "Customers in China are embracing electrification and advanced self-driving technology faster than anywhere else in the world," Blissett said.
Grouptron: Dynamic Multi-Scale Graph Convolutional Networks for Group-Aware Dense Crowd Trajectory Forecasting
Zhou, Rui, Zhou, Hongyu, Tomizuka, Masayoshi, Li, Jiachen, Xu, Zhuo
Accurate, long-term forecasting of human pedestrian trajectories in highly dynamic and interactive scenes is a long-standing challenge. Recent advances in using data-driven approaches have achieved significant improvements in terms of prediction accuracy. However, the lack of group-aware analysis has limited the performance of forecasting models. This is especially apparent in highly populated scenes, where pedestrians are moving in groups and the interactions between groups are extremely complex and dynamic. In this paper, we present Grouptron, a multi-scale dynamic forecasting framework that leverages pedestrian group detection and utilizes individual-level, group-level, and scene-level information for better understanding and representation of the scenes. Our approach employs spatio-temporal clustering algorithms to identify pedestrian groups, creates spatio-temporal graphs at the individual, group, and scene levels. It then uses graph neural networks to encode dynamics at different scales and incorporates encoding across different scales for trajectory prediction. We carried out extensive comparisons and ablation experiments to demonstrate the effectiveness of our approach. Our method achieves 9.3% decrease in final displacement error (FDE) compared with state-of-the-art methods on ETH/UCY benchmark datasets, and 16.1% decrease in FDE in more crowded scenes where extensive human group interactions are more frequently present.
Designed to Cooperate: A Kant-Inspired Ethic of Machine-to-Machine Cooperation
This position paper highlights an ethic of machine-to-machine cooperation and machine pro-sociality, and argues that machines capable of autonomous sensing, decision-making and action, such as automated vehicles and urban robots, owned and used by different self-interested parties, and having their own agendas (or interests of their owners) should be designed and built to be cooperative in their behaviours, especially if they share public spaces. That is, by design, the machine should first cooperate, and then only consider alternatives if there are problems. It is argued that being cooperative is not only important for their improved functioning, especially, when they use shared resources (e.g., parking spaces, public roads, curbside space and walkways), but also as a favourable requirement analogous to how humans cooperating with other humans can be advantageous and often viewed favourably. The usefulness of such machine-to-machine cooperation are illustrated via examples including cooperative crowdsourcing, cooperative traffic routing and parking as well as futuristic scenarios involving urban robots for delivery and shopping. It is argued that just as privacy-by-design and security-by-design are important considerations, in order to yield systems that fulfil ethical requirements, cooperative-by-design should also be an imperative for autonomous systems that are separately owned but co-inhabit the same spaces and use common resources. If a machine using shared public spaces is not cooperative, as one might expect, then it is not only anti-social but not behaving ethically. It is also proposed that certification for urban robots that operate in public could be explored.
A User-Centred Framework for Explainable Artificial Intelligence in Human-Robot Interaction
Matarese, Marco, Rea, Francesco, Sciutti, Alessandra
State of the art Artificial Intelligence (AI) techniques have reached an impressive complexity. Consequently, researchers are discovering more and more methods to use them in real-world applications. However, the complexity of such systems requires the introduction of methods that make those transparent to the human user. The AI community is trying to overcome the problem by introducing the Explainable AI (XAI) field, which is tentative to make AI algorithms less opaque. However, in recent years, it became clearer that XAI is much more than a computer science problem: since it is about communication, XAI is also a Human-Agent Interaction problem. Moreover, AI came out of the laboratories to be used in real life. This implies the need for XAI solutions tailored to non-expert users. Hence, we propose a user-centred framework for XAI that focuses on its social-interactive aspect taking inspiration from cognitive and social sciences' theories and findings. The framework aims to provide a structure for interactive XAI solutions thought for non-expert users.