Agents
Decomposed Soft Actor-Critic Method for Cooperative Multi-Agent Reinforcement Learning
Pu, Yuan, Wang, Shaochen, Yang, Rui, Yao, Xin, Li, Bin
Deep reinforcement learning methods have shown great performance on many challenging cooperative multi-agent tasks. Two main promising research directions are multi-agent value function decomposition and multi-agent policy gradients. In this paper, we propose a new decomposed multi-agent soft actor-critic (mSAC) method, which effectively combines the advantages of the aforementioned two methods. The main modules include decomposed Q network architecture, discrete probabilistic policy and counterfactual advantage function (optinal). Theoretically, mSAC supports efficient off-policy learning and addresses credit assignment problem partially in both discrete and continuous action spaces. Tested on StarCraft II micromanagement cooperative multiagent benchmark, we empirically investigate the performance of mSAC against its variants and analyze the effects of the different components. Experimental results demonstrate that mSAC significantly outperforms policy-based approach COMA, and achieves competitive results with SOTA value-based approach Qmix on most tasks in terms of asymptotic perfomance metric. In addition, mSAC achieves pretty good results on large action space tasks, such as 2c vs 64zg and MMM2.
Cooperative AI: machines must learn to find common ground
A huddle at the 2017 United Nations Climate Change Conference, where attendees cooperated on mutually beneficial joint actions on climate.Credit: Sean Gallup/Getty Artificial-intelligence assistants and recommendation algorithms interact with billions of people every day, influencing lives in myriad ways, yet they still have little understanding of humans. Self-driving vehicles controlled by artificial intelligence (AI) are gaining mastery of their interactions with the natural world, but they are still novices when it comes to coordinating with other cars and pedestrians or collaborating with their human operators. The state of AI applications reflects that of the research field. It has long been steeped in a kind of methodological individualism. As is evident from introductory textbooks, the canonical AI problem is that of a solitary machine confronting a non-social environment. Historically, this was a sensible starting point.
Game Plan: What AI can do for Football, and What Football can do for AI
Tuyls, Karl (deepmind) | Omidshafiei, Shayegan | Muller, Paul | Wang, Zhe | Connor, Jerome | Hennes, Daniel | Graham, Ian | Spearman, William | Waskett, Tim | Steel, Dafydd | Luc, Pauline | Recasens, Adria | Galashov, Alexandre | Thornton, Gregory | Elie, Romuald | Sprechmann, Pablo | Moreno, Pol | Cao, Kris | Garnelo, Marta | Dutta, Praneet | Valko, Michal | Heess, Nicolas | Bridgland, Alex | Pérolat, Julien | De Vylder, Bart | Eslami, S. M. Ali | Rowland, Mark | Jaegle, Andrew | Munos, Remi | Back, Trevor | Ahamed, Razia | Bouton, Simon | Beauguerlange, Nathalie | Broshear, Jackson | Graepel, Thore | Hassabis, Demis
The rapid progress in artificial intelligence (AI) and machine learning has opened unprecedented analytics possibilities in various team and individual sports, including baseball, basketball, and tennis. More recently, AI techniques have been applied to football, due to a huge increase in data collection by professional teams, increased computational power, and advances in machine learning, with the goal of better addressing new scientific challenges involved in the analysis of both individual players' and coordinated teams' behaviors. The research challenges associated with predictive and prescriptive football analytics require new developments and progress at the intersection of statistical learning, game theory, and computer vision. In this paper, we provide an overarching perspective highlighting how the combination of these fields, in particular, forms a unique microcosm for AI research, while offering mutual benefits for professional teams, spectators, and broadcasters in the years to come. We illustrate that this duality makes football analytics a game changer of tremendous value, in terms of not only changing the game of football itself, but also in terms of what this domain can mean for the field of AI. We review the state-of-the-art and exemplify the types of analysis enabled by combining the aforementioned fields, including illustrative examples of counterfactual analysis using predictive models, and the combination of game-theoretic analysis of penalty kicks with statistical learning of player attributes. We conclude by highlighting envisioned downstream impacts, including possibilities for extensions to other sports (real and virtual).
A Framework of Explanation Generation toward Reliable Autonomous Robots
Sakai, Tatsuya, Miyazawa, Kazuki, Horii, Takato, Nagai, Takayuki
To realize autonomous collaborative robots, it is important to increase the trust that users have in them. Toward this goal, this paper proposes an algorithm which endows an autonomous agent with the ability to explain the transition from the current state to the target state in a Markov decision process (MDP). According to cognitive science, to generate an explanation that is acceptable to humans, it is important to present the minimum information necessary to sufficiently understand an event. To meet this requirement, this study proposes a framework for identifying important elements in the decision-making process using a prediction model for the world and generating explanations based on these elements. To verify the ability of the proposed method to generate explanations, we conducted an experiment using a grid environment. It was inferred from the result of a simulation experiment that the explanation generated using the proposed method was composed of the minimum elements important for understanding the transition from the current state to the target state. Furthermore, subject experiments showed that the generated explanation was a good summary of the process of state transition, and that a high evaluation was obtained for the explanation of the reason for an action.
Informational Design of Dynamic Multi-Agent System
This work considers a novel information design problem and studies how the craft of payoff-relevant environmental signals solely can influence the behaviors of intelligent agents. The agents' strategic interactions are captured by an incomplete-information Markov game, in which each agent first selects one environmental signal from multiple signal sources as additional payoff-relevant information and then takes an action. There is a rational information designer (principal) who possesses one signal source and aims to control the equilibrium behaviors of the agents by designing the information structure of her signals sent to the agents. An obedient principle is established which states that it is without loss of generality to focus on the direct information design when the information design incentivizes each agent to select the signal sent by the principal, such that the design process avoids the predictions of the agents' strategic selection behaviors. Based on the obedient principle, we introduce the design protocol given a goal of the principal referred to as obedient implementability (OIL) and study a Myersonian information design that characterizes the OIL in a class of obedient sequential Markov perfect Bayesian equilibria (O-SMPBE). A framework is proposed based on an approach which we refer to as the fixed-point alignment that incentivizes the agents to choose the signal sent by the principal, makes sure that the agents' policy profile of taking actions is the policy component of an O-SMPBE, and the principal's goal is achieved. The proposed approach can be applied to elicit desired behaviors of multi-agent systems in competing as well as cooperating settings and be extended to heterogeneous stochastic games in the complete- and the incomplete-information environments.
Explainable Autonomous Robots: A Survey and Perspective
Sakai, Tatsuya, Nagai, Takayuki
It is commonly claimed that AI will replace most manual labor in the future; however, is this really the case? AI technologies do have higher image recognition accuracy compared to humans in some limited contexts, and have consistently outperformed humans in classical games such as Go and chess. Nonetheless, we believe that even advanced future developments based on current technology will not lead to robots replacing humans. AI systems' fundamental lack of ability to communicate naturally and effectively with humans is among the most significant reasons that they cannot replace human labor. Here, one may believe that such communication could be achieved via the development of natural language processing (NLP) technology [4]; however, NLP technologies are systems for estimating the content of human statements and their meanings; they do not constitute communication. That is, humans do not feel that robots using such systems truly understand and respond to them appropriately. Therefore, if effective communication is not achieved, robots will continue to function only as tools to assist humans. Advancements improving the accuracy or effectiveness of various specific tasks do not indicate that robots are equivalent to human beings. Under this scenario, how can we enable robots to communicate with humans?
SUTD wins best paper at 35th AAAI conference on Artificial Intelligence 2021
Game theory is known to be a useful tool in the study of Machine Learning (ML) and Artificial Intelligence (AI) Multi-Agent interactions. One basic component of these ML and AI systems is the exploration-exploitation trade-off, a fundamental dilemma between taking a risk with new actions in the quest for more information about the environment (exploration) and repeatedly selecting actions that result in the current maximum reward or (exploitation). However, the outcome of the exploration-exploitation process is often unpredictable in practice and the reasons behind its volatile performance have been a long-standing open question in the ML and AI communities. Dr Stefanos Leonardos and Assistant Professor Georgios Piliouras, researchers from the Singapore University of Technology and Design (SUTD), applied analytical tools from the theory of dynamical systems in the study of multi-agent systems and established a deep connection between exploration-exploitation and Catastrophe Theory (Figures 1 and 2). The latter is a branch of mathematics that formally explains phase transitions in all kinds of natural systems ranging from the transition from water to ice and disease outbreaks to collapses of financial markets.
Explainable Artificial Intelligence for Human Decision-Support System in Medical Domain
Knapič, Samanta, Malhi, Avleen, Salujaa, Rohit, Främling, Kary
In the present paper we present the potential of Explainable Artificial Intelligence methods for decision-support in medical image analysis scenarios. With three types of explainable methods applied to the same medical image data set our aim was to improve the comprehensibility of the decisions provided by the Convolutional Neural Network (CNN). The visual explanations were provided on in-vivo gastral images obtained from a Video capsule endoscopy (VCE), with the goal of increasing the health professionals' trust in the black box predictions. We implemented two post-hoc interpretable machine learning methods LIME and SHAP and the alternative explanation approach CIU, centered on the Contextual Value and Utility (CIU). The produced explanations were evaluated using human evaluation. We conducted three user studies based on the explanations provided by LIME, SHAP and CIU. Users from different non-medical backgrounds carried out a series of tests in the web-based survey setting and stated their experience and understanding of the given explanations. Three user groups (n=20, 20, 20) with three distinct forms of explanations were quantitatively analyzed. We have found that, as hypothesized, the CIU explainable method performed better than both LIME and SHAP methods in terms of increasing support for human decision-making as well as being more transparent and thus understandable to users. Additionally, CIU outperformed LIME and SHAP by generating explanations more rapidly. Our findings suggest that there are notable differences in human decision-making between various explanation support settings. In line with that, we present three potential explainable methods that can with future improvements in implementation be generalized on different medical data sets and can provide great decision-support for medical experts.
Pervasive AI for IoT Applications: Resource-efficient Distributed Artificial Intelligence
Baccour, Emna, Mhaisen, Naram, Abdellatif, Alaa Awad, Erbad, Aiman, Mohamed, Amr, Hamdi, Mounir, Guizani, Mohsen
Artificial intelligence (AI) has witnessed a substantial breakthrough in a variety of Internet of Things (IoT) applications and services, spanning from recommendation systems to robotics control and military surveillance. This is driven by the easier access to sensory data and the enormous scale of pervasive/ubiquitous devices that generate zettabytes (ZB) of real-time data streams. Designing accurate models using such data streams, to predict future insights and revolutionize the decision-taking process, inaugurates pervasive systems as a worthy paradigm for a better quality-of-life. The confluence of pervasive computing and artificial intelligence, Pervasive AI, expanded the role of ubiquitous IoT systems from mainly data collection to executing distributed computations with a promising alternative to centralized learning, presenting various challenges. In this context, a wise cooperation and resource scheduling should be envisaged among IoT devices (e.g., smartphones, smart vehicles) and infrastructure (e.g. edge nodes, and base stations) to avoid communication and computation overheads and ensure maximum performance. In this paper, we conduct a comprehensive survey of the recent techniques developed to overcome these resource challenges in pervasive AI systems. Specifically, we first present an overview of the pervasive computing, its architecture, and its intersection with artificial intelligence. We then review the background, applications and performance metrics of AI, particularly Deep Learning (DL) and online learning, running in a ubiquitous system. Next, we provide a deep literature review of communication-efficient techniques, from both algorithmic and system perspectives, of distributed inference, training and online learning tasks across the combination of IoT devices, edge devices and cloud servers. Finally, we discuss our future vision and research challenges.
Calibration of Human Driving Behavior and Preference Using Naturalistic Traffic Data
Dai, Qi, Shen, Di, Wang, Jinhong, Huang, Suzhou, Filev, Dimitar
Understanding human driving behaviors quantitatively is critical even in the era when connected and autonomous vehicles and smart infrastructure are becoming ever more prevalent. This is particularly so as that mixed traffic settings, where autonomous vehicles and human driven vehicles co-exist, are expected to persist for quite some time. Towards this end it is necessary that we have a comprehensive modeling framework for decision-making within which human driving preferences can be inferred statistically from observed driving behaviors in realistic and naturalistic traffic settings. Leveraging a recently proposed computational framework for smart vehicles in a smart world using multi-agent based simulation and optimization, we first recapitulate how the forward problem of driving decision-making is modeled as a state space model. We then show how the model can be inverted to estimate driver preferences from naturalistic traffic data using the standard Kalman filter technique. We explicitly illustrate our approach using the vehicle trajectory data from Sugiyama experiment that was originally meant to demonstrate how stop-and-go shockwave can arise spontaneously without bottlenecks. Not only the estimated state filter can fit the observed data well for each individual vehicle, the inferred utility functions can also re-produce quantitatively similar pattern of the observed collective behaviors. One distinct advantage of our approach is the drastically reduced computational burden. This is possible because our forward model treats driving decision process, which is intrinsically dynamic with multi-agent interactions, as a sequence of independent static optimization problems contingent on the state with a finite look ahead anticipation. Consequently we can practically sidestep solving an interacting dynamic inversion problem that would have been much more computationally demanding.