Agents
Interpreting Graph Drawing with Multi-Agent Reinforcement Learning
Safarli, Ilkin, Zhou, Youjia, Wang, Bei
Applying machine learning techniques to graph drawing has become an emergent area of research in visualization. In this paper, we interpret graph drawing as a multi-agent reinforcement learning (MARL) problem. We first demonstrate that a large number of classic graph drawing algorithms, including force-directed layouts and stress majorization, can be interpreted within the framework of MARL. Using this interpretation, a node in the graph is assigned to an agent with a reward function. Via multi-agent reward maximization, we obtain an aesthetically pleasing graph layout that is comparable to the outputs of classic algorithms. The main strength of a MARL framework for graph drawing is that it not only unifies a number of classic drawing algorithms in a general formulation but also supports the creation of novel graph drawing algorithms by introducing a diverse set of reward functions.
Towards Personalized Explanation of Robotic Planning via User Feedback
Boggess, Kayla, Chen, Shenghui, Feng, Lu
Prior studies have found that providing explanations about robots' decisions and actions help to improve system transparency, increase human users' trust of robots, and enable effective human-robot collaboration. Different users have various preferences about what should be included in explanations. However, little research has been conducted for the generation of personalized explanations. In this paper, we present a system for generating personalized explanations of robotic planning via user feedback. We consider robotic planning using Markov decision processes (MDPs) and develop an algorithm to automatically generate a personalized explanation of an optimal robotic plan (i.e., an optimal MDP policy) based on the user preference regarding four elements (i.e., objective, locality, specificity, and abstraction). In addition, we design the system to interact with users via answering users' further questions about the generated explanations. Users have the option to update their preferences to view different explanations. The system is capable of detecting and resolving any preference conflict via user interaction. Our user study results show that the generated personalized explanations improve user satisfaction, while the majority of users liked the system's capabilities of question-answering, and conflict detection and resolution.
An Overview of Multi-Agent Reinforcement Learning from Game Theoretical Perspective
Following the remarkable success of the AlphaGO series, 2019 was a booming year that witnessed significant advances in multi-agent reinforcement learning (MARL) techniques. MARL corresponds to the learning problem in a multi-agent system in which multiple agents learn simultaneously. MARL is an interdisciplinary domain with a long history that includes game theory, machine learning, stochastic control, psychology, and optimisation. Although MARL has achieved considerable empirical success in solving real-world games, there is a lack of a self-contained overview in the literature that elaborates the game theoretical foundations of modern MARL methods and summarises the recent advances. In fact, the majority of existing surveys are outdated and do not fully cover the recent developments since 2010. In this work, we provide a monograph on MARL that covers both the fundamentals and the latest developments in the research frontier. The goal of our monograph is to provide a self-contained assessment of the current state-of-the-art MARL techniques from a game theoretical perspective. We expect this work to serve as a stepping stone for both new researchers who are about to enter this fast-growing domain and existing domain experts who want to obtain a panoramic view and identify new directions based on recent advances.
Experience Grounds Language
Bisk, Yonatan, Holtzman, Ari, Thomason, Jesse, Andreas, Jacob, Bengio, Yoshua, Chai, Joyce, Lapata, Mirella, Lazaridou, Angeliki, May, Jonathan, Nisnevich, Aleksandr, Pinto, Nicolas, Turian, Joseph
Language understanding research is held back by a failure to relate language to the physical world it describes and to the social interactions it facilitates. Despite the incredible effectiveness of language processing models to tackle tasks after being trained on text alone, successful linguistic communication relies on a shared experience of the world. It is this shared experience that makes utterances meaningful. Natural language processing is a diverse field, and progress throughout its development has come from new representational theories, modeling techniques, data collection paradigms, and tasks. We posit that the present success of representation learning approaches trained on large, text-only corpora requires the parallel tradition of research on the broader physical and social context of language to address the deeper questions of communication.
Computing Machinery and Knowledge
In this paper, I will examine virtue epistemology from the perspective of a nonhuman artificial intelligence (AI) agent to see whether such agent, computing machine, can be able to know things and to possess knowledge. The aim is to gain insight into what it means for a human agent to know things and to possess knowledge by comparing it to a nonhuman agent in this way. Alan Turing, one of the founding fathers of AI, wrote in his classical paper "Computing Machinery and Intelligence" (Turing, 1950) about machine intelligence and ask whether computing machinery, digital computers, could be said to think or not. The paper also covers the ability for a machine to learn things. Turing's position was that this was possible, maybe not at his time, but that it would be possible by the end of the century. Well, the end of the century has now passed and today there is a big hype around AI and specifically related to machine learning. My intention is to review selected parts from Alan Turing's paper and put it to the test against virtue epistemology to see if our progress in the field of AI has changed anything in relation to the possibility for machines to think and know things. My hypothesis, like that of Alan Turing, is that this is possible. The question is if we are there yet or if we need to wait for another end of the century!
Strategic Recourse in Linear Classification
Chen, Yatong, Wang, Jialu, Liu, Yang
In algorithmic decision making, recourse refers to individuals' ability to systematically reverse an unfavorable decision made by an algorithm. Meanwhile, individuals subjected to a classification mechanism are incentivized to behave strategically in order to gain a system's approval. However, not all strategic behavior necessarily leads to adverse results: through appropriate mechanism design, strategic behavior can induce genuine improvement in an individual's qualifications. In this paper, we explore how to design a classifier that achieves high accuracy while providing recourse to strategic individuals so as to incentivize them to improve their features in non-manipulative ways. We capture these dynamics using a two-stage game: first, the mechanism designer publishes a classifier, with the goal of optimizing classification accuracy and providing recourse to incentivize individuals' improvement. Then, agents respond by potentially modifying their input features in order to obtain a favorable decision from the classifier, while trying to minimize the cost of making such modifications. Under this model, we provide analytical results characterizing the equilibrium strategies for both the mechanism designer and the agents. Our empirical results show the effectiveness of our mechanism in three real-world datasets: compared to a baseline classifier that only considers individuals' strategic behavior without explicitly incentivizing improvement, our algorithm can provide recourse to a much higher fraction of individuals in the direction of improvement while maintaining relatively high prediction accuracy. We also show that our algorithm can effectively mitigate disparities caused by differences in manipulation costs. Our results provide insights for designing a machine learning model that focuses not only on the static distribution as of now, but also tries to encourage future improvement.
A Policy Gradient Algorithm for Learning to Learn in Multiagent Reinforcement Learning
Kim, Dong-Ki, Liu, Miao, Riemer, Matthew, Sun, Chuangchuang, Abdulhai, Marwa, Habibi, Golnaz, Lopez-Cot, Sebastian, Tesauro, Gerald, How, Jonathan P.
A fundamental challenge in multiagent reinforcement learning is to learn beneficial behaviors in a shared environment with other agents that are also simultaneously learning. In particular, each agent perceives the environment as effectively non-stationary due to the changing policies of other agents. Moreover, each agent is itself constantly learning, leading to natural nonstationarity in the distribution of experiences encountered. In this paper, we propose a novel meta-multiagent policy gradient theorem that directly accommodates for the non-stationary policy dynamics inherent to these multiagent settings. This is achieved by modeling our gradient updates to directly consider both an agent's own non-stationary policy dynamics and the non-stationary policy dynamics of other agents interacting with it in the environment. We find that our theoretically grounded approach provides a general solution to the multiagent learning problem, which inherently combines key aspects of previous state of the art approaches on this topic. We test our method on several multiagent benchmarks and demonstrate a more efficient ability to adapt to new agents as they learn than previous related approaches across the spectrum of mixed incentive, competitive, and cooperative environments.
SMARTS: Scalable Multi-Agent Reinforcement Learning Training School for Autonomous Driving
Zhou, Ming, Luo, Jun, Villella, Julian, Yang, Yaodong, Rusu, David, Miao, Jiayu, Zhang, Weinan, Alban, Montgomery, Fadakar, Iman, Chen, Zheng, Huang, Aurora Chongxi, Wen, Ying, Hassanzadeh, Kimia, Graves, Daniel, Chen, Dong, Zhu, Zhengbang, Nguyen, Nhat, Elsayed, Mohamed, Shao, Kun, Ahilan, Sanjeevan, Zhang, Baokuan, Wu, Jiannan, Fu, Zhengang, Rezaee, Kasra, Yadmellat, Peyman, Rohani, Mohsen, Nieves, Nicolas Perez, Ni, Yihan, Banijamali, Seyedershad, Rivers, Alexander Cowen, Tian, Zheng, Palenicek, Daniel, Ammar, Haitham bou, Zhang, Hongbo, Liu, Wulong, Hao, Jianye, Wang, Jun
Multi-agent interaction is a fundamental aspect of autonomous driving in the real world. Despite more than a decade of research and development, the problem of how to competently interact with diverse road users in diverse scenarios remains largely unsolved. Learning methods have much to offer towards solving this problem. But they require a realistic multi-agent simulator that generates diverse and competent driving interactions. To meet this need, we develop a dedicated simulation platform called SMARTS (Scalable Multi-Agent RL Training School). SMARTS supports the training, accumulation, and use of diverse behavior models of road users. These are in turn used to create increasingly more realistic and diverse interactions that enable deeper and broader research on multi-agent interaction. In this paper, we describe the design goals of SMARTS, explain its basic architecture and its key features, and illustrate its use through concrete multi-agent experiments on interactive scenarios. We open-source the SMARTS platform and the associated benchmark tasks and evaluation metrics to encourage and empower research on multi-agent learning for autonomous driving. Our code is available at https://github.com/huawei-noah/SMARTS.
FireCommander: An Interactive, Probabilistic Multi-agent Environment for Joint Perception-Action Tasks
Seraj, Esmaeil, Wu, Xiyang, Gombolay, Matthew
The purpose of this tutorial is to help individuals use the \underline{FireCommander} game environment for research applications. The FireCommander is an interactive, probabilistic joint perception-action reconnaissance environment in which a composite team of agents (e.g., robots) cooperate to fight dynamic, propagating firespots (e.g., targets). In FireCommander game, a team of agents must be tasked to optimally deal with a wildfire situation in an environment with propagating fire areas and some facilities such as houses, hospitals, power stations, etc. The team of agents can accomplish their mission by first sensing (e.g., estimating fire states), communicating the sensed fire-information among each other and then taking action to put the firespots out based on the sensed information (e.g., dropping water on estimated fire locations). The FireCommander environment can be useful for research topics spanning a wide range of applications from Reinforcement Learning (RL) and Learning from Demonstration (LfD), to Coordination, Psychology, Human-Robot Interaction (HRI) and Teaming. There are four important facets of the FireCommander environment that overall, create a non-trivial game: (1) Complex Objectives: Multi-objective Stochastic Environment, (2)Probabilistic Environment: Agents' actions result in probabilistic performance, (3) Hidden Targets: Partially Observable Environment and, (4) Uni-task Robots: Perception-only and Action-only agents. The FireCommander environment is first-of-its-kind in terms of including Perception-only and Action-only agents for coordination. It is a general multi-purpose game that can be useful in a variety of combinatorial optimization problems and stochastic games, such as applications of Reinforcement Learning (RL), Learning from Demonstration (LfD) and Inverse RL (iRL).
Thinking About Causation: A Causal Language with Epistemic Operators
Barbero, Fausto, Schulz, Katrin, Smets, Sonja, Velázquez-Quesada, Fernando R., Xie, Kaibo
In recent years a lot of effort has been put in the development of formal models of causal reasoning. A central motivation behind this is the importance of causal reasoning for AI. Making computers take into account causal information is currently one of the central challenges of AI research [27, 9]. There has also been tremendous progress in this direction after the earlier groundbreaking work in [23] and [28]. Advanced formal and computational tools have been developed for modelling causal reasoning and learning causal information, with applications in many different scientific areas. In this paper we want to extend this work further. The direction we want to explore is that of developing formal models of the interaction between causal and epistemic reasoning.