Agents
Artificial Intelligence for Social Good: A Survey
Shi, Zheyuan Ryan, Wang, Claire, Fang, Fei
Its impact is drastic and real: Youtube's AIdriven recommendation system would present sports videos for days if one happens to watch a live baseball game on the platform [1]; email writing becomes much faster with machine learning (ML) based auto-completion [2]; many businesses have adopted natural language processing based chatbots as part of their customer services [3]. AI has also greatly advanced human capabilities in complex decision-making processes ranging from determining how to allocate security resources to protect airports [4] to games such as poker [5] and Go [6]. All such tangible and stunning progress suggests that an "AI summer" is happening. As some put it, "AI is the new electricity" [7]. Meanwhile, in the past decade, an emerging theme in the AI research community is the so-called "AI for social good" (AI4SG): researchers aim at developing AI methods and tools to address problems at the societal level and improve the wellbeing of the society.
Incentivizing the Emergence of Grounded Discrete Communication Between General Agents
We converted the recently developed BabyAI grid world platform to a sender/receiver setup in order to test the hypothesis that established deep reinforcement learning techniques are sufficient to incentivize the emergence of a grounded discrete communication protocol between general agents. This is in contrast to previous experiments that employed straight-through estimation or tailored inductive biases. Our results show that these can indeed be avoided, by instead providing proper environmental incentives. Moreover, they show that a longer interval between communications in-centivized more abstract semantics. In some cases, the communicating agents adapted to new environments more quickly than monolithic agents, showcasing the potential of emergent discrete communication for transfer learning.
Learning Reusable Options for Multi-Task Reinforcement Learning
Garcia, Francisco M., Nota, Chris, Thomas, Philip S.
One of the main reasons why RL has worked so well in these applications is that we are able simulate millions of interactions with the environment in a relatively short period of time, allowing the agent to experience a large number of different situations in the environment and learn the consequences of its actions. In many real world applications, however, where the agent interacts with the physical world, it might not be easy to generate such a large number of interactions. The time and cost associated with training such systems could render RL an unfeasible approach for training in large scale. As a concrete example, consider training a large number of humanoid robots (agents) to move quickly, as in the Robocup competition [ Farchy et al., 2013 ] . Although the agents have similar dynamics, subtle variations mean that a single policy shared across all agents would not be an effective solution.
A Robot that Learns Connect Four Using Game Theory and Demonstrations
Teaching robots new skills using minimal time and effort has long been a goal of artificial intelligence. This paper investigates the use of game theoretic representations to represent and learn how to play interactive games such as Connect Four. We combine aspects of learning by demonstration, active learning, and game theory allowing a robot to learn by presenting its understanding of the structure of the game and conducting a question/answer session with a person. The paper demonstrates how a robot can be taught the win conditions of the game Connect Four and its variants using a single demonstration and a few trial examples with a question and answer session led by the robot. Our results show that the robot can learn any arbitrary win conditions for the Connect Four game without any prior knowledge of the win conditions and then play the game with a human utilizing the learned win conditions. Our experiments also show that some questions are more important for learning the game's win conditions.
Intelligent Roundabout Insertion using Deep Reinforcement Learning
Capasso, Alessandro Paolo, Bacchiani, Giulio, Molinari, Daniele
The study and development of autonomous vehicles have seen an increasing interest in recent years, becoming hot topics in both academia and industry. One of the main reasearch areas in this field is related to control systems, in particular planning and decision-making problems. The basic approaches for scheduling high-level maneuver execution modules are based on the concepts of time-to-collision (van der Horst and Hogema, 1994) and headway control (Hatipoglu et al., 1996). In order to add interpretation capabilities to the system, several approaches model the driving decision-making problem as a Partially Observable Markov Decision Process (POMDP, (Spaan, 2012)), as in (Liu et al., 2015) for urban scenarios and in (Song et al., 2016) for intersection handling. A further extension is proposed in (Bandyopadhyay et al., 2012) where a Mixed Observability Markov Decision Process (MOMDP) (Ong et al., 2010) is used to model uncertainties in agents intentions. However, since vehicles are assumed to behave in a deterministic way, the aforementioned approaches handle many situations with excessive prudence and would not be able to enter in a busy roundabout.
Emergent Behaviors from Folksonomy Driven Interactions
To reflect the evolving knowledge on the Web this paper considers ontologies based on folksonomies according to a new concept structure called "Folksodriven" to represent folksonomies. This paper describes a research program for studying Folksodriven tags interactions leading to Folksodriven cluster behavior. The goal of the research is to understand the type of simple local interactions which produce complex and purposive group behaviors on Folksodriven tags. We describe a synthetic, bottom-up approach to studying group behavior, consisting of designing and testing a variety of social interactions and cultural scenarios with Folksodriven tags. We propose a set of basic interactions which can be used to structure and simplify the process of both designing and analyzing emergent group behaviors. The presented behavior repertories was developed and tested on a folksonomy environment.
Improved Structural Discovery and Representation Learning of Multi-Agent Data
Hobbs, Jennifer, Holbrook, Matthew, Frank, Nathan, Sha, Long, Lucey, Patrick
Central to all machine learning algorithms is data representation. For multi-agent systems, selecting a representation which adequately captures the interactions among agents is challenging due to the latent group structure which tends to vary depending on context. However, in multi-agent systems with strong group structure, we can simultaneously learn this structure and map a set of agents to a consistently ordered representation for further learning. In this paper, we present a dynamic alignment method which provides a robust ordering of structured multi-agent data enabling representation learning to occur in a fraction of the time of previous methods. We demonstrate the value of this approach using a large amount of soccer tracking data from a professional league. The natural representation for many sources of unstructured data is intuitive to us as humans: for images, a 2D pixel representation; for speech, a spectrogram or linear filter-bank features; and for text, letters and characters. All of these possess fixed, rigid structure in space, time, or sequential ordering which are immediately amenable for further learning. For other unstructured data sources such as point clouds, semantic graphs, and multi-agent trajectories, such an initial ordered structure does not naturally exist. These data sources are set or graph-like in nature and therefore the natural representation is unordered, posing a significant challenge for many machine-learning techniques.
Towards Regulated Deep Learning
Regulation of Multi-Agent Systems (MAS) was a research topic of the past decade and one of these proposals was Electronic Institutions. However, with the recent reformulation of Artificial Neural Networks (ANN) as Deep Learning (DL), Security, Privacy, Ethical and Legal issues regarding the use of DL has raised concerns in the Artificial Intelligence (AI) Community. Now that the Regulation of MAS is almost correctly addressed, we propose the Regulation of ANN as Agent-based Training of a special type of regulated ANN that we call Institutional Neural Network. This paper introduces the former concept and provides $\mathcal{I}$, a language previously used to model and extend Electronic Institutions, as a means to implement and regulate DL.
Individual specialization in multi-task environments with multiagent reinforcement learners
Gasparrini, Marco Jerome, Solé, Ricard, Sánchez-Fibla, Martí
There is a growing interest in Multi-Agent Reinforcement Learning (MARL) as the first steps towards building general intelligent agents that learn to make low and high-level decisions in non-stationary complex environments in the presence of other agents. Previous results point us towards increased conditions for coordination, efficiency/fairness, and common-pool resource sharing. We further study coordination in multi-task environments where several rewarding tasks can be performed and thus agents don't necessarily need to perform well in all tasks, but under certain conditions may specialize. An observation derived from the study is that epsilon greedy exploration of value-based reinforcement learning methods is not adequate for multi-agent independent learners because the epsilon parameter that controls the probability of selecting a random action synchronizes the agents artificially and forces them to have deterministic policies at the same time. By using policy-based methods with independent entropy regularised exploration updates, we achieved a better and smoother convergence. Another result that needs to be further investigated is that with an increased number of agents specialization tends to be more probable.
Loss aversion fosters coordination among independent reinforcement learners
Gasparrini, Marco Jerome, Sánchez-Fibla, Martí
L OSS AVERSION FOSTERS COORDINATION AMONG INDEPENDENT REINFORCEMENT LEARNERS ARX IV VERSION Marco Jerome Gasparrini University Pompeu Fabra Barcelona, Spain Martí Sánchez-Fibla † University Pompeu Fabra Barcelona, Spain January 1, 2020 A BSTRACT We study what are the factors that can accelerate the emergence of collaborative behaviours among independent selfish learning agents. We depart from the "Battle of the Exes" (BoE), a spatial repeated game from which human behavioral data has been obtained (by Hawkings and Goldstone, 2016) that we find interesting because it considers two cases: a classic game theory version, called ballistic, in which agents can only make one action/decision (equivalent to the Battle of the Sexes) and a spatial version, called dynamic, in which agents can change decision (a spatial continuous version). We model both versions of the game with independent reinforcement learning agents and we manipulate the reward function transforming it into an utility introducing "loss aversion": the reward that an agent obtains can be perceived as less valuable when compared to what the other got. We prove experimentally the introduction of loss aversion fosters cooperation by accelerating its appearance, and by making it possible in some cases like in the dynamic condition. We suggest that this may be an important factor explaining the rapid converge of human behaviour towards collaboration reported in the experiment of Hawkings and Goldstone.