Agents
Planning in Stochastic Environments with Goal Uncertainty
Saisubramanian, Sandhya, Wray, Kyle Hollins, Pineda, Luis, Zilberstein, Shlomo
We present the Goal Uncertain Stochastic Shortest Path (GUSSP) problem -- a general framework to model path planning and decision making in stochastic environments with goal uncertainty. The framework extends the stochastic shortest path (SSP) model to dynamic environments in which it is impossible to determine the exact goal states ahead of plan execution. GUSSPs introduce flexibility in goal specification by allowing a belief over possible goal configurations. The unique observations at potential goals helps the agent identify the true goal during plan execution. The partial observability is restricted to goals, facilitating the reduction to an SSP with a modified state space. We formally define a GUSSP and discuss its theoretical properties. We then propose an admissible heuristic that reduces the planning time using FLARES -- a start-of-the-art probabilistic planner. We also propose a determinization approach for solving this class of problems. Finally, we present empirical results on a search and rescue mobile robot and three other problem domains in simulation.
DeepMind's Agent57 AI agent can best human players across a suite of 57 Atari games โ TechCrunch
Development of artificial intelligence agents tends to frequently be measured by their performance in games, but there's a good reason for that: Games tend to offer a wide proficiency curve, in terms of being relatively simple to grasp the basics, but difficult to master, and they almost always have a built-in scoring system to evaluate performance. DeepMind's agents have tackled board game Go, as well as real-time strategy video game StarCraft. But the Alphabet company's most recent feat is Agent57, a learning agent that can beat the average human on each of 57 Atari games with a wide range of difficulty, characteristics and gameplay styles. Being better than humans at 57 Atari games may seem like an odd benchmark against which to measure the performance of a deep learning agent, but it's actually a standard that goes all the way back to 2012, with a selection of Atari classics including Pitfall, Solaris, Montezuma's Revenge and many others. Taken together, these games represent a broad range of difficulty levels, as well as requiring a range of different strategies in order to achieve success.
Improving Confidence in the Estimation of Values and Norms
Siebert, Luciano Cavalcante, Mercuur, Rijk, Dignum, Virginia, Hoven, Jeroen van den, Jonker, Catholijn
Autonomous agents (AA) will increasingly be interacting with us in our daily lives. While we want the benefits attached to AAs, it is essential that their behavior is aligned with our values and norms. Hence, an AA will need to estimate the values and norms of the humans it interacts with, which is not a straightforward task when solely observing an agent's behavior. This paper analyses to what extent an AA is able to estimate the values and norms of a simulated human agent (SHA) based on its actions in the ultimatum game. We present two methods to reduce ambiguity in profiling the SHAs: one based on search space exploration and another based on counterfactual analysis. We found that both methods are able to increase the confidence in estimating human values and norms, but differ in their applicability, the latter being more efficient when the number of interactions with the agent is to be minimized. These insights are useful to improve the alignment of AAs with human values and norms.
Information State Embedding in Partially Observable Cooperative Multi-Agent Reinforcement Learning
Mao, Weichao, Zhang, Kaiqing, Miehling, Erik, Baลar, Tamer
Multi-agent reinforcement learning (MARL) under partial observability has long been considered challenging, primarily due to the requirement for each agent to maintain a belief over all other agents' local histories -- a domain that generally grows exponentially over time. In this work, we investigate a partially observable MARL problem in which agents are cooperative. To enable the development of tractable algorithms, we introduce the concept of an information state embedding that serves to compress agents' histories. We quantify how the compression error influences the resulting value functions for decentralized control. Furthermore, we propose three natural embeddings, based on finite-memory truncation, principal component analysis, and recurrent neural networks. The output of these embeddings are then used as the information state, and can be fed into any MARL algorithm. The proposed embed-then-learn pipeline opens the black-box of existing MARL algorithms, allowing us to establish some theoretical guarantees (error bounds of value functions) while still achieving competitive performance with many end-to-end approaches.
Multi-agent Reinforcement Learning for Networked System Control
Chu, Tianshu, Chinchali, Sandeep, Katti, Sachin
This paper considers multi-agent reinforcement learning (MARL) in networked system control. Specifically, each agent learns a decentralized control policy based on local observations and messages from connected neighbors. We formulate such a networked MARL (NMARL) problem as a spatiotemporal Markov decision process and introduce a spatial discount factor to stabilize the training of each local agent. Further, we propose a new differentiable communication protocol, called NeurComm, to reduce information loss and non-stationarity in NMARL. Based on experiments in realistic NMARL scenarios of adaptive traffic signal control and cooperative adaptive cruise control, an appropriate spatial discount factor effectively enhances the learning curves of non-communicative MARL algorithms, while NeurComm outperforms existing communication protocols in both learning efficiency and control performance.
Learning to cooperate: Emergent communication in multi-agent navigation
Kajiฤ, Ivana, Aygรผn, Eser, Precup, Doina
Emergent communication in artificial agents has been studied to understand language evolution, as well as to develop artificial systems that learn to communicate with humans. We show that agents performing a cooperative navigation task in various gridworld environments learn an interpretable communication protocol that enables them to efficiently, and in many cases, optimally, solve the task. An analysis of the agents' policies reveals that emergent signals spatially cluster the state space, with signals referring to specific locations and spatial directions such as "left", "up", or "upper left room". Using populations of agents, we show that the emergent protocol has basic compositional structure, thus exhibiting a core property of natural language.
Planning as Inference in Epidemiological Models
Wood, Frank, Warrington, Andrew, Naderiparizi, Saeid, Weilbach, Christian, Masrani, Vaden, Harvey, William, Scibior, Adam, Beronov, Boyan, Nasseri, Ali
In this work we demonstrate how existing software tools can be used to automate parts of infectious disease-control policy-making via performing inference in existing epidemiological dynamics models. The kind of inference tasks undertaken include computing, for planning purposes, the posterior distribution over putatively controllable, via direct policy-making choices, simulation model parameters that give rise to acceptable disease progression outcomes. Neither the full capabilities of such inference automation software tools nor their utility for planning is widely disseminated at the current time. Timely gains in understanding about these tools and how they can be used may lead to more fine-grained and less economically damaging policy prescriptions, particularly during the current COVID-19 pandemic.
Anytime and Efficient Coalition Formation with Spatial and Temporal Constraints
Capezzuto, Luca, Tarapore, Danesh, Ramchurn, Sarvapali D.
The Coalition Formation with Spatial and Temporal constraints Problem (CFSTP) is a multi-agent task allocation problem where the agents are cooperative and few, the tasks are many, spatially distributed, with deadlines and workloads, and the objective is to find a schedule that maximises the number of completed tasks. The current state-of-the-art CFSTP solver, the Coalition Formation with Look-Ahead (CFLA) algorithm, has two main limitations. First, its time complexity is quadratic with the number of tasks and exponential with the number of agents, which makes it not efficient. Second, its look-ahead technique is not effective in real-world scenarios, such as open multi-agent systems, where new tasks can appear at any time. Motivated by this, we propose an extension of CFLA, which we call Coalition Formation with Improved Look-Ahead (CFLA+). Since CFLA+ inherits the limitations of CFLA, we also develop a novel algorithm to solve the CFSTP, the first to be both anytime and efficient, which we call Cluster-based Coalition Formation (CCF). We empirically show that, in settings where the look-ahead technique is highly effective, CCF completes up to 20% (resp. 10%) more tasks than CFLA (resp. CFLA+) while being up to four orders of magnitude faster. Our results affirm CCF as the new state-of-the-art CFSTP solver.
DeepMind's Agent57 beats humans at 57 classic Atari games
In a preprint paper published this week by DeepMind, Google parent company Alphabet's U.K.-based research division, a team of scientists describe Agent57, which they say is the first system that outperforms humans on all 57 Atari games in the Arcade Learning Environment data set. Assuming the claim holds water, Agent57 could lay the groundwork for more capable AI decision-making models than have been previously released. This could be a boon for enterprises looking to boost productivity through workplace automation; imagine AI that automatically completes not only mundane, repetitive tasks like data entry, but which reasons about its environment. "With Agent57, we have succeeded in building a more generally intelligent agent that has above-human performance on all tasks in the Atari57 benchmark," wrote the study's coauthors. "Agent57 was able to scale with increasing amounts of computation: the longer it trained, the higher its score got."
Provable Sample Complexity Guarantees for Learning of Continuous-Action Graphical Games with Nonparametric Utilities
Game theory has been extensively used as a framework to model and study the strategic interactions amongst rational but selfish individual players who are trying to maximize their payoffs. Game theory has been applied in many fields including but not limited to social and political science, economics, communication, system design and computer science. In non-cooperative games each player decides its action based on the actions of others players. These games are characterized by the equilibrium solution concept such as Nash equilibrium (NE) [18] which serves a descriptive role of the stable outcome of the overall behavior of self-interested players (e.g., people, companies, governments, groups or autonomous systems) interacting strategically with each other in distributed settings. Graphical games, introduced within the AI community about two decades ago, graphical games [16], are a representation of multiplayer games which capture and exploit locality or sparsity of direct influences. They are most appropriate for large-scale population games in which the payoffs of each player are determined by the actions of only a small number of other players. Indeed, graphical games played a prominent role in establishing the computational complexity of computing NE in normal-form games as well as in succinctly representable multiplayer games (see, e.g., [5, 6, 7] and the references therein). Graphical games have been studied for both discrete and continuous actions.