Agents
E-HBA: Using Action Policies for Expert Advice and Agent Typification
Albrecht, Stefano V., Crandall, Jacob W., Ramamoorthy, Subramanian
Past research has studied two approaches to utilise predefined policy sets in repeated interactions: as experts, to dictate our own actions, and as types, to characterise the behaviour of other agents. In this work, we bring these complementary views together in the form of a novel meta-algorithm, called Expert-HBA (E-HBA), which can be applied to any expert algorithm that considers the average (or total) payoff an expert has yielded in the past. E-HBA gradually mixes the past payoff with a predicted future payoff, which is computed using the type-based characterisation. We present results from a comprehensive set of repeated matrix games, comparing the performance of several well-known expert algorithms with and without the aid of E-HBA. Our results show that E-HBA has the potential to significantly improve the performance of expert algorithms.
Low-Variance and Zero-Variance Baselines for Extensive-Form Games
Davis, Trevor, Schmid, Martin, Bowling, Michael
Extensive-form games (EFGs) are a common model of multi-agent interactions with imperfect information. State-of-the-art algorithms for solving these games typically perform full walks of the game tree that can prove prohibitively slow in large games. Alternatively, sampling-based methods such as Monte Carlo Counterfactual Regret Minimization walk one or more trajectories through the tree, touching only a fraction of the nodes on each iteration, at the expense of requiring more iterations to converge due to the variance of sampled values. In this paper, we extend recent work that uses baseline estimates to reduce this variance. We introduce a framework of baseline-corrected values in EFGs that generalizes the previous work. Within our framework, we propose new baseline functions that result in significantly reduced variance compared to existing techniques. We show that one particular choice of such a function --- predictive baseline --- is provably optimal under certain sampling schemes. This allows for efficient computation of zero-variance value estimates even along sampled trajectories.
A Sufficient Statistic for Influence in Structured Multiagent Environments
Oliehoek, Frans A., Witwicki, Stefan, Kaelbling, Leslie P.
Making decisions in complex environments is a key challenge in artificial intelligence (AI). Situations involving multiple decision makers are particularly complex, leading to computation intractability of principled solution methods. A body of work in AI [4, 3, 41, 45, 47, 2] has tried to mitigate this problem by trying to bring down interaction to its core: how does the policy of one agent influence another agent? If we can find more compact representations of such influence, this can help us deal with the complexity, for instance by searching the space of influences rather than that of policies [45]. However, so far these notions of influence have been restricted in their applicability to special cases of interaction. In this paper we formalize influence-based abstraction (IBA), which facilitates the elimination of latent state factors without any loss in value, for a very general class of problems described as factored partially observable stochastic games (fPOSGs) [33]. This generalizes existing descriptions of influence, and thus can serve as the foundation for improvements in scalability and other insights in decision making in complex settings.
Automatic Calibration of Artificial Neural Networks for Zebrafish Collective Behaviours using a Quality Diversity Algorithm
Cazenille, Leo, Bredeche, Nicolas, Halloy, Josรฉ
During the last two decades, various models have been proposed for fish collective motion. These models are mainly developed to decipher the biological mechanisms of social interaction between animals. They consider very simple homogeneous unbounded environments and it is not clear that they can simulate accurately the collective trajectories. Moreover when the models are more accurate, the question of their scalability to either larger groups or more elaborate environments remains open. This study deals with learning how to simulate realistic collective motion of collective of zebrafish, using real-world tracking data. The objective is to devise an agent-based model that can be implemented on an artificial robotic fish that can blend into a collective of real fish. We present a novel approach that uses Quality Diversity algorithms, a class of algorithms that emphasise exploration over pure optimisation. In particular, we use CVT-MAP-Elites, a variant of the state-of-the-art MAP-Elites algorithm for high dimensional search space. Results show that Quality Diversity algorithms not only outperform classic evolutionary reinforcement learning methods at the macroscopic level (i.e. group behaviour), but are also able to generate more realistic biomimetic behaviours at the microscopic level (i.e. individual behaviour).
Today Me, Tomorrow Thee: Efficient Resource Allocation in Competitive Settings using Karma Games
Censi, Andrea, Bolognani, Saverio, Zilly, Julian G., Mousavi, Shima Sadat, Frazzoli, Emilio
We present a new type of coordination mechanism among multiple agents for the allocation of a finite resource, such as the allocation of time slots for passing an intersection. We consider the setting where we associate one counter to each agent, which we call karma value, and where there is an established mechanism to decide resource allocation based on agents exchanging karma. The idea is that agents might be inclined to pass on using resources today, in exchange for karma, which will make it easier for them to claim the resource use in the future. To understand whether such a system might work robustly, we only design the protocol and not the agents' policies. We take a game-theoretic perspective and compute policies corresponding to Nash equilibria for the game. We find, surprisingly, that the Nash equilibria for a society of self-interested agents are very close in social welfare to a centralized cooperative solution. These results suggest that many resource allocation problems can have a simple, elegant, and robust solution, assuming the availability of a karma accounting mechanism.
Comparative Evaluation of Multiagent Learning Algorithms in a Diverse Set of Ad Hoc Team Problems
Albrecht, Stefano V., Ramamoorthy, Subramanian
This paper is concerned with evaluating different multiagent learning (MAL) algorithms in problems where individual agents may be heterogenous, in the sense of utilizing different learning strategies, without the opportunity for prior agreements or information regarding coordination. Such a situation arises in ad hoc team problems, a model of many practical multiagent systems applications. Prior work in multiagent learning has often been focussed on homogeneous groups of agents, meaning that all agents were identical and a priori aware of this fact. Also, those algorithms that are specifically designed for ad hoc team problems are typically evaluated in teams of agents with fixed behaviours, as opposed to agents which are adapting their behaviours. In this work, we empirically evaluate five MAL algorithms, representing major approaches to multiagent learning but originally developed with the homogeneous setting in mind, to understand their behaviour in a set of ad hoc team problems. All teams consist of agents which are continuously adapting their behaviours. The algorithms are evaluated with respect to a comprehensive characterisation of repeated matrix games, using performance criteria that include considerations such as attainment of equilibrium, social welfare and fairness. Our main conclusion is that there is no clear winner. However, the comparative evaluation also highlights the relative strengths of different algorithms with respect to the type of performance criteria, e.g., social welfare vs. attainment of equilibrium.
Open Problems in a Logic of Gossips
Apt, Krzysztof R., Wojtczak, Dominik
Gossip protocols are programs used in a setting in which each agent holds a secret and the aim is to reach a situation in which all agents know all secrets. Such protocols rely on a point-to-point or group communication. Distributed epistemic gossip protocols use epistemic formulas in the component programs for the agents. The advantage of the use of epistemic logic is that the resulting protocols are very concise and amenable for a simple verification. Recently, we introduced a natural modal logic that allows one to express distributed epistemic gossip protocols and to reason about their correctness. We proved that the resulting protocols are implementable and that all aspects of their correctness, including termination, are decidable. To establish these results we showed that both the definition of semantics and of truth of the underlying logic are decidable. We also showed that the analogous results hold for an extension of this logic with the 'common knowledge' operator. However, several, often deceptively simple, questions about this logic and the corresponding gossip protocols remain open. The purpose of this paper is to list and elucidate these questions and provide for them an appropriate background information in the form of partial of related results.
Multi-Agent Path Finding with Capacity Constraints
Surynek, Pavel, Kumar, T. K. Satish, Koenig, Sven
In multi-agent path finding (MAPF) the task is to navigate agents from their starting positions to given individual goals. The problem takes place in an undirected graph whose vertices represent positions and edges define the topology. Agents can move to neighbor vertices across edges. In the standard MAPF, space occupation by agents is modeled by a capacity constraint that permits at most one agent per vertex. We suggest an extension of MAPF in this paper that permits more than one agent per vertex. Propositional satisfiability (SAT) models for these extensions of MAPF are studied. We focus on modeling capacity constraints in SAT-based formulations of MAPF and evaluation of performance of these models. We extend two existing SAT-based formulations with vertex capacity constraints: MDD-SAT and SMT-CBS where the former is an approach that builds the model in an eager way while the latter relies on lazy construction of the model.
Social Choice Methods for Database Aggregation
Belardinelli, Francesco, Grandi, Umberto
Knowledge can be represented compactly in multiple ways, from a set of propositional formulas, to a Kripke model, to a database. In this paper we study the aggregation of information coming from multiple sources, each source submitting a database modelled as a first-order relational structure. In the presence of integrity constraints, we identify classes of aggregators that respect them in the aggregated database, provided these are satisfied in all individual databases. We also characterise languages for first-order queries on which the answer to a query on the aggregated database coincides with the aggregation of the answers to the query obtained on each individual database. This contribution is meant to be a first step on the application of techniques from social choice theory to knowledge representation in databases.
Deep Reinforcement Learning for Autonomous Internet of Things: Model, Applications and Challenges
Lei, Lei, Tan, Yue, Liu, Shiwen, Zheng, Kan, Xuemin, null, Shen, null
The Internet of Things (IoT) extends the Internet connectivity into billions of IoT devices around the world, which collect and share information to reflect the status of physical world. The Autonomous Control System (ACS), on the other hand, performs control functions on the physical systems without external intervention over an extended period of time. The integration of IoT and ACS results in a new concept - autonomous IoT (AIoT). The sensors collect information on the system status, based on which intelligent agents in IoT devices as well as Edge/Fog/Cloud servers make control decisions for the actuators to react. In order to achieve autonomy, a promising method is for the intelligent agents to leverage the techniques in the field of artificial intelligence, especially reinforcement learning (RL) and deep reinforcement learning (DRL) for decision making. In this paper, we first provide comprehensive survey of the state-of-art research, and then propose a general model for the applications of RL/DRL in AIoT. Finally, the challenges and open issues for future research are identified.