Agents
When Do Envy-Free Allocations Exist?
Manurangsi, Pasin, Suksompong, Warut
We consider a fair division setting in which $m$ indivisible items are to be allocated among $n$ agents, where the agents have additive utilities and the agents' utilities for individual items are independently sampled from a distribution. Previous work has shown that an envy-free allocation is likely to exist when $m=\Omega(n\log n)$ but not when $m=n+o(n)$, and left open the question of determining where the phase transition from non-existence to existence occurs. We show that, surprisingly, there is in fact no universal point of transition---instead, the transition is governed by the divisibility relation between $m$ and $n$. On the one hand, if $m$ is divisible by $n$, an envy-free allocation exists with high probability as long as $m\geq 2n$. On the other hand, if $m$ is not "almost" divisible by $n$, an envy-free allocation is unlikely to exist even when $m=\Theta(n\log n/\log\log n)$.
Blameworthiness in Games with Imperfect Information
Blameworthiness of an agent or a coalition of agents is often defined in terms of the principle of alternative possibilities: for the coalition to be responsible for an outcome, the outcome must take place and the coalition should have had a strategy to prevent it. In this paper we argue that in the settings with imperfect information, not only should the coalition have had a strategy, but it also should have known that it had a strategy, and it should have known what the strategy was. The main technical result of the paper is a sound and complete bimodal logic that describes the interplay between knowledge and blameworthiness in strategic games with imperfect information.
Multi-Agent Common Knowledge Reinforcement Learning
Foerster, Jakob N., de Witt, Christian A. Schroeder, Farquhar, Gregory, Torr, Philip H. S., Boehmer, Wendelin, Whiteson, Shimon
In multi-agent reinforcement learning, centralised policies can only be executed if agents have access to either the global state or an instantaneous communication channel. An alternative approach that circumvents this limitation is to use centralised training of a set of decentralised policies. However, such policies severely limit the agents' ability to coordinate. We propose multi-agent common knowledge reinforcement learning (MACKRL), which strikes a middle ground between these two extremes. Our approach is based on the insight that, even in partially observable settings, subsets of agents often have some common knowledge that they can exploit to coordinate their behaviour. Common knowledge can arise, e.g., if all agents can reliably observe things in their own field of view and know the field of view of other agents. Using this additional information, it is possible to find a centralised policy that conditions only on agents' common knowledge and that can be executed in a decentralised fashion. A resulting challenge is then to determine at what level agents should coordinate. While the common knowledge shared among all agents may not contain much valuable information, there may be subgroups of agents that share common knowledge useful for coordination. MACKRL addresses this challenge using a hierarchical approach: at each level, a controller can either select a joint action for the agents in a given subgroup, or propose a partition of the agents into smaller subgroups whose actions are then selected by controllers at the next level. While action selection involves sampling hierarchically, learning updates are based on the probability of the joint action, calculated by marginalising across the possible decisions of the hierarchy. We show promising results on both a proof-of-concept matrix game and a multi-agent version of StarCraft II Micromanagement.
Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning
Foerster, Jakob N., Song, Francis, Hughes, Edward, Burch, Neil, Dunning, Iain, Whiteson, Shimon, Botvinick, Matthew, Bowling, Michael
When observing the actions of others, humans carry out inferences about why the others acted as they did, and what this implies about their view of the world. Humans also use the fact that their actions will be interpreted in this manner when observed by others, allowing them to act informatively and thereby communicate efficiently with others. Although learning algorithms have recently achieved superhuman performance in a number of two-player, zero-sum games, scalable multi-agent reinforcement learning algorithms that can discover effective strategies and conventions in complex, partially observable settings have proven elusive. We present the Bayesian action decoder (BAD), a new multi-agent learning method that uses an approximate Bayesian update to obtain a public belief that conditions on the actions taken by all agents in the environment. Together with the public belief, this Bayesian update effectively defines a new Markov decision process, the public belief MDP, in which the action space consists of deterministic partial policies, parameterised by deep neural networks, that can be sampled for a given public state. It exploits the fact that an agent acting only on this public belief state can still learn to use its private information if the action space is augmented to be over partial policies mapping private information into environment actions. The Bayesian update is also closely related to the theory of mind reasoning that humans carry out when observing others' actions. We first validate BAD on a proof-of-principle two-step matrix game, where it outperforms traditional policy gradient methods. We then evaluate BAD on the challenging, cooperative partial-information card game Hanabi, where in the two-player setting the method surpasses all previously published learning and hand-coded approaches.
Legible Normativity for AI Alignment: The Value of Silly Rules
Hadfield-Menell, Dylan, Andrus, McKane, Hadfield, Gillian K.
It has become commonplace to assert that autonomous agents will have to be built to follow human rules of behavior-social norms and laws. But human laws and norms are complex and culturally varied systems; in many cases agents will have to learn the rules. This requires autonomous agents to have models of how human rule systems work so that they can make reliable predictions about rules. In this paper we contribute to the building of such models by analyzing an overlooked distinction between important rules and what we call silly rules --rules with no discernible direct impact on welfare. We show that silly rules render a normative system both more robust and more adaptable in response to shocks to perceived stability. They make normativity more legible for humans, and can increase legibility for AI systems as well. For AI systems to integrate into human normative systems, we suggest, it may be important for them to have models that include representations of silly rules.
Confiding in and Listening to Virtual Agents: The Effect of Personality
Li, Jingyi, Zhou, Michelle X., Yang, Huahai, Mark, Gloria
We present an intelligent virtual interviewer that engages with a user in a text-based conversation and automatically infers the user's psychological traits, such as personality. We investigate how the personality of a virtual interviewer influences a user's behavior from two perspectives: the user's willingness to confide in, and listen to, a virtual interviewer. We have developed two virtual interviewers with distinct personalities and deployed them in a real-world recruiting event. We present findings from completed interviews with 316 actual job applicants. Notably, users are more willing to confide in and listen to a virtual interviewer with a serious, assertive personality. Moreover, users' personality traits, inferred from their chat text, influence their perception of a virtual interviewer, and their willingness to confide in and listen to a virtual interviewer. Finally, we discuss the implications of our work on building hyper-personalized, intelligent agents based on user traits.
META-DES.Oracle: Meta-learning and feature selection for ensemble selection
Cruz, Rafael M. O, Sabourin, Robert, Cavalcanti, George D. C.
The key issue in Dynamic Ensemble Selection (DES) is defining a suitable criterion for calculating the classifiers' competence. There are several criteria available to measure the level of competence of base classifiers, such as local accuracy estimates and ranking. However, using only one criterion may lead to a poor estimation of the classifier's competence. In order to deal with this issue, we have proposed a novel dynamic ensemble selection framework using meta-learning, called META-DES. An important aspect of the META-DES framework is that multiple criteria can be embedded in the system encoded as different sets of meta-features. However, some DES criteria are not suitable for every classification problem. For instance, local accuracy estimates may produce poor results when there is a high degree of overlap between the classes. Moreover, a higher classification accuracy can be obtained if the performance of the meta-classifier is optimized for the corresponding data. In this paper, we propose a novel version of the META-DES framework based on the formal definition of the Oracle, called META-DES.Oracle. The Oracle is an abstract method that represents an ideal classifier selection scheme. A meta-feature selection scheme using an overfitting cautious Binary Particle Swarm Optimization (BPSO) is proposed for improving the performance of the meta-classifier. The difference between the outputs obtained by the meta-classifier and those presented by the Oracle is minimized. Thus, the meta-classifier is expected to obtain results that are similar to the Oracle. Experiments carried out using 30 classification problems demonstrate that the optimization procedure based on the Oracle definition leads to a significant improvement in classification accuracy when compared to previous versions of the META-DES framework and other state-of-the-art DES techniques.
Privacy Preserving Multi-Agent Planning with Provable Guarantees
Beimel, Amos, Brafman, Ronen I.
In privacy-preserving multi-agent planning, a group of agents attempt to cooperatively solve a multi-agent planning problem while maintaining private their data and actions. Although much work was carried out in this area in past years, its theoretical foundations have not been fully worked out. Specifically, although algorithms with precise privacy guarantees exist, even their most efficient implementations are not fast enough on realistic instances, whereas for practical algorithms no meaningful privacy guarantees exist. Secure-MAFS, a variant of the multi-agent forward search algorithm (MAFS) is the only practical algorithm to attempt to offer more precise guarantees, but only in very limited settings and with proof sketches only. In this paper we formulate a precise notion of secure computation for search-based algorithms and prove that Secure MAFS has this property in all domains.
Efficient Large-Scale Fleet Management via Multi-Agent Deep Reinforcement Learning
Lin, Kaixiang, Zhao, Renyu, Xu, Zhe, Zhou, Jiayu
Large-scale online ride-sharing platforms have substantially transformed our lives by reallocating transportation resources to alleviate traffic congestion and promote transportation efficiency. An efficient fleet management strategy not only can significantly improve the utilization of transportation resources but also increase the revenue and customer satisfaction. It is a challenging task to design an effective fleet management strategy that can adapt to an environment involving complex dynamics between demand and supply. Existing studies usually work on a simplified problem setting that can hardly capture the complicated stochastic demand-supply variations in high-dimensional space. In this paper we propose to tackle the large-scale fleet management problem using reinforcement learning, and propose a contextual multi-agent reinforcement learning framework including two concrete algorithms, namely contextual deep Q-learning and contextual multi-agent actor-critic, to achieve explicit coordination among a large number of agents adaptive to different contexts. We show significant improvements of the proposed framework over state-of-the-art approaches through extensive empirical studies.
Scientists create an AI system that 'mimics human religiosity'
Religious conflict can occur when a group of people who share the same belief or religion feel threatened by another expanding group. This xenophobia then becomes violent in 20 per cent of cases when people start to question their own beliefs and ideals. An international team of resarchers used artificial intelligence to create a virtual world with various'agents' that interacted with one another. This is the first time this form of AI has been used in research and the scientists claim the findings can be used to support governments address and prevent terrorism and social conflict. The software is a psychologically realistic model of a human that mimics how we think and identify with particular groups.