Agents
Swarm Programming Using Moth-Flame Optimization and Whale Optimization Algorithms
Automatic programming (AP) is an important area of Machine Learning (ML) where computer programs are generated automatically. Swarm Programming (SP), a newly emerging research area in AP, automatically generates the computer programs using Swarm Intelligence (SI) algorithms. This paper presents two grammar-based SP methods named as Grammatical Moth-Flame Optimizer (GMFO) and Grammatical Whale Optimizer (GWO). The Moth-Flame Optimizer and Whale Optimization algorithm are used as search engines or learning algorithms in GMFO and GWO respectively. The proposed methods are tested on Santa Fe Ant Trail, quartic symbolic regression, and 3-input multiplexer problems. The results are compared with Grammatical Bee Colony (GBC) and Grammatical Fireworks algorithm (GFWA). The experimental results demonstrate that the proposed SP methods can be used in automatic computer program generation.
Probabilistic Serial Mechanism for Multi-Type Resource Allocation
Guo, Xiaoxi, Sikdar, Sujoy, Wang, Haibin, Xia, Lirong, Cao, Yongzhi, Wang, Hanpin
In multi-type resource allocation (MTRA) problems, there are p $\ge$ 2 types of items, and n agents, who each demand one unit of items of each type, and have strict linear preferences over bundles consisting of one item of each type. For MTRAs with indivisible items, our first result is an impossibility theorem that is in direct contrast to the single type (p = 1) setting: No mechanism, the output of which is always decomposable into a probability distribution over discrete assignments (where no item is split between agents), can satisfy both sd-efficiency and sd-envy-freeness. To circumvent this impossibility result, we consider the natural assumption of lexicographic preference, and provide an extension of the probabilistic serial (PS), called lexicographic probabilistic serial (LexiPS).We prove that LexiPS satisfies sd-efficiency and sd-envy-freeness, retaining the desirable properties of PS. Moreover, LexiPS satisfies sd-weak-strategyproofness when agents are not allowed to misreport their importance orders. For MTRAs with divisible items, we show that the existing multi-type probabilistic serial (MPS) mechanism satisfies the stronger efficiency notion of lexi-efficiency, and is sd-envy-free under strict linear preferences, and sd-weak-strategyproof under lexicographic preferences. We also prove that MPS can be characterized both by leximin-ptimality and by item-wise ordinal fairness, and the family of eating algorithms which MPS belongs to can be characterized by no-generalized-cycle condition.
The Variational Bandwidth Bottleneck: Stochastic Evaluation on an Information Budget
Goyal, Anirudh, Bengio, Yoshua, Botvinick, Matthew, Levine, Sergey
In many applications, it is desirable to extract only the relevant information from complex input data, which involves making a decision about which input features are relevant. The information bottleneck method formalizes this as an information-theoretic optimization problem by maintaining an optimal tradeoff between compression (throwing away irrelevant input information), and predicting the target. In many problem settings, including the reinforcement learning problems we consider in this work, we might prefer to compress only part of the input. This is typically the case when we have a standard conditioning input, such as a state observation, and a "privileged" input, which might correspond to the goal of a task, the output of a costly planning algorithm, or communication with another agent. In such cases, we might prefer to compress the privileged input, either to achieve better generalization (e.g., with respect to goals) or to minimize access to costly information (e.g., in the case of communication). Practical implementations of the information bottleneck based on variational inference require access to the privileged input in order to compute the bottleneck variable, so although they perform compression, this compression operation itself needs unrestricted, lossless access. In this work, we propose the variational bandwidth bottleneck, which decides for each example on the estimated value of the privileged information before seeing it, i.e., only based on the standard input, and then accordingly chooses stochastically, whether to access the privileged input or not. We formulate a tractable approximation to this framework and demonstrate in a series of reinforcement learning experiments that it can improve generalization and reduce access to computationally costly information.
Impact of different belief facets on agents' decision -- a refined cognitive architecture
Sedigh, Amir Hosein Afshar, Purvis, Martin K., Savarimuthu, Bastin Tony Roy, Frantz, Christopher K, Purvis, Maryam A.
This paper presents a conceptual refinement of agent cognitive architecture inspired from the beliefs-desires-intentions (BDI) and the theory of planned behaviour (TPB) models, with an emphasis on different belief facets. This enables us to investigate the impact of personality and the way that an agent weights its internal beliefs and social sanctions on an agent's actions. The study also uses the concept of cognitive dissonance associated with the fairness of institutions to investigate the agents' behaviour. To showcase our model, we simulate two historical long-distance trading societies, namely Armenian merchants of New-Julfa and the English East India Company. The results demonstrate the importance of internal beliefs of agents as a pivotal aspect for following institutional rules.
Learning Attentional Communication for Multi-Agent Cooperation
Communication could potentially be an effective way for multi-agent cooperation. However, information sharing among all agents or in predefined communication architectures that existing methods adopt can be problematic. When there is a large number of agents, agents cannot differentiate valuable information that helps cooperative decision making from globally shared information. Therefore, communication barely helps, and could even impair the learning of multi-agent cooperation. Predefined communication architectures, on the other hand, restrict communication among agents and thus restrain potential cooperation.
Tension Space Analysis for Emergent Narrative
Kybartas, Ben, Verbrugge, Clark, Lessard, Jonathan
Emergent narratives provide a unique and compelling approach to interactive storytelling through simulation, and have applications in games, narrative generation, and virtual agents. However the inherent complexity of simulation makes understanding the expressive potential of emergent narratives difficult, particularly at the design phase of development. In this paper, we present a novel approach to emergent narrative using the narratological theory of possible worlds and demonstrate how the design of works in such a system can be understood through a formal means of analysis inspired by expressive range analysis. Lastly, we propose a novel way through which content may be authored for the emergent narrative system using a sketch-based interface.
Real World Games Look Like Spinning Tops
Czarnecki, Wojciech Marian, Gidel, Gauthier, Tracey, Brendan, Tuyls, Karl, Omidshafiei, Shayegan, Balduzzi, David, Jaderberg, Max
This paper investigates the geometrical properties of real world games (e.g. Tic-Tac-Toe, Go, StarCraft II). We hypothesise that their geometrical structure resemble a spinning top, with the upright axis representing transitive strength, and the radial axis, which corresponds to the number of cycles that exist at a particular transitive strength, representing the non-transitive dimension. We prove the existence of this geometry for a wide class of real world games, exposing their temporal nature. Additionally, we show that this unique structure also has consequences for learning - it clarifies why populations of strategies are necessary for training of agents, and how population size relates to the structure of the game. Finally, we empirically validate these claims by using a selection of nine real world two-player zero-sum symmetric games, showing 1) the spinning top structure is revealed and can be easily re-constructed by using a new method of Nash clustering to measure the interaction between transitive and cyclical strategy behaviour, and 2) the effect that population size has on the convergence in these games.
The future of AI and autonomous systems Federal News Network
Best listening experience is on Chrome, Firefox or Safari. This week on Fed Access, Yuna Huh Wong, a research analyst at the Rand Corporation, joins host Derrick Dortch to discuss how artificial intelligence and the use of autonomous unmanned systems could impact future military crises and conflicts around the world. Wong is one of the co-authors of a Rand report titled: "Deterrence in the Age of Thinking Machines", which examines the role these systems would play in the decision making process of military leaders. She discusses the report's findings including whether the increased use of these systems could lead to inadvertent escalations in hostilities or serve as a deterrent. Insight by CenturyLink: GSA, Export-Import Bank and National Science Foundation address modernizing federal networks in this free webinar.
Mechanism Design with Bandit Feedback
Kandasamy, Kirthevasan, Gonzalez, Joseph E., Jordan, Michael I., Stoica, Ion
We study a multi-round welfare-maximising mechanism design problem, where, on each round, a mechanism assigns an allocation each to a set of agents and charges them a price. Then the agents report their realised (stochastic) values back to the mechanism. This is motivated by applications in cloud markets and online advertising where an agent may know her value for an allocation only after experiencing it. The distribution of these values is unknown to the agent beforehand which necessitates learning them over multiple rounds while simultaneously attempting to find the socially optimal set of allocations. Our focus is on designing truthful and individually rational mechanisms which imitate the classical VCG mechanism in the long run. To that end, we define three notions of regret for the welfare, the individual utilities of each agent (value minus price) and that of the mechanism (revenue minus cost). We show that these three terms are interdependent via an $\Omega(T^{2/3})$ lower bound for the maximum of these three terms after $T$ rounds of allocations. We describe a family of anytime algorithms which achieve this rate. The proposed framework provides flexibility to control the pricing scheme so as to trade-off between the agent and seller regrets, and additionally to control the degree of truthfulness and individual rationality.
Intention Propagation for Multi-agent Reinforcement Learning
Qu, Chao, Li, Hui, Liu, Chang, Xiong, Junwu, Zhang, James, Chu, Wei, Qi, Yuan, Song, Le
Collaborative multi-agent reinforcement learning is an important sub-field of the multiagent reinforcement learning (MARL), where the agents learn to coordinate to achieve joint success. It has wide applications in traffic control [Kuyer et al., 2008], autonomous driving [Shalev-Shwartz et al., 2016] and smart grid [Yang et al., 2018]. To learn a coordination, the interactions between agents are indispensable. For instance, humans can reason about other's behaviors or know other peoples' intentions through communication and then determine an effective coordination plan. However, how to design a mechanism of such interaction in a principled way and at the same time solve the large scale real-world applications is still a challenging problem. Recently, there is a surge of interest in solving the collaborative MARL problem [Foerster et al., 2018, Qu et al., 2019, Lowe et al., 2017]. Among them, joint policy approaches have demonstrated their superiority [Rashid et al., 2018, Sunehag et al., 2018, Oliehoek et al., 2016]. A straightforward approach is to replace the action in the single-agent reinforcement learning by the joint action a (a 1, a 2,..., a N), while it obviously suffers from the issue of the exponentially large action space.