Goto

Collaborating Authors

 Agent Societies


Emergence of Pragmatics from Referential Game between Theory of Mind Agents

arXiv.org Artificial Intelligence

Pragmatics studies how context can contribute to language meanings [1]. In human communication, language is never interpreted out of context, and sentences can usually convey more information than their literal meanings [2]. However, this mechanism is missing in most multi-agent systems [3, 4, 5, 6], restricting the communication efficiency and the capability of human-agent interaction. In this paper, we propose an algorithm, using which agents can spontaneously learn the ability to "read between lines" without any explicit hand-designed rules. We integrate the theory of mind (ToM) [7, 8] in a cooperative multi-agent pedagogical situation and propose an adaptive reinforcement learning (RL) algorithm to develop a communication protocol. ToM is a profound cognitive science concept, claiming that people regularly reason about other's mental states, including beliefs, goals, and intentions, to obtain performance advantage in competition, cooperation or coalition. With this ability, agents consider language as not only messages but also rational acts reflecting others' hidden states. Our experiments demonstrate the advantage of pragmatic protocols over non-pragmatic protocols. We also show the teaching complexity following the pragmatic protocol empirically approximates to recursive teaching dimension (RTD).


Adversarially Guided Self-Play for Adopting Social Conventions

arXiv.org Artificial Intelligence

Robotic agents must adopt existing social conventions in order to be effective teammates. These social conventions, such as driving on the right or left side of the road, are arbitrary choices among optimal policies, but all agents on a successful team must use the same convention. Prior work has identified a method of combining self-play with paired input-output data gathered from existing agents in order to learn their social convention without interacting with them. We build upon this work by introducing a technique called Adversarial Self-Play (ASP) that uses adversarial training to shape the space of possible learned policies and substantially improves learning efficiency. ASP only requires the addition of unpaired data: a dataset of outputs produced by the social convention without associated inputs. Theoretical analysis reveals how ASP shapes the policy space and the circumstances (when behaviors are clustered or exhibit some other structure) under which it offers the greatest benefits. Empirical results across three domains confirm ASP's advantages: it produces models that more closely match the desired social convention when given as few as two paired datapoints.


Model-based Multi-Agent Reinforcement Learning with Cooperative Prioritized Sweeping

arXiv.org Artificial Intelligence

We present a new model-based reinforcement learning algorithm, Cooperative Prioritized Sweeping, for efficient learning in multi-agent Markov decision processes. The algorithm allows for sample-efficient learning on large problems by exploiting a factorization to approximate the value function. Our approach only requires knowledge about the structure of the problem in the form of a dynamic decision network. Using this information, our method learns a model of the environment and performs temporal difference updates which affect multiple joint states and actions at once. Batch updates are additionally performed which efficiently back-propagate knowledge throughout the factored Q-function. Our method outperforms the state-of-the-art algorithm sparse cooperative Q-learning algorithm, both on the well-known SysAdmin benchmark and randomized environments.


Emergent Behaviors from Folksonomy Driven Interactions

arXiv.org Artificial Intelligence

To reflect the evolving knowledge on the Web this paper considers ontologies based on folksonomies according to a new concept structure called "Folksodriven" to represent folksonomies. This paper describes a research program for studying Folksodriven tags interactions leading to Folksodriven cluster behavior. The goal of the research is to understand the type of simple local interactions which produce complex and purposive group behaviors on Folksodriven tags. We describe a synthetic, bottom-up approach to studying group behavior, consisting of designing and testing a variety of social interactions and cultural scenarios with Folksodriven tags. We propose a set of basic interactions which can be used to structure and simplify the process of both designing and analyzing emergent group behaviors. The presented behavior repertories was developed and tested on a folksonomy environment.


Improved Structural Discovery and Representation Learning of Multi-Agent Data

arXiv.org Machine Learning

Central to all machine learning algorithms is data representation. For multi-agent systems, selecting a representation which adequately captures the interactions among agents is challenging due to the latent group structure which tends to vary depending on context. However, in multi-agent systems with strong group structure, we can simultaneously learn this structure and map a set of agents to a consistently ordered representation for further learning. In this paper, we present a dynamic alignment method which provides a robust ordering of structured multi-agent data enabling representation learning to occur in a fraction of the time of previous methods. We demonstrate the value of this approach using a large amount of soccer tracking data from a professional league. The natural representation for many sources of unstructured data is intuitive to us as humans: for images, a 2D pixel representation; for speech, a spectrogram or linear filter-bank features; and for text, letters and characters. All of these possess fixed, rigid structure in space, time, or sequential ordering which are immediately amenable for further learning. For other unstructured data sources such as point clouds, semantic graphs, and multi-agent trajectories, such an initial ordered structure does not naturally exist. These data sources are set or graph-like in nature and therefore the natural representation is unordered, posing a significant challenge for many machine-learning techniques.


Data-driven Discovery of Emergent Behaviors in Collective Dynamics

arXiv.org Machine Learning

Particle- and agent-based systems are a ubiquitous modeling tool in many disciplines. We consider the fundamental problem of inferring interaction kernels from observations of agent-based dynamical systems given observations of trajectories, in particular for collective dynamical systems exhibiting emergent behaviors with complicated interaction kernels, in a nonparametric fashion, and for kernels which are parametrized by a single unknown parameter. We extend the estimators introduced in \cite{PNASLU}, which are based on suitably regularized least squares estimators, to these larger classes of systems. We provide extensive numerical evidence that the estimators provide faithful approximations to the interaction kernels, and provide accurate predictions for trajectories started at new initial conditions, both throughout the ``training'' time interval in which the observations were made, and often much beyond. We demonstrate these features on prototypical systems displaying collective behaviors, ranging from opinion dynamics, flocking dynamics, self-propelling particle dynamics, synchronized oscillator dynamics, and a gravitational system. Our experiments also suggest that our estimated systems can display the same emergent behaviors of the observed systems, that occur at larger timescales than those used in the training data. Finally, in the case of families of systems governed by a parameterized family of interaction kernels, we introduce novel estimators that estimate the parameterized family of kernels, splitting it into a common interaction kernel and the action of parameters. We demonstrate this in the case of gravity, by learning both the ``common component'' $1/r^2$ and the dependency on mass, without any a priori knowledge of either one, from observations of planetary motions in our solar system.


Optimizing Collision Avoidance in Dense Airspace using Deep Reinforcement Learning

arXiv.org Artificial Intelligence

New methodologies will be needed to ensure the airspace remains safe and efficient as traffic densities rise to accommodate new unmanned operations. This paper explores how unmanned free-flight traffic may operate in dense airspace. We develop and analyze autonomous collision avoidance systems for aircraft operating in dense airspace where traditional collision avoidance systems fail. We propose a metric for quantifying the decision burden on a collision avoidance system as well as a metric for measuring the impact of the collision avoidance system on airspace. We use deep reinforcement learning to compute corrections for an existing collision avoidance approach to account for dense airspace. The results show that a corrected collision avoidance system can operate more efficiently than traditional methods in dense airspace while maintaining high levels of safety.


Inverse Graph Learning over Optimization Networks

arXiv.org Machine Learning

Many inferential and learning tasks can be accomplished efficiently by means of distributed optimization algorithms where the network topology plays a critical role in driving the local interactions among neighboring agents. There is a large body of literature examining the effect of the graph structure on the performance of optimization strategies. In this article, we examine the inverse problem and consider the reverse question: How much information does observing the behavior at the nodes convey about the underlying network structure used for optimization? Over large-scale networks, the difficulty of addressing such inverse questions (or problems) is compounded by the fact that usually only a limited portion of nodes can be probed, giving rise to a second important question: Despite the presence of several unobserved nodes, are partial and local observations still sufficient to discover the graph linking the probed nodes? The article surveys recent advances on this inverse learning problem and related questions. Examples of applications are provided to illustrate how the interplay between graph learning and distributed optimization arises in practice, e.g., in cognitive engineered systems such as distributed detection, or in other real-world problems such as the mechanism of opinion formation over social networks and the mechanism of coordination in biological networks. A unifying framework for examining the reconstruction error will be described, which allows to devise and examine various estimation strategies enabling successful graph learning. The relevance of specific network attributes, such as sparsity versus density of connections, and node degree concentration, is discussed in relation to the topology inference goal. It is shown how universal (i.e., data-driven) clustering algorithms can be exploited to solve the graph learning problem.


A Machine Learning Framework for Solving High-Dimensional Mean Field Game and Mean Field Control Problems

arXiv.org Machine Learning

Mean field games (MFG) and mean field control (MFC) are critical classes of multi-agent models for efficient analysis of massive populations of interacting agents. Their areas of application span topics in economics, finance, game theory, industrial engineering, crowd motion, and more. In this paper, we provide a flexible machine learning framework for the numerical solution of potential MFG and MFC models. State-of-the-art numerical methods for solving such problems utilize spatial discretization that leads to a curse-of-dimensionality. We approximately solve high-dimensional problems by combining Lagrangian and Eulerian viewpoints and leveraging recent advances from machine learning. More precisely, we work with a Lagrangian formulation of the problem and enforce the underlying Hamilton-Jacobi-Bellman (HJB) equation that is derived from the Eulerian formulation. Finally, a tailored neural network parameterization of the MFG/MFC solution helps us avoid any spatial discretization. Our numerical results include the approximate solution of 100-dimensional instances of optimal transport and crowd motion problems on a standard work station. These results open the door to much-anticipated applications of MFG and MFC models that were beyond reach with existing numerical methods.


LTLf Synthesis with Fairness and Stability Assumptions

arXiv.org Artificial Intelligence

In synthesis, assumptions are constraints on the environment that rule out certain environment behaviors. A key observation here is that even if we consider systems with LTL f goals on finite traces, environment assumptions need to be expressed over infinite traces, since accomplishing the agent goals may require an unbounded number of environment action. To solve synthesis with respect to finite-trace LTL f goals under infinite-trace assumptions, we could reduce the problem to LTL synthesis. Unfortunately, while synthesis in LTL f and in LTLhave the same worst-case complexity (both 2EXPTIME-complete), the algorithms available for LTLsyn-thesis are much more difficult in practice than those for LTL f synthesis. In this work we show that in interesting cases we can avoid such a detour to LTLsynthesis and keep the simplicity of LTL f synthesis. Specifically, we develop a BDD-based fixpoint-based technique for handling basic forms of fairness and of stability assumptions. We show, empirically, that this technique performs much better than standard LTLsynthesis. Introduction In many situations we are interested in expressing properties over an unbounded but finite sequence of successive states. Linear-time Temporal Logic over finite traces ( LTL f) and its variants have been thoroughly investigated for doing so. There has been broad research for logical reasoning (De Gi-acomo and V ardi 2013; Li et al. 2019), synthesis (De Gi-acomo and V ardi 2015; Camacho et al. 2018), and planning (Camacho et al. 2017; De Giacomo and Rubin 2018). Recently synthesis under assumptions in LTL f has attracted specific interest (De Giacomo and Rubin 2018; Camacho, Bienvenu, and McIlraith 2018). First, planning for LTL f goals can be considered as a form of LTL f synthesis under assumptions, where the assumptions are the dynamics of the environment encoded in the planning domain (Green 1969; Camacho, Bienvenu, and McIlraith 2018; Aminof et al. 2018; Aminof et al. 2019). Synthesis under assumptions has been extensively studied in LTL, where environment assumptions are expressed as LTL formulas (Chatterjee and Henzinger 2007; Chatter-jee, Henzinger, and Jobstmann 2008; D'Ippolito et al. 2013; Bloem, Ehlers, and K onighofer 2015; Brenguier, Raskin, and Sankur 2017). In fact, LTL formulas can be used as assumptions as long as it is guaranteed that the environment is able to behave so as to keep the assumptions true, i.e., assumptions are environment realizable. Under these circumstances, it is possible to reduce synthesis for LTL goal ψ G under assumptions ψ A to standard synthesis for ψ A ψ G. Note that because of the guarantee of ψ A being environment realizable, no agent strategy can win ψ A ψ G by falsifying ψ A. See (Aminof et al. 2019) for a discussion.