Goto

Collaborating Authors

 Agents


Empirically Evaluating Creative Arc Negotiation for Improvisational Decision-making

arXiv.org Artificial Intelligence

Action selection from many options with few constraints is crucial for improvisation and co-creativity. Our previous work proposed creative arc negotiation to solve this problem, i.e., selecting actions to follow an author-defined `creative arc' or trajectory over estimates of novelty, unexpectedness, and quality for potential actions. The CARNIVAL agent architecture demonstrated this approach for playing the Props game from improv theatre in the Robot Improv Circus installation. This article evaluates the creative arc negotiation experience with CARNIVAL through two crowdsourced observer studies and one improviser laboratory study. The studies focus on subjects' ability to identify creative arcs in performance and their preference for creative arc negotiation compared to a random selection baseline. Our results show empirically that observers successfully identified creative arcs in performances. Both groups also preferred creative arc negotiation in agent creativity and logical coherence, while observers enjoyed it more too.


Unbiased Self-Play

arXiv.org Machine Learning

We present a general optimization framework for emergent belief-state representation without any supervision. We employed the common configuration of multiagent reinforcement learning and communication to improve exploration coverage over an environment by leveraging the knowledge of each agent. In this paper, we obtained that recurrent neural nets (RNNs) with shared weights are highly biased in partially observable environments because of their noncooperativity. To address this, we designated an unbiased version of self-play via mechanism design, also known as reverse game theory, to clarify unbiased knowledge at the Bayesian Nash equilibrium. The key idea is to add imaginary rewards using the peer prediction mechanism (Miller et al., 2005), i.e., a mechanism for mutually criticizing information in a decentralized environment. Numerical analyses, including StarCraft exploration tasks with up to 20 agents and off-the-shelf RNNs, demonstrate the state-of-the-art performance.


Auction-based and Distributed Optimization Approaches for Scheduling Observations in Satellite Constellations with Exclusive Orbit Portions

arXiv.org Artificial Intelligence

We investigate the use of multi-agent allocation techniques on problems related to Earth observation scenarios with multiple users and satellites. We focus on the problem of coordinating users having reserved exclusive orbit portions and one central planner having several requests that may use some intervals of these exclusives. We define this problem as Earth Observation Satellite Constellation Scheduling Problem (EOSCSP) and map it to a Mixed Integer Linear Program. As to solve EOSCSP, we propose market-based techniques and a distributed problem solving technique based on Distributed Constraint Optimization (DCOP), where agents cooperate to allocate requests without sharing their own schedules. These contributions are experimentally evaluated on randomly generated EOSCSP instances based on real large-scale or highly conflicting observation order books.


Discovering Multi-Agent Auto-Curricula in Two-Player Zero-Sum Games

arXiv.org Artificial Intelligence

When solving two-player zero-sum games, multi-agent reinforcement learning (MARL) algorithms often create populations of agents where, at each iteration, a new agent is discovered as the best response to a mixture over the opponent population. Within such a process, the update rules of "who to compete with" (i.e., the opponent mixture) and "how to beat them" (i.e., finding best responses) are underpinned by manually developed game theoretical principles such as fictitious play and Double Oracle. In this paper we introduce a framework, LMAC, based on meta-gradient descent that automates the discovery of the update rule without explicit human design. Specifically, we parameterise the opponent selection module by neural networks and the best-response module by optimisation subroutines, and update their parameters solely via interaction with the game engine, where both players aim to minimise their exploitability. Surprisingly, even without human design, the discovered MARL algorithms achieve competitive or even better performance with the state-of-the-art population-based game solvers (e.g., PSRO) on Games of Skill, differentiable Lotto, non-transitive Mixture Games, Iterated Matching Pennies, and Kuhn Poker. Additionally, we show that LMAC is able to generalise from small games to large games, for example training on Kuhn Poker and outperforming PSRO on Leduc Poker. Our work inspires a promising future direction to discover general MARL algorithms solely from data.


Emergent Communication of Generalizations

arXiv.org Artificial Intelligence

To build agents that can collaborate effectively with others, recent research has trained artificial agents to communicate with each other in Lewis-style referential games. However, this often leads to successful but uninterpretable communication. We argue that this is due to the game objective: communicating about a single object in a shared visual context is prone to overfitting and does not encourage language useful beyond concrete reference. In contrast, human language conveys a rich variety of abstract ideas. To promote such skills, we propose games that require communicating generalizations over sets of objects representing abstract visual concepts, optionally with separate contexts for each agent. We find that these games greatly improve systematicity and interpretability of the learned languages, according to several metrics in the literature. Finally, we propose a method for identifying logical operations embedded in the emergent languages by learning an approximate compositional reconstruction of the language.


MALib: A Parallel Framework for Population-based Multi-agent Reinforcement Learning

arXiv.org Artificial Intelligence

Population-based multi-agent reinforcement learning (PB-MARL) refers to the series of methods nested with reinforcement learning (RL) algorithms, which produces a self-generated sequence of tasks arising from the coupled population dynamics. By leveraging auto-curricula to induce a population of distinct emergent strategies, PB-MARL has achieved impressive success in tackling multi-agent tasks. Despite remarkable prior arts of distributed RL frameworks, PB-MARL poses new challenges for parallelizing the training frameworks due to the additional complexity of multiple nested workloads between sampling, training and evaluation involved with heterogeneous policy interactions. To solve these problems, we present MALib, a scalable and efficient computing framework for PB-MARL. Our framework is comprised of three key components: (1) a centralized task dispatching model, which supports the self-generated tasks and scalable training with heterogeneous policy combinations; (2) a programming architecture named Actor-Evaluator-Learner, which achieves high parallelism for both training and sampling, and meets the evaluation requirement of auto-curriculum learning; (3) a higher-level abstraction of MARL training paradigms, which enables efficient code reuse and flexible deployments on different distributed computing paradigms. Experiments on a series of complex tasks such as multi-agent Atari Games show that MALib achieves throughput higher than 40K FPS on a single machine with $32$ CPU cores; 5x speedup than RLlib and at least 3x speedup than OpenSpiel in multi-agent training tasks. MALib is publicly available at https://github.com/sjtu-marl/malib.


Data-driven discovery of interacting particle systems using Gaussian processes

arXiv.org Machine Learning

Interacting particle or agent systems that display a rich variety of collection motions are ubiquitous in science and engineering. A fundamental and challenging goal is to understand the link between individual interaction rules and collective behaviors. In this paper, we study the data-driven discovery of distance-based interaction laws in second-order interacting particle systems. We propose a learning approach that models the latent interaction kernel functions as Gaussian processes, which can simultaneously fulfill two inference goals: one is the nonparametric inference of interaction kernel function with the pointwise uncertainty quantification, and the other one is the inference of unknown parameters in the non-collective forces of the system. We formulate learning interaction kernel functions as a statistical inverse problem and provide a detailed analysis of recoverability conditions, establishing that a coercivity condition is sufficient for recoverability. We provide a finite-sample analysis, showing that our posterior mean estimator converges at an optimal rate equal to the one in the classical 1-dimensional Kernel Ridge regression. Numerical results on systems that exhibit different collective behaviors demonstrate efficient learning of our approach from scarce noisy trajectory data.


Deceptive Level Generation for Angry Birds

arXiv.org Artificial Intelligence

The Angry Birds AI competition has been held over many years to encourage the development of AI agents that can play Angry Birds game levels better than human players. Many different agents with various approaches have been employed over the competition's lifetime to solve this task. Even though the performance of these agents has increased significantly over the past few years, they still show major drawbacks in playing deceptive levels. This is because most of the current agents try to identify the best next shot rather than planning an effective sequence of shots. In order to encourage advancements in such agents, we present an automated methodology to generate deceptive game levels for Angry Birds. Even though there are many existing content generators for Angry Birds, they do not focus on generating deceptive levels. In this paper, we propose a procedure to generate deceptive levels for six deception categories that can fool the state-of-the-art Angry Birds playing AI agents. Our results show that generated deceptive levels exhibit similar characteristics of human-created deceptive levels. Additionally, we define metrics to measure the stability, solvability, and degree of deception of the generated levels.


Modeling Communication to Coordinate Perspectives in Cooperation

arXiv.org Artificial Intelligence

Communication is highly overloaded. Despite this, even young children are good at leveraging context to understand ambiguous signals. We propose a computational account of overloaded signaling from a shared agency perspective which we call the Imagined We for Communication. Under this framework, communication helps cooperators coordinate their perspectives, allowing them to act together to achieve shared goals. We assume agents are rational cooperators, which puts constraints on how signals can be sent and interpreted. We implement this model in a set of simulations demonstrating this model's success under increasing ambiguity as well as increasing layers of reasoning. Our model is capable of improving performance with deeper recursive reasoning; however, it outperforms comparison baselines at even the shallowest level, highlighting how shared knowledge and cooperative logic can do much of the heavy-lifting in language.


The Smoothed Satisfaction of Voting Axioms

arXiv.org Artificial Intelligence

We initiate the work towards a comprehensive picture of the smoothed satisfaction of voting axioms, to provide a finer and more realistic foundation for comparing voting rules. We adopt the smoothed social choice framework, where an adversary chooses arbitrarily correlated "ground truth" preferences for the agents, on top of which random noises are added. We focus on characterizing the smoothed satisfaction of two well-studied voting axioms: Condorcet criterion and participation. We prove that for any fixed number of alternatives, when the number of voters $n$ is sufficiently large, the smoothed satisfaction of the Condorcet criterion under a wide range of voting rules is $1$, $1-\exp(-\Theta(n))$, $\Theta(n^{-0.5})$, $ \exp(-\Theta(n))$, or being $\Theta(1)$ and $1-\Theta(1)$ at the same time; and the smoothed satisfaction of participation is $1-\Theta(n^{-0.5})$. Our results address open questions by Berg and Lepelley in 1994 for these rules, and also confirm the following high-level message: the Condorcet criterion is a bigger concern than participation under realistic models.