Goto

Collaborating Authors

 Agents


Learning Transferable Cooperative Behavior in Multi-Agent Teams

arXiv.org Machine Learning

While multi-agent interactions can be naturally modeled as a graph, the environment has traditionally been considered as a black box. We propose to create a shared agent-entity graph, where agents and environmental entities form vertices, and edges exist between the vertices which can communicate with each other. Agents learn to cooperate by exchanging messages along the edges of this graph. Our proposed multi-agent reinforcement learning framework is invariant to the number of agents or entities present in the system as well as permutation invariance, both of which are desirable properties for any multi-agent system representation. We present state-of-the-art results on coverage, formation and line control tasks for multi-agent teams in a fully decentralized framework and further show that the learned policies quickly transfer to scenarios with different team sizes along with strong zero-shot generalization performance. This is an important step towards developing multi-agent teams which can be realistically deployed in the real world without assuming complete prior knowledge or instantaneous communication at unbounded distances.


Extra-gradient with player sampling for provable fast convergence in n-player games

arXiv.org Machine Learning

Data-driven model training is increasingly relying on finding Nash equilibria with provable techniques, e.g., for GANs and multi-agent RL. In this paper, we analyse a new extra-gradient method, that performs gradient extrapolations and updates on a random subset of players at each iteration. This approach provably exhibits the same rate of convergence as full extra-gradient in non-smooth convex games. We propose an additional variance reduction mechanism for this to hold for smooth convex games. Our approach makes extrapolation amenable to massive multiplayer settings, and brings empirical speed-ups, in particular when using cyclic sampling schemes. We demonstrate the efficiency of player sampling on large-scale non-smooth and non-strictly convex games. We show that the joint use of extrapolation and player sampling allows to train better GANs on CIFAR10.


Unsupervised Emergence of Egocentric Spatial Structure from Sensorimotor Prediction

arXiv.org Artificial Intelligence

Despite its omnipresence in robotics application, the nature of spatial knowledge and the mechanisms that underlie its emergence in autonomous agents are still poorly understood. Recent theoretical works suggest that the Euclidean structure of space induces invariants in an agent's raw sensorimotor experience. We hypothesize that capturing these invariants is beneficial for sensorimotor prediction and that, under certain exploratory conditions, a motor representation capturing the structure of the external space should emerge as a byproduct of learning to predict future sensory experiences. We propose a simple sensorimotor predictive scheme, apply it to different agents and types of exploration, and evaluate the pertinence of these hypotheses. We show that a naive agent can capture the topology and metric regularity of its sensor's position in an egocentric spatial frame without any a priori knowledge, nor extraneous supervision.


Massive Styles Transfer with Limited Labeled Data

arXiv.org Artificial Intelligence

Language style transfer has attracted more and more attention in the past few years. Recent researches focus on improving neural models targeting at transferring from one style to the other with labeled data. However, transferring across multiple styles is often very useful in real-life applications. Previous researches of language style transfer have two main deficiencies: dependency on massive labeled data and neglect of mutual influence among different style transfer tasks. In this paper, we propose a multi-agent style transfer system (MAST) for addressing multiple style transfer tasks with limited labeled data, by leveraging abundant unlabeled data and the mutual benefit among the multiple styles. A style transfer agent in our system not only learns from unlabeled data by using techniques like denoising auto-encoder and back-translation, but also learns to cooperate with other style transfer agents in a self-organization manner. We conduct our experiments by simulating a set of real-world style transfer tasks with multiple versions of the Bible. Our model significantly outperforms the other competitive methods. Extensive results and analysis further verify the efficacy of our proposed system.


An Extensive Review of Computational Dance Automation Techniques and Applications

arXiv.org Artificial Intelligence

Dance is an art and when technology meets this kind of art, it's a novel attempt in itself. Several researchers have attempted to automate several aspects of dance, right from dance notation to choreography. Furthermore, we have encountered several applications of dance automation like e-learning, heritage preservation, etc. Despite several attempts by researchers for more than two decades in various styles of dance all round the world, we found a review paper that portrays the research status in this area dating to 1990 \cite{politis1990computers}. Hence, we decide to come up with a comprehensive review article that showcases several aspects of dance automation. This paper is an attempt to review research work reported in the literature, categorize and group all research work completed so far in the field of automating dance. We have explicitly identified six major categories corresponding to the use of computers in dance automation namely dance representation, dance capturing, dance semantics, dance generation, dance processing approaches and applications of dance automation systems. We classified several research papers under these categories according to their research approach and functionality. With the help of proposed categories and subcategories one can easily determine the state of research and the new avenues left for exploration in the field of dance automation.


A Case for Backward Compatibility for Human-AI Teams

arXiv.org Machine Learning

AI systems are being deployed to support human decision making in high-stakes domains. In many cases, the human and AI form a team, in which the human makes decisions after reviewing the AI's inferences. A successful partnership requires that the human develops insights into the performance of the AI system, including its failures. We study the influence of updates to an AI system in this setting. While updates can increase the AI's predictive performance, they may also lead to changes that are at odds with the user's prior experiences and confidence in the AI's inferences, hurting therefore the overall team performance. We introduce the notion of the compatibility of an AI update with prior user experience and present methods for studying the role of compatibility in human-AI teams. Empirical results on three high-stakes domains show that current machine learning algorithms do not produce compatible updates. We propose a re-training objective to improve the compatibility of an update by penalizing new errors. The objective offers full leverage of the performance/compatibility tradeoff, enabling more compatible yet accurate updates.


The Computational Structure of Unintentional Meaning

arXiv.org Artificial Intelligence

Speech-acts can have literal meaning as well as pragmatic meaning, but these both involve consequences typically intended by a speaker. Speech-acts can also have unintentional meaning, in which what is conveyed goes above and beyond what was intended. Here, we present a Bayesian analysis of how, to a listener, the meaning of an utterance can significantly differ from a speaker's intended meaning. Our model emphasizes how comprehending the intentional and unintentional meaning of speech-acts requires listeners to engage in sophisticated model-based perspective-taking and reasoning about the history of the state of the world, each other's actions, and each other's observations. To test our model, we have human participants make judgments about vignettes where speakers make utterances that could be interpreted as intentional insults or unintentional faux pas. In elucidating the mechanics of speech-acts with unintentional meanings, our account provides insight into how communication both functions and malfunctions.


Budgeted Policy Learning for Task-Oriented Dialogue Systems

arXiv.org Artificial Intelligence

This paper presents a new approach that extends Deep Dyna-Q (DDQ) by incorporating a Budget-Conscious Scheduling (BCS) to best utilize a fixed, small amount of user interactions (budget) for learning task-oriented dialogue agents. BCS consists of (1) a Poisson-based global scheduler to allocate budget over different stages of training; (2) a controller to decide at each training step whether the agent is trained using real or simulated experiences; (3) a user goal sampling module to generate the experiences that are most effective for policy learning. Experiments on a movie-ticket booking task with simulated and real users show that our approach leads to significant improvements in success rate over the state-of-the-art baselines given the fixed budget.


Automated Video Game Testing Using Synthetic and Human-Like Agents

arXiv.org Artificial Intelligence

In this paper, we present a new methodology that employs tester agents to automate video game testing. We introduce two types of agents -synthetic and human-like- and two distinct approaches to create them. Our agents are derived from Reinforcement Learning (RL) and Monte Carlo Tree Search (MCTS) agents, but focus on finding defects. The synthetic agent uses test goals generated from game scenarios, and these goals are further modified to examine the effects of unintended game transitions. The human-like agent uses test goals extracted by our proposed multiple greedy-policy inverse reinforcement learning (MGP-IRL) algorithm from tester trajectories. MGPIRL captures multiple policies executed by human testers. These testers' aims are finding defects while interacting with the game to break it, which is considerably different from game playing. We present interaction states to model such interactions. We use our agents to produce test sequences, run the game with these sequences, and check the game for each run with an automated test oracle. We analyze the proposed method in two parts: we compare the success of human-like and synthetic agents in bug finding, and we evaluate the similarity between humanlike agents and human testers. We collected 427 trajectories from human testers using the General Video Game Artificial Intelligence (GVG-AI) framework and created three games with 12 levels that contain 45 bugs. Our experiments reveal that human-like and synthetic agents compete with human testers' bug finding performances. Moreover, we show that MGP-IRL increases the human-likeness of agents while improving the bug finding performance.


Average-case Analysis of the Assignment Problem with Independent Preferences

arXiv.org Artificial Intelligence

The fundamental assignment problem is in search of welfare maximization mechanisms to allocate items to agents when the private preferences over indivisible items are provided by self-interested agents. The mainstream mechanism \textit{Random Priority} is asymptotically the best mechanism for this purpose, when comparing its welfare to the optimal social welfare using the canonical \textit{worst-case approximation ratio}. Despite its popularity, the efficiency loss indicated by the worst-case ratio does not have a constant bound. Recently, [Deng, Gao, Zhang 2017] show that when the agents' preferences are drawn from a uniform distribution, its \textit{average-case approximation ratio} is upper bounded by 3.718. They left it as an open question of whether a constant ratio holds for general scenarios. In this paper, we offer an affirmative answer to this question by showing that the ratio is bounded by $1/\mu$ when the preference values are independent and identically distributed random variables, where $\mu$ is the expectation of the value distribution. This upper bound also improves the upper bound of 3.718 in [Deng, Gao, Zhang 2017] for the Uniform distribution. Moreover, under mild conditions, the ratio has a \textit{constant} bound for any independent random values. En route to these results, we develop powerful tools to show the insights that in most instances the efficiency loss is small.