Goto

Collaborating Authors

 Agents


Learning to Speak and Act in a Fantasy Text Adventure Game

arXiv.org Artificial Intelligence

We introduce a large scale crowdsourced text adventure game as a research platform for studying grounded dialogue. In it, agents can perceive, emote, and act whilst conducting dialogue with other agents. Models and humans can both act as characters within the game. We describe the results of training state-of-the-art generative and retrieval models in this setting. We show that in addition to using past dialogue, these models are able to effectively use the state of the underlying world to condition their predictions. In particular, we show that grounding on the details of the local environment, including location descriptions, and the objects (and their affordances) and characters (and their previous actions) present within it allows better predictions of agent behavior and dialogue. We analyze the ingredients necessary for successful grounding in this setting, and how each of these factors relate to agents that can talk and act successfully.


Three-Way Decisions-Based Conflict Analysis Models

arXiv.org Artificial Intelligence

Three-way decision theory, which trisects the universe with less risks or costs, is considered as a powerful mathematical tool for handling uncertainty in incomplete and imprecise information tables, and provides an effective tool for conflict analysis decision making in real-time situations. In this paper, we propose the concepts of the agreement, disagreement and neutral subsets of a strategy with two evaluation functions, which establish the three-way decisions-based conflict analysis models(TWDCAMs) for trisecting the universe of agents, and employ a pair of two-way decisions models to interpret the mechanism of the three-way decision rules for an agent. Subsequently, we develop the concepts of the agreement, disagreement and neutral strategies of an agent group with two evaluation functions, which build the TWDCAMs for trisecting the universe of issues, and take a couple of two-way decisions models to explain the mechanism of the three-way decision rules for an issue. Finally, we reconstruct Fan, Qi and Wei's conflict analysis models(FQWCAMs) and Sun, Ma and Zhao's conflict analysis models(SMZCAMs) with two evaluation functions, and interpret FQWCAMs and SMZCAMs with a pair of two-day decisions models, which illustrates that FQWCAMs and SMZCAMs are special cases of TWDCAMs.


Neural MMO - A Massively Multiagent Game Environment

#artificialintelligence

Our platform supports a large, variable number of agents within a persistent and open-ended task. The inclusion of many agents and species leads to better exploration, divergent niche formation, and greater overall competence. In recent years, multiagent settings have become an effective platform for deep reinforcement learning research. Despite this progress, there are still two main challenges for multiagent reinforcement learning. We need to create open-ended tasks with a high complexity ceiling: current environments are either complex but too narrow or open-ended but too simple.


Researchers Are Training AI to Survive In This MMO

#artificialintelligence

Neural MMO is a new massively multiplayer online game, but humans aren't invited--only artificial intelligence can play. In the game, AI agents spawn into an open world and need to gather resources like food and water to survive. Along the way, they'll encounter rival agents which they can avoid or fight for dominance. It's a harsh world, designed by its creators to prompt the AI agents to develop strategies that satisfy a task that is both open-ended and highly complex: survival over a lifetime. OpenAI researchers Joseph Suarez, Yilun Du, Phillip Isola, and Igor Mordatch designed Neural MMO and released its code via GitHub on Monday.


Concurrent Meta Reinforcement Learning

arXiv.org Artificial Intelligence

State-of-the-art meta reinforcement learning algorithms typically assume the setting of a single agent interacting with its environment in a sequential manner. A negative side-effect of this sequential execution paradigm is that, as the environment becomes more and more challenging, and thus requiring more interaction episodes for the meta-learner, it needs the agent to reason over longer and longer time-scales. To combat the difficulty of long time-scale credit assignment, we propose an alternative parallel framework, which we name "Concurrent Meta-Reinforcement Learning" (CMRL), that transforms the temporal credit assignment problem into a multi-agent reinforcement learning one. In this multi-agent setting, a set of parallel agents are executed in the same environment and each of these "rollout" agents are given the means to communicate with each other. The goal of the communication is to coordinate, in a collaborative manner, the most efficient exploration of the shared task the agents are currently assigned. This coordination therefore represents the meta-learning aspect of the framework, as each agent can be assigned or assign itself a particular section of the current task's state space. This framework is in contrast to standard RL methods that assume that each parallel rollout occurs independently, which can potentially waste computation if many of the rollouts end up sampling the same part of the state space. Furthermore, the parallel setting enables us to define several reward sharing functions and auxiliary losses that are non-trivial to apply in the sequential setting. We demonstrate the effectiveness of our proposed CMRL at improving over sequential methods in a variety of challenging tasks.


AI Generality and Spearman’s Law of Diminishing Returns

Journal of Artificial Intelligence Research

Many areas of AI today use benchmarks and competitions with larger and wider sets of tasks. This tries to deter AI systems (and research effort) from specialising to a single task, and encourage them to be prepared to solve previously unseen tasks. It is unclear, however, whether the methods with best performance are actually those that are most general and, in perspective, whether the trend moves towards more general AI systems. This question has a striking similarity with the analysis of the so-called positive manifold and general factors in the area of human intelligence. In this paper, we first show how the existence of a manifold (positive average pairwise task correlation) can also be analysed in AI, and how this relates to the notion of agent generality, from the individual and the populational points of view. From the populational perspective, we analyse the following question: is this manifold correlation higher for the most or for the least able group of agents? We contrast this analysis with one of the most controversial issues in human intelligence research, the so-called Spearman's Law of Diminishing Returns (SLODR), which basically states that the relevance of a general factor diminishes for most able human groups. We perform two empirical studies on these issues in AI. We analyse the results of the 2015 general video game AI (GVGAI) competition, with games as tasks and "controllers" as agents, and the results of a synthetic setting, with modified elementary cellular automata (ECA) rules as tasks and simple interactive programs as agents. In both cases, we see that SLODR doesnot appear. The data, and the use of just two scenarios, does not clearly support the reverse either, a Universal Law of Augmenting Returns (ULOAR), but calls for more experiments on this question.


Distributed Online Convex Optimization with Time-Varying Coupled Inequality Constraints

arXiv.org Machine Learning

This paper considers distributed online optimization with time-varying coupled inequality constraints. The global objective function is composed of local convex cost and regularization functions and the coupled constraint function is the sum of local convex constraint functions. A distributed online primal-dual dynamic mirror descent algorithm is proposed to solve this problem, where the local cost, regularization, and constraint functions are held privately and revealed only after each time slot. We first derive regret and cumulative constraint violation bounds for the algorithm and show how they depend on the stepsize sequences, the accumulated dynamic variation of the comparator sequence, the number of agents, and the network connectivity. As a result, under some natural decreasing stepsize sequences, we prove that the algorithm achieves sublinear dynamic regret and cumulative constraint violation if the accumulated dynamic variation of the optimal sequence also grows sublinearly. We also prove that the algorithm achieves sublinear static regret and cumulative constraint violation under mild conditions. In addition, smaller bounds on the static regret are achieved when the objective functions are strongly convex. Finally, numerical simulations are provided to illustrate the effectiveness of the theoretical results.


Can Sophisticated Dispatching Strategy Acquired by Reinforcement Learning? - A Case Study in Dynamic Courier Dispatching System

arXiv.org Artificial Intelligence

In this paper, we study a courier dispatching problem (CDP) raised from an online pickup-service platform of Alibaba. The CDP aims to assign a set of couriers to serve pickup requests with stochastic spatial and temporal arrival rate among urban regions. The objective is to maximize the revenue of served requests given a limited number of couriers over a period of time. Many online algorithms such as dynamic matching and vehicle routing strategy from existing literature could be applied to tackle this problem. However, these methods rely on appropriately predefined optimization objectives at each decision point, which is hard in dynamic situations. This paper formulates the CDP as a Markov decision process (MDP) and proposes a data-driven approach to derive the optimal dispatching rule-set under different scenarios. Our method stacks multi-layer images of the spatial-and-temporal map and apply multi-agent reinforcement learning (MARL) techniques to evolve dispatching models. This method solves the learning inefficiency caused by traditional centralized MDP modeling. Through comprehensive experiments on both artificial dataset and real-world dataset, we show: 1) By utilizing historical data and considering long-term revenue gains, MARL achieves better performance than myopic online algorithms; 2) MARL is able to construct the mapping between complex scenarios to sophisticated decisions such as the dispatching rule. 3) MARL has the scalability to adopt in large-scale real-world scenarios.


OpenAI launches Neural MMO, a massive reinforcement learning simulator

#artificialintelligence

Artificial intelligence that's beastly at World of Warcraft might not lie too far into the distant future, if OpenAI has its way. The San Francisco research nonprofit today released Neural MMO, a "massively multiagent" virtual training ground that plops agents in the middle of an RPG-like world -- one complete with a resource collection mechanic and player versus player combat. "The game genre of Massively Multiplayer Online Games (MMOs) simulates a large ecosystem of a variable number of players competing in persistent and extensive environments," OpenAI wrote in a blog post. "The inclusion of many agents and species leads to better exploration, divergent niche formation, and greater overall competence." AI agents spawn randomly in Neural MMO environments, which contain automatically generated tile maps of a prespecified size. Some tiles are traversable, like "forest" (which bears food) and "grass," while others aren't (such as water and stone).


A Grounded Interaction Protocol for Explainable Artificial Intelligence

arXiv.org Artificial Intelligence

Explainable Artificial Intelligence (XAI) systems need to include an explanation model to communicate the internal decisions, behaviours and actions to the interacting humans. Successful explanation involves both cognitive and social processes. In this paper we focus on the challenge of meaningful interaction between an explainer and an explainee and investigate the structural aspects of an interactive explanation to propose an interaction protocol. We follow a bottom-up approach to derive the model by analysing transcripts of different explanation dialogue types with 398 explanation dialogues. We use grounded theory to code and identify key components of an explanation dialogue. We formalize the model using the agent dialogue framework (ADF) as a new dialogue type and then evaluate it in a human-agent interaction study with 101 dialogues from 14 participants. Our results show that the proposed model can closely follow the explanation dialogues of human-agent conversations.