Agents
Generally capable agents emerge from open-ended play
In recent years, artificial intelligence agents have succeeded in a range of complex game environments. For instance, AlphaZero beat world-champion programs in chess, shogi, and Go after starting out with knowing no more than the basic rules of how to play. But AlphaZero still trained separately on each game -- unable to simply learn another game or task without repeating the RL process from scratch. The same is true for other successes of RL, such as Atari, Capture the Flag, StarCraft II, Dota 2, and Hide-and-Seek. DeepMind's mission of solving intelligence to advance science and humanity led us to explore how we could overcome this limitation to create AI agents with more general and adaptive behaviour.
Core Challenges in Embodied Vision-Language Planning
Francis, Jonathan, Kitamura, Nariaki, Labelle, Felix, Lu, Xiaopeng, Navarro, Ingrid, Oh, Jean
Recent advances in the areas of multimodal machine learning and artificial intelligence (AI) have led to the development of challenging tasks at the intersection of Computer Vision, Natural Language Processing, and Embodied AI. Whereas many approaches and previous survey pursuits have characterised one or two of these dimensions, there has not been a holistic analysis at the center of all three. Moreover, even when combinations of these topics are considered, more focus is placed on describing, e.g., current architectural methods, as opposed to also illustrating high-level challenges and opportunities for the field. In this survey paper, we discuss Embodied Vision-Language Planning (EVLP) tasks, a family of prominent embodied navigation and manipulation problems that jointly use computer vision and natural language. We propose a taxonomy to unify these tasks and provide an in-depth analysis and comparison of the new and current algorithmic approaches, metrics, simulated environments, as well as the datasets used for EVLP tasks. Finally, we present the core challenges that we believe new EVLP works should seek to address, and we advocate for task construction that enables model generalizability and furthers real-world deployment.
Policy Gradient Methods Find the Nash Equilibrium in N-player General-sum Linear-quadratic Games
Hambly, Ben, Xu, Renyuan, Yang, Huining
Policy optimization algorithms have achieved substantial empirical successes in addressing a variety of non-cooperative multi-agent problems, including self-driving vehicles [17], real-time bidding games [8], and optimal execution in financial markets [6]. However, there have been few results from a theoretical perspective showing why such a class of reinforcement learning algorithms performs well with the presence of competition among agents. As a starting point to tackle this challenging problem, we investigate linear-quadratic games (LQGs) which can be seen as a generalization of the linear-quadratic regulator (LQR) from a single agent to multiple agents. In an LQG, all agents jointly control a linear state process, which may be in high dimensions, where the control (or action) from each individual agent has a linear impact on the state process. Each agent optimizes a quadratic cost function which depends on the state process, the control from this agent and/or the controls from the opponents.
A Storytelling Robot managing Persuasive and Ethical Stances via ACT-R: an Exploratory Study
Augello, Agnese, Città, Giuseppe, Gentile, Manuel, Lieto, Antonio
In the last decade, the field of Human-Computer Interaction (HCI) has started to focus its attention on the design and implementation of artificial systems "orienting" attitudes and/or behaviours of a user according to a predefined direction. This growing sub-field, studying the so-called Persuasive Technologies, concerns a variety of system typologies that can adopt different strategies to pursue their goals. Building persuasive robots able to interact with human beings on a specific topic (or in a multi-domain setting) in a realistic and persuasive way, represents an open problem and research challenge in Social Robotics. To this aim, a strategy often used in human-human communication to make people reconsider their behaviour and beliefs, and similarly proposed in human-robot interaction, is to exploit storytelling to let people identify themselves with the characters or roles in a story in order to understand different perspectives and needs. In the design of a persuasive system, in addition, it is also important to not ignore the ethical dimension: i.e. an intelligent artificial system should be able to make decision and act in an ethical way, taking into account norms of social practices and needs of other individuals.
DYPLODOC: Dynamic Plots for Document Classification
Malysheva, Anastasia, Tikhonov, Alexey, Yamshchikov, Ivan P.
Narrative generation and analysis are still on the fringe of modern natural language processing yet are crucial in a variety of applications. This paper proposes a feature extraction method for plot dynamics. We present a dataset that consists of the plot descriptions for thirteen thousand TV shows alongside meta-information on their genres and dynamic plots extracted from them. We validate the proposed tool for plot dynamics extraction and discuss possible applications of this method to the tasks of narrative analysis and generation.
Rational Verification for Probabilistic Systems
Gutierrez, Julian, Hammond, Lewis, Lin, Anthony W., Najib, Muhammad, Wooldridge, Michael
Rational verification is the problem of determining which temporal logic properties will hold in a multi-agent system, under the assumption that agents in the system act rationally, by choosing strategies that collectively form a game-theoretic equilibrium. Previous work in this area has largely focussed on deterministic systems. In this paper, we develop the theory and algorithms for rational verification in probabilistic systems. We focus on concurrent stochastic games (CSGs), which can be used to model uncertainty and randomness in complex multi-agent environments. We study the rational verification problem for both non-cooperative games and cooperative games in the qualitative probabilistic setting. In the former case, we consider LTL properties satisfied by the Nash equilibria of the game and in the latter case LTL properties satisfied by the core. In both cases, we show that the problem is 2EXPTIME-complete, thus not harder than the much simpler verification problem of model checking LTL properties of systems modelled as Markov decision processes (MDPs).
Architecture of Automated Crypto-Finance Agent
Raheman, Ali, Kolonin, Anton, Goertzel, Ben, Hegykozi, Gergely, Ansari, Ikram
The subject of decentralized finance is attracting the attention of investors as well developers and scientists due to high potential financial returns, high demand for implementation of automated business applications for investments, liquidity provision, and trading using crypto-currencies. A few unique properties of cryptofinancial markets, enormous volatility and the presence of "on-chain" data such as transaction logs that may be used as an extra source of data for applications based on artificial intelligence and machine learning. The key possibility associated with decentralized finance is automated liquidity provision, also called market making, which can be performed on either centralized exchanges (CEX), such as Binance, or decentralized ones (DEX) such as smart contracts like Uniswap or Balancer on the Ethereum blockchain. How machine learning and artificial intelligence can be applied to it is a matter of active study, such as attempts to learn efficient market making strategies [1,2,3,4]. Unfortunately, the results are not that exciting so far with demonstrated ability to learn some basic principles of trading using limit book orders, with the ability to outperform "hodling" strategy (buy and hold on rising market) in very specific conditions.
Using Microsoft Teams and ServiceNow to enhance end-user support
Microsoft Digital, the organization that is powering, protecting, and transforming Microsoft, is improving the support experience by partnering with ServiceNow to incorporate modern support-agent functionality into the Microsoft Digital environment by using ServiceNow Virtual Agent and Microsoft Teams. As a result, the support team, and the employees they assist have a more complete tool set, a simpler view into the support environment, and a more streamlined method for executing tasks and solving issues quickly. Microsoft Digital runs the systems that support more than 135,000 employees. Our Global Helpdesk supplies support to these employees throughout more than 120 countries and regions worldwide. Global Helpdesk receives approximately 3,000 requests for support every day, and the ability to efficiently assess what help our users need and how we can provide that help are critical to the effectiveness of Global Helpdesk and our Employee Experience organization at Microsoft.
On Blame Attribution for Accountable Multi-Agent Sequential Decision Making
Triantafyllou, Stelios, Singla, Adish, Radanovic, Goran
Blame attribution is one of the key aspects of accountable decision making, as it provides means to quantify the responsibility of an agent for a decision making outcome. In this paper, we study blame attribution in the context of cooperative multi-agent sequential decision making. As a particular setting of interest, we focus on cooperative decision making formalized by Multi-Agent Markov Decision Processes (MMDP), and we analyze different blame attribution methods derived from or inspired by existing concepts in cooperative game theory. We formalize desirable properties of blame attribution in the setting of interest, and we analyze the relationship between these properties and the studied blame attribution methods. Interestingly, we show that some of the well known blame attribution methods, such as Shapley value, are not performance-incentivizing, while others, such as Banzhaf index, may over-blame agents. To mitigate these value misalignment and fairness issues, we introduce a novel blame attribution method, unique in the set of properties it satisfies, which trade-offs explanatory power (by under-blaming agents) for the aforementioned properties. We further show how to account for uncertainty about agents' decision making policies, and we experimentally: a) validate the qualitative properties of the studied blame attribution methods, and b) analyze their robustness to uncertainty.
Transferable Dialogue Systems and User Simulators
Tseng, Bo-Hsiang, Dai, Yinpei, Kreyssig, Florian, Byrne, Bill
One of the difficulties in training dialogue systems is the lack of training data. We explore the possibility of creating dialogue data through the interaction between a dialogue system and a user simulator. Our goal is to develop a modelling framework that can incorporate new dialogue scenarios through self-play between the two agents. In this framework, we first pre-train the two agents on a collection of source domain dialogues, which equips the agents to converse with each other via natural language. With further fine-tuning on a small amount of target domain data, the agents continue to interact with the aim of improving their behaviors using reinforcement learning with structured reward functions. In experiments on the MultiWOZ dataset, two practical transfer learning problems are investigated: 1) domain adaptation and 2) single-to-multiple domain transfer. We demonstrate that the proposed framework is highly effective in bootstrapping the performance of the two agents in transfer learning. We also show that our method leads to improvements in dialogue system performance on complete datasets.