Agents
Just Ask:An Interactive Learning Framework for Vision and Language Navigation
Chi, Ta-Chung, Eric, Mihail, Kim, Seokhwan, Shen, Minmin, Hakkani-tur, Dilek
In the vision and language navigation task, the agent may encounter ambiguous situations that are hard to interpret by just relying on visual information and natural language instructions. We propose an interactive learning framework to endow the agent with the ability to ask for users' help in such situations. As part of this framework, we investigate multiple learning approaches for the agent with different levels of complexity. The simplest model-confusion-based method lets the agent ask questions based on its confusion, relying on the predefined confidence threshold of a next action prediction model. To build on this confusion-based method, the agent is expected to demonstrate more sophisticated reasoning such that it discovers the timing and locations to interact with a human. We achieve this goal using reinforcement learning (RL) with a proposed reward shaping term, which enables the agent to ask questions only when necessary. The success rate can be boosted by at least 15% with only one question asked on average during the navigation. Furthermore, we show that the RL agent is capable of adjusting dynamically to noisy human responses. Finally, we design a continual learning strategy, which can be viewed as a data augmentation method, for the agent to improve further utilizing its interaction history with a human. We demonstrate the proposed strategy is substantially more realistic and data-efficient compared to previously proposed pre-exploration techniques.
Artificial Intelligence Discovers Tool Use in Hide-and-Seek Games
Artificial Intelligence Discovers Tool Use in Hide-and-Seek Games Programmers at OpenAI, an artificial intelligence research company, recently taught a gaggle of intelligent artificial agents -- bots -- to play hide-and-seek. Not because they cared who won: The goal was to observe how competition between hiders and seekers would drive the bots to find and use digital tools. The idea is familiar to anyone who's ever played the game in real life; it's a kind of scaled-down arms race. When your opponent adopts a strategy that works, you have to abandon what you were doing before and find a new, better plan. It's the rule that governs games from chess to StarCraft II; it's also an adaptation that seems likely to confer an evolutionary advantage. So it went with hide-and-seek.
Interactive AI with a Theory of Mind
รelikok, Mustafa Mert, Peltola, Tomi, Daee, Pedram, Kaski, Samuel
Understanding each other is the key to success in collaboration. For humans, attributing mental states to others, the theory of mind, provides the crucial advantage. We argue for formulating human--AI interaction as a multi-agent problem, endowing AI with a computational theory of mind to understand and anticipate the user. To differentiate the approach from previous work, we introduce a categorisation of user modelling approaches based on the level of agency learnt in the interaction. We describe our recent work in using nested multi-agent modelling to formulate user models for multi-armed bandit based interactive AI systems, including a proof-of-concept user study.
Optimization for Reinforcement Learning: From Single Agent to Cooperative Agents
Lee, Donghwan, He, Niao, Kamalaruban, Parameswaran, Cevher, Volkan
This article reviews recent advances in multi-agent reinforcement learning algorithms for large-scale control systems and communication networks, which learn to communicate and cooperate. We provide an overview of this emerging field, with an emphasis on the decentralized setting under different coordination protocols. We highlight the evolution of reinforcement learning algorithms from single-agent to multi-agent systems, from a distributed optimization perspective, and conclude with future directions and challenges, in the hope to catalyze the growing synergy among distributed optimization, signal processing, and reinforcement learning communities.
Automated curriculum generation for Policy Gradients from Demonstrations
Srinivasan, Anirudh, Bahdanau, Dzmitry, Chevalier-Boisvert, Maxime, Bengio, Yoshua
In this paper, we present a technique that improves the process of training an agent (using RL) for instruction following. We develop a training curriculum that uses a nominal number of expert demonstrations and trains the agent in a manner that draws parallels from one of the ways in which humans learn to perform complex tasks, i.e by starting from the goal and working backwards. We test our method on the BabyAI platform and show an improvement in sample efficiency for some of its tasks compared to a PPO (proximal policy optimization) baseline.
Enhancing Statement Evaluation in Argumentation via Multi-labelling Systems
Baroni, Pietro (University of Brescia) | Riveret, Regis (Data61, CSIRO, Brisbane, Australia)
In computational models of argumentation, the justification of statements has drawn less attention than the construction and justification of arguments. As a consequence, significant losses of sensitivity and expressiveness in the treatment of statement statuses can be incurred by otherwise appealing formalisms. In order to reappraise statement statuses and, more generally, to support a uniform modelling of different phases of the argumentation process we introduce multi-labelling systems, a generic formalism devoted to represent reasoning processes consisting of a sequence of labelling stages. In this context, two families of multi-labelling systems, called argument-focused and statement-focused approach, are identified and compared. Then they are shown to be able to encompass several prominent literature proposals as special cases, thereby enabling a systematic comparison evidencing their merits and limits. Further, we show that the proposed model supports tunability of statement justification by specifying a few alternative statement justification labellings, and we illustrate how they can be seamlessly integrated into different formalisms.
MANELA: A Multi-Agent Algorithm for Learning Network Embeddings
--Playing an essential role in data mining, machine learning has a long history of being applied to networks on multifarious tasks and has played an essential role in data mining. However, the discrete and sparse natures of networks often render it difficult to apply machine learning directly to networks. T o circumvent this difficulty, one major school of thought to approach networks using machine learning is via network embeddings . On the one hand, this network embeddings have achieved huge success on aggregated network data in recent years. On the other hand, learning network embeddings on distributively stored networks still remained understudied: T o the best of our knowledge, all existing algorithms for learning network embeddings have hitherto been exclusively centralized and thus cannot be applied to these networks. T o accommodate distributively stored networks, in this paper, we proposed a multi-agent model. Under this model, we developed the multi-agent network embedding learning algorithm (MANELA) for learning network embeddings. We demonstrate MANELA's advantages over other existing centralized network embedding learning algorithms both theoretically and experimentally. I NTRODUCTION Playing an essential role in data mining, machine learning has a long history of being applied to networks on multifarious tasks, such as network classification [1], prediction of protein binding [2], etc. Thanks to the advancement of technologies such as the Internet and database management systems, the amount of data that are available for machine learning algorithms have been growing tremendously over the past decade. Among these datasets, a huge fraction can be modeled as networks, such as web networks, brain networks, citation networks, street networks, etc. [3]. Therefore, improving machine learning algorithms on networks has become even more important. However, the discrete and sparse natures of networks often render it difficult to apply machine learning directly to networks. To circumvent this difficulty, one major school of thought to approach networks using machine learning is via network embeddings [4].
Idle Time Optimization for Target Assignment and Path Finding in Sortation Centers
Kou, Ngai Meng, Peng, Cheng, Ma, Hang, Kumar, T. K. Satish, Koenig, Sven
In this paper, we study the one-shot and lifelong versions of the Target Assignment and Path Finding problem in automated sortation centers, where each agent needs to constantly assign itself a sorting station, move to its assigned station without colliding with obstacles or other agents, wait in the queue of that station to obtain a parcel for delivery, and then deliver the parcel to a sorting bin. The throughput of such centers is largely determined by the total idle time of all stations since their queues can frequently become empty. To address this problem, we first formalize and study the one-shot version that assigns stations to a set of agents and finds collision-free paths for the agents to their assigned stations. We present efficient algorithms for this task based on a novel min-cost max-flow formulation that minimizes the total idle time of all stations in a fixed time window. We then demonstrate how our algorithms for solving the one-shot problem can be applied to solving the lifelong problem as well. Experimentally, we believe to be the first researchers to consider real-world automated sortation centers using an industrial simulator with realistic data and a kinodynamic model of real robots.
Why AI will be inhuman
According to F-Secure Vice President of Artificial Intelligence Matti Aksela, there's a common misconception that'advanced' AI should mimic human intelligence โ an assumption Project Blackfin aims to challenge. "People's expectations that'advanced' machine intelligence simply mimics human intelligence is limiting our understanding of what AI can and should do. Instead of building AI to function as though it were human, we can and should be exploring ways to unlock the unique potential of machine intelligence, and how that can augment what people do," said Aksela, Head of F-Secure's Artificial Intelligence Center of Excellence. "We created Project Blackfin to help us reach that next level of understanding about what AI can achieve." Project Blackfin is a research initiative conceptualised by Aksela's cross-disciplinary team of artificial intelligence and cyber security researchers, mathematicians, data scientists, machine learning experts, and engineers.
Game Theory in Artificial Intelligence
Game Theory is a branch of mathematics used to model the strategic interaction between different players in a context with predefined rules and outcomes. Game Theory can also be used to describe many situations in our daily life and Machine Learning models (Figure 1). For example, a Classification algorithm such as SVM (Support Vector Machines) can be explained in terms of a two-player game in which one player is challenging the other to find the best hyper-plane giving him the most difficult points to classify. The game will then converge to a solution which will be a trade-off between the strategic abilities of the two players (eg. Different aspects of Game Theory are commonly used in Artificial Intelligence, I will now introduce you to the Nash Equilibrium, Inverse Game Theory and give you some practical examples.