Agents
Planning with Explanatory Actions: A Joint Approach to Plan Explicability and Explanations in Human-Aware Planning
Sreedharan, Sarath, Chakraborti, Tathagata, Muise, Christian, Kambhampati, Subbarao
In this work, we formulate the process of generating explanations as model reconciliation for planning problems as one of planning with explanatory actions. We show that these problems could be better understood within the framework of epistemic planning and that, in fact, most earlier works on explanation as model reconciliation correspond to tractable subsets of epistemic planning problems. We empirically show how our approach is computationally more efficient than existing techniques for explanation generation and also discuss how this particular approach could be extended to capture most of the existing variants of explanation as model reconciliation. We end the paper with a discussion of how this formulation could be extended to generate novel explanatory behaviors.
Annealing for Distributed Global Optimization
Swenson, Brian, Kar, Soummya, Poor, H. Vincent, Moura, Jose' M. F.
The paper proves convergence to global optima for a class of distributed algorithms for nonconvex optimization in network-based multi-agent settings. Agents are permitted to communicate over a time-varying undirected graph. Each agent is assumed to possess a local objective function (assumed to be smooth, but possibly nonconvex). The paper considers algorithms for optimizing the sum function. A distributed algorithm of the consensus+innovations type is proposed which relies on first-order information at the agent level. Under appropriate conditions on network connectivity and the cost objective, convergence to the set of global optima is achieved by an annealing-type approach, with decaying Gaussian noise independently added into each agent's update step. It is shown that the proposed algorithm converges in probability to the set of global minima of the sum function.
Beyond Turing: Intelligent Agents Centered on the User
Eskenazi, Maxine, Mehri, Shikib, Razumovskaia, Evgeniia, Zhao, Tiancheng
Most research on intelligent agents centers on the agent and not on the user. We look at the origins of agent-centric research for slot-filling, gaming and chatbot agents. We then argue that it is important to concentrate more on the user. After reviewing relevant literature, some approaches for creating and assessing user-centric systems are proposed.
PolyAI scores $12M Series A to put its 'conversational AI agents' in contact centres
PolyAI, a London startup founded by experts in the field of "conversational AI" -- including CEO Nikola Mrkšić, who was previously the first engineer at Apple-acquired VocalIQ -- has raised $12 million in Series A funding to deploy its tech in customer support contact centres. The round was led by Point72 Ventures, with participation from Sands Capital Ventures, Amadeus Capital Partners, Passion Capital and Entrepreneur First (EF). PolyAI's founders are graduates of EF, although they didn't meet during the company building program but already knew each other from their time at Cambridge's Dialog Systems Group, part of the Machine Intelligence Lab at the University of Cambridge. "We started PolyAI in 2017, straight after submitting our PhD theses," Mrkšić tells me. "At Cambridge, we developed state-of-the-art conversational technology, and starting a company was the best way to get this tech used in the real world. We brought many of our Cambridge colleagues with us and started building the commercial version of our conversational platform."
DSPG: Decentralized Simultaneous Perturbations Gradient Descent Scheme
In this paper, we present an asynchronous approximate gradient method that is easy to implement called DSPG (Decentralized Simultaneous Perturbation Stochastic Approximations, with Constant Sensitivity Parameters). It is obtained by modifying SPSA (Simultaneous Perturbation Stochastic Approximations) to allow for decentralized optimization in multi-agent learning and distributed control scenarios. SPSA is a popular approximate gradient method developed by Spall, that is used in Robotics and Learning. In the multi-agent learning setup considered herein, the agents are assumed to be asynchronous (agents abide by their local clocks) and communicate via a wireless medium, that is prone to losses and delays. We analyze the gradient estimation bias that arises from setting the sensitivity parameters to a single value, and the bias that arises from communication losses and delays. Specifically, we show that these biases can be countered through better and frequent communication and/or by choosing a small fixed value for the sensitivity parameters. We also discuss the variance of the gradient estimator and its effect on the rate of convergence. Finally, we present numerical results supporting DSPG and the aforementioned theories and discussions.
A Multi-Agent Off-Policy Actor-Critic Algorithm for Distributed Reinforcement Learning
Suttle, Wesley, Yang, Zhuoran, Zhang, Kaiqing, Wang, Zhaoran, Basar, Tamer, Liu, Ji
In this work we develop a new off-policy actor-critic algorithm that performs policy improvement with convergence guarantees in the multi-agent setting using function approximation. To achieve this, we extend the method of emphatic temporal differences (ETD(λ)) to the multi-agent setting with provable convergence under linear function approximation, and we also derive a novel off-policy policy gradient theorem for the multi-agent setting. Using these new results, we develop our two-timescale algorithm, which uses ETD(λ) to perform policy evaluation for the critic step at a faster timescale and policy gradient ascent using emphatic weightings for the actor step at a slower timescale. We also provide convergence guarantees for the actor step. Our work builds on recent advances in three main areas: multi-agent on-policy actor-critic methods, emphatic temporal difference learning for off-policy policy evaluation, and the use of emphatic weightings in off-policy policy gradient methods.
Adaptive Genomic Evolution of Neural Network Topologies (AGENT) for State-to-Action Mapping in Autonomous Agents
Behjat, Amir, Chidambaran, Sharat, Chowdhury, Souma
Neuroevolution is a process of training neural networks (NN) through an evolutionary algorithm, usually to serve as a state-to-action mapping model in control or reinforcement learning-type problems. This paper builds on the Neuro Evolution of Augmented Topologies (NEAT) formalism that allows designing topology and weight evolving NNs. Fundamental advancements are made to the neuroevolution process to address premature stagnation and convergence issues, central among which is the incorporation of automated mechanisms to control the population diversity and average fitness improvement within the neuroevolution process. Insights into the performance and efficiency of the new algorithm is obtained by evaluating it on three benchmark problems from the Open AI platform and an Unmanned Aerial Vehicle (UAV) collision avoidance problem.
Teaching Machines About Human Ethics
Advancement in artificial intelligence is picking up pace at a substantial level. Entering humans in to an era where decision making will be at least machine consulted, if not machine governed. Since, these intelligent machines or agents do not experience the same emotions and experiences as humans do. Their suggestions or outputs will more likely be calculated decisions, which sometimes are not appropriate from a human standpoint. At this stage it is essential that such intelligent agents are programmed so that their suggestions or outputs coincide with the human ethics and traditions.
AI2-THOR: An Interactive 3D Environment for Visual AI
Kolve, Eric, Mottaghi, Roozbeh, Han, Winson, VanderBilt, Eli, Weihs, Luca, Herrasti, Alvaro, Gordon, Daniel, Zhu, Yuke, Gupta, Abhinav, Farhadi, Ali
We introduce The House Of inteRactions (THOR), a framework for visual AI research, available at http://ai2thor.allenai.org. AI2-THOR consists of near photo-realistic 3D indoor scenes, where AI agents can navigate in the scenes and interact with objects to perform tasks. AI2-THOR enables research in many different domains including but not limited to deep reinforcement learning, imitation learning, learning by interaction, planning, visual question answering, unsupervised representation learning, object detection and segmentation, and learning models of cognition. The goal of AI2-THOR is to facilitate building visually intelligent models and push the research forward in this domain.
Policy Distillation and Value Matching in Multiagent Reinforcement Learning
Wadhwania, Samir, Kim, Dong-Ki, Omidshafiei, Shayegan, How, Jonathan P.
Multiagent reinforcement learning algorithms (MARL) have been demonstrated on complex tasks that require the coordination of a team of multiple agents to complete. Existing works have focused on sharing information between agents via centralized critics to stabilize learning or through communication to increase performance, but do not generally look at how information can be shared between agents to address the curse of dimensionality in MARL. We posit that a multiagent problem can be decomposed into a multi-task problem where each agent explores a subset of the state space instead of exploring the entire state space. This paper introduces a multiagent actor-critic algorithm and method for combining knowledge from homogeneous agents through distillation and value-matching that outperforms policy distillation alone and allows further learning in both discrete and continuous action spaces.