Agents
Why Did the Robot Cross the Road? A User Study of Explanation in Human-Robot Interaction
This work documents a pilot user study evaluating the effectiveness of contrastive, causal and example explanations in supporting human understanding of AI in a hypothetical commonplace human robot interaction HRI scenario. In doing so, this work situates explainable AI XAI in the context of the social sciences and suggests that HRI explanations are improved when informed by the social sciences.
TLeague: A Framework for Competitive Self-Play based Distributed Multi-Agent Reinforcement Learning
Sun, Peng, Xiong, Jiechao, Han, Lei, Sun, Xinghai, Li, Shuxing, Xu, Jiawei, Fang, Meng, Zhang, Zhengyou
Competitive Self-Play (CSP) based Multi-Agent Reinforcement Learning (MARL) has shown phenomenal breakthroughs recently. Strong AIs are achieved for several benchmarks, including Dota 2, Glory of Kings, Quake III, StarCraft II, to name a few. Despite the success, the MARL training is extremely data thirsty, requiring typically billions of (if not trillions of) frames be seen from the environment during training in order for learning a high performance agent. This poses non-trivial difficulties for researchers or engineers and prevents the application of MARL to a broader range of real-world problems. To address this issue, in this manuscript we describe a framework, referred to as TLeague, that aims at large-scale training and implements several main-stream CSP-MARL algorithms. The training can be deployed in either a single machine or a cluster of hybrid machines (CPUs and GPUs), where the standard Kubernetes is supported in a cloud native manner. TLeague achieves a high throughput and a reasonable scale-up when performing distributed training. Thanks to the modular design, it is also easy to extend for solving other multi-agent problems or implementing and verifying MARL algorithms. We present experiments over StarCraft II, ViZDoom and Pommerman to show the efficiency and effectiveness of TLeague. The code is open-sourced and available at https://github.com/tencent-ailab/tleague_projpage
401k Meets Artificial Intelligence Via New Strategic Investment
In an effort to be the first organization to bring true artificial intelligence (AI) to the 401k market, The 401(k) Plan Company recently announced a minority investment in Unanimous AI to help elevate human decision-making in the workplace for HR partners, CFOs and plan participants. San Francisco-based Unanimous AI builds technologies that amplify human intelligence using technologies modeled on the biological principle of Swarm Intelligence. Unanimous AI in late October announced it has been awarded three new U.S. Patents covering its unique AI technology aimed at amplifying the intelligence of human groups. Swarm AI technology from Unanimous is a combination of real-time human input and AI algorithms, which the company says enables networked groups of people to think together as super-intelligent systems. Under terms of the deal, The 401(k) Plan Company will make Unanimous AI's capabilities available to employers seeking to evolve through the remote workforce considerations, empowering teams to make significantly better decisions. "AI has the power to replace humans, or to amplify their best work.
Human-Agent Cooperation in Bridge Bidding
Lockhart, Edward, Burch, Neil, Bard, Nolan, Borgeaud, Sebastian, Eccles, Tom, Smaira, Lucas, Smith, Ray
We introduce a human-compatible reinforcement-learning approach to a cooperative game, making use of a third-party hand-coded human-compatible bot to generate initial training data and to perform initial evaluation. Our learning approach consists of imitation learning, search, and policy iteration. Our trained agents achieve a new state-of-the-art for bridge bidding in three settings: an agent playing in partnership with a copy of itself; an agent partnering a pre-existing bot; and an agent partnering a human player.
Accelerating Distributed SGD for Linear Regression using Iterative Pre-Conditioning
Chakrabarti, Kushal, Gupta, Nirupam, Chopra, Nikhil
This paper considers the multi-agent distributed linear least-squares problem. The system comprises multiple agents, each agent with a locally observed set of data points, and a common server with whom the agents can interact. The agents' goal is to compute a linear model that best fits the collective data points observed by all the agents. In the server-based distributed settings, the server cannot access the data points held by the agents. The recently proposed Iteratively Pre-conditioned Gradient-descent (IPG) method has been shown to converge faster than other existing distributed algorithms that solve this problem. In the IPG algorithm, the server and the agents perform numerous iterative computations. Each of these iterations relies on the entire batch of data points observed by the agents for updating the current estimate of the solution. Here, we extend the idea of iterative pre-conditioning to the stochastic settings, where the server updates the estimate and the iterative pre-conditioning matrix based on a single randomly selected data point at every iteration. We show that our proposed Iteratively Pre-conditioned Stochastic Gradient-descent (IPSG) method converges linearly in expectation to a proximity of the solution. Importantly, we empirically show that the proposed IPSG method's convergence rate compares favorably to prominent stochastic algorithms for solving the linear least-squares problem in server-based networks.
Investigating Human Response, Behaviour, and Preference in Joint-Task Interaction
Lindsay, Alan, Craenen, Bart, Dalzel-Job, Sara, Hill, Robin L., Petrick, Ronald P. A.
Human interaction relies on a wide range of signals, including non-verbal cues. In order to develop effective Explainable Planning (XAIP) agents it is important that we understand the range and utility of these communication channels. Our starting point is existing results from joint task interaction and their study in cognitive science. Our intention is that these lessons can inform the design of interaction agents -- including those using planning techniques -- whose behaviour is conditioned on the user's response, including affective measures of the user (i.e., explicitly incorporating the user's affective state within the planning model). We have identified several concepts at the intersection of plan-based agent behaviour and joint task interaction and have used these to design two agents: one reactive and the other partially predictive. We have designed an experiment in order to examine human behaviour and response as they interact with these agents. In this paper we present the designed study and the key questions that are being investigated. We also present the results from an empirical analysis where we examined the behaviour of the two agents for simulated users.
Improving Welfare in One-sided Matching using Simple Threshold Queries
Ma, Thomas, Menon, Vijay, Larson, Kate
We study one-sided matching problems where $n$ agents have preferences over $m$ objects and each of them need to be assigned to at most one object. Most work on such problems assume that the agents only have ordinal preferences and usually the goal in them is to compute a matching that satisfies some notion of economic efficiency. However, in reality, agents may have some preference intensities or cardinal utilities that, e.g., indicate that they like an an object much more than another object, and not taking these into account can result in a loss in welfare. While one way to potentially account for these is to directly ask the agents for this information, such an elicitation process is cognitively demanding. Therefore, we focus on learning more about their cardinal preferences using simple threshold queries which ask an agent if they value an object greater than a certain value, and use this in turn to come up with algorithms that produce a matching that, for a particular economic notion $X$, satisfies $X$ and also achieves a good approximation to the optimal welfare among all matchings that satisfy $X$. We focus on several notions of economic efficiency, and look at both adaptive and non-adaptive algorithms. Overall, our results show how one can improve welfare by even non-adaptively asking the agents for just one bit of extra information per object.
A Survey on Data Pricing: from Economics to Data Science
How can we assess the value of data objectively, systematically and quantitatively? Pricing data, or information goods in general, has been studied and practiced in dispersed areas and principles, such as economics, marketing, electronic commerce, data management, data mining and machine learning. In this article, we present a unified, interdisciplinary and comprehensive overview of this important direction. We examine various motivations behind data pricing, understand the economics of data pricing and review the development and evolution of pricing models according to a series of fundamental principles. We discuss both digital products and data products. We also consider a series of challenges and directions for future work.
TStarBot-X: An Open-Sourced and Comprehensive Study for Efficient League Training in StarCraft II Full Game
Han, Lei, Xiong, Jiechao, Sun, Peng, Sun, Xinghai, Fang, Meng, Guo, Qingwei, Chen, Qiaobo, Shi, Tengfei, Yu, Hongsheng, Zhang, Zhengyou
StarCraft, one of the most difficult esport games with long-standing history of professional tournaments, has attracted generations of players and fans, and also, intense attentions in artificial intelligence research. Recently, Google's DeepMind announced AlphaStar, a grandmaster level AI in StarCraft II. In this paper, we introduce a new AI agent, named TStarBot-X, that is trained under limited computation resources and can play competitively with expert human players. TStarBot-X takes advantage of important techniques introduced in AlphaStar, and also benefits from substantial innovations including new league training methods, novel multi-agent roles, rule-guided policy search, lightweight neural network architecture, and importance sampling in imitation learning, etc. We show that with limited computation resources, a faithful reimplementation of AlphaStar can not succeed and the proposed techniques are necessary to ensure TStarBot-X's competitive performance. We reveal all technical details that are complementary to those mentioned in AlphaStar, showing the most sensitive parts in league training, reinforcement learning and imitation learning that affect the performance of the agents. Most importantly, this is an open-sourced study that all codes and resources (including the trained model parameters) are publicly accessible via https://github.com/tencent-ailab/tleague_projpage We expect this study could be beneficial for both academic and industrial future research in solving complex problems like StarCraft, and also, might provide a sparring partner for all StarCraft II players and other AI agents.
A methodology for co-constructing an interdisciplinary model: from model to survey, from survey to model
Beck, Elise, Dugdale, Julie, Adam, Carole, Gaïdatzis, Christelle, Bañgate, Julius
How should computer science and social science collaborate to build a common model? How should they proceed to gather data that is really useful to the modelling? How can they design a survey that is tailored to the target model? This paper aims to answer those crucial questions in the framework of a multidisciplinary research project. This research addresses the issue of co-constructing a model when several disciplines are involved, and is applied to modelling human behaviour immediately after an earthquake. The main contribution of the work is to propose a tool dedicated to multidisciplinary dialogue. It also proposes a reflexive analysis of the enriching intellectual process carried out by the different disciplines involved. Finally, from working with an anthropologist, a complementary view of the multidisciplinary process is given.