Country
Optimal Multiple Stopping Rule for Warm-Starting Sequential Selection
Fekom, Mathilde, Vayatis, Nicolas, Kalogeratos, Argyris
In this paper we present the Warm-starting Dynamic Thresholding algorithm, developed using dynamic programming, for a variant of the standard online selection problem. The problem allows job positions to be either free or already occupied at the beginning of the process. Throughout the selection process, the decision maker interviews one after the other the new candidates and reveals a quality score for each of them. Based on that information, she can (re)assign each job at most once by taking immediate and irrevocable decisions. We relax the hard requirement of the class of dynamic programming algorithms to perfectly know the distribution from which the scores of candidates are drawn, by presenting extensions for the partial and no-information cases, in which the decision maker can learn the underlying score distribution sequentially while interviewing candidates.
AI safety: state of the field through quantitative lens
Juric, Mislav, Sandic, Agneza, Brcic, Mario
Last decade has seen major improvements in the performance of artificial intelligence which has driven wide-spread applications. Unforeseen effects of such mass-adoption has put the notion of AI safety into the public eye. AI safety is a relatively new field of research focused on techniques for building AI beneficial for humans. While there exist survey papers for the field of AI safety, there is a lack of a quantitative look at the research being conducted. The quantitative aspect gives a data-driven insight about the emerging trends, knowledge gaps and potential areas for future research. In this paper, bibliometric analysis of the literature finds significant increase in research activity since 2015. Also, the field is so new that most of the technical issues are open, including: explainability with its long-term utility, and value alignment which we have identified as the most important long-term research topic. Equally, there is a severe lack of research into concrete policies regarding AI. As we expect AI to be the one of the main driving forces of changes in society, AI safety is the field under which we need to decide the direction of humanity's future.
A Bounded Measure for Estimating the Benefit of Visualization
Chen, Min, Sbert, Mateu, Abdul-Rahman, Alfie, Silver, Deborah
Information theory can be used to analyze the cost-benefit of visualization processes. However, the current measure of benefit contains an unbounded term that is neither easy to estimate nor intuitive to interpret. In this work, we propose to revise the existing cost-benefit measure by replacing the unbounded term with a bounded one. We examine a number of bounded measures that include the Jenson-Shannon divergence and a new divergence measure formulated as part of this work. We use visual analysis to support the multi-criteria comparison, enabling the selection of the most logical and intuitive option. We applied the revised cost-benefit measure to two case studies, demonstrating its uses in practical scenarios, while the collected real world data further informs the selection of a bounded measure.
Fully Differentiable Procedural Content Generation through Generative Playing Networks
Bontrager, Philip, Togelius, Julian
To procedurally create interactive content such as environments or game levels, we need agents that can evaluate the content; but to train such agents, we need content they can train on. Generative Playing Networks is a framework that learns agent policies and generates environments in tandem through a symbiotic process. Policies are learned using an actor-critic reinforcement learning algorithm so as to master the environment, and environments are created by a generator network which tries to provide an appropriate level of challenge for the agent. This is accomplished by the generator learning to make content based on estimates by the critic. Thus, this process provides an implicit curriculum for the agent, creating more complex environments over time. Unlike previous approaches to procedural content generation, Generative Playing Networks is end-to-end differentiable and does not require human-designed examples or domain knowledge. We demonstrate the capability of this framework by training an agent and level generator for a 2D dungeon crawler game.
Connectivity-driven Communication in Multi-agent Reinforcement Learning through Diffusion Processes on Graphs
Pesce, Emanuele, Montana, Giovanni
We discuss the problem of learning collaborative behaviour in multi-agent systems using deep reinforcement learning (DRL). A connectivity-driven communication (CDC) algorithm is proposed to address three key aspects: what agents to involve in the communication, what information content to share, and how often to share it. We introduce the notion of a connectivity network, modelled as a weighted graph, where nodes represent agents and edges represent the degree of connectivity between pairs of agents. The optimal graph topology is learned end-to-end concurrently with the stochastic policy so as to maximise future expected returns. The communication patterns depend on the graph's topology through a diffusion process on the graph, the heat kernel, which is found by exponentiating the Laplacian eigensystem through time and is fully differentiable. Empirical results show that CDC is capable of superior performance over alternative algorithms for a range of cooperative navigation tasks.
Data Efficient Training for Reinforcement Learning with Adaptive Behavior Policy Sharing
Liu, Ge, Wu, Rui, Cheng, Heng-Tze, Wang, Jing, Ooi, Jayden, Li, Lihong, Li, Ang, Li, Wai Lok Sibon, Boutilier, Craig, Chi, Ed
Deep Reinforcement Learning (RL) is proven powerful for decision making in simulated environments. However, training deep RL model is challenging in real world applications such as production-scale health-care or recommender systems because of the expensiveness of interaction and limitation of budget at deployment. One aspect of the data inefficiency comes from the expensive hyper-parameter tuning when optimizing deep neural networks. We propose Adaptive Behavior Policy Sharing (ABPS), a data-efficient training algorithm that allows sharing of experience collected by behavior policy that is adaptively selected from a pool of agents trained with an ensemble of hyper-parameters. We further extend ABPS to evolve hyper-parameters during training by hybridizing ABPS with an adapted version of Population Based Training (ABPS-PBT). We conduct experiments with multiple Atari games with up to 16 hyper-parameter/architecture setups. ABPS achieves superior overall performance, reduced variance on top 25% agents, and equivalent performance on the best agent compared to conventional hyper-parameter tuning with independent training, even though ABPS only requires the same number of environmental interactions as training a single agent. We also show that ABPS-PBT further improves the convergence speed and reduces the variance.
Explainable Deep Modeling of Tabular Data using TableGraphNet
Terejanu, Gabriel, Chowdhury, Jawad, Rashid, Rezaur, Chowdhury, Asif
The vast majority of research on explainability focuses on post-explainability rather than explainable modeling. Namely, an explanation model is derived to explain a complex black box model built with the sole purpose of achieving the highest performance possible. In part, this trend might be driven by the misconception that there is a trade-off between explainability and accuracy. Furthermore, the consequential work on Shapely values, grounded in game theory, has also contributed to a new wave of post-explainability research on better approximations for various machine learning models, including deep learning models. We propose a new architecture that inherently produces explainable predictions in the form of additive feature attributions. Our approach learns a graph representation for each record in the dataset. Attribute centric features are then derived from the graph and fed into a contribution deep set model to produce the final predictions. We show that our explainable model attains the same level of performance as black box models. Finally, we provide an augmented model training approach that leverages the missingness property and yields high levels of consistency (as required for the Shapely values) without loss of accuracy.
Signaling in Bayesian Network Congestion Games: the Subtle Power of Symmetry
Castiglioni, Matteo, Celli, Andrea, Marchesi, Alberto, Gatti, Nicola
Network congestion games are a well-understood model of multi-agent strategic interactions. Despite their ubiquitous applications, it is not clear whether it is possible to design information structures to ameliorate the overall experience of the network users. We focus on Bayesian games with atomic players, where network vagaries are modeled via a (random) state of nature which determines the costs incurred by the players. A third-party entity---the sender---can observe the realized state of the network and exploit this additional information to send a signal to each player. A natural question is the following: is it possible for an informed sender to reduce the overall social cost via the strategic provision of information to players who update their beliefs rationally? The paper focuses on the problem of computing optimal ex ante persuasive signaling schemes, showing that symmetry is a crucial property for its solution. Indeed, we show that an optimal ex ante persuasive signaling scheme can be computed in polynomial time when players are symmetric and have affine cost functions. Moreover, the problem becomes NP-hard when players are asymmetric, even in non-Bayesian settings.
Intrinsic Motivation for Encouraging Synergistic Behavior
Chitnis, Rohan, Tulsiani, Shubham, Gupta, Saurabh, Gupta, Abhinav
We study the role of intrinsic motivation as an exploration bias for reinforcement learning in sparse-reward synergistic tasks, which are tasks where multiple agents must work together to achieve a goal they could not individually. Our key idea is that a good guiding principle for intrinsic motivation in synergistic tasks is to take actions which affect the world in ways that would not be achieved if the agents were acting on their own. Thus, we propose to incentivize agents to take (joint) actions whose effects cannot be predicted via a composition of the predicted effect for each individual agent. We study two instantiations of this idea, one based on the true states encountered, and another based on a dynamics model trained concurrently with the policy. While the former is simpler, the latter has the benefit of being analytically differentiable with respect to the action taken. We validate our approach in robotic bimanual manipulation and multi-agent locomotion tasks with sparse rewards; we find that our approach yields more efficient learning than both 1) training with only the sparse reward and 2) using the typical surprise-based formulation of intrinsic motivation, which does not bias toward synergistic behavior.
Public Bayesian Persuasion: Being Almost Optimal and Almost Persuasive
Castiglioni, Matteo, Celli, Andrea, Gatti, Nicola
Persuasion studies how an informed principal may influence the behavior of agents by the strategic provision of payoff-relevant information. We focus on the fundamental multi-receiver model by Arieli and Babichenko (2019), in which there are no inter-agent externalities. Unlike prior works on this problem, we study the public persuasion problem in the general setting with: (i) arbitrary state spaces; (ii) arbitrary action spaces; (iii) arbitrary sender's utility functions. We fully characterize the computational complexity of computing a bi-criteria approximation of an optimal public signaling scheme. In particular, we show, in a voting setting of independent interest, that solving this problem requires at least a quasi-polynomial number of steps even in settings with a binary action space, assuming the Exponential Time Hypothesis. In doing so, we prove that a relaxed version of the MAXIMUM FEASIBLE SUBSYSTEM OF LINEAR INEQUALITIES problem requires at least quasi-polynomial time to be solved. Finally, we close the gap by providing a quasi-polynomial time bi-criteria approximation algorithm for arbitrary public persuasion problems that, in specific settings, yields a QPTAS.