Agents
An Efficient, Generalized Bellman Update For Cooperative Inverse Reinforcement Learning
Malik, Dhruv, Palaniappan, Malayandi, Fisac, Jaime F., Hadfield-Menell, Dylan, Russell, Stuart, Dragan, Anca D.
Our goal is for AI systems to correctly identify and act according to their human user's objectives. Cooperative Inverse Reinforcement Learning (CIRL) formalizes this value alignment problem as a two-player game between a human and robot, in which only the human knows the parameters of the reward function: the robot needs to learn them as the interaction unfolds. Previous work showed that CIRL can be solved as a POMDP, but with an action space size exponential in the size of the reward parameter space. In this work, we exploit a specific property of CIRL---the human is a full information agent---to derive an optimality-preserving modification to the standard Bellman update; this reduces the complexity of the problem by an exponential factor and allows us to relax CIRL's assumption of human rationality. We apply this update to a variety of POMDP solvers and find that it enables us to scale CIRL to non-trivial problems, with larger reward parameter spaces, and larger action spaces for both robot and human. In solutions to these larger problems, the human exhibits pedagogic (teaching) behavior, while the robot interprets it as such and attains higher value for the human.
Multi-Agent Path Finding with Deadlines
Ma, Hang, Wagner, Glenn, Felner, Ariel, Li, Jiaoyang, Kumar, T. K. Satish, Koenig, Sven
We formalize Multi-Agent Path Finding with Deadlines (MAPF-DL). The objective is to maximize the number of agents that can reach their given goal vertices from their given start vertices within the deadline, without colliding with each other. We first show that MAPF-DL is NP-hard to solve optimally. We then present two classes of optimal algorithms, one based on a reduction of MAPF-DL to a flow problem and a subsequent compact integer linear programming formulation of the resulting reduced abstracted multi-commodity flow network and the other one based on novel combinatorial search algorithms. Our empirical results demonstrate that these MAPF-DL solvers scale well and each one dominates the other ones in different scenarios.
Lecture Notes on Fair Division
Fair division is the problem of dividing one or several goods amongst two or more agents in a way that satisfies a suitable fairness criterion. That is, fair division may be considered part of the larger research area of multiagent resource allocation (Chevaleyre et al., 2006). What is special about fair division is the explicit focus on fairness concerns. These notes give a succinct introduction to the field, focusing on formal and computational aspects that are particularly relevant to research in Computational Social Choice (Chevaleyre et al., 2007b) and Multiagent Systems (Wooldridge, 2009). We begin by briefly outlining how fair division fits into (and relates to) these two disciplines. Like voting, the archetypical instance of a social choice problem, fair division amounts to selecting an outcome from a set of possible collective agreements, given the individual preferences of a group of agents. There are however two main differences when compared to voting. The first difference is that, typically, voting theory assumes that agents (voters) have ordinal preferences (that is, they rank the available candidates and can say for any two candidates which one they like more), while in the context of fair division we usually assume that agents have cardinal preferences (that is, each agent has got a utility function mapping possible outcomes to appropriate numerical values). The second difference is that a fair division problem comes with a certain internal "structure" that is typically absent from problems in voting:
Adaptive Mechanism Design: Learning to Promote Cooperation
Baumann, Tobias, Graepel, Thore, Shawe-Taylor, John
In the future, artificial learning agents are likely to become increasingly widespread in our society. They will interact with both other learning agents and humans in a variety of complex settings including social dilemmas. We consider the problem of how an external agent can promote cooperation between artificial learners by distributing additional rewards and punishments based on observing the learners' actions. We propose a rule for automatically learning how to create right incentives by considering the players' anticipated parameter updates. Using this learning rule leads to cooperation with high social welfare in matrix games in which the agents would otherwise learn to defect with high probability. We show that the resulting cooperative outcome is stable in certain games even if the planning agent is turned off after a given number of episodes, while other games require ongoing intervention to maintain mutual cooperation. However, even in the latter case, the amount of necessary additional incentives decreases over time.
4 Ways Machine Learning Protects the Environment - UA Magazine
Monday, the 29th, marked the beginning of the EU Green Week, an event organized by the European Commission's Directorate-General for Environment to discuss environmental policies. This year, the focus is "Green jobs for a greener future." The organizers stressed how traditional specializations will be characterized by additional sets of new skills. Being able to deal with technology is certainly one of them, and many jobs in the environmental sciences are already adopting these innovative tools. People working in this sector are no longer restricted to field work and laboratory analyses.
The Impact of Humanoid Affect Expression on Human Behavior in a Game-Theoretic Setting
Roth, Aaron M., Bhatt, Umang, Amin, Tamara, Doryab, Afsaneh, Fang, Fei, Veloso, Manuela
With the rapid development of robot and other intelligent and autonomous agents, how a human could be influenced by a robot's expressed mood when making decisions becomes a crucial question in human-robot interaction. In this pilot study, we investigate (1) in what way a robot can express a certain mood to influence a human's decision making behavioral model; (2) how and to what extent the human will be influenced in a game theoretic setting. More specifically, we create an NLP model to generate sentences that adhere to a specific affective expression profile. We use these sentences for a humanoid robot as it plays a Stackelberg security game against a human. We investigate the behavioral model of the human player.
A Taxonomy and Survey of Intrusion Detection System Design Techniques, Network Threats and Datasets
Hindy, Hanan, Brosset, David, Bayne, Ethan, Seeam, Amar, Tachtatzis, Christos, Atkinson, Robert, Bellekens, Xavier
With the world moving towards being increasingly dependent on computers and automation, one of the main challenges in the current decade has been to build secure applications, systems and networks. Alongside these challenges, the number of threats is rising exponentially due to the attack surface increasing through numerous interfaces offered for each service. To alleviate the impact of these threats, researchers have proposed numerous solutions; however, current tools often fail to adapt to ever-changing architectures, associated threats and 0-days. This manuscript aims to provide researchers with a taxonomy and survey of current dataset composition and current Intrusion Detection Systems (IDS) capabilities and assets. These taxonomies and surveys aim to improve both the efficiency of IDS and the creation of datasets to build the next generation IDS as well as to reflect networks threats more accurately in future datasets. To this end, this manuscript also provides a taxonomy and survey or network threats and associated tools. The manuscript highlights that current IDS only cover 25% of our threat taxonomy, while current datasets demonstrate clear lack of real-network threats and attack representation, but rather include a large number of deprecated threats, hence limiting the accuracy of current machine learning IDS. Moreover, the taxonomies are open-sourced to allow public contributions through a Github repository.
Discovering space - Grounding spatial topology and metric regularity in a naive agent's sensorimotor experience
Alban, Alban Laflaquiere, O'Regan, J. Kevin, Gas, Bruno, Terekhov, Alexander
In line with the sensorimotor contingency theory, we investigate the problem of the perception of space from a fundamental sensorimotor perspective. Despite its pervasive nature in our perception of the world, the origin of the concept of space remains largely mysterious. For example in the context of artificial perception, this issue is usually circumvented by having engineers pre-define the spatial structure of the problem the agent has to face. We here show that the structure of space can be autonomously discovered by a naive agent in the form of sensorimotor regularities, that correspond to so called compensable sensory experiences: these are experiences that can be generated either by the agent or its environment. By detecting such compensable experiences the agent can infer the topological and metric structure of the external space in which its body is moving. We propose a theoretical description of the nature of these regularities and illustrate the approach on a simulated robotic arm equipped with an eye-like sensor, and which interacts with an object. Finally we show how these regularities can be used to build an internal representation of the sensor's external spatial configuration.
Using Social Network Information in Bayesian Truth Discovery
Yang, Jielong, Wang, Junshan, Tay, Wee Peng
We investigate the problem of truth discovery based on opinions from multiple agents who may be unreliable or biased. We consider the case where agents' reliabilities or biases are correlated if they belong to the same community, which defines a group of agents with similar opinions regarding a particular event. An agent can belong to different communities for different events, and these communities are unknown \emph{a priori}. We incorporate knowledge of the agents' social network in our truth discovery framework and develop Laplace variational inference methods to estimate agents' reliabilities, communities, and the event states. We also develop a stochastic variational inference method to scale our model to large social networks. Simulations and experiments on real data suggest that when observations are sparse, our proposed methods perform better than several other inference methods, including majority voting, the popular Bayesian Classifier Combination (BCC) method, and the Community BCC method.
Re-evaluating evaluation
Balduzzi, David, Tuyls, Karl, Perolat, Julien, Graepel, Thore
Progress in machine learning is measured by careful evaluation on problems of outstanding common interest. However, the proliferation of benchmark suites and environments, adversarial attacks, and other complications has diluted the basic evaluation model by overwhelming researchers with choices. Deliberate or accidental cherry picking is increasingly likely, and designing well-balanced evaluation suites requires increasing effort. In this paper we take a step back and propose Nash averaging. The approach builds on a detailed analysis of the algebraic structure of evaluation in two basic scenarios: agent-vs-agent and agent-vs-task. The key strength of Nash averaging is that it automatically adapts to redundancies in evaluation data, so that results are not biased by the incorporation of easy tasks or weak agents. Nash averaging thus encourages maximally inclusive evaluation -- since there is no harm (computational cost aside) from including all available tasks and agents.