Agents
Privileged Information Dropout in Reinforcement Learning
Kamienny, Pierre-Alexandre, Arulkumaran, Kai, Behbahani, Feryal, Boehmer, Wendelin, Whiteson, Shimon
Using privileged information during training can improve the sample efficiency and performance of machine learning systems. This paradigm has been applied to reinforcement learning (RL), primarily in the form of distillation or auxiliary tasks, and less commonly in the form of augmenting the inputs of agents. In this work, we investigate Privileged Information Dropout (PI-Dropout) for achieving the latter which can be applied equally to value-based and policy-based RL algorithms. Within a simple partially-observed environment, we demonstrate that PI-Dropout outperforms alternatives for leveraging privileged information, including distillation and auxiliary tasks, and can successfully utilise different types of privileged information. Finally, we analyse its effect on the learned representations.
Adapting a Kidney Exchange Algorithm to Align with Human Values
Freedman, Rachel, Borg, Jana Schaich, Sinnott-Armstrong, Walter, Dickerson, John P., Conitzer, Vincent
As AI is deployed increasingly broadly, AI researchers are confronted with the moral implications of their work. The pursuit of simple objectives, such as minimizing error rates, maximizing resource efficiency, or decreasing response times, often results in systems that have unintended consequences when they confront the real world, such as discriminating against certain groups of people [34]. It would be helpful for AI researchers and practitioners to have a general set of principles with which to approach these problems [45, 41, 24, 16, 33]. One may ask why any moral decisions should be left to computers at all. There are multiple possible reasons. One is that the decision needs to be made so quickly that calling in a human for the decision is not feasible, as would be the case for a self-driving car having to make a split-second decision about whom to hit [13]. Another reason could be that each individual decision by itself is too insignificant to bother a human, even though all the decisions combined may be highly significant morally--for example, if we were to consider the moral impact of each advertisement shown online. A third reason is that the moral decision is hard to decouple from a computational problem that apparently exceeds human capabilities. This is the case in many machine learning applications (e.g., should this person be released on bail?
Experience Augmentation: Boosting and Accelerating Off-Policy Multi-Agent Reinforcement Learning
Ye, Zhenhui, Chen, Yining, Song, Guanghua, Yang, Bowei, Fan, Shen
Exploration of the high-dimensional state action space is one of the biggest challenges in Reinforcement Learning (RL), especially in multi-agent domain. We present a novel technique called Experience Augmentation, which enables a time-efficient and boosted learning based on a fast, fair and thorough exploration to the environment. It can be combined with arbitrary off-policy MARL algorithms and is applicable to either homogeneous or heterogeneous environments. We demonstrate our approach by combining it with MADDPG and verifing the performance in two homogeneous and one heterogeneous environments. In the best performing scenario, the MADDPG with experience augmentation reaches to the convergence reward of vanilla MADDPG with 1/4 realistic time, and its convergence beats the original model by a significant margin. Our ablation studies show that experience augmentation is a crucial ingredient which accelerates the training process and boosts the convergence.
TAIP: an anytime algorithm for allocating student teams to internship programs
Georgara, Athina, Sierra, Carles, Rodríguez-Aguilar, Juan A.
In scenarios that require teamwork, we usually have at hand a variety of specific tasks, for which we need to form a team in order to carry out each one. Here we target the problem of matching teams with tasks within the context of education, and specifically in the context of forming teams of students and allocating them to internship programs. First we provide a formalization of the Team Allocation for Internship Programs Problem, and show the computational hardness of solving it optimally. Thereafter, we propose TAIP, a heuristic algorithm that generates an initial team allocation which later on attempts to improve in an iterative process. Moreover, we conduct a systematic evaluation to show that TAIP reaches optimality, and outperforms CPLEX in terms of time.
AdaSwarm: A Novel PSO optimization Method for the Mathematical Equivalence of Error Gradients
Mohapatra, Rohan, Saha, Snehanshu, Dhavala, Soma S.
This paper tackles the age-old question of derivative free optimization in neural networks. This paper introduces AdaSwarm, a novel derivative-free optimizer to have similar or better performance to Adam but without "gradients". To support the AdaSwarm, a novel Particle Swarm Optimization Exponentially weighted Momentum PSO (EM-PSO), a derivative-free optimizer, is also proposed which tackles constrained and unconstrained single objective optimization problems and looks at applying the proposed momentum particle swarm optimization on benchmark test functions, engineering optimization problems and habitability scores for exoplanets which show speed and convergence of the technique. The EM-PSO is extended by approximating the gradient of a function at any point using the parameters of the particle swarm optimization. This is a novel technique to simulate gradient descent, an extremely popular method in the back-propagation algorithm, using the approximated gradients from the particle swarm optimization parameters. Mathematical proofs of gradient approximation by EM-PSO, thereby bypassing the gradient computation, are presented. The AdaSwarm is compared with various optimizers and the theory and algorithmic performance are supported by promising results.
The AI Economist: Improving Equality and Productivity with AI-Driven Tax Policies
Our work fits within a larger context of recent advances in RL. RL has been used to train AIs to win competitive games, such as Go, Dota, and Starcraft. In those settings, the RL objective is inherently adversarial ("beat-the-other-team"). Machine learning has also been used for the design of auction rules. In this work, we instead focus on the opportunity to use AI to promote social welfare through the design of optimal tax policies in dynamic economies. Many studies have shown that high income inequality can negatively impact economic growth and economic opportunity.
Improving Multi-Agent System Coordination Via Intensity Variation
Mathias, David H. (University of Wisconsin - La Crosse ) | Wu, Annie S. ( University of Central Florida ) | Ruetten, Laik (University of Wisconsin - La Crosse) | Coursin, Eric (University of Wisconsin - La Crosse)
In this work, we explore the impact of inter-agent variation in intensity of effort on the ability of a swarm of artificial agents to achieve a goal. Variation in intensity models biological phenomena such as individual differences in size and strength and increased adeptness for a task due to experience. Focusing on experience, we implement inter-agent variation in intensity, with dynamic values that increase and decrease with an agent's activation or non-activation for a task. Examining intensity variation alone and in combination with activation threshold variation, we find that the desynchronizing effects of variation in thresholds in concert with the increase in agent efficiency due to experience with a task, dramatically improves the swarm's goal achievement.
RALE-ACL — A Language for Information Exchange between Case-Based Agents as Alternative to the FIPA-ACL-Based Communication
Eisenstadt, Viktor (University of Hildesheim ) | Althoff, Klaus-Dieter (University of Hildesheim and The German Research Center for Artificial Intelligence )
In this paper, we present RALE-ACL, a communication language for case-based agents in multi-agent systems (MAS) that utilize case-based reasoning (CBR) as the main means of decision making for their agents. RALE-ACL is an accompanying approach of RALE-CBR, a methodology for construction of CBR-based approaches and systems that adds more flexibility to the classic 4R cycle of case-based reasoning. The main goal of RALE-ACL is to establish a much more CBR-compatible alternative to the KQML and FIPA-ACL-based languages, that are currently used in many multi-agent systems, but are too generic and therefore only cumbersomely usable for the specific structure and purposes of case-based agents. This paper is the final part in the trilogy about the RALE methodology.
Middleware Unifying Framework for Independent Nodes System (MUFFINS)
Okolica, James S. (Air Force Institute of Technology ) | Peterson, Gilbert L. (Air Force Institute of Technology) | Mendenhall, Michael J. (Air Force Research Laboratory)
Multi-agent systems are used in domains where individual component autonomy and cooperation are necessary. The overall system performance requires that the diverse agents maintain quality interactions to facilitate cooperation. A complication to inter-agent interaction occurs when the agents learn (change their own functionality), when new agents are introduced, or existing agents are functionally modified. This research focuses on creating a general use multi-agent system, Middleware Unifying Framework for Independent Nodes System (MUFFINS), and implementing a mechanism, the Megagent, that addresses the interaction challenges. The Megagent provides the ability for agents to assess their performance per data source and to improve it with transformations based on feedback. Evaluation of the concept is tested on data mangled from the Digits dataset to represent learning and new agents and in all cases improves accuracy over a static agent.
Enhancing Lattice-based Motion Planning with Introspective Learning and Reasoning
Tiger, Mattias, Bergström, David, Norrstig, Andreas, Heintz, Fredrik
Lattice-based motion planning is a hybrid planning method where a plan made up of discrete actions simultaneously is a physically feasible trajectory. The planning takes both discrete and continuous aspects into account, for example action pre-conditions and collision-free action-duration in the configuration space. Safe motion planing rely on well-calibrated safety-margins for collision checking. The trajectory tracking controller must further be able to reliably execute the motions within this safety margin for the execution to be safe. In this work we are concerned with introspective learning and reasoning about controller performance over time. Normal controller execution of the different actions is learned using reliable and uncertainty-aware machine learning techniques. By correcting for execution bias we manage to substantially reduce the safety margin of motion actions. Reasoning takes place to both verify that the learned models stays safe and to improve collision checking effectiveness in the motion planner by the use of more accurate execution predictions with a smaller safety margin. The presented approach allows for explicit awareness of controller performance under normal circumstances, and timely detection of incorrect performance in abnormal circumstances. Evaluation is made on the nonlinear dynamics of a quadcopter in 3D using simulation. Video: https://youtu.be/STmZduvSUMM