Agents
Mix&Match - Agent Curricula for Reinforcement Learning
Czarnecki, Wojciech Marian, Jayakumar, Siddhant M., Jaderberg, Max, Hasenclever, Leonard, Teh, Yee Whye, Osindero, Simon, Heess, Nicolas, Pascanu, Razvan
We introduce Mix & Match (M&M) - a training framework designed to facilitate rapid and effective learning in RL agents, especially those that would be too slow or too challenging to train otherwise. The key innovation is a procedure that allows us to automatically form a curriculum over agents. Through such a curriculum we can progressively train more complex agents by, effectively, bootstrapping from solutions found by simpler agents. In contradistinction to typical curriculum learning approaches, we do not gradually modify the tasks or environments presented, but instead use a process to gradually alter how the policy is represented internally. We show the broad applicability of our method by demonstrating significant performance gains in three different experimental setups: (1) We train an agent able to control more than 700 actions in a challenging 3D first-person task; using our method to progress through an action-space curriculum we achieve both faster training and better final performance than one obtains using traditional methods.
The Expanding Approvals Rule: Improving Proportional Representation and Monotonicity
Proportional representation (PR) is often discussed in voting settings as a major desideratum. For the past century or so, it is common both in practice and in the academic literature to jump to single transferable vote (STV) as the solution for achieving PR. Some of the most prominent electoral reform movements around the globe are pushing for the adoption of STV. It has been termed a major open problem to design a voting rule that satisfies the same PR properties as STV and better monotonicity properties. In this paper, we first present a taxonomy of proportional representation axioms for general weak order preferences, some of which generalise and strengthen previously introduced concepts. We then present a rule called Expanding Approvals Rule (EAR) that satisfies properties stronger than the central PR axiom satisfied by STV, can handle indifferences in a convenient and computationally efficient manner, and also satisfies better candidate monotonicity properties. In view of this, our proposed rule seems to be a compelling solution for achieving proportional representation in voting settings.
ML-Leaks: Model and Data Independent Membership Inference Attacks and Defenses on Machine Learning Models
Salem, Ahmed, Zhang, Yang, Humbert, Mathias, Fritz, Mario, Backes, Michael
Machine learning (ML) has become a core component of many real-world applications and training data is a key factor that drives current progress. This huge success has led Internet companies to deploy machine learning as a service (MLaaS). Recently, the first membership inference attack has shown that extraction of information on the training set is possible in such MLaaS settings, which has severe security and privacy implications. However, the early demonstrations of the feasibility of such attacks have many assumptions on the adversary such as using multiple so-called shadow models, knowledge of the target model structure and having a dataset from the same distribution as the target model's training data. We relax all 3 key assumptions, thereby showing that such attacks are very broadly applicable at low cost and thereby pose a more severe risk than previously thought. We present the most comprehensive study so far on this emerging and developing threat using eight diverse datasets which show the viability of the proposed attacks across domains. In addition, we propose the first effective defense mechanisms against such broader class of membership inference attacks that maintain a high level of utility of the ML model.
Shallow decision-making analysis in General Video Game Playing
Bravi, Ivan, Liu, Jialin, Perez-Liebana, Diego, Lucas, Simon
The General Video Game AI competitions have been the testing ground for several techniques for game playing, such as evolutionary computation techniques, tree search algorithms, hyper heuristic based or knowledge based algorithms. So far the metrics used to evaluate the performance of agents have been win ratio, game score and length of games. In this paper we provide a wider set of metrics and a comparison method for evaluating and comparing agents. The metrics and the comparison method give shallow introspection into the agent's decision making process and they can be applied to any agent regardless of its algorithmic nature. In this work, the metrics and the comparison method are used to measure the impact of the terms that compose a tree policy of an MCTS based agent, comparing with several baseline agents. The results clearly show how promising such general approach is and how it can be useful to understand the behaviour of an AI agent, in particular, how the comparison with baseline agents can help understanding the shape of the agent decision landscape. The presented metrics and comparison method represent a step toward to more descriptive ways of logging and analysing agent's behaviours.
The Anatomy of a Modular System for Media Content Analysis
Flaounas, Ilias, Lansdall-Welfare, Thomas, Antonakaki, Panagiota, Cristianini, Nello
Intelligent systems for the annotation of media content are increasingly being used for the automation of parts of social science research. In this domain the problem of integrating various Artificial Intelligence (AI) algorithms into a single intelligent system arises spontaneously. As part of our ongoing effort in automating media content analysis for the social sciences, we have built a modular system by combining multiple AI modules into a flexible framework in which they can cooperate in complex tasks. Our system combines data gathering, machine translation, topic classification, extraction and annotation of entities and social networks, as well as many other tasks that have been perfected over the past years of AI research. Over the last few years, it has allowed us to realise a series of scientific studies over a vast range of applications including comparative studies between news outlets and media content in different countries, modelling of user preferences, and monitoring public mood. The framework is flexible and allows the design and implementation of modular agents, where simple modules cooperate in the annotation of a large dataset without central coordination.
Multi-Agent Reinforcement Learning via Double Averaging Primal-Dual Optimization
Wai, Hoi-To, Yang, Zhuoran, Wang, Zhaoran, Hong, Mingyi
Despite the success of single-agent reinforcement learning, multi-agent reinforcement learning (MARL) remains challenging due to complex interactions between agents. Motivated by decentralized applications such as sensor networks, swarm robotics, and power grids, we study policy evaluation in MARL, where agents with jointly observed state-action pairs and private local rewards collaborate to learn the value of a given policy. In this paper, we propose a double averaging scheme, where each agent iteratively performs averaging over both space and time to incorporate neighboring gradient information and local reward information, respectively. We prove that the proposed algorithm converges to the optimal solution at a global geometric rate. In particular, such an algorithm is built upon a primal-dual reformulation of the mean squared projected Bellman error minimization problem, which gives rise to a decentralized convex-concave saddle-point problem. To the best of our knowledge, the proposed double averaging primal-dual optimization algorithm is the first to achieve fast finite-time convergence on decentralized convex-concave saddle-point problems.
The Complexity of Splitting Necklaces and Bisecting Ham Sandwiches
Filos-Ratsikas, Aris, Goldberg, Paul W.
The complexity classes PPA and PPAD were introduced in a seminal paper of Papadimitriou [47] in 1994, in an attempt to classify several natural problems in the class TFNP [45]. TFNP is the class of total search problems in NP for which a solution exists for every instance and the solution can be efficiently verified. Various important problems were subsequently proven to be complete for the class PPAD, such as the complexity of many versions of Nash equilibrium [16, 11, 21, 46, 50, 12] and market equilibrium computation [15, 9, 57, 13, 52]. For details on this, and the significance of PPAcompleteness, see the related discussion in [23], but note the following basic points. As evidence of computational hardness, PPA-completeness is stronger than PPAD-completeness: PPAD PPA. In terms of expressive power, the distinction is that PPA-complete problems can embed a search for a guaranteed fixpoint in a non-oriented topological space, but PPAD problems restrict us to an oriented one. Complete problems for the class PPA seemed to be much more elusive than PPAD-complete ones, especially when one is interested in "natural" problems, where "natural" here has the very specific meaning of problems that do not explicitly contain a circuit in their definition.
Distributed Stochastic Gradient Tracking Methods
In this paper, we study the problem of distributed multi-agent optimization over a network, where each agent possesses a local cost function that is smooth and strongly convex. The global objective is to find a common solution that minimizes the average of all cost functions. Assuming agents only have access to unbiased estimates of the gradients of their local cost functions, we consider a distributed stochastic gradient tracking method (DSGT) and a gossip-like stochastic gradient tracking method (GSGT). We show that, in expectation, the iterates generated by each agent are attracted to a neighborhood of the optimal solution, where they accumulate exponentially fast (under a constant stepsize choice). Under DSGT, the limiting (expected) error bounds on the distance of the iterates from the optimal solution decrease with the network size $n$, which is a comparable performance to a centralized stochastic gradient algorithm. Moreover, we show that when the network is well-connected, GSGT incurs lower communication cost than DSGT while maintaining a similar computational cost. Numerical example further demonstrates the effectiveness of the proposed methods.
Novel artificial intelligence system may help robots complete household chores
BOSTON: Scientists are developing an artificial intelligence system that can get virtual agents to complete simple household chores in a simulated environment, paving the way for future robots to learn such tasks. The team from University of Toronto in Canada and Massachusetts Institute of Technology (MIT) in the US trained the system called VirtualHome using nearly 3,000 programmes of various activities, which are further broken down into subtasks for the computer to understand. A simple task like "making coffee," for example, would also include the step "grabbing a cup." The researchers demonstrated VirtualHome in a 3D world inspired by Sims - a life-simulation video game. The team's AI agent can execute 1,000 of these interactions in the Sims-style world, with eight different scenes including a living room, kitchen, dining room, bedroom, and home office.