Goto

Collaborating Authors

 Agents


Optimal Differentially Private ADMM for Distributed Machine Learning

arXiv.org Machine Learning

Due to massive amounts of data distributed across multiple locations, distributed machine learning has attracted a lot of research interests. Alternating Direction Method of Multipliers (ADMM) is a powerful method of designing distributed machine learning algorithm, whereby each agent computes over local datasets and exchanges computation results with its neighbor agents in an iterative procedure. There exists significant privacy leakage during this iterative process if the local data is sensitive. In this paper, we propose a differentially private ADMM algorithm (P-ADMM) to provide dynamic zero-concentrated differential privacy (dynamic zCDP), by inserting Gaussian noise with linearly decaying variance. We prove that P-ADMM has the same convergence rate compared to the non-private counterpart, i.e., $\mathcal{O}(1/K)$ with $K$ being the number of iterations and linear convergence for general convex and strongly convex problems while providing differentially private guarantee. Moreover, through our experiments performed on real-world datasets, we empirically show that P-ADMM has the best-known performance among the existing differentially private ADMM based algorithms.


Deep Reinforcement Learning from Policy-Dependent Human Feedback

arXiv.org Machine Learning

To widen their accessibility and increase their utility, intelligent agents must be able to learn complex behaviors as specified by (non-expert) human users. Moreover, they will need to learn these behaviors within a reasonable amount of time while efficiently leveraging the sparse feedback a human trainer is capable of providing. Recent work has shown that human feedback can be characterized as a critique of an agent's current behavior rather than as an alternative reward signal to be maximized, culminating in the COnvergent Actor-Critic by Humans (COACH) algorithm for making direct policy updates based on human feedback. Our work builds on COACH, moving to a setting where the agent's policy is represented by a deep neural network. We employ a series of modifications on top of the original COACH algorithm that are critical for successfully learning behaviors from high-dimensional observations, while also satisfying the constraint of obtaining reduced sample complexity. We demonstrate the effectiveness of our Deep COACH algorithm in the rich 3D world of Minecraft with an agent that learns to complete tasks by mapping from raw pixels to actions using only real-time human feedback in 10-15 minutes of interaction.


DADAM: A Consensus-based Distributed Adaptive Gradient Method for Online Optimization

arXiv.org Machine Learning

Adaptive gradient-based optimization methods such as \textsc{Adagrad}, \textsc{Rmsprop}, and \textsc{Adam} are widely used in solving large-scale machine learning problems including deep learning. A number of schemes have been proposed in the literature aiming at parallelizing them, based on communications of peripheral nodes with a central node, but incur high communications cost. To address this issue, we develop a novel consensus-based distributed adaptive moment estimation method (\textsc{Dadam}) for online optimization over a decentralized network that enables data parallelization, as well as decentralized computation. The method is particularly useful, since it can accommodate settings where access to local data is allowed. Further, as established theoretically in this work, it can outperform centralized adaptive algorithms, for certain classes of loss functions used in applications. We analyze the convergence properties of the proposed algorithm and provide a dynamic regret bound on the convergence rate of adaptive moment estimation methods in both stochastic and deterministic settings. Empirical results demonstrate that \textsc{Dadam} works also well in practice and compares favorably to competing online optimization methods.


The StarCraft Multi-Agent Challenge

arXiv.org Machine Learning

In the last few years, deep multi-agent reinforcement learning (RL) has become a highly active area of research. A particularly challenging class of problems in this area is partially observable, cooperative, multi-agent learning, in which teams of agents must learn to coordinate their behaviour while conditioning only on their private observations. This is an attractive research area since such problems are relevant to a large number of real-world systems and are also more amenable to evaluation than general-sum problems. Standardised environments such as the ALE and MuJoCo have allowed single-agent RL to move beyond toy domains, such as grid worlds. However, there is no comparable benchmark for cooperative multi-agent RL. As a result, most papers in this field use one-off toy problems, making it difficult to measure real progress. In this paper, we propose the StarCraft Multi-Agent Challenge (SMAC) as a benchmark problem to fill this gap. SMAC is based on the popular real-time strategy game StarCraft II and focuses on micromanagement challenges where each unit is controlled by an independent agent that must act based on local observations. We offer a diverse set of challenge maps and recommendations for best practices in benchmarking and evaluations. We also open-source a deep multi-agent RL learning framework including state-of-the-art algorithms. We believe that SMAC can provide a standard benchmark environment for years to come. Videos of our best agents for several SMAC scenarios are available at: https://youtu.be/VZ7zmQ_obZ0.


Global Collaboration through Local Interaction in Competitive Learning

arXiv.org Machine Learning

The Self Organizing Map (SOM) is a competitive, unsupervised learningalgorithm capable of creating a low dimensional and discrete representation of high dimensional data. Sinceits initial conception, SOM has found broad application indata analytics, mainly for data clustering, function approximation, and dimensionality reduction (see [5, 8] for examples of applications). A SOM consists of a population of adaptive, interacting agents dubbed units. Each unit is represented in the sample spaceby vector (called weight) and it influences set of other units (neighbors). For each sample, the unit with the most similar weight is found (called the best matching unit - BMU), and its similarity to the sample is increased by altering itsweight. Subsequently, the neighbors of the BMU are also influenced by increasing their similarity to the sample - albeit to a lesser extend. Given enough data, the units' weight may converge to a low dimensional discrete representation of the data - called a feature map. Additionally, the units' weight will be placed meaningfully: neighboring units should contain similar features, sinceneighborhoods move en masse, a property known as topological preservation. It is possible however for this process to go awry; for example, limiting the influence of a unit over its neighbors may compromise the topological preservation[5, 4].


Hierarchical Critics Assignment for Multi-agent Reinforcement Learning

arXiv.org Machine Learning

In this paper, we investigate the use of global information to speed up the learning process and increase the cumulative rewards of multi-agent reinforcement learning (MARL) tasks. Within the actor-critic MARL, we introduce multiple cooperative critics from two levels of the hierarchy and propose a hierarchical critic-based MARL algorithm. In our approach, the agent is allowed to receive information from local and global critics in a competition task. The agent not only receives low-level details but also considers coordination from high levels to obtain global information for increasing operational performance. Here, we define multiple cooperative critics in a top-down hierarchy, called the Hierarchical Critic Assignment (HCA) framework. Our experiment, a two-player tennis competition task performed in the Unity environment, tested the HCA multi-agent framework based on the Asynchronous Advantage Actor-Critic (A3C) with Proximal Policy Optimization (PPO) algorithm. The results showed that the HCA framework outperforms the non-hierarchical critic baseline method on MARL tasks.


Learning Best Response Strategies for Agents in Ad Exchanges

arXiv.org Artificial Intelligence

Ad exchanges are widely used in platforms for online display advertising. Autonomous agents operating in these exchanges must learn policies for interacting profitably with a diverse, continually changing, but unknown market. We consider this problem from the perspective of a publisher, strategically interacting with an advertiser through a posted price mechanism. The learning problem for this agent is made difficult by the fact that information is censored, i.e., the publisher knows if an impression is sold but no other quantitative information. We address this problem using the Harsanyi-Bellman Ad Hoc Coordination (HBA) algorithm, which conceptualises this interaction in terms of a Stochastic Bayesian Game and arrives at optimal actions by best responding with respect to probabilistic beliefs maintained over a candidate set of opponent behaviour profiles. We adapt and apply HBA to the censored information setting of ad exchanges. Also, addressing the case of stochastic opponents, we devise a strategy based on a Kaplan-Meier estimator for opponent modelling. We evaluate the proposed method using simulations wherein we show that HBA-KM achieves substantially better competitive ratio and lower variance of return than baselines, including a Q-learning agent and a UCB-based online learning agent, and comparable to the offline optimal algorithm.


EvalAI: Towards Better Evaluation Systems for AI Agents

arXiv.org Artificial Intelligence

We introduce EvalAI, an open source platform for evaluating and comparing machine learning (ML) and artificial intelligence algorithms (AI) at scale. EvalAI is built to provide a scalable solution to the research community to fulfill the critical need of evaluating machine learning models and agents acting in an environment against annotations or with a human-in-the-loop. This will help researchers, students, and data scientists to create, collaborate, and participate in AI challenges organized around the globe. By simplifying and standardizing the process of benchmarking these models, EvalAI seeks to lower the barrier to entry for participating in the global scientific effort to push the frontiers of machine learning and artificial intelligence, thereby increasing the rate of measurable progress in this domain. Our code is available here.


Yvonne Hofstetter Pirate Security Conference

#artificialintelligence

Yvonne Hofstetter (* 1966 in Frankfurt am Main) is a German lawyer, essayist and non-fiction author. She attended a convent school of the Congregation of Jesus and studied law. Since 1999, she has been working internationally in software companies specializing in multi-agent systems for the defense and automated commerce industries, working in the fields of Artificial Intelligence, Industry 4.0 and Big Data.


Cost-Based Goal Recognition in Navigational Domains

Journal of Artificial Intelligence Research

Goal recognition is the problem of determining an agent's intent by observing her behaviour. Contemporary solutions for general task-planning relate the probability of a goal to the cost of reaching it. We adapt this approach to goal recognition in the strict context of path-planning. We show (1) that a simpler formula provides an identical result to current state-of-the-art in less than half the time under all but one set of conditions. Further, we prove (2) that the probability distribution based on this technique is independent of an agent's past behaviour and present a revised formula that achieves goal recognition by reference to the agent's starting point and current location only. Building on this, we demonstrate (3) that a Radius of Maximum Probability (i.e., the distance from a goal within which that goal is guaranteed to be the most probable) can be calculated from relative cost-distances between the candidate goals and a start location, without needing to calculate any actual probabilities. In this extended version of earlier work, we generalise our framework to the continuous domain and discuss our results, including the conditions under which our findings can be generalised back to goal recognition in general task-planning.