Agents
A Dual Approach for Optimal Algorithms in Distributed Optimization over Networks
Uribe, César A., Lee, Soomin, Gasnikov, Alexander, Nedić, Angelia
We study dual-based algorithms for distributed convex optimization problems over networks, where the objective is to minimize a sum $\sum_{i=1}^{m}f_i(z)$ of functions over in a network. We provide complexity bounds for four different cases, namely: each function $f_i$ is strongly convex and smooth, each function is either strongly convex or smooth, and when it is convex but neither strongly convex nor smooth. Our approach is based on the dual of an appropriately formulated primal problem, which includes a graph that models the communication restrictions. We propose distributed algorithms that achieve the same optimal rates as their centralized counterparts (up to constant and logarithmic factors), with an additional optimal cost related to the spectral properties of the network. Initially, we focus on functions for which we can explicitly minimize its Legendre-Fenchel conjugate, i.e., admissible or dual friendly functions. Then, we study distributed optimization algorithms for non-dual friendly functions, as well as a method to improve the dependency on the parameters of the functions involved. Numerical analysis of the proposed algorithms is also provided.
Multiagent Evaluation under Incomplete Information
Rowland, Mark, Omidshafiei, Shayegan, Tuyls, Karl, Perolat, Julien, Valko, Michal, Piliouras, Georgios, Munos, Remi
This paper investigates the evaluation of learned multiagent strategies in the incomplete information setting, which plays a critical role in ranking and training of agents. Traditionally, researchers have relied on Elo ratings for this purpose, with recent works also using methods based on Nash equilibria. Unfortunately, Elo is unable to handle intransitive agent interactions, and other techniques are restricted to zero-sum, two-player settings or are limited by the fact that the Nash equilibrium is intractable to compute. Recently, a ranking method called {\alpha}-Rank, relying on a new graph-based game-theoretic solution concept, was shown to tractably apply to general games. However, evaluations based on Elo or {\alpha}-Rank typically assume noise-free game outcomes, despite the data often being collected from noisy simulations, making this assumption unrealistic in practice. This paper investigates multiagent evaluation in the incomplete information regime, involving general-sum many-player games with noisy outcomes. We derive sample complexity guarantees required to confidently rank agents in this setting. We propose adaptive algorithms for accurate ranking, provide correctness and sample complexity guarantees, then introduce a means of connecting uncertainties in noisy match outcomes to uncertainties in rankings. We evaluate the performance of these approaches in several domains, including Bernoulli games, a soccer meta-game, and Kuhn poker.
Virtual Artificial Intelligent Agents Use Previously Unknown Strategies for Simulated Hide and Seek Game
In their quest to "ensure that artificial general intelligence benefits all of humanity", researchers at OpenAI discovered that their virtual AI agents engaged in a simulated game of "Hide and Seek" had learned six distinct new strategies for playing the game outside of the designed environment. Through training in our new simulated hide-and-seek environment, agents build a series of six distinct strategies and counterstrategies, some of which we did not know our environment supported. The self-supervised emergent complexity in this simple environment further suggests that multi-agent co-adaptation may one day produce extremely complex and intelligent behavior. This surprising type of learning behavior suggests intrinsic motivation and/or competition with others, although the researchers are still figuring it all out. We've shown that agents can learn sophisticated tool use in a high fidelity physics simulator; however, there were many lessons learned along the way to this result.
Emergent Tool Use from Multi-Agent Interaction
In our environment, agents play a team-based hide-and-seek game. Hiders (blue) are tasked with avoiding line-of-sight from the seekers (red), and seekers are tasked with keeping vision of the hiders. There are objects scattered throughout the environment that hiders and seekers can grab and lock in place, as well as randomly generated immovable rooms and walls that agents must learn to navigate. Before the game begins, hiders are given a preparation phase where seekers are immobilized to give hiders a chance to run away or change their environment. There are no explicit incentives for agents to interact with objects in the environment; the only supervision given is through the hide-and-seek objective.
Regularized Diffusion Adaptation via Conjugate Smoothing
Vlaski, Stefan, Vandenberghe, Lieven, Sayed, Ali H.
--The purpose of this work is to develop and study a distributed strategy for Pareto optimization of an aggregate cost consisting of regularized risks. Each risk is modeled as the expectation of some loss function with unknown probability distribution while the regularizers are assumed deterministic, but are not required to be differentiable or even continuous. The individual, regularized, cost functions are distributed across a strongly-connected network of agents and the Pareto optimal solution is sought by appealing to a multi-agent diffusion strategy. T o this end, the regularizers are smoothed by means of infimal convolution and it is shown that the Pareto solution of the approximate, smooth problem can be made arbitrarily close to the solution of the original, non-smooth problem. Performance bounds are established under conditions that are weaker than assumed before in the literature, and hence applicable to a broader class of adaptation and learning problems. Index T erms --Distributed optimization, diffusion strategy, smoothing, proximal operator, non-smooth regularizer, proximal diffusion, regularized diffusion. The objective of distributed learning is the solution of global, stochastic optimization problems across networks of agents through localized interactions and without information about the statistical properties of the data. Using streaming data, the resulting strategies are adaptive in nature and able to track drifts in the location of the minimizers due to variations in the statistical properties of the data. Regularization is one useful technique to encourage or enforce structural properties on the sought after minimizer, such as sparsity or constraints. A substantial number of regularizers are inherently non-smooth, while many cost functions are differentiable.
AI Hide and Seek: Agents Punched Holes in their Creators' Universe
Bots removed opponents' tools from the game space, and launched themselves into the air… Two teams of AI agents tasked with playing a game (or million) of hide and seek in a virtual environment developed complex strategies and counterstrategies – and exploited holes in their environment that even its creators didn't even know that it had. The game was part of an experiment by OpenAI designed to test the AI skills that emerge from multi-agent competition and standard reinforcement learning algorithms at scale. OpenAI described the outcome in a striking paper published this week. The organisation, now heavily backed by Microsoft, described the outcome as further proof that "skills, far more complex than the seed game dynamics and environment, can emerge" (from such experiments/training exercises). Some of its findings are neatly captured in the video below.
Why Playing Hide-and-Seek Could Lead AI to Humanlike Intelligence
Humans are a species that can adapt to environmental challenges, and over eons this has enabled us to biologically evolve -- an essential characteristic found in animals but absent in AI. Although machine learning has made remarkable progress in complex games such as Go and Dota 2, the skills mastered in these arenas do not necessarily generalize to practical applications in real-world scenarios. The goal for a growing number of researchers is to build a machine intelligence that behaves, learns and evolves more like humans. A new paper from San Francisco-based OpenAI proposes that training models in the children's game of hide-and-seek and pitting them against each other in tens of millions of contests results in the models automatically developing humanlike behaviors that increase their intelligence and improve subsequent performance. Hide-and-seek was selected as a fun starting point mostly due to its simple rules, says the paper's first author, OpenAI Researcher Bowen Baker.
AIs use hide-and-seek to learn to tackle real-world problems
Pitting two artificial intelligences against each other in games such as DeepMind's Go has led to some of the biggest breakthroughs in AI in recent years, as the machines learn skills through trial and error that eventually lead to them beating humans. But can the same technique produce a more useful AI capable of operating in the real word? OpenAI, a San Francisco-based AI research group, published research on Tuesday showing what it claimed was a method for training increasingly powerful smart systems that could prepare them for tackling more ordinary human problems. Set in increasingly realistic environments, the technique points to a way for the AI to "evolve" in a simulated world until it is ready to be used, it said. The researchers used several intelligent "agents" in a game of hide-and-seek played in a simulated physical environment.
Reasoning in Highly Reactive Environments
The aim of my Ph.D. thesis concerns Reasoning in Highly Reactive Environments. As reasoning in highly reactive environments, we identify the setting in which a knowledge-based agent, with given goals, is deployed in an environment subject to repeated, sudden and possibly unknown changes. This is for instance the typical setting in which, e.g., artificial agents for video-games (the so called "bots"), cleaning robots, bomb clearing robots, and so on are deployed. In all these settings one can follow the classical approach in which the operations of the agent are distinguished in "sensing" the environment with proper interface devices, "thinking", and then behaving accordingly using proper actuators. In order to operate in an highly reactive environment, an artificial agent needs to be: 1. Responsive -> The agent must be able to react repeatedly and in a reasonable amount of time; 2. Elastic -> The agent must stay reactive also under varying workload; 3. Resilient -> The agent must stay responsive also in case of internal failure or failure of one of the programmed actions in the environment. Nowadays, thanks to new technologies in the field of Artificial Intelligence, it is already technically possible to create AI agents that are able to operate in reactive environments. Nevertheless, several issues stay unsolved, and are subject of ongoing research.
No-Regret Learning in Unknown Games with Correlated Payoffs
Sessa, Pier Giuseppe, Bogunovic, Ilija, Kamgarpour, Maryam, Krause, Andreas
We consider the problem of learning to play a repeated multi-agent game with an unknown reward function. Single player online learning algorithms attain strong regret bounds when provided with full information feedback, which unfortunately is unavailable in many real-world scenarios. Bandit feedback alone, i.e., observing outcomes only for the selected action, yields substantially worse performance. In this paper, we consider a natural model where, besides a noisy measurement of the obtained reward, the player can also observe the opponents' actions. This feedback model, together with a regularity assumption on the reward function, allows us to exploit the correlations among different game outcomes by means of Gaussian processes (GPs). We propose a novel confidence-bound based bandit algorithm GP-MW, which utilizes the GP model for the reward function and runs a multiplicative weight (MW) method. We obtain novel kernel-dependent regret bounds that are comparable to the known bounds in the full information setting, while substantially improving upon the existing bandit results. We experimentally demonstrate the effectiveness of GP-MW in random matrix games, as well as real-world problems of traffic routing and movie recommendation. In our experiments, GP-MW consistently outperforms several baselines, while its performance is often comparable to methods that have access to full information feedback.