Goto

Collaborating Authors

 Agents


New Criteria and a New Algorithm for Learning in Multi-Agent Systems

Neural Information Processing Systems

We propose a new set of criteria for learning algorithms in multi-agent systems, one that is more stringent and (we argue) better justified than previous proposed criteria. Our criteria, which apply most straightforwardly inrepeated games with average rewards, consist of three requirements: (a) against a specified class of opponents (this class is a parameter of the criterion) the algorithm yield a payoff that approaches the payoff of the best response, (b) against other opponents the algorithm's payoff at least approach (and possibly exceed) the security level payoff (or maximin value),and (c) subject to these requirements, the algorithm achieve a close to optimal payoff in self-play. We furthermore require that these average payoffs be achieved quickly. We then present a novel algorithm, and show that it meets these new criteria for a particular parameter class, the class of stationary opponents. Finally, we show that the algorithm is effective not only in theory, but also empirically. Using a recently introduced comprehensive game theoretic test suite, we show that the algorithm almost universally outperforms previous learning algorithms.


Approximately Efficient Online Mechanism Design

Neural Information Processing Systems

Online mechanism design (OMD) addresses the problem of sequential decision making in a stochastic environment with multiple self-interested agents. The goal in OMD is to make value-maximizing decisions despite this self-interest. In previous work we presented a Markov decision process (MDP)-basedapproach to OMD in large-scale problem domains. In practice the underlying MDP needed to solve OMD is too large and hence the mechanism must consider approximations. This raises the possibility thatagents may be able to exploit the approximation for selfish gain. We adopt sparse-sampling-based MDP algorithms to implement ษ›- efficient policies, and retain truth-revelation as an approximate Bayesian-Nash equilibrium. Our approach is empirically illustrated in the context of the dynamic allocation of WiFi connectivity to users in a coffeehouse.


Convergence and No-Regret in Multiagent Learning

Neural Information Processing Systems

Learning in a multiagent system is a challenging problem due to two key factors. First, if other agents are simultaneously learning then the environment isno longer stationary, thus undermining convergence guarantees. Second, learning is often susceptible to deception, where the other agents may be able to exploit a learner's particular dynamics. In the worst case, this could result in poorer performance than if the agent was not learning at all. These challenges are identifiable in the two most common evaluationcriteria for multiagent learning algorithms: convergence and regret. Algorithms focusing on convergence or regret in isolation are numerous. In this paper, we seek to address both criteria in a single algorithm by introducing GIGA-WoLF, a learning algorithm for normalform games.We prove the algorithm guarantees at most zero average regret, while demonstrating the algorithm converges in many situations of self-play. We prove convergence in a limited setting and give empirical resultsin a wider variety of situations. These results also suggest a third new learning criterion combining convergence and regret, which we call negative non-convergence regret (NNR).


Probabilistic Hybrid Action Models for Predicting Concurrent Percept-driven Robot Behavior

Journal of Artificial Intelligence Research

Most autonomous robots are equipped with restricted, unreliable, and inaccurate sensors and effectors and operate in complex and dynamic environments. A successful approach to deal with the resulting uncertainty is the use of controllers that prescribe the robots' behavior in terms of concurrent reactive plans (CRPs) -- plans that specify how the robots are to react to sensory input in order to accomplish their jobs reliably (e.g., McDermott, 1992a; Beetz, 1999). Reactive plans are successfully used to produce situation specific behavior, to detect problems and recover from them automatically, and to recognize and exploit opportunities (Beetz et al., 2001). These kinds of behaviors are particularly important for autonomous robots that have only uncertain information about the world, act in dynamically changing environments, and are to accomplish complex tasks efficiently. Besides reliability and flexibility, foresight is another important capability of competent autonomous robots (McDermott, 1992a).


The Coevolution of AI and AAAI

AI Magazine

AI and AAAI are coevolving. As AI matures, its focus is shifting from inward-looking to outwardlooking. Some of the new concerns of the field are social awareness, networking, cross-disciplinarity, globalization, and open access. AAAI must reflect and support those concerns. AI is now a mature discipline.


On Self-Regulated Swarms, Societal Memory, Speed and Dynamics

arXiv.org Artificial Intelligence

Wasps, bees, ants and termites all make effective use of their environment and resources by displaying collective "swarm" intelligence. Termite colonies - for instance - build nests with a complexity far beyond the comprehension of the individual termite, while ant colonies dynamically allocate labor to various vital tasks such as foraging or defense without any central decision-making ability. Recent research suggests that microbial life can be even richer: highly social, intricately networked, and teeming with interactions, as found in bacteria. What strikes from these observations is that both ant colonies and bacteria have similar natural mechanisms based on Stigmergy and Self-Organization in order to emerge coherent and sophisticated patterns of global foraging behavior. Keeping in mind the above characteristics we propose a Self-Regulated Swarm (SRS) algorithm which hybridizes the advantageous characteristics of Swarm Intelligence as the emergence of a societal environmental memory or cognitive map via collective pheromone laying in the landscape (properly balancing the exploration/exploitation nature of our dynamic search strategy), with a simple Evolutionary mechanism that trough a direct reproduction procedure linked to local environmental features is able to self-regulate the above exploratory swarm population, speeding it up globally.


Cooperative Information Sharing to Improve Distributed Learning in Multi-Agent Systems

Journal of Artificial Intelligence Research

Effective coordination of agents' actions in partially-observable domains is a major challenge of multi-agent systems research. To address this, many researchers have developed techniques that allow the agents to make decisions based on estimates of the states and actions of other agents that are typically learnt using some form of machine learning algorithm. Nevertheless, many of these approaches fail to provide an actual means by which the necessary information is made available so that the estimates can be learnt. To this end, we argue that cooperative communication of state information between agents is one such mechanism. However, in a dynamically changing environment, the accuracy and timeliness of this communicated information determine the fidelity of the learned estimates and the usefulness of the actions taken based on these. Given this, we propose a novel information-sharing protocol, post-task-completion sharing, for the distribution of state information. We then show, through a formal analysis, the improvement in the quality of estimates produced using our strategy over the widely used protocol of sharing information between nearest neighbours. Moreover, communication heuristics designed around our information-sharing principle are subjected to empirical evaluation along with other benchmark strategies (including Littman's Q-routing and Stone's TPOT-RL) in a simulated call-routing application. These studies, conducted across a range of environmental settings, show that, compared to the different benchmarks used, our strategy generates an improvement of up to 60% in the call connection rate; of more than 1000% in the ability to connect long-distance calls; and incurs as low as 0.25 of the message overhead.


Making Better Recommendations with Online Profiling Agents

AI Magazine

In recent years, we have witnessed the success of autonomous agents applying machine-learning techniques across a wide range of applications. However, agents applying the same machine-learning techniques in online applications have not been so successful. Even agent-based hybrid recommender systems that combine information filtering techniques with collaborative filtering techniques have been applied with considerable success only to simple consumer goods such as movies, books, clothing, and food. Yet complex, adaptive autonomous agent systems that can handle complex goods such as real estate, vacation plans, insurance, mutual funds, and mortgages have emerged.


Synthetic Adversaries for Urban Combat Training

AI Magazine

Six high-level requirements drive the implementation of intelligent synthetic adversaries for training: (1) competence, (2) taskability, (3) observational fidelity, (4) behavior variability, most difficult tasks soldiers perform. Frequent Competence: The adversaries must perform training is an essential element in reducing the tactics and missions humans perform in casualties. For this application, the adversaries' environments is costly and restricted to physical goal is to defend a small multistoried mockups of buildings and small towns. The agents must move Environments (VIRTE) program is developing immersive virtual trainers for military operations through the environment, identify tactically on urbanized terrain (MOUT). In this relevant features (such as escape routes), and trainer, four-person fire teams of U.S. Marines communicate and coordinate with other are situated in a virtual urban environment and agents. Virtual opponents new missions for different training scenarios, are required to populate the environment and and they must change their objectives challenge the trainees. Behavior is not scripted or This article describes the general requirements specific to a particular mission, terrain, or operational for virtual MOUT opponents and our development setting, providing flexibility for operational of synthetic adversaries to meet use.


Making Better Recommendations with Online Profiling Agents

AI Magazine

In recent years, we have witnessed the success of autonomous agents applying machine-learning techniques across a wide range of applications. However, agents applying the same machine-learning techniques in online applications have not been so successful. Even agent-based hybrid recommender systems that combine information filtering techniques with collaborative filtering techniques have been applied with considerable success only to simple consumer goods such as movies, books, clothing, and food. Yet complex, adaptive autonomous agent systems that can handle complex goods such as real estate, vacation plans, insurance, mutual funds, and mortgages have emerged. To a large extent, the reinforcement learning methods developed to aid agents in learning have been more successfully deployed in offline applications. The inherent limitations in these methods have rendered them somewhat ineffective in online applications. In this article, we postulate that a small amount of prior knowledge and human-provided input can dramatically speed up online learning. We demonstrate that our agent HumanE -- with its prior knowledge or "experiences" about the real estate domain -- can effectively assist users in identifying requirements, especially unstated ones, quickly and unobtrusively.