Agents
Reports on the 2004 AAAI Fall Symposia
Cassimatis, Nick, Luke, Sean, Levy, Simon D., Gayler, Ross, Kanerva, Pentti, Eliasmith, Chris, Bickmore, Timothy, Schultz, Alan C., Davis, Randall, Landay, James, Miller, Rob, Saund, Eric, Stahovich, Tom, Littman, Michael, Singh, Satinder, Argamon, Shlomo, Dubnov, Shlomo
The Association for the Advancement of Artificial Intelligence presented its 2004 Fall Symposium Series Friday through Sunday, October 22-24 at the Hyatt Regency Crystal City in Arlington, Virginia, adjacent to Washington, DC. The symposium series was preceded by a one-day AI funding seminar. The topics of the eight symposia in the 2004 Fall Symposia Series were: (1) Achieving Human-Level Intelligence through Integrated Systems and Research; (2) Artificial Multiagent Learning; (3) Compositional Connectionism in Cognitive Science; (4) Dialogue Systems for Health Communications; (5) The Intersection of Cognitive Science and Robotics: From Interfaces to Intelligence; (6) Making Pen-Based Interaction Intelligent and Natural; (7) Real- Life Reinforcement Learning; and (8) Style and Meaning in Language, Art, Music, and Design.
The Sixth International Conference on Enterprise Information Systems (ICEIS 2004
The Sixth International Conference on Enterprise Information Systems (ICEIS) was held in Porto, Portugal; previous venues were in Spain, France, and the United Kingdom. Since its inception in 1999, ICEIS has grown steadily, and is now one of the largest international conferences in the area of information systems. In 2004, more than 600 papers were submitted to the conference and its ten satellite workshops. One of the interesting features of this conference is the high number of invited speakers. In 2004, eighteen keynote speakers were featured at ICEIS and its workshops.
Extremal Behaviour in Multiagent Contract Negotiation
We examine properties of a model of resource allocation in which several agents exchange resources in order to optimise their individual holdings. The schemes discussed relate to well-known negotiation protocols proposed in earlier work and we consider a number of alternative notions of ``rationality'' covering both quantitative measures, e.g. cooperative and individual rationality and more qualitative forms, e.g. Pigou-Dalton transfers. While it is known that imposing particular rationality and structural restrictions may result in some reallocations of the resource set becoming unrealisable, in this paper we address the issue of the number of restricted rational deals that may be required to implement a particular reallocation when it is possible to do so. We construct examples showing that this number may be exponential (in the number of resources m), even when all of the agent utility functions are monotonic. We further show that k agents may achieve in a single deal a reallocation requiring exponentially many rational deals if at most k-1 agents can participate, this same reallocation being unrealisable by any sequences of rational deals in which at most k-2 agents are involved.
Extending Q-Learning to General Adaptive Multi-Agent Systems
Recent multi-agent extensions of Q-Learning require knowledge of other agents' payoffs and Q-functions, and assume game-theoretic play at all times by all other agents. This paper proposes a fundamentally different approach, dubbed "Hyper-Q" Learning, in which values of mixed strategies rather than base actions are learned, and in which other agents' strategies are estimated from observed actions via Bayesian inference. Hyper-Q may be effective against many different types of adaptive agents, even if they are persistently dynamic. Against certain broad categories of adaptation, it is argued that Hyper-Q may converge to exact optimal time-varying policies. In tests using Rock-Paper-Scissors, Hyper-Q learns to significantly exploit an Infinitesimal Gradient Ascent (IGA) player, as well as a Policy Hill Climber (PHC) player. Preliminary analysis of Hyper-Q against itself is also presented.
Learning Near-Pareto-Optimal Conventions in Polynomial Time
Wang, Xiaofeng, Sandholm, Tuomas
We study how to learn to play a Pareto-optimal strict Nash equilibrium when there exist multiple equilibria and agents may have different preferences among the equilibria. We focus on repeated coordination games of non-identical interest where agents do not know the game structure up front and receive noisy payoffs. We design efficient near-optimal algorithms for both the perfect monitoring and the imperfect monitoring setting(where the agents only observe their own payoffs and the joint actions).
All learning is Local: Multi-agent Learning in Global Reward Games
Chang, Yu-han, Ho, Tracey, Kaelbling, Leslie P.
In large multiagent games, partial observability, coordination, and credit assignment persistently plague attempts to design good learning algorithms. We provide a simple and efficient algorithm that in part uses a linear system to model the world from a single agent's limited perspective, and takes advantage of Kalman filtering to allow an agent to construct a good training signal and learn an effective policy.
An MDP-Based Approach to Online Mechanism Design
Parkes, David C., Singh, Satinder P.
Online mechanism design (MD) considers the problem of providing incentives to implement desired system-wide outcomes in systems with self-interested agents that arrive and depart dynamically. Agents can choose to misrepresent their arrival and departure times, in addition to information about their value for different outcomes. We consider the problem of maximizing the total longterm value of the system despite the self-interest of agents. The online MD problem induces a Markov Decision Process (MDP), which when solved can be used to implement optimal policies in a truth-revealing Bayesian-Nash equilibrium.
Extending Q-Learning to General Adaptive Multi-Agent Systems
Recent multi-agent extensions of Q-Learning require knowledge of other agents' payoffs and Q-functions, and assume game-theoretic play at all times by all other agents. This paper proposes a fundamentally different approach, dubbed "Hyper-Q" Learning, in which values of mixed strategies rather than base actions are learned, and in which other agents' strategies are estimated from observed actions via Bayesian inference. Hyper-Q may be effective against many different types of adaptive agents, even if they are persistently dynamic. Against certain broad categories of adaptation, it is argued that Hyper-Q may converge to exact optimal time-varying policies. In tests using Rock-Paper-Scissors, Hyper-Q learns to significantly exploit an Infinitesimal Gradient Ascent (IGA) player, as well as a Policy Hill Climber (PHC) player. Preliminary analysis of Hyper-Q against itself is also presented.
Learning Near-Pareto-Optimal Conventions in Polynomial Time
Wang, Xiaofeng, Sandholm, Tuomas
We study how to learn to play a Pareto-optimal strict Nash equilibrium when there exist multiple equilibria and agents may have different preferences among the equilibria. We focus on repeated coordination games of non-identical interest where agents do not know the game structure up front and receive noisy payoffs. We design efficient near-optimal algorithms for both the perfect monitoring and the imperfect monitoring setting(where the agents only observe their own payoffs and the joint actions).