Industry
Generalized Reaction Functions for Solving Complex-Task Allocation Problems
Zheng, Xiaoming (Facebook, Inc) | Koenig, Sven (University of Southern California)
We study distributed task-allocation problems wherecooperative agents need to perform some tasks simultaneously. Examples are multi-agent routing problems where several agents need to visit some targets simultaneously, for example, to move obstacles out of the way cooperatively. In this paper, we first generalize the concept of reaction functions proposed in the literature to characterize the agent costs of performing multiple complex tasks. Second, we show how agents can construct and approximate reaction functions in a distributed way. Third, we show how reaction functions can be used by an auction-like algorithm to allocate tasks to agents. Finally, we show empirically that the team costs of our algorithms are substantially smaller than those of an existing state-of-the-art allocation algorithm for complex tasks.
Mechanism Design for Double Auctions with Temporal Constraints
Zhao, Dengji (University of Western Sydney and University of Toulouse) | Zhang, Dongmo (University of Western Sydney) | Perrussel, Laurent (University of Toulouse)
This paper examines an extended double auction model where market clearing is restricted by temporal constraints. It is found that the allocation problem in this model can be effectively transformed into a weighted bipartite matching in graph theory. By using the augmentation technique, we propose a Vickrey-Clarke-Groves (VCG) mechanism in this model and demonstrate the advantages of the payment compared with the classical VCG payment (the Clarke pivot payment). We also show that the algorithms for both allocation and payment calculation run in polynomial time. It is expected that the method and results provided in this paper can be applied to the design and analysis of dynamic double auctions and futures markets.
An Efficient Monte-Carlo Algorithm for Pricing Combinatorial Prediction Markets for Tournaments
Xia, Lirong (Duke University) | Pennock, David M. (Yahoo! Research New York)
Computing the market maker price of a security in a combinatorial prediction market is #P-hard. We devise a fully polynomial randomized approximation scheme (FPRAS) that computes the price of any security in disjunctive normal form (DNF) within an ฮต multiplicative error factor in time polynomial in 1ฮต and the size of the input, with high probability and under reasonable assumptions. Our algorithm is a Monte-Carlo technique based on importance sampling. The algorithm can also approximately price securities represented in conjunctive normal form (CNF) with additive error bounds. To illustrate the applicability of our algorithm, we show that many securities in Yahoo!'s popular combinatorial prediction market game called Predictalot can be represented by DNF formulas of polynomial size.
Online Planning for Ad Hoc Autonomous Agent Teams
Wu, Feng (University of Science and Technology of China) | Zilberstein, Shlomo (University of Massachusetts Amherst) | Chen, Xiaoping (University of Science and Technology of China)
We propose a novel online planning algorithm for ad hoc team settings โ challenging situations in which an agent must collaborate with unknown teammates without prior coordination. Our approach is based on constructing and solving a series of stage games, and then using biased adaptive play to choose actions. The utility function in each stage game is estimated via Monte-Carlo tree search using the UCT algorithm. We establish analytically the convergence of the algorithm and show that it performs well in a variety of ad hoc team domains.
Reasoning About Preferences in Intelligent Agent Systems
Visser, Simeon (Utrecht University) | Thangarajah, John (RMIT University) | Harland, James (RMIT University)
Note that this extra to make decisions about which plans are used to information is included as a preference rather than a goal, achieve their goals. Usually the choice of which as it is acceptable to satisfy the goal without satisfying the plan to use to achieve a particular goal is left up preference. For example, if the user prefers to fly on Dodgy to the system to determine. In this paper we show Airlines, but no such flights are available, then specifying this how preferences, which can be set by the user of the as a preference means that the user can still have a holiday; system, can be incorporated into the BDI execution specifying this as a goal would mean that the user refuses to process and used to guide the choices made.
Social Instruments for Robust Convention Emergence
Villatoro, Daniel (Artificial Intelligence Research Institute (IIIA-CSIC)) | Sabater-Mir, Jordi (Artificial Intelligence Research Institute (IIIA-CSIC)) | Sen, Sandip (University of Tulsa)
We present the notion of Social Instruments as mechanisms that facilitate the emergence of conventions from repeated interactions between members of a society. Specifically, we focus on two social instruments: rewiring and observation. Our main goal is to provide agents with tools that allow them to leverage their social network of interactions when effectively addressing coordination and learning problems, paying special attention to dissolving meta-stable subconventions. Initial experiments throw some light on how Self-Reinforcing Substructures (SRS) in the network prevent full convergence, resulting in reduced convergence rates. The use of an effective composed social instrument (observation + rewiring) allow agents to eliminate the subconventions that otherwise remained meta-stable.
Facing Openness with Socio Cognitive Trust and Categories
Venanzi, Matteo (University of Southampton) | Piunti, Michele (ISTC-CNR, Rome) | Falcone, Rino (ISTC-CNR, Rome) | Castelfranchi, Cristiano (ISTC-CNR, Rome)
Typical solutions for agents assessing trust relies on the circulation of information on the individual level, i.e. reputational images, subjective experi- ences, statistical analysis, etc. This work presents an alternative approach, inspired to the cognitive heuristics enabling humans to reason at a categorial level. The approach is envisaged as a crucial ability for agents in order to: (1) estimate trustworthiness of unknown trustees based on an ascribed mem- bership to categories; (2) learn a series of emer- gent relations between trustees observable proper- ties and their effective abilities to fulfill tasks in sit- uated conditions. On such a basis, categorization is provided to recognize signs (Manifesta) through which hidden capabilities (Kripta) can be inferred. Learning is provided to refine reasoning attitudes needed to ascribe tasks to categories. A series of ar- chitectures combining categorization abilities, indi- vidual experiences and context awareness are eval- uated and compared in simulated experiments.
Emergence and Stability of Social Conventions in Conflict Situations
Sugawara, Toshiharu (Waseda Univesity)
We investigate the emergence and stability of social conventions for efficiently resolving conflicts through reinforcement learning. Facilitation of coordination and conflict resolution is an important issue in multi-agent systems. However, exhibiting coordinated and negotiation activities is computationally expensive. In this paper, we first describe a conflict situation using a Markov game which is iterated if the agents fail to resolve their conflicts, where the repeated failures result in an inefficient society. Using this game, we show that social conventions for resolving conflicts emerge, but their stability and social efficiency depend on the payoff matrices that characterize the agents. We also examine how unbalanced populations and small heterogeneous agents affect efficiency and stability of the resulting conventions. Our results show that (a) a type of indecisive agent that is generous for adverse results leads to unstable societies, and (b) selfish agents that have an explicit order of benefits make societies stable and efficient.
An Empirical Study of Seeding Manipulations and Their Prevention
Russell, Tyrel (University of Waterloo) | Beek, Peter van (University of Waterloo)
It is well known that cheating occurs in sports. In cup competitions, a common type of sports competition, one method of cheating is in manipulating the seeding to unfairly advantage a particular team. Previous empirical and theoretical studies of seeding manipulation have focused on competitions with unrestricted seeding. However, real cup competitions often place restrictions on seedings to ensure fairness, wide geographic interest, and so on. In this paper, we perform an extensive empirical study of seeding manipulation under comprehensive and realistic sets of restrictions. A generalized random model of competition problems is proposed. This model creates a realistic range of problem instances that are used to identify the sets of seeding restrictions that are hard to manipulate in practice. We end with a discussion of the implications of this work and recommendations for organizing competitions so as to prevent or reduce the opportunities for manipulating the seeding.
On Combining Decisions from Multiple Expert Imitators for Performance
Rubin, Jonathan (University of Auckland) | Watson, Ian (University of Auckland)
One approach for artificially intelligent agents wishing to maximise some performance metric in a given domain is to learn from a collection of training data that consists of actions or decisions made by some expert, in an attempt to imitate that expert's style. We refer to this type of agent as an expert imitator. In this paper we investigate whether performance can be improved by combining decisions from multiple expert imitators. In particular, we investigate two existing approaches for combining decisions. The first approach combines decisions by employing ensemble voting between multiple expert imitators. The second approach dynamically selects the best imitator to use at runtime given the performance of the imitators in the current environment. We investigate these approaches in the domain of computer poker. In particular, we create expert imitators for limit and no limit Texas Hold'em and determine whether their performance can be improved by combining their decisions using the two approaches listed above.