Agent Societies
DCOPs Meet the Real World: Exploring Unknown Reward Matrices with Applications to Mobile Sensor Networks
Jain, Manish (University of Southern California) | Taylor, Matthew (University of Southern California) | Tambe, Milind (University of Southern California) | Yokoo, Makoto (Kyushu University)
Buoyed by recent successes in the area of distributed constraint optimization problems (DCOPs), this paper addresses challenges faced when applying DCOPs to real-world domains. Three fundamental challenges must be addressed for a class of real-world domains, requiring novel DCOP algorithms. First, agents may not know the payoff matrix and must explore the environment to determine rewards associated with variable settings. Second, agents may need to maximize total accumulated reward rather than instantaneous final reward. Third, limited time horizons disallow exhaustive exploration of the environment. We propose and implement a set of novel algorithms that combine decision-theoretic exploration approaches with DCOP-mandated coordination. In addition to simulation results, we implement these algorithms on robots, deploying DCOPs on a distributed mobile sensor network.
Collaborative Multi Agent Physical Search with Probabilistic Knowledge
Hazon, Noam (Bar Ilan University) | Aumann, Yonatan (Bar Ilan University) | Kraus, Sarit (Bar Ilan University)
This paper considers the setting wherein a group of agents (e.g., robots) is seeking to obtain a given tangible good, potentially available at different locations in a physical environment. Traveling between locations, as well as acquiring the good at any given location consumes from the resources available to the agents (e.g., battery charge). The availability of the good at any given location, as well as the exact cost of acquiring the good at the location is not fully known in advance, and observed only upon physically arriving at the location. However, a-priori probabilities on the availability and potential cost are provided. Given such as setting, the problem is to find a strategy/plan that maximizes the probability of acquiring the good while minimizing resource consumption. Sample applications include agents in exploration and patrol missions, e.g., rovers on Mars seeking to mine a specific mineral. Although this model captures many real world scenarios, it has not been investigated so far. We focus on the case where locations are aligned along a path, and study several variants of the problem, analyzing the effects of communication and coordination. For the case that agents can communicate, we present a polynomial algorithm that works for any fixed number of agents. For non-communicating agents, we present a polynomial algorithm that is suitable for any number of agents. Finally, we analyze the difference between homogeneous and heterogeneous agents, both with respect to their allotted resources and with respect to their capabilities.
Planning Games
Brafman, Ronen I. (Ben-Gurion University) | Domshlak, Carmel (Technion - Israel Institute of Technology) | Engel, Yagil (Technion - Israel Institute of Technology) | Tennenholtz, Moshe (Microsoft Israel R&D Center)
We introduce planning games, a study of interactions of self-motivated agents in automated planning settings. Planning games extend STRIPS-like models of single-agent planning to systems of multiple self-interested agents, providing a rich class of structured games that capture subtle forms of local interactions. We consider two basic models of planning games and adapt game-theoretic solution concepts to these models. In both models, agents may need to cooperate in order to achieve their goals, but are assumed to do so only in order to increase their net benefit. For each model we study the computational problem of finding a stable solution and provide efficient algorithms for systems exhibiting acyclic interaction structure.
Trust-Based Mechanisms for Robust and Efficient Task Allocation in the Presence of Execution Uncertainty
Ramchurn, S. D., Mezzetti, C., Giovannucci, A., Rodriguez-Aguilar, J. A., Dash, R. K., Jennings, N. R.
Vickrey-Clarke-Groves (VCG) mechanisms are often used to allocate tasks to selfish and rational agents. VCG mechanisms are incentive compatible, direct mechanisms that are efficient (i.e., maximise social utility) and individually rational (i.e., agents prefer to join rather than opt out). However, an important assumption of these mechanisms is that the agents will "always" successfully complete their allocated tasks. Clearly, this assumption is unrealistic in many real-world applications, where agents can, and often do, fail in their endeavours. Moreover, whether an agent is deemed to have failed may be perceived differently by different agents. Such subjective perceptions about an agent's probability of succeeding at a given task are often captured and reasoned about using the notion of "trust". Given this background, in this paper we investigate the design of novel mechanisms that take into account the trust between agents when allocating tasks. Specifically, we develop a new class of mechanisms, called "trust-based mechanisms", that can take into account multiple subjective measures of the probability of an agent succeeding at a given task and produce allocations that maximise social utility, whilst ensuring that no agent obtains a negative utility. We then show that such mechanisms pose a challenging new combinatorial optimisation problem (that is NP-complete), devise a novel representation for solving the problem, and develop an effective integer programming solution (that can solve instances with about 2x10^5 possible allocations in 40 seconds).
Dynamic Programming Approximations for Partially Observable Stochastic Games
Kumar, Akshat (University of Massachusetts Amherst) | Zilberstein, Shlomo (University of Massachusetts Amherst)
Partially observable stochastic games (POSGs) provide a rich mathematical framework for planning under uncertainty by a group of agents. However, this modeling advantage comes with a price, namely computation cost. Solving POSGs optimally quickly becomes intractable after a few decision cycles. Our main contribution is to provide bounded approximation techniques which enable us to scale POSG algorithms by several orders of magnitude. We study both the general POSGs and its cooperative counterpart DEC-POMDPs. Experiments on a number of problems confirm the scalability of our approach while still providing useful policies.
An Anytime Algorithm for Optimal Coalition Structure Generation
Rahwan, T., Ramchurn, S. D., Jennings, N. R., Giovannucci, A.
Coalition formation is a fundamental type of interaction that involves the creation of coherent groupings of distinct, autonomous, agents in order to efficiently achieve their individual or collective goals. Forming effective coalitions is a major research challenge in the field of multi-agent systems. Central to this endeavour is the problem of determining which of the many possible coalitions to form in order to achieve some goal. This usually requires calculating a value for every possible coalition, known as the coalition value, which indicates how beneficial that coalition would be if it was formed. Once these values are calculated, the agents usually need to find a combination of coalitions, in which every agent belongs to exactly one coalition, and by which the overall outcome of the system is maximized. However, this coalition structure generation problem is extremely challenging due to the number of possible solutions that need to be examined, which grows exponentially with the number of agents involved. To date, therefore, many algorithms have been proposed to solve this problem using different techniques ranging from dynamic programming, to integer programming, to stochastic search all of which suffer from major limitations relating to execution time, solution quality, and memory requirements. With this in mind, we develop an anytime algorithm to solve the coalition structure generation problem. Specifically, the algorithm uses a novel representation of the search space, which partitions the space of possible solutions into sub-spaces such that it is possible to compute upper and lower bounds on the values of the best coalition structures in them. These bounds are then used to identify the sub-spaces that have no potential of containing the optimal solution so that they can be pruned. The algorithm, then, searches through the remaining sub-spaces very efficiently using a branch-and-bound technique to avoid examining all the solutions within the searched subspace(s). In this setting, we prove that our algorithm enumerates all coalition structures efficiently by avoiding redundant and invalid solutions automatically. Moreover, in order to effectively test our algorithm we develop a new type of input distribution which allows us to generate more reliable benchmarks compared to the input distributions previously used in the field. Given this new distribution, we show that for 27 agents our algorithm is able to find solutions that are optimal in 0.175% of the time required by the fastest available algorithm in the literature. The algorithm is anytime, and if interrupted before it would have normally terminated, it can still provide a solution that is guaranteed to be within a bound from the optimal one. Moreover, the guarantees we provide on the quality of the solution are significantly better than those provided by the previous state of the art algorithms designed for this purpose. For example, for the worst case distribution given 25 agents, our algorithm is able to find a 90% efficient solution in around 10% of time it takes to find the optimal solution.
Asynchronous Forward Bounding for Distributed COPs
Gershman, A., Meisels, A., Zivan, R.
A new search algorithm for solving distributed constraint optimization problems (DisCOPs) is presented. Agents assign variables sequentially and compute bounds on partial assignments asynchronously. The asynchronous bounds computation is based on the propagation of partial assignments. The asynchronous forward-bounding algorithm (AFB) is a distributed optimization search algorithm that keeps one consistent partial assignment at all times. The algorithm is described in detail and its correctness proven. Experimental evaluation shows that AFB outperforms synchronous branch and bound by many orders of magnitude, and produces a phase transition as the tightness of the problem increases. This is an analogous effect to the phase transition that has been observed when local consistency maintenance is applied to MaxCSPs. The AFB algorithm is further enhanced by the addition of a backjumping mechanism, resulting in the AFB-BJ algorithm. Distributed backjumping is based on accumulated information on bounds of all values and on processing concurrently a queue of candidate goals for the next move back. The AFB-BJ algorithm is compared experimentally to other DisCOP algorithms (ADOPT, DPOP, OptAPO) and is shown to be a very efficient algorithm for DisCOPs.
Resource Adaptive Agents in Interactive Theorem Proving
Benzmueller, Christoph, Sorge, Volker
We introduce a resource adaptive agent mechanism which supports the user in interactive theorem proving. The mechanism uses a two layered architecture of agent societies to suggest appropriate commands together with possible command argument instantiations. Experiments with this approach show that its effectiveness can be further improved by introducing a resource concept. In this paper we provide an abstract view on the overall mechanism, motivate the necessity of an appropriate resource concept and discuss its realization within the agent architecture.
On the Value of Correlation
Ashlagi, I., Monderer, D., Tennenholtz, M.
Correlated equilibrium generalizes Nash equilibrium to allow correlation devices. Correlated equilibrium captures the idea that in many systems there exists a trusted administrator who can recommend behavior to a set of agents, but can not enforce such behavior. This makes this solution concept most appropriate to the study of multi-agent systems in AI. Aumann showed an example of a game, and of a correlated equilibrium in this game in which the agents' welfare (expected sum of players' utilities) is greater than their welfare in all mixed-strategy equilibria. Following the idea initiated by the price of anarchy literature this suggests the study of two major measures for the value of correlation in a game with nonnegative payoffs: 1. The ratio between the maximal welfare obtained in a correlated equilibrium to the maximal welfare obtained in a mixed-strategy equilibrium. We refer to this ratio as the mediation value. 2. The ratio between the maximal welfare to the maximal welfare obtained in a correlated equilibrium. We refer to this ratio as the enforcement value. In this work we initiate the study of the mediation and enforcement values, providing several general results on the value of correlation as captured by these concepts. We also present a set of results for the more specialized case of congestion games, a class of games that received a lot of attention in the recent literature.
Learning to Reach Agreement in a Continuous Ultimatum Game
de Jong, S., Uyttendaele, S., Tuyls, K.
It is well-known that acting in an individually rational manner, according to the principles of classical game theory, may lead to sub-optimal solutions in a class of problems named social dilemmas. In contrast, humans generally do not have much difficulty with social dilemmas, as they are able to balance personal benefit and group benefit. As agents in multi-agent systems are regularly confronted with social dilemmas, for instance in tasks such as resource allocation, these agents may benefit from the inclusion of mechanisms thought to facilitate human fairness. Although many of such mechanisms have already been implemented in a multi-agent systems context, their application is usually limited to rather abstract social dilemmas with a discrete set of available strategies (usually two). Given that many real-world examples of social dilemmas are actually continuous in nature, we extend this previous work to more general dilemmas, in which agents operate in a continuous strategy space. The social dilemma under study here is the well-known Ultimatum Game, in which an optimal solution is achieved if agents agree on a common strategy. We investigate whether a scale-free interaction network facilitates agents to reach agreement, especially in the presence of fixed-strategy agents that represent a desired (e.g. human) outcome. Moreover, we study the influence of rewiring in the interaction network. The agents are equipped with continuous-action learning automata and play a large number of random pairwise games in order to establish a common strategy. From our experiments, we may conclude that results obtained in discrete-strategy games can be generalized to continuous-strategy games to a certain extent: a scale-free interaction network structure allows agents to achieve agreement on a common strategy, and rewiring in the interaction network greatly enhances the agents' ability to reach agreement. However, it also becomes clear that some alternative mechanisms, such as reputation and volunteering, have many subtleties involved and do not have convincing beneficial effects in the continuous case.