Agents
Decision Support for Agent Populations in Uncertain and Congested Environments
Varakantham, Pradeep (Singapore Management University) | Cheng, Shih-Fen (Singapore Management University) | Gordon, Geoff (Carnegie Mellon University) | Ahmed, Asrar (Singapore Management University)
This research is motivated by large scale problems in urban transportation and labor mobility where there is congestion for resources and uncertainty in movement. In such domains, even though the individual agents do not have an identity of their own and do not explicitly interact with other agents, they effect other agents. While there has been much research in handling such implicit effects, it has primarily assumed de- terministic movements of agents. We address the issue of decision support for individual agents that are identical and have involuntary movements in dynamic environments. For instance, in a taxi fleet serving a city, when a taxi is hired by a customer, its movements are uncontrolled and depend on (a) the customers requirement; and (b) the location of other taxis in the fleet. Towards addressing decision support in such problems, we make two key contributions: (a) A framework to represent the decision problem for selfish individuals in a dynamic population, where there is transitional uncertainty (involuntary movements); and (b) Two techniques (Fictitious Play for Symmetric Agent Populations, FP-SAP and Soft- max based Flow Update, SMFU) that converge to equilibrium solutions. We show that our techniques (apart from providing equilibrium strategies) outperform “driver” strategies with re- spect to overall availability of taxis and the revenue obtained by the taxi drivers. We demonstrate this on a real world data set with 8,000 taxis and 83 zones (representing the entire area of Singapore).
Optimal Auctions for Spiteful Bidders
Tang, Pingzhong (Carnegie Mellon University) | Sandholm, Tuomas (Carnegie Mellon University)
Designing revenue-optimal auctions for various settings is perhaps the most important, yet sometimes most elusive, problem in mechanism design. Spiteful bidders have been intensely studied recently, especially because spite occurs in many applications in multiagent system and electronic commerce. We derive the optimal auction for such bidders (as well as bidders that are altruistic). It is a generalization of Myerson’s (1981) auction. It chooses an allocation that maximizes agents’ virtual valuations, but for a generalized definition of virtual valuation. The payment rule is less intuitive. For one, it takes each bidder’s own report into consideration when determining his payment. Moreover, bidders pay even if the seller keeps the item; a similar phenomenon has been shown in other settings with neg- ative externalities (Jehiel, Moldovanu, and Stacchetti 1996; Deng and Pekec 2011). On the other hand, a novel aspect of our auction is that it sometimes subsidizes losers when the item is sold to some other bidder. We also derive a revenue equivalence theorem for this setting. Using it, we generate a short proof of (a slight generalization of) the previously known result that, in two-bidder settings with independently uniformly drawn valuations, second-price auctions yield greater expected revenue than first-price auctions. Finally, we present a template for comparing the expected revenues of any two auction mechanisms that have the same allocation rule (for the valuations distributions at hand).
Negotiation in Exploration-Based Environment
Sofer, Israel (Bar Ilan University) | Sarne, David (Bar Ilan University) | Hassidim, Avinatan (Bar Ilan University)
This paper studies repetitive negotiation over the execution of an exploration process between two self-interested, fully rational agents in a full information environmentwith side payments. A key aspect of the protocolis that the exploration’s execution may interleaves ith the negotiation itself, inflicting some degradationon the exploration’s flexibility. The advantage of this form of negotiation is in enabling the agents supervising that the exploration’s execution takes place in its agreedform as negotiated. We show that in many cases, much of the computational complexity of the new protocol can be eliminated by solving an alternative negotiation scheme according to which the parties first negotiate theexploration terms as a whole and then execute it. As demonstrated in the paper, the solution characteristics of the new protocol are somehow different from thoseof legacy negotiation protocols where the execution of the agreement reached through the negotiation is completely separated from the negotiation process. Furthermore, if the agents are given the option to control some of the negotiation protocol parameters, the resulting exploration may be suboptimal. In particular we show that the increase in an agent’s expected utility in such casesis unbounded and so is the resulting decrease in the social welfare. Surprisingly, we show that further increasingone of the agents’ level of control in some of thenegotiation parameters enables bounding the resultingdecrease in the social welfare.
A Hybrid Algorithm for Coalition Structure Generation
Rahwan, Talal (University of Southampton) | Michalak, Tomasz (University of Warsaw) | Jennings, Nicholas (University of Southampton)
The current state-of-the-art algorithm for optimal coalition structure generation is IDP-IP — an algorithm that combines IDP (a dynamic programming algorithm due to Rahwan and Jennings, AAAI'08) with IP (a tree-search algorithm due to Rahwan et al., JAIR'09). In this paper we analyse IDP-IP, highlight its limitations, and then develop a new approach for combining IDP with IP that overcomes these limitations.
A Scalable Message-Passing Algorithm for Supply Chain Formation
Penya-Alba, Toni (Instituto de Investigación en Inteligencia Artificial (IIIA) Consejo Superior de Investigaciones Cientificas (CSIC)) | Vinyals, Meritxell (University of Verona) | Cerquides, Jesus (Instituto de Investigación en Inteligencia Artificial (IIIA) Consejo Superior de Investigaciones Cientificas (CSIC)) | Rodriguez-Aguilar, Juan A. (Instituto de Investigación en Inteligencia Artificial (IIIA) Consejo Superior de Investigaciones Cientificas (CSIC))
Supply Chain Formation (SCF) is the process of determining the participants in a supply chain, who will exchange what with whom, and the terms of the exchanges. Decentralized SCF appears as a highly intricate task because agents only possess local information and have limited knowledge about the capabilities of other agents. The decentralized SCF problem has been recently cast as an optimization problem that can be efficiently approximated using max-sum loopy belief propagation. Along this direction, in this paper we propose a novel encoding of the problem into a binary factor graph (containing only binary variables) as well as an alternative algorithm. We empirically show that our approach allows to significantly increase scalability, hence allowing to form supply chains in market scenarios with a large number of participants and high competition.
Tree-Based Solution Methods for Multiagent POMDPs with Delayed Communication
Oliehoek, Frans Adriaan (Maastricht University) | Spaan, Matthijs T. J. (Delft University of Technology)
Multiagent Partially Observable Markov Decision Processes (MPOMDPs) provide a powerful framework for optimal decision making under the assumption of instantaneous communication. We focus on a delayed communication setting (MPOMDP-DC), in which broadcasted information is delayed by at most one time step. This model allows agents to act on their most recent (private) observation. Such an assumption is a strict generalization over having agents wait until the global information is available and is more appropriate for applications in which response time is critical. In this setting, however, value function backups are significantly more costly, and naive application of incremental pruning, the core of many state-of-the-art optimal POMDP techniques, is intractable. In this paper, we overcome this problem by demonstrating that computation of the MPOMDP-DC backup can be structured as a tree and introducing two novel tree-based pruning techniques that exploit this structure in an effective way. We experimentally show that these methods have the potential to outperform naive incremental pruning by orders of magnitude, allowing for the solution of larger problems.
Bayes-Adaptive Interactive POMDPs
Ng, Brenda (Lawrence Livermore National Laboratory) | Boakye, Kofi (Lawrence Livermore National Laboratory) | Meyers, Carol (Lawrence Livermore National Laboratory) | Wang, Andrew (Massachusetts Institute of Technology)
We introduce the Bayes-Adaptive Interactive Partially Observable Markov Decision Process (BA-IPOMDP), the first multiagent decision model that explicitly incorporates model learning. As in I-POMDPs, the BA-IPOMDP agent maintains beliefs over interactive states, which include the physical states as well as the other agents’ models. The BA-IPOMDP assumes that the state transition and observation probabilities are unknown, and augments the interactive states to include these parameters. Beliefs are maintained over this augmented interactive state space. This (necessary) state expansion exacerbates the curse of dimensionality, especially since each I-POMDP belief update is already a recursive procedure (because an agent invokes belief updates from other agents’ perspectives as part of its own belief update, in order to anticipate other agents’ actions). We extend the interactive particle filter to perform approximate belief update on BA-IPOMDPs. We present our findings on the multiagent Tiger problem.
Congestion Games with Agent Failures
Meir, Reshef (Hebrew University and Microsoft Research, Herzlia) | Tennenholtz, Moshe (Technion-Israel Institute of Technology and Microsoft Research, Herzlia) | Bachrach, Yoram (Microsoft Research, Cambridge) | Key, Peter (Microsoft Research, Cambridge)
We propose a natural model for agent failures in congestion games. In our model, each of the agents may fail to participate in the game, introducing uncertainty regarding the set of active agents. We examine how such uncertainty may change the Nash equilibria (NE) of the game. We prove that although the perturbed game induced by the failure model is not always a congestion game, it still admits at least one pure Nash equilibrium. Then, we turn to examine the effect of failures on the maximal social cost in any NE of the perturbed game. We show that in the limit case where failure probability is negligible new equilibria never emerge, and that the social cost may decrease but it never increases. For the case of non-negligible failure probabilities, we provide a full characterization of the maximal impact of failures on the social cost under worst-case equilibrium outcomes.
Characterizing Multi-Agent Team Behavior from Partial Team Tracings: Evidence from the English Premier League
Lucey, Patrick (Disney Research Pittsburgh) | Bialkowski, Alina (Queensland University of Technology and Disney Research Pittsburgh) | Carr, Peter (Disney Research Pittsburgh) | Foote, Eric (Disney Research Pittsburgh) | Matthews, Iain (Disney Research Pittsburgh)
Real-world AI systems have been recently deployed which can automatically analyze the plan and tactics of tennis players. As the game-state is updated regularly at short intervals (i.e. point-level), a library of successful and unsuccessful plans of a player can be learnt over time. Given the relative strengths and weaknesses of a player’s plans, a set of proven plans or tactics from the library that characterize a player can be identified. For low-scoring, continuous team sports like soccer, such analysis for multi-agent teams does not exist as the game is not segmented into “discretized” plays (i.e. plans), making it difficult to obtain a library that characterizes a team’s behavior. Additionally, as player tracking data is costly and difficult to obtain, we only have partial team tracings in the form of ball actions which makes this problem even more difficult. In this paper, we propose a method to overcome these issues by representing team behavior via play-segments, which are spatio-temporal descriptions of ball movement over fixed windows of time. Using these representations we can characterize team behavior from entropy maps, which give a measure of predictability of team behaviors across the field. We show the efficacy and applicability of our method on the 2010-2011 English Premier League soccer data.
Competing with Humans at Fantasy Football: Team Formation in Large Partially-Observable Domains
Matthews, Tim (University of Southampton) | Ramchurn, Sarvapali D. (University of Southampton) | Chalkiadakis, Georgios (Technical University of Crete)
We present the first real-world benchmark for sequentially-optimal team formation, working within the framework of a class of online football prediction games known as Fantasy Football. We model the problem as a Bayesian reinforcement learning one, where the action space is exponential in the number of players and where the decision maker's beliefs are over multiple characteristics of each footballer. We then exploit domain knowledge to construct computationally tractable solution techniques in order to build a competitive automated Fantasy Football manager. Thus, we are able to establish the baseline performance in this domain, even without complete information on footballers' performances (accessible to human managers), showing that our agent is able to rank at around the top percentile when pitched against 2.5M human players.