Agents
A Centralized Multi-Agent Negotiation Approach to Collaborative Air Traffic Resource Management Planning
Jarvis, Peter A. (NASA Ames Research Center) | Wolfe, Shawn R. (NASA Ames Research Center) | Enomoto, Francis Y. (NASA Ames Research Center) | Nado, Robert A. (Stinger Ghaffarian Technologies Inc) | Sierhuis, Maarten (NASA Ames Research Center)
Demand and capacity imbalances in the US national airspace are resolved using traffic management initiatives designed, in current operations, with little collaboration with the airspace users. NASA and its partners have developed a new collaborative concept of operations that requires the users and airspace service provider to work together to choose initiatives that better satisfy the business needs of the users while also ensuring safety to the same standard as today. In this paper, we describe an approach to implementing this concept through a software negotiation framework underpinned by technology developed in the artificial intelligence community. We describe our exploration of peer-to-peer negotiation and how the number of conversation threads and the time sensitivity of offer acceptance led us to a centralized approach. The centralized approach uses hill climbing to evaluate airport slot allocations from a user perspective and a linear programming solver to seek solutions compatible across the user community. Our experiments with full sized problems identify the potential operational benefits as well as limitations, and where future research needs to be focused.
Multi-Agent Learning with Policy Prediction
Zhang, Chongjie (University of Massachusetts Amherst) | Lesser, Victor (University of Massachusetts Amherst)
Due to the non-stationary environment, learning in multi-agent systems is a challenging problem. This paper first introduces a new gradient-based learning algorithm, augmenting the basic gradient ascent approach with policy prediction. We prove that this augmentation results in a stronger notion of convergence than the basic gradient ascent, that is, strategies converge to a Nash equilibrium within a restricted class of iterated games. Motivated by this augmentation, we then propose a new practical multi-agent reinforcement learning (MARL) algorithm exploiting approximate policy prediction. Empirical results show that it converges faster and in a wider variety of situations than state-of-the-art MARL algorithms.
Trial-Based Dynamic Programming for Multi-Agent Planning
Wu, Feng (University of Science and Technology of China) | Zilberstein, Shlomo (University of Massachusetts Amherst) | Chen, Xiaoping (University of Science and Technology of China)
Trial-based approaches offer an efficient way to solve single-agent MDPs and POMDPs. These approaches allow agents to focus their computations on regions of the environment they encounter during the trials, leading to significant computational savings. We present a novel trial-based dynamic programming (TBDP) algorithm for DEC-POMDPs that extends these benefits to multi-agent settings. The algorithm uses trial-based methods for both belief generation and policy evaluation. Policy improvement is implemented efficiently using linear programming and a sub-policy reuse technique that helps bound the amount of memory. The results show that TBDP can produce significant value improvements and is much faster than the best existing planning algorithms.
A Decentralised Coordination Algorithm for Mobile Sensors
Stranders, Ruben (University of Southampton) | Fave, Francesco Maria Delle (University of Southampton) | Rogers, Alex (University of Southampton) | Jennings, Nicholas R. (University of Southampton)
We present an on-line decentralised algorithm for coordinating mobile sensors for a broad class of information gathering tasks. These sensors can be deployed in unknown and possibly hostile environments, where uncertainty and dynamism are endemic. Such environments are common in the areas of disaster response and military surveillance. Our coordination approach itself is based on work by Stranders et al. (2009), that uses the max-sum algorithm to coordinate mobile sensors for monitoring spatial phenomena. In particular, we generalise and extend their approach to any domain where measurements can be valued. Also, we introduce a clustering approach that allows sensors to negotiate over paths to the most relevant locations, as opposed to a set of fixed directions, which results in a significantly improved performance. We demonstrate our algorithm by applying it to two challenging and distinct information gathering tasks. In the first–pursuit-evasion (PE)–sensors need to capture a target whose movement might be unknown. In the second–patrolling (P)–sensors need to minimise loss from intrusions that occur within their environment. In doing so, we obtain the first decentralised coordination algorithms for these domains. Finally, in each domain, we empirically evaluate our approach in a simulated environment, and show that it outperforms two state of the art greedy algorithms by 30% (PE) and 44% (P), and an existing approach based on the Travelling Salesman Problem by 52% (PE) and 30% (P).
Envy Quotes and the Iterated Core-Selecting Combinatorial Auction
Othman, Abraham (Carnegie Mellon University) | Sandholm, Tuomas (Carnegie Mellon University)
Using a model of agent behavior based around envy-reducing strategies, we describe an iterated combinatorial auction in which the allocation and prices converge to a solution in the core of the agents' true valuations. In each round of the iterative auction mechanism, agents act on envy quotes produced by the mechanism: hints that suggest the prices of the bundles they are interested in. We describe optimal methods of generating envy quotes for various core-selecting mechanisms. Prior work on core-selecting combinatorial auctions has required agents to have perfect information about every agent's valuations to achieve a solution in the core. In contrast, here a core solution is reached even in the private information setting.
Convergence to Equilibria in Plurality Voting
Meir, Reshef (The Hebrew University of Jerusalem) | Polukarov, Maria (University of Southampton) | Rosenschein, Jeffrey S. (The Hebrew University of Jerusalem) | Jennings, Nicholas R. (University of Southampton)
Multi-agent decision problems, in which independent agents have to agree on a joint plan of action or allocation of resources, are central to AI. In such situations, agents' individual preferences over available alternatives may vary, and they may try to reconcile these differences by voting. Based on the fact that agents may have incentives to vote strategically and misreport their real preferences, a number of recent papers have explored different possibilities for avoiding or eliminating such manipulations. In contrast to most prior work, this paper focuses on convergence of strategic behavior to a decision from which no voter will want to deviate. We consider scenarios where voters cannot coordinate their actions, but are allowed to change their vote after observing the current outcome. We focus on the Plurality voting rule, and study the conditions under which this iterative game is guaranteed to converge to a Nash equilibrium (i.e., to a decision that is stable against further unilateral manipulations). We show for the first time how convergence depends on the exact attributes of the game, such as the tie-breaking scheme, and on assumptions regarding agents' weights and strategies.
Learning Simulation Control in General Game-Playing Agents
Finnsson, Hilmar (Reykjavik University) | Björnsson, Yngvi (Reykjavik University)
The aim of General Game Playing (GGP) is to create intelligent agents that can automatically learn how to play many different games at an expert level without any human intervention. One of the main challenges such agents face is to automatically learn knowledge-based heuristics in real-time, whether for evaluating game positions or for search guidance. In recent years, GGP agents that use Monte-Carlo simulations to reason about their actions have become increasingly more popular. For competitive play such an approach requires an effective search-control mechanism for guiding the simulation playouts. In here we introduce several schemes for automatically learning search guidance based on both statistical and reinforcement learning techniques. We compare the different schemes empirically on a variety of games and show that they improve significantly upon the current state-of-the-art in simulation-control in GGP. For example, in the chess-like game Skirmish, which has proved a particularly challenging game for simulation-based GGP agents, an agent employing one of the proposed schemes achieves 97% winning rate against an unmodified agent.
A Testbed for Investigating Task Allocation Strategies between Air Traffic Controllers and Automated Agents
Schurr, Nathan (Aptima, Inc.) | Good, Richard (Aptima, Inc.) | Alexander, Amy (Aptima, Inc.) | Picciano, Paul (Aptima, Inc.) | Ganberg, Gabriel (Aptima, Inc.) | Therrien, Michael (Aptima, Inc.) | Beard, Bettina L. (NASA Ames Research Center) | Holbrook, Jon (San Jose State University Research Foundation)
To meet the growing demands of the National Airspace System (NAS) stakeholders and provide the level of service, safety and security needed to sustain future air transport, the Next Generation Air Transportation System (NextGen) concept calls for technologies and systems offering increasing support from automated systems that provide decision-aiding and optimization capabilities. This is an exciting application for some core aspects of Artificial Intelligence research since the automation must be designed to enable the human operators to access and process a myriad of information sources, understand heightened system complexity, and maximize capacity, throughput and fuel savings in the NAS.. This paper introduces an emerging application of techniques from mixed initiative (adjustable autonomy), multi-agent systems, and task scheduling techniques to the air traffic control domain. Consequently, we have created a testbed for investigating the critical challenges in supporting the early design of systems that allow for optimal, context-sensitive function (role) allocation between air traffic controller and automated agents. A pilot study has been conducted with the testbed and preliminary results show a marked qualitative improvement in using dynamic function allocation optimization versus static function allocation.
A Wiki with Multiagent Tracking, Modeling, and Coalition Formation
Khandaker, Nobel (University of Nebraska - Lincoln) | Soh, Leen-Kiat (University of Nebraska - Lincoln)
Wikis are being increasingly used as a tool for conducting colla-borative writing assignments in today’s classrooms. However, Wikis in general (1) do not provide group formation methods to more specifically facilitate collaborative learning of the students and (2) suffer from typical problems of collaborative learning like detection of free-riding (earning credit without contribution). To improve the state of the art of the use of Wikis as a collaborative writing tool, we have designed and implemented ClassroomWiki - a Web-based collaborative Wiki that utilizes a set of learner pedagogy theories to provide multiagent-based tracking, modeling, and group formation functionalities. For the students, ClassroomWiki provides a Web interface for writing and revising their group’s Wiki and a topic-based forum for discussing their ideas during collaboration. When the students collaborate, ClassroomWiki’s agents track all student activities to learn a model of the students and use a Bayesian Network to learn a probabilistic mapping that describes the ability of a group of students with a specific set of models to work together. For the teacher, Clas-sroomWiki provides a framework that uses the learned student models and the mapping to form student groups to improve the collaborative learning of students. ClassroomWiki was deployed in three university-level courses and the results suggest that ClassroomWiki can (1) form better student groups that improve stu-dent learning and collaboration and (2) alleviate free-riding and allow the instructor to provide scaffolding by its multiagent-based tracking and modeling.
Agent-Based Decision Support: A Case-Study on DSL Access Networks
Bsufka, Karsten (Technische Universität Berlin) | Bye, Rainer (Technische Universität Berlin) | Chinnow, Joël (Technische Universität Berlin) | Schmidt, Stephan (Technische Universität Berlin) | Batyuk, Leonid (Technische Universität Berlin)
Network management is a complex task involving various challenges, such as the heterogeneity of the infrastructure or the information flood caused by billions of log messages from different systems and operated by different organiza- tional units. All of these messages and systems may contain information relevant to other operational units. For example, in order to ensure reliable DSL connections for IPTV cus- tomers, optimal customer traffic path assignments for the current network state and traffic demands need to be evalu- ated. Currently reassignments are only manually performed during routine maintenance or as a response to reported problems. In this paper we present a decision support sys- tem for this task. In addition, the system predicts future pos- sible demands and allows reconfigurations of a DSL access network before congestions may occur.