Lesser, Victor
DJAO: A Communication-Constrained DCOP Algorithm that Combines Features of ADOPT and Action-GDL
Kim, Yoonheui (University of Massachusetts at Amherst) | Lesser, Victor (University of Massachusetts at Amherst)
In this paper we propose a novel DCOP algorithm, called DJAO, that is able toefficiently find a solution with low communication overhead; this algorithm can be used for optimal and bounded approximate solutions by appropriately setting the error bounds. Our approach builds on distributed junction trees used in Action-GDL to represent independence relationsamong variables. We construct an AND/OR search space based on these junction trees.This new type of search space results in higher degrees for each OR node, consequently yielding a more efficient search graph in the distributed settings. DJAO uses a branch-and-bound search algorithm to distributedly find solutions within this search graph. We introduce heuristics to compute the upper and lower boundestimates that the search starts with, which is integral to our approach for reducing communication overhead. We empirically evaluate our approach in various settings.
Compact Mathematical Programs For DEC-MDPs With Structured Agent Interactions
Mostafa, Hala, Lesser, Victor
To deal with the prohibitive complexity of calculating policies in Decentralized MDPs, researchers have proposed models that exploit structured agent interactions. Settings where most agent actions are independent except for few actions that affect the transitions and/or rewards of other agents can be modeled using Event-Driven Interactions with Complex Rewards (EDI-CR). Finding the optimal joint policy can be formulated as an optimization problem. However, existing formulations are too verbose and/or lack optimality guarantees. We propose a compact Mixed Integer Linear Program formulation of EDI-CR instances. The key insight is that most action sequences of a group of agents have the same effect on a given agent. This allows us to treat these sequences similarly and use fewer variables. Experiments show that our formulation is more compact and leads to faster solution times and better solutions than existing formulations.
Coordinated Multi-Agent Reinforcement Learning in Networked Distributed POMDPs
Zhang, Chongjie (University of Massachusetts Amherst) | Lesser, Victor (University of Massachusetts Amherst)
In many multi-agent applications such as distributed sensor nets, a network of agents act collaboratively under uncertainty and local interactions. Networked Distributed POMDP (ND-POMDP) provides a framework to model such cooperative multi-agent decision making. Existing work on ND-POMDPs has focused on offline techniques that require accurate models, which are usually costly to obtain in practice. This paper presents a model-free, scalable learning approach that synthesizes multi-agent reinforcement learning (MARL) and distributed constraint optimization (DCOP). By exploiting structured interaction in ND-POMDPs, our approach distributes the learning of the joint policy and employs DCOP techniques to coordinate distributed learning to ensure the global learning performance. Our approach can learn a globally optimal policy for ND-POMDPs with a property called groupwise observability. Experimental results show that, with communication during learning and execution, our approach significantly outperforms the nearly-optimal non-communication policies computed offline.
Multi-Agent Learning with Policy Prediction
Zhang, Chongjie (University of Massachusetts Amherst) | Lesser, Victor (University of Massachusetts Amherst)
Due to the non-stationary environment, learning in multi-agent systems is a challenging problem. This paper first introduces a new gradient-based learning algorithm, augmenting the basic gradient ascent approach with policy prediction. We prove that this augmentation results in a stronger notion of convergence than the basic gradient ascent, that is, strategies converge to a Nash equilibrium within a restricted class of iterated games. Motivated by this augmentation, we then propose a new practical multi-agent reinforcement learning (MARL) algorithm exploiting approximate policy prediction. Empirical results show that it converges faster and in a wider variety of situations than state-of-the-art MARL algorithms.
Towards Multiagent Meta-level Control
Cheng, Shanjun (The University of North Carolina at Charlotte) | Raja, Anita (The University of North Carolina at Charlotte) | Lesser, Victor (University of Massachusetts Amherst)
Embedded systems consisting of collaborating agents capable of interacting with their environment are becoming ubiquitous. It is crucial for these systems to be able to adapt to the dynamic and uncertain characteristics of an open environment. In this paper, we argue that multiagent meta-level control (MMLC) is an effective way to determine when this adaptation process should be done and how much effort should be invested in adaptation as opposed to continuing with the current action plan. We describe a reinforcement learning based approach to learn decentralized meta-control policies offline. We then propose to use the learned reward model as input to a global optimization algorithm to avoid conflicting meta-level decisions between coordinating agents. Our initial experiments in the context of NetRads, a multiagent tornado tracking application show that MMLC significantly improves performance in a 3-agent network.
An Ensemble Learning and Problem Solving Architecture for Airspace Management
Zhang, Xiaoqin (Shelly) (University of Massachusetts) | Yoon, Sungwook (Arizona State University) | DiBona, Phillip (Lockheed Martin ย Advanced Technology Laboratories) | Appling, Darren (Georgia Institute of Technology) | Ding, Li (Rensselaer Polytechnic Institute) | Doppa, Janardhan (Oregon State University) | Green, Derek (University of Wyoming) | Guo, Jinhong (Lockheed Martin Advanced Technology Laboratories) | Kuter, Ugur (University of Maryland) | Levine, Geoff (University of Illinois at Urbana) | MacTavish, Reid (Georgia Institute of Technology) | McFarlane, Daniel (Lockheed Martin Advanced Technology Laboratories) | Michaelis, James (Rensselaer Polytechnic Institute) | Mostafa, Hala (University of Massachusetts) | Ontanon, Santiago (Georgia Institute of Technology) | Parker, Charles (Georgia Institute of Technology) | Radhakrishnan, Jainarayan (University of Wyoming) | Rebguns, Anton (University of Massachusetts) | Shrestha, Bhavesh (Fujitsu Laboratories of America) | Song, Zhexuan (Georgia Institute of Technology) | Trewhitt, Ethan (University of Massachusetts) | Zafar, Huzaifa (University of Massachusetts) | Zhang, Chongjie (University of Massachusetts) | Corkill, Daniel (University of Illinois at Urbana-Champaign) | DeJong, Gerald (Oregon State University) | Dietterich, Thomas (Arizona State University) | Kambhampati, Subbarao (University of Massachusetts) | Lesser, Victor (Rensselaer Polytechnic Institute) | McGuinness, Deborah L. (Georgia Institute of Technology) | Ram, Ashwin (University of Wyoming) | Spears, Diana (Oregon State University) | Tadepalli, Prasad (Georgia Institute of Technology) | Whitaker, Elizabeth (Oregon State University) | Wong, Weng-Keen (Rensselaer Polytechnic Institute) | Hendler, James (Lockheed Martin Advanced Technology Laboratories) | Hofmann, Martin (Lockheed Martin Advanced Technology Laboratories) | Whitebread, Kenneth
In this paper we describe the application of a novel learning and problem solving architecture to the domain of airspace management, where multiple requests for the use of airspace need to be reconciled and managed automatically. The key feature of our "Generalized Integrated Learning Architecture" (GILA) is a set of integrated learning and reasoning (ILR) systems coordinated by a central meta-reasoning executive (MRE). Each ILR learns independently from the same training example and contributes to problem-solving in concert with other ILRs as directed by the MRE. Formal evaluations show that our system performs as well as or better than humans after learning from the same training data. Further, GILA outperforms any individual ILR run in isolation, thus demonstrating the power of the ensemble architecture for learning and problem solving.
Leveled-Commitment Contracting: A Backtracking Instrument for Multiagent Systems
Sandholm, Tuomas, Lesser, Victor
In (automated) negotiation systems for self-interested agents, contracts have traditionally been binding. The level of commitment is set by decommitting penalties. To be freed from the contract, an agent simply pays its penalty to the other contract party(ies). We show that despite such strategic decommitting, leveled commitment increases the expected payoffs of all contract parties and can enable deals that are impossible under full commitment.
Leveled-Commitment Contracting: A Backtracking Instrument for Multiagent Systems
Sandholm, Tuomas, Lesser, Victor
In (automated) negotiation systems for self-interested agents, contracts have traditionally been binding. They do not accommodate future events. Contingency contracts address this but are often impractical. As an alternative, we propose leveledcommitment contracts. The level of commitment is set by decommitting penalties. To be freed from the contract, an agent simply pays its penalty to the other contract party(ies). A self-interested agent will be ruluctant to decommit because some other contract party might decommit, in which case the former agent gets freed from the contract, does not incur a penalty, and collects a penalty from the other party. We show that despite such strategic decommitting, leveled commitment increases the expected payoffs of all contract parties and can enable deals that are impossible under full commitment. Different decommitting mechanisms are introduced and compared. Practical prescriptions for market designers are presented. A contract optimizer, ECOMMITTER, is provided on the web.