Agents
Multi-Agent Path Finding with Payload Transfers and the Package-Exchange Robot-Routing Problem
Ma, Hang (University of Southern California) | Tovey, Craig (Georgia Institute of Technology) | Sharon, Guni (Ben-Gurion University of the Negev) | Kumar, T. K. Satish (University of Southern California) | Koenig, Sven (University of Southern California)
We study transportation problems where robots have to deliver packages and can transfer the packages among each other. Specifically, we study the package-exchange robot-routing problem (PERR), where each robot carries one package, any two robots in adjacent locations can exchange their packages, and each package needs to be delivered to a given destination. We prove that exchange operations make all PERR instances solvable. Yet, we also show that PERR is NP-hard to approximate within any factor less than 4/3 for makespan minimization and is NP-hard to solve for flowtime minimization, even when there are only two types of packages. Our proof techniques also generate new insights into other transportation problems, for example, into the hardness of approximating optimal solutions to the standard multi-agent path-finding problem (MAPF). Finally, we present optimal and suboptimal PERR solvers that are inspired by MAPF solvers, namely a flow-based ILP formulation and an adaptation of conflict-based search. Our empirical results demonstrate that these solvers scale well and that PERR instances often have smaller makespans and flowtimes than the corresponding MAPF instances.
Robust Execution of BDI Agent Programs by Exploiting Synergies Between Intentions
Yao, Yuan (University of Nottingham) | Logan, Brian (University of Nottingham) | Thangarajah, John (RMIT Universtity)
A key advantage the reactive planning approach adopted by BDI-based agents is the ability to recover from plan execution failures, and almost all BDI agent programming languages and platforms provide some form of failure handling mechanism. In general, these consist of simply choosing an alternative plan for the failed subgoal (e.g., JACK, Jadex). In this paper, we propose an alternative approach to recovering from execution failures that relies on exploiting positive interactions between an agent's intentions. A positive interaction occurs when the execution of an action in one intention assists the execution of actions in other intentions (e.g., by (re)establishing their preconditions). We have implemented our approach in a scheduling algorithm for BDI agents which we call SP. The results of a preliminary empirical evaluation of SP suggest our approach out-performs existing failure handling mechanisms used by state-of-the-art BDI languages. Moreover, the computational overhead of SP is modest.
Is It Harmful When Advisors Only Pretend to Be Honest?
Wang, Dongxia (Nanyang Technological University) | Muller, Tim (Nanyang Technological University) | Zhang, Jie (Nanyang Technological University) | Liu, Yang (Nanyang Technological University)
In trust systems, unfair rating attacks โ where advisors provide ratings dishonestly โ influence the accuracy of trust evaluation. A secure trust system should function properly under all possible unfair rating attacks; including dynamic attacks. In the literature, camouflage attacks are the most studied dynamic attacks. But an open question is whether more harmful dynamic attacks exist. We propose random processes to model and measure dynamic attacks. The harm of an attack is influenced by a user's ability to learn from the past. We consider three types of users: blind users, aware users, and general users. We found for all the three types, camouflage attacks are far from the most harmful. We identified the most harmful attacks, under which we found the ratings may still be useful to users.
ConTaCT: Deciding to Communicate during Time-Critical Collaborative Tasks in Unknown, Deterministic Domains
Unhelkar, Vaibhav V. (Massachusetts Institute of Technology) | Shah, Julie A. (Massachusetts Institute of Technology)
Communication between agents has the potential to improve team performance of collaborative tasks. However, communication is not free in most domains, requiring agents to reason about the costs and benefits of sharing information. In this work, we develop an online, decentralized communication policy, ConTaCT, that enables agents to decide whether or not to communicate during time-critical collaborative tasks in unknown, deterministic environments. Our approach is motivated by real-world applications, including the coordination of disaster response and search and rescue teams. These settings motivate a model structure that explicitly represents the world model as initially unknown but deterministic in nature, and that de-emphasizes uncertainty about action outcomes. Simulated experiments are conducted in which ConTaCT is compared to other multi-agent communication policies, and results indicate that ConTaCT achieves comparable task performance while substantially reducing communication overhead.
Exploiting Anonymity in Approximate Linear Programming: Scaling to Large Multiagent MDPs
Robbel, Philipp (Massachusetts Institute of Technology) | Oliehoek, Frans A. (University of Amsterdam and University of Liverpool) | Kochenderfer, Mykel J. (Stanford University)
Many solution methods for Markov Decision Processes (MDPs) exploit structure in the problem and are based on value function factorization. Especially multiagent settings, however, are known to suffer from an exponential increase in value component sizes as interactions become denser, restricting problem sizes and types that can be handled. We present an approach to mitigate this limitation for certain types of multiagent systems, exploiting a property that can be thought of as "anonymous influence" in the factored MDP. We show how representational benefits from anonymity translate into computational efficiencies, both for variable elimination in a factor graph and for the approximate linear programming solution to factored MDPs. Our methods scale to factored MDPs that were previously unsolvable, such as the control of a stochastic disease process over densely connected graphs with 50 nodes and 25 agents.
Bayesian Learning of Other Agents' Finite Controllers for Interactive POMDPs
Panella, Alessandro (University of Illinois at Chicago) | Gmytrasiewicz, Piotr (University of Illinois at Chicago)
We consider an autonomous agent operating in a stochastic, partially-observable, multiagent environment, that explicitly models the other agents as probabilistic deterministic finite-state controllers (PDFCs) in order to predict their actions. We assume that such models are not given to the agent, but instead must be learned from (possibly imperfect) observations of the other agents' behavior. The agent maintains a belief over the other agents' models, that is updated via Bayesian inference. To represent this belief we place a flexible stick-breaking distribution over PDFCs, that allows the posterior to concentrate around controllers whose size is not bounded and scales with the complexity of the observed data. Since this Bayesian inference task is not analytically tractable, we devise a Markov chain Monte Carlo algorithm to approximate the posterior distribution. The agent then embeds the result of this inference into its own decision making process using the interactive POMDP framework. We show that our learning algorithm can learn agent models that are behaviorally accurate for problems of varying complexity, and that the agent's performance increases as a result.
Learning for Decentralized Control of Multiagent Systems in Large, Partially-Observable Stochastic Environments
Liu, Miao (Massachusetts Institute of Technology) | Amato, Christopher (University of New Hampshire) | Anesta, Emily P. (Massachusetts Institute of Technology) | Griffith, John Daniel (Massachusetts Institute of Technology) | How, Jonathan P (Massachusetts Institute of Technology)
Decentralized partially observable Markov decision processes (Dec-POMDPs) provide a general framework for multiagent sequential decision-making under uncertainty. Although Dec-POMDPs are typically intractable to solve for real-world problems, recent research on macro-actions (i.e., temporally-extended actions) has significantly increased the size of problems that can be solved. However, current methods assume the underlying Dec-POMDP model is known a priori or a full simulator is available during planning time. To accommodate more realistic scenarios, when such information is not available, this paper presents a policy-based reinforcement learning approach, which learns the agent policies based solely on trajectories generated by previous interaction with the environment (e.g., demonstrations). We show that our approach is able to generate valid macro-action controllers and develop an expectationmaximization (EM) algorithm (called Policy-based EM or PoEM), which has convergence guarantees for batch learning. Our experiments show PoEM is a scalable learning method that can learn optimal policies and improve upon hand-coded โexpertโ solutions.
Strengthening Agents Strategic Ability with Communication
Huang, Xiaowei (University of Oxford and Jinan University) | Chen, Qingliang (Jinan University) | Su, Kaile (Griffith University and Jinan University)
The current frameworks of reasoning about agents' collective strategy are either too conservative or too liberal in terms of the sharing of local information between agents. In this paper, we argue that in many cases, a suitable amount of information is required to be communicated between agents to both enforce goals and keep privacy. Several communication operators are proposed to work with an epistemic strategy logic ATLK. The complexity of model checking resulting logics is studied, and surprisingly, we found that the additional expressiveness from the communication operators comes for free.
Efficient Computation of Emergent Equilibrium in Agent-Based Simulation
Hu, Zehong (Nanyang Technological University) | Sha, Meng (Nanyang Technological University) | Jarrah, Moath (Nanyang Technological University) | Zhang, Jie (Nanyang Technological University) | Xi, Hui (Royce Singapore Pte Ltd)
In agent-based simulation, emergent equilibrium describes the macroscopic steady states of agents' interactions. While the state of individual agents might be changing, the collective behavior pattern remains the same in macroscopic equilibrium states. Traditionally, these emergent equilibriums are calculated using Monte Carlo methods. However, these methods require thousands of repeated simulation runs, which are extremely time-consuming. In this paper, we propose a novel three-layer framework to efficiently compute emergent equilibriums. The framework consists of a macro-level pseudo-arclength equilibrium solver (PAES), a micro-level simulator (MLS) and a macro-micro bridge (MMB). It can adaptively explore parameter space and recursively compute equilibrium states using the predictor-corrector scheme. We apply the framework to the popular opinion dynamics and labour market models. The experimental results show that our framework outperformed Monte Carlo experiments in terms of computation efficiency while maintaining the accuracy.
Emergence of Social Punishment and Cooperation through Prior Commitments
Han, The Anh (Teesside University)
Social punishment, whereby cooperators punish defectors, has been suggested as an important mechanism that promotes the emergence of cooperation or maintenance of social norms in the context of the one-shot (i.e. non-repeated) interaction. However, whenever antisocial punishment, whereby defectors punish cooperators, is available, this antisocial behavior outperforms social punishment, leading to the destruction of cooperation. In this paper, we use evolutionary game theory to show that this antisocial behavior can be efficiently restrained by relying on prior commitments, wherein agents can arrange, prior to an interaction, agreements regarding posterior compensation by those who dishonor the agreements. We show that, although the commitment mechanism by itself can guarantee a notable level of cooperation, a significantly higher level is achieved when both mechanisms, those of proposing prior commitments and of punishment, are available in co-presence. Interestingly, social punishment prevails and dominates in this system as it can take advantage of the commitment mechanism to cope with antisocial behaviors. That is, establishment of a commitment system helps to pave the way for the evolution of social punishment and abundant cooperation, even in the presence of antisocial punishment.