Agents
Bounding the Probability of Resource Constraint Violations in Multi-Agent MDPs
Nijs, Frits de (Delft University of Technology) | Walraven, Erwin (Delft University of Technology) | Weerdt, Mathijs M. de (Delft University of Technology) | Spaan, Matthijs T. J. (Delft University of Technology)
Multi-agent planning problems with constraints on global resource consumption occur in several domains. Existing algorithms for solving Multi-agent Markov Decision Processes can compute policies that meet a resource constraint in expectation, but these policies provide no guarantees on the probability that a resource constraint violation will occur. We derive a method to bound constraint violation probabilities using Hoeffding's inequality. This method is applied to two existing approaches for computing policies satisfying constraints: the Constrained MDP framework and a Column Generation approach. We also introduce an algorithm to adaptively relax the bound up to a given maximum violation tolerance. Experiments on a hard toy problem show that the resulting policies outperform static optimal resource allocations to an arbitrary level. By testing the algorithms on more realistic planning domains from the literature, we demonstrate that the adaptive bound is able to efficiently trade off violation probability with expected value, outperforming state-of-the-art planners.
Collective Multiagent Sequential Decision Making Under Uncertainty
Nguyen, Duc Thien (Singapore Management University) | Kumar, Akshat (Singapore Management University) | Lau, Hoong Chuin (Singapore Management University)
Multiagent sequential decision making has seen rapid progress with formal models such as decentralized MDPs and POMDPs. However, scalability to large multiagent systems and applicability to real world problems remain limited. To address these challenges, we study multiagent planning problems where the collective behavior of a population of agents affects the joint-reward and environment dynamics. Our work exploits recent advances in graphical models for modeling and inference with a population of individuals such as collective graphical models and the notion of finite partial exchangeability in lifted inference. We develop a collective decentralized MDP model where policies can be computed based on counts of agents in different states. As the policy search space over counts is combinatorial, we develop a sampling based framework that can compute open and closed loop policies. Comparisons with previous best approaches on synthetic instances and a real world taxi dataset modeling supply-demand matching show that our approach significantly outperforms them w.r.t.solution quality.
Solving Seven Open Problems of Offline and Online Control in Borda Elections
Neveling, Marc (Heinrich-Heine-Universität Düsseldorf) | Rothe, Jörg (Heinrich-Heine-Universität Düsseldorf)
Standard (offline) control scenarios in elections (such as adding, deleting, or partitioning either voters or candidates) have been studied for many voting systems, natural and less natural ones, and the related control problems have been classified in terms of their complexity. However, for one of the most important natural voting systems, the Borda Count, only a few such complexity results are known. We reduce the number of missing cases by pinpointing the complexity of three control scenarios for Borda elections, including some that arguably are among the practically most relevant ones. We also study online candidate control, an interesting dynamical, partial-information model due to Hemaspaandra et al. (2012a), who mainly focused on general complexity bounds by constructing artificial voting systems—only recently they succeeded in classifying four problems of online candidate control for one natural voting system: sequential plurality (Hemaspaandra et al. 2016). We settle the complexity of another four natural cases: constructive and destructive online control by deleting and adding candidates in sequential Borda elections.
Parameterised Verification of Infinite State Multi-Agent Systems via Predicate Abstraction
Kouvaros, Panagiotis (Imperial College London) | Lomuscio, Alessio (Imperial College London)
We define a class of parameterised infinite state multi-agent systems (MAS) that is unbounded in both the number of agents composing the system and the domain of the variables encoding the agents. We analyse their verification problem by combining and extending existing techniques in parameterised model checking with predicate abstraction procedures. The resulting methodology addresses both forms of unboundedness and provides a technique for verifying unbounded MAS defined on infinite-state variables. We illustrate the effectiveness of the technique on an infinite-domain variant of an unbounded version of the train-gate-controller.
Kont: Computing Tradeoffs in Normative Multiagent Systems
Kafali, Ozgur (North Carolina State University) | Ajmeri, Nirav (North Carolina State University) | Singh, Munindar P. (North Carolina State University)
We propose Kont, a formal framework for comparing normative multiagent systems (nMASs) by computing tradeoffs among liveness (something good happens) and safety (nothing bad happens). Safety-focused nMASs restrict agents' actions to avoid undesired enactments. However, such restrictions hinder liveness, particularly in situations such as medical emergencies. We formalize tradeoffs using norms, and develop an approach for understanding to what extent an nMAS promotes liveness or safety. We propose patterns to guide the design of an nMAS with respect to liveness and safety, and prove their correctness. We further quantify liveness and safety using heuristic metrics for an emergency healthcare application. We show that the results of the application corroborate our theoretical development.
Centralized versus Personalized Commitments and Their Influence on Cooperation in Group Interactions
Han, The Anh (Teesside University) | Pereira, Luis Moniz (Universidade Nova de Lisboa) | Martinez-Vaquero, Luis A. (Institute of Cognitive Sciences and Technologies (ISTC-CNR)) | Lenaerts, Tom (Université libre de Bruxelles)
Before engaging in a group venture agents may seek commitments from other members in the group and, based on the level of participation (i.e. the number of actually committed participants), decide whether it is worth joining the venture. Alternatively, agents can delegate this costly process to a (beneficent or non-costly) third-party, who helps seek commitments from the agents. Using methods from Evolutionary Game Theory, this paper shows that, in the context of Public Goods Game, much higher levels of cooperation can be achieved through such centralized commitment management. It provides a more efficient mechanism for dealing with commitment free-riders, those who are not willing to bear the cost of arranging commitments whilst enjoying the benefits provided by the paying commitment proposers. We show that the participation level plays a crucial role in the decision of whether an agreement should be formed; namely, it needs to be more strict in terms of the level of participation required from players of the centralized system for the agreement to be formed; however, once it is done right, it is much more beneficial in terms of the level of cooperation and social welfare achieved. In short, our analysis provides important insights for the design of multi-agent systems that rely on commitments to monitor agents' cooperative behavior.
Improving Surveillance Using Cooperative Target Observation
Aswani, Rashi (International Institute of Information Technology - Hyderabad) | Munnangi, Sai Krishna (International Institute of Information Technology - Hyderabad) | Paruchuri, Praveen (International Institute of Information Technology - Hyderabad)
The Cooperative Target Observation (CTO) problem has been of great interest in the multi-agents and robotics literature due to the problem being at the core of a number of applications including surveillance. In CTO problem, the observer agents attempt to maximize the collective time during which each moving target is being observed by at least one observer in the area of interest. However, most of the prior works for the CTO problem consider the targets movement to be Randomized. Given our focus on surveillance domain, we modify this assumption to make the targets strategic and present two target strategies namely Straight-line strategy and Controlled Randomization strategy. We then modify the observer strategy proposed in the literature based on the K-means algorithm to introduce five variants and provide experimental validation. In surveillance domain, it is often reasonable to assume that the observers may themselves be a subject of observation for a variety of purposes by unknown adversaries whose model may not be known. Randomizing the observers actions can help to make their target observation strategy less predictable. As the fifth variant, we therefore introduce Adjustable Randomization into the best performing observer strategy where the observer can adjust the expected loss in reward due to randomization depending on the situation.
Transfer Reinforcement Learning with Shared Dynamics
Laroche, Romain (Orange Labs at Châtillon) | Barlier, Merwan (Orange Labs at Châtillon)
This article addresses a particular Transfer Reinforcement Learning (RL) problem: when dynamics do not change from one task to another, and only the reward function does. Our method relies on two ideas, the first one is that transition samples obtained from a task can be reused to learn on any other task: an immediate reward estimator is learnt in a supervised fashion and for each sample, the reward entry is changed by its reward estimate. The second idea consists in adopting the optimism in the face of uncertainty principle and to use upper bound reward estimates. Our method is tested on a navigation task, under four Transfer RL experimental settings: with a known reward function, with strong and weak expert knowledge on the reward function, and with a completely unknown reward function. It is also evaluated in a Multi-Task RL experiment and compared with the state-of-the-art algorithms. Results reveal that this method constitutes a major improvement for transfer/multi-task problems that share dynamics.
Sampling Beats Fixed Estimate Predictors for Cloning Stochastic Behavior in Multiagent Systems
Hrolenok, Brian (Georgia Institute of Technology) | Boots, Byron (Georgia Institute of Technology) | Balch, Tucker Hybinette (Georgia Institute of Technology)
Modeling stochastic multiagent behavior such as fish schooling is challenging for fixed-estimate prediction techniques because they fail to reliably reproduce the stochastic aspects of the agents’ behavior. We show how standard fixed-estimate predictors fit within a probabilistic framework, and suggest the reason they work for certain classes of behaviors and not others. We quantify the degree of mismatch and offer alternative sampling-based modeling techniques. We are specifically interested in building executable models (as opposed to statistical or descriptive models) because we want to reproduce and study multiagent behavior in simulation. Such models can be used by biologists, sociologists, and economists to explain and predict individual and group behavior in novel scenarios, and to test hypotheses regarding group behavior. Developing models from observation of real systems is an obvious application of machine learning. Learning directly from data eliminates expensive hand processing and tuning, but introduces unique challenges that violate certain assumptions common in standard machine learning approaches. Our framework suggests a new class of sampling-based methods, which we implement and apply to simulated deterministic and stochastic schooling behaviors, as well as the observed schooling behavior of real fish. Experimental results show that our implementation performs comparably with standard learning techniques for deterministic behaviors, and better on stochastic behaviors.
OFFER: Off-Environment Reinforcement Learning
Ciosek, Kamil Andrzej (University of Oxford) | Whiteson, Shimon (University of Oxford)
Policy gradient methods have been widely applied in reinforcement learning. For reasons of safety and cost, learning is often conducted using a simulator. However, learning in simulation does not traditionally utilise the opportunity to improve learning by adjusting certain environment variables - state features that are randomly determined by the environment in a physical setting but controllable in a simulator. Exploiting environment variables is crucial in domains containing significant rare events (SREs), e.g., unusual wind conditions that can crash a helicopter, which are rarely observed under random sampling but have a considerable impact on expected return. We propose off environment reinforcement learning (OFFER), which addresses such cases by simultaneously optimising the policy and a proposal distribution over environment variables. We prove that OFFER converges to a locally optimal policy and show experimentally that it learns better and faster than a policy gradient baseline.