Agents
Complexity Results and Algorithms for Bipolar Argumentation
Karamlou, Amin, Čyras, Kristijonas, Toni, Francesca
Bipolar Argumentation Frameworks (BAFs) admit several interpretations of the support relation and diverging definitions of semantics. Recently, several classes of BAFs have been captured as instances of bipolar Assumption-Based Argumentation, a class of Assumption-Based Argumentation (ABA). In this paper, we establish the complexity of bipolar ABA, and consequently of several classes of BAFs. In addition to the standard five complexity problems, we analyse the rarely-addressed extension enumeration problem too. We also advance backtracking-driven algorithms for enumerating extensions of bipolar ABA frameworks, and consequently of BAFs under several interpretations. We prove soundness and completeness of our algorithms, describe their implementation and provide a scalability evaluation. We thus contribute to the study of the as yet uninvestigated complexity problems of (variously interpreted) BAFs as well as of bipolar ABA, and provide the lacking implementations thereof.
An approach to Decision Making based on Dynamic Argumentation Systems
Ferretti, Edgardo, Tamargo, Luciano H., Garcia, Alejandro J., Errecalde, Marcelo L., Simari, Guillermo R.
In this paper, we introduce a formalism for single-agent decision making that is based on Dynamic Argumentation Frameworks. The formalism can be used to justify a choice, which is based on the current situation the agent is involved. Taking advantage of the inference mechanism of the argumentation formalism, it is possible to consider preference relations and conflicts among the available alternatives for that reasoning. With this formalization, given a particular set of evidence, the justified conclusions supported by warranted arguments will be used by the agent's decision rules to determine which alternatives will be selected. We also present an algorithm that implements a choice function based on our formalization. Finally, we complete our presentation by introducing formal results that relate the proposed framework with approaches of classical decision theory.
Microscopic Traffic Simulation by Cooperative Multi-agent Deep Reinforcement Learning
Bacchiani, Giulio, Molinari, Daniele, Patander, Marco
Expert human drivers perform actions relying on traffic laws and their previous experience. While traffic laws are easily embedded into an artificial brain, modeling human complex behaviors which come from past experience is a more challenging task. One of these behaviors is the capability of communicating intentions and negotiating the right of way through driving actions, as when a driver is entering a crowded roundabout and observes other cars movements to guess the best time to merge in. In addition, each driver has its own unique driving style, which is conditioned by both its personal characteristics, such as age and quality of sight, and external factors, such as being late or in a bad mood. For these reasons, the interaction between different drivers is not trivial to simulate in a realistic manner. In this paper, this problem is addressed by developing a microscopic simulator using a Deep Reinforcement Learning Algorithm based on a combination of visual frames, representing the perception around the vehicle, and a vector of numerical parameters. In particular, the algorithm called Asynchronous Advantage Actor-Critic has been extended to a multi-agent scenario in which every agent needs to learn to interact with other similar agents. Moreover, the model includes a novel architecture such that the driving style of each vehicle is adjustable by tuning some of its input parameters, permitting to simulate drivers with different levels of aggressiveness and desired cruising speeds.
Modeling Social Group Communication with Multi-Agent Imitation Learning
Sanghvi, Navyata, Yonetani, Ryo, Kitani, Kris
In crowded social scenarios with a myriad of external stimuli, human brains exhibit a natural ability to filter out irrelevant information and narrowly focus their attention. In the midst of multiple groups of people, humans use such sensory gating to effectively further their own group's interactional goals. In this work, we consider the design of a policy network to model multi-group multi-person communication. Our policy takes as input the state of the world such as an agent's gaze direction, body pose of other agents or history of past actions, and outputs an optimal action such as speaking, listening or responding (communication modes). Inspired by humans' natural neurobiological filtering process, a central component of our policy network design is an information gating function, termed the Kinesic-Proxemic-Message Gate (KPM-Gate), that models the ability of an agent to selectively gather information from specific neighboring agents. The degree of influence of a neighbor is based on dynamic non-verbal cues such as body motion, head pose (kinesics) and interpersonal space (proxemics). We further show that the KPM-Gate can be used to discover social groups using its natural interpretation as a social attention mechanism. We pose the communication policy learning problem as a multi-agent imitation learning problem. We learn a single policy shared by all agents under the assumption of a decentralized Markov decision process. We term our policy network as the Multi-Agent Group Discovery and Communication Mode Network (MAGDAM network), as it learns social group structure in addition to the dynamics of group communication. Our experimental validation on both synthetic and real world data shows that our model is able to both discover social group structure and learn an accurate multi-agent communication policy.
Local Distance Restricted Bribery in Voting
Studying complexity of various bribery problems has been one of the main research focus in computational social choice. In all the models of bribery studied so far, the briber has to pay every voter some amount of money depending on what the briber wants the voter to report and the briber has some budget at her disposal. Although these models successfully capture many real world applications, in many other scenarios, the voters may be unwilling to deviate too much from their true preferences. In this paper, we study the computational complexity of the problem of finding a preference profile which is as close to the true preference profile as possible and still achieves the briber's goal subject to budget constraints. We call this problem Optimal Bribery. We consider three important measures of distances, namely, swap distance, footrule distance, and maximum displacement distance, and resolve the complexity of the optimal bribery problem for many common voting rules. We show that the problem is polynomial time solvable for the plurality and veto voting rules for all the three measures of distance. On the other hand, we prove that the problem is NP-complete for a class of scoring rules which includes the Borda voting rule, maximin, Copeland$^\alpha$ for any $\alpha\in[0,1]$, and Bucklin voting rules for all the three measures of distance even when the distance allowed per voter is $1$ for the swap and maximum displacement distances and $2$ for the footrule distance even without the budget constraints (which corresponds to having an infinite budget). For the $k$-approval voting rule for any constant $k>1$ and the simplified Bucklin voting rule, we show that the problem is NP-complete for the swap distance even when the distance allowed is $2$ and for the footrule distance even when the distance allowed is $4$ even without the budget constraints.
Using Causal Analysis to Learn Specifications from Task Demonstrations
Angelov, Daniel, Hristov, Yordan, Ramamoorthy, Subramanian
Learning models of user behaviour is an important problem that is broadly applicable across many application domains requiring human-robot interaction. In this work we show that it is possible to learn a generative model for distinct user behavioral types, extracted from human demonstrations, by enforcing clustering of preferred task solutions within the latent space. We use this model to differentiate between user types and to find cases with overlapping solutions. Moreover, we can alter an initially guessed solution to satisfy the preferences that constitute a particular user type by backpropagating through the learned differentiable model. An advantage of structuring generative models in this way is that it allows us to extract causal relationships between symbols that might form part of the user's specification of the task, as manifested in the demonstrations. We show that the proposed method is capable of correctly distinguishing between three user types, who differ in degrees of cautiousness in their motion, while performing the task of moving objects with a kinesthetically driven robot in a tabletop environment. Our method successfully identifies the correct type, within the specified time, in 99% [97.8 - 99.8] of the cases, which outperforms an IRL baseline. We also show that our proposed method correctly changes a default trajectory to one satisfying a particular user specification even with unseen objects. The resulting trajectory is shown to be directly implementable on a PR2 humanoid robot completing the same task.
Attacking Power Indices by Manipulating Player Reliability
Istrate, Gabriel, Bonchiş, Cosmin, Brînduşescu, Alin
We investigate the manipulation of power indices in TU-cooperative games by stimulating (subject to a budget constraint) changes in the propensity of other players to participate to the game. We display several algorithms that show that the problem is often tractable for so-called network centrality games and influence attribution games, as well as an example when optimal manipulation is intractable, even though computing power indices is feasible.
A Cooperative Multi-Agent Reinforcement Learning Framework for Resource Balancing in Complex Logistics Network
Li, Xihan, Zhang, Jia, Bian, Jiang, Tong, Yunhai, Liu, Tie-Yan
Resource balancing within complex transportation networks is one of the most important problems in real logistics domain. Traditional solutions on these problems leverage combinatorial optimization with demand and supply forecasting. However, the high complexity of transportation routes, severe uncertainty of future demand and supply, together with non-convex business constraints make it extremely challenging in the traditional resource management field. In this paper, we propose a novel sophisticated multi-agent reinforcement learning approach to address these challenges. In particular, inspired by the externalities especially the interactions among resource agents, we introduce an innovative cooperative mechanism for state and reward design resulting in more effective and efficient transportation. Extensive experiments on a simulated ocean transportation service demonstrate that our new approach can stimulate cooperation among agents and lead to much better performance. Compared with traditional solutions based on combinatorial optimization, our approach can give rise to a significant improvement in terms of both performance and stability.
Neural MMO: A Massively Multiagent Game Environment for Training and Evaluating Intelligent Agents
Suarez, Joseph, Du, Yilun, Isola, Phillip, Mordatch, Igor
The emergence of complex life on Earth is often attributed to the arms race that ensued from a huge number of organisms all competing for finite resources. We present an artificial intelligence research environment, inspired by the human game genre of MMORPGs (Massively Multiplayer Online Role-Playing Games, a.k.a. MMOs), that aims to simulate this setting in microcosm. As with MMORPGs and the real world alike, our environment is persistent and supports a large and variable number of agents. Our environment is well suited to the study of large-scale multiagent interaction: it requires that agents learn robust combat and navigation policies in the presence of large populations attempting to do the same. Baseline experiments reveal that population size magnifies and incentivizes the development of skillful behaviors and results in agents that outcompete agents trained in smaller populations. We further show that the policies of agents with unshared weights naturally diverge to fill different niches in order to avoid competition.
After Mastering Go and StarCraft, DeepMind Takes on Soccer
Having notched impressive victories over human professionals in Go, Atari Games, and most recently StarCraft 2 -- Google's DeepMind team has now turned its formidable research efforts to soccer. In a paper released last week, the UK AI company demonstrates a novel machine learning method that trains a team of AI agents to play a simulated version of "the beautiful game." Gaming, AI and soccer fans hailed DeepMind's latest innovation on social media, with comments like "You should partner with EA Sports for a FIFA environment!" Machine learning, and particularly deep reinforcement learning, has in recent years achieved remarkable success across a wide range of competitive games. Collaborative-multi-agent games however remained a relatively difficult research domain.