Goto

Collaborating Authors

 Agents


Coordination in Adversarial Sequential Team Games via Multi-Agent Deep Reinforcement Learning

arXiv.org Artificial Intelligence

Many real-world applications involve teams of agents that have to coordinate their actions to reach a common goal against potential adversaries. This paper focuses on zero-sum games where a team of players faces an opponent, as is the case, for example, in Bridge, collusion in poker, and collusion in bidding. The possibility for the team members to communicate before gameplay---that is, coordinate their strategies ex ante---makes the use of behavioral strategies unsatisfactory. We introduce Soft Team Actor-Critic (STAC) as a solution to the team's coordination problem that does not require any prior domain knowledge. STAC allows team members to effectively exploit ex ante communication via exogenous signals that are shared among the team. STAC reaches near-optimal coordinated strategies both in perfectly observable and partially observable games, where previous deep RL algorithms fail to reach optimal coordinated behaviors.


Natural Actor-Critic Converges Globally for Hierarchical Linear Quadratic Regulator

arXiv.org Machine Learning

Multi-agent reinforcement learning has been successfully applied to a number of challenging problems. Despite these empirical successes, theoretical understanding of different algorithms is lacking, primarily due to the curse of dimensionality caused by the exponential growth of the state-action space with the number of agents. We study a fundamental problem of multi-agent linear quadratic regulator in a setting where the agents are partially exchangeable. In this setting, we develop a hierarchical actor-critic algorithm, whose computational complexity is independent of the total number of agents, and prove its global linear convergence to the optimal policy. As linear quadratic regulators are often used to approximate general dynamic systems, this paper provided an important step towards better understanding of general hierarchical mean-field multi-agent reinforcement learning.


Spatial Influence-aware Reinforcement Learning for Intelligent Transportation System

arXiv.org Artificial Intelligence

Intelligent transportation systems (ITSs) are envisioned to be crucial for smart cities, which aims at improving traffic flow to improve the life quality of urban residents and reducing congestion to improve the efficiency of commuting. However, several challenges need to be resolved before such systems can be deployed, for example, conventional solutions for Markov decision process (MDP) and single-agent Reinforcement Learning (RL) algorithms suffer from poor scalability, and multi-agent systems suffer from poor communication and coordination. In this paper, we explore the potential of mutual information sharing, or in other words, spatial influence based communication, to optimize traffic light control policy. First, we mathematically analyze the transportation system. We conclude that the transportation system does not have stationary Nash Equilibrium, thereby reinforcement learning algorithms offer suitable solutions. Secondly, we describe how to build a multi-agent Deep Deterministic Policy Gradient (DDPG) system with spatial influence and social group utility incorporated. Then we utilize the grid topology road network to empirically demonstrate the scalability of the new system. We demonstrate three types of directed communications to show the effect of directions of social influence on the entire network utility and individual utility. Lastly, we define "selfish index" and analyze the effect of it on total group utility.


Resolving Congestions in the Air Traffic Management Domain via Multiagent Reinforcement Learning Methods

arXiv.org Artificial Intelligence

In this article, we report on the efficiency and effectiveness of multiagent reinforcement learning methods (MARL) for the computation of flight delays to resolve congestion problems in the Air Traffic Management (ATM) domain. Specifically, we aim to resolve cases where demand of airspace use exceeds capacity (demand-capacity problems), via imposing ground delays to flights at the pre-tactical stage of operations (i.e. few days to few hours before operation). Casting this into the multiagent domain, agents, representing flights, need to decide on own delays w.r.t. own preferences, having no information about others' payoffs, preferences and constraints, while they plan to execute their trajectories jointly with others, adhering to operational constraints. Specifically, we formalize the problem as a multiagent Markov Decision Process (MA-MDP) and we show that it can be considered as a Markov game in which interacting agents need to reach an equilibrium: What makes the problem more interesting is the dynamic setting in which agents operate, which is also due to the unforeseen, emergent effects of their decisions in the whole system. We propose collaborative multiagent reinforcement learning methods to resolve demand-capacity imbalances: Extensive experimental study on real-world cases, shows the potential of the proposed approaches in resolving problems, while advanced visualizations provide detailed views towards understanding the quality of solutions provided.



A Stable Nuclear Future? The Impact of Autonomous Systems and Artificial Intelligence

arXiv.org Artificial Intelligence

The potential for advances in information-age technologies to undermine nuclear deterrence and influence the potential for nuclear escalation represents a critical question for international politics. One challenge is that uncertainty about the trajectory of technologies such as autonomous systems and artificial intelligence (AI) makes assessments difficult. This paper evaluates the relative impact of autonomous systems and artificial intelligence in three areas: nuclear command and control, nuclear delivery platforms and vehicles, and conventional applications of autonomous systems with consequences for nuclear stability. We argue that countries may be more likely to use risky forms of autonomy when they fear that their second-strike capabilities will be undermined. Additionally, the potential deployment of uninhabited, autonomous nuclear delivery platforms and vehicles could raise the prospect for accidents and miscalculation. Conventional military applications of autonomous systems could simultaneously influence nuclear force postures and first-strike stability in previously unanticipated ways. In particular, the need to fight at machine speed and the cognitive risk introduced by automation bias could increase the risk of unintended escalation. Finally, used properly, there should be many applications of more autonomous systems in nuclear operations that can increase reliability, reduce the risk of accidents, and buy more time for decision-makers in a crisis.


Reducing selfish routing inefficiencies using traffic lights

arXiv.org Artificial Intelligence

In this paper we equip congestion games with traffic lights, modelled as junction-based waiting cycles, therefore enabling more realistic route planning strategies. Using the SUMO simulator, we show that our modelling choices coincide with realistic routing behaviours, in particular, that drivers' decisions about route choices are based on the proportion of red light time for their direction of travel. Drawing upon the experimental results, we show that the effects of the notorious Braess' paradox can be avoided in theory and significantly reduced in practice, by allocating the appropriate traffic light cycles along a transport network. 1 Introduction Congestion games are the standard framework of algorithmic game theory employed to study the equilibria of traffic flows [ Roughgarden, 2005 ] . They are non-cooperative games of perfect information where self-interested actors choose sets of available resources, e.g., roads, and where the cost of each resource depends on its overall usage. A well-known phenomenon occurring in these games is Braess' paradox [ Braess, 1968 ], i.e., the existence of traffic networks that suffer from the increase of total cost when the cost of an available resource strictly decreases. While Braess' paradox is an important mathematical result, its existence relies on rather constraining modelling assumptions, as congestion games abstract away from a number of important features of real-world road networks. Notably, their cost functions assume no clashes between antagonistic traffic flows which, in the real-world, are typically resolved by interdependent control mechanisms such as traffic lights. Contact Author Traffic lights are themselves an important object of research in Artificial Intelligence, as understanding their best configuration is paramount for the branch of AI concerned with optimising traffic [ Chouhan and Banda, 2018; Laszka et al., 2016; Lopez et al., 2018; Pol and Oliehoek, 2016 ] . However, their effect on the traffic flow equilibria is yet to be understood.


Training multi-agent AI systems to solve complex tasks through cooperation

#artificialintelligence

A novel approach to cooperative multi-agent reinforcement learning (RL) that assigns tasks to individual agents within a group, thereby improving the entire group's ability to collaborate. We tested this method in the real-time strategy game StarCraft: Brood War, and found that our RL-trained model significantly outperformed computer-controlled players that relied on carefully tuned rule-based baselines. Perhaps most important, these gains carried over to matches with significantly larger armies than what we included in our training scenarios. We're releasing the source code for this approach on our TorchCraftAI GitHub repository, and detailing our results, which indicate that treating collaborative multi-agent RL as a dynamic assignment problem can lead to groups of agents that are better at generalizing to more complex situations. Our approach focuses on multi-agent collaborative (MAC) problems where agents have to carry out multiple intermediate tasks in order to accomplish a larger one.


Are We Making Real Progress in Simulated Environments? Measuring the Sim2Real Gap in Embodied Visual Navigation

arXiv.org Artificial Intelligence

Does progress in simulation translate to progress in robotics? Specifically, if method A outperforms method B in simulation, how likely is the trend to hold in reality on a robot? We examine this question for embodied (PointGoal) navigation, developing engineering tools and a research paradigm for evaluating a simulator by its sim2real predictivity, revealing surprising findings about prior work. First, we develop Habitat-PyRobot Bridge (HaPy), a library for seamless execution of identical code on a simulated agent and a physical robot. Habitat-to-Locobot transfer with HaPy involves just one line change in config, essentially treating reality as just another simulator! Second, we investigate sim2real predictivity of Habitat-Sim for PointGoal navigation. We 3D-scan a physical lab space to create a virtualized replica, and run parallel tests of 9 different models in reality and simulation. We present a new metric called Sim-vs-Real Correlation Coefficient (SRCC) to quantify sim2real predictivity. Our analysis reveals several important findings. We find that SRCC for Habitat as used for the CVPR19 challenge is low (0.18 for the success metric), which suggests that performance improvements for this simulator-based challenge would not transfer well to a physical robot. We find that this gap is largely due to AI agents learning to 'cheat' by exploiting simulator imperfections: specifically, the way Habitat allows for 'sliding' along walls on collision. Essentially, the virtual robot is capable of cutting corners, leading to unrealistic shortcuts through non-navigable spaces. Naturally, such exploits do not work in the real world where the robot stops on contact with walls. Our experiments show that it is possible to optimize simulation parameters to enable robots trained in imperfect simulators to generalize learned skills to reality (e.g. improving $SRCC_{Succ}$ from 0.18 to 0.844).


Formal Verification of Debates in Argumentation Theory

arXiv.org Artificial Intelligence

Humans engage in informal debates on a daily basis. By expressing their opinions and ideas in an argumentative fashion, they are able to gain a deeper understanding of a given problem and in some cases, find the best possible course of actions towards resolving it. In this paper, we develop a methodology to verify debates formalised as abstract argumentation frameworks. We first present a translation from debates to transition systems. Such transition systems can model debates and represent their evolution over time using a finite set of states. We then formalise relevant debate properties using temporal and strategy logics. These formalisations, along with a debate transition system, allow us to verify whether a given debate satisfies certain properties. The verification process can be automated using model checkers. Therefore, we also measure their performance when verifying debates, and use the results to discuss the feasibility of model checking debates.