Agents
Efficient UAV Trajectory-Planning using Economic Reinforcement Learning
Khalil, Alvi Ataur, Byrne, Alexander J, Rahman, Mohammad Ashiqur, Manshaei, Mohammad Hossein
Advances in unmanned aerial vehicle (UAV) design have opened up applications as varied as surveillance, firefighting, cellular networks, and delivery applications. Additionally, due to decreases in cost, systems employing fleets of UAVs have become popular. The uniqueness of UAVs in systems creates a novel set of trajectory or path planning and coordination problems. Environments include many more points of interest (POIs) than UAVs, with obstacles and no-fly zones. This system revolves around an economic theory, in particular an auction mechanism where UAVs trade assigned POIs. We formulate the path planning problem as a multi-agent economic game, where agents can cooperate and compete for resources. We then translate the problem into a Partially Observable Markov decision process (POMDP), which is solved using a reinforcement learning (RL) model deployed on each agent. As the system computes task distributions via UAV cooperation, it is highly resilient to any change in the swarm size. Our proposed network and economic game architecture can effectively coordinate the swarm as an emergent phenomenon while maintaining the swarm's operation. Unmanned aerial vehicles (UAVs) are applicable to a wide-ranging set of problems such as fire fighting, security monitoring, agriculture, edge computing, 3D mapping, and network support [1]. Fire fighting problems center around tracking and finding fires, whereas security applications focus on monitoring and finding targets. On the other hand, agricultural problems center around field monitoring and data harvesting, while edge computing and network support are focused on data harvesting and load reaction. All of these problems can be abstracted to a set of partially observed points and must be traveled to in the shortest amount of time possible, and then some task must be carried out in the vicinity of this point. Swarm surveillance missions are essential in both civilian and military contexts, where solutions must be secure, reliable, and efficient.
Formal Methods for An Iterated Volunteer's Dilemma
Dineen, Jacob, Haque, A S M Ahsan-Ul, Bielskas, Matthew
We propose an iterated version of Volunteer's Dilemma game through PRISM Model Checker (PRISM henceforth). This is useful because with this software, one can easily tune game parameters to get intuition of game dynamics. This can allow us to see what setting changes correlate with change in expected reward for each player. Additionally, PRISM can provide us a probabilistic graph that reflects a strategy that is optimal (or approximately optimal). Previous works [2] define public good game as a concurrent stochastic game, evaluating optimal strategies under a fixed set of parameters deciding the length of the game and the scaling factor associated with resource distribution.
Inference-Based Deterministic Messaging For Multi-Agent Communication
Communication is essential for coordination among humans and animals. Therefore, with the introduction of intelligent agents into the world, agent-to-agent and agent-to-human communication becomes necessary. In this paper, we first study learning in matrix-based signaling games to empirically show that decentralized methods can converge to a suboptimal policy. We then propose a modification to the messaging policy, in which the sender deterministically chooses the best message that helps the receiver to infer the sender's observation. Using this modification, we see, empirically, that the agents converge to the optimal policy in nearly all the runs. We then apply this method to a partially observable gridworld environment which requires cooperation between two agents and show that, with appropriate approximation methods, the proposed sender modification can enhance existing decentralized training methods for more complex domains as well.
Adversarial Environment Generation for Learning to Navigate the Web
Gur, Izzeddin, Jaques, Natasha, Malta, Kevin, Tiwari, Manoj, Lee, Honglak, Faust, Aleksandra
Learning to autonomously navigate the web is a difficult sequential decision making task. The state and action spaces are large and combinatorial in nature, and websites are dynamic environments consisting of several pages. One of the bottlenecks of training web navigation agents is providing a learnable curriculum of training environments that can cover the large variety of real-world websites. Therefore, we propose using Adversarial Environment Generation (AEG) to generate challenging web environments in which to train reinforcement learning (RL) agents. We provide a new benchmarking environment, gMiniWoB, which enables an RL adversary to use compositional primitives to learn to generate arbitrarily complex websites. To train the adversary, we propose a new technique for maximizing regret using the difference in the scores obtained by a pair of navigator agents. Our results show that our approach significantly outperforms prior methods for minimax regret AEG. The regret objective trains the adversary to design a curriculum of environments that are "just-the-right-challenge" for the navigator agents; our results show that over time, the adversary learns to generate increasingly complex web navigation tasks. The navigator agents trained with our technique learn to complete challenging, high-dimensional web navigation tasks, such as form filling, booking a flight etc. We show that the navigator agent trained with our proposed Flexible b-PAIRED technique significantly outperforms competitive automatic curriculum generation baselines -- including a state-of-the-art RL web navigation approach -- on a set of challenging unseen test environments, and achieves more than 80% success rate on some tasks.
The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games
Yu, Chao, Velu, Akash, Vinitsky, Eugene, Wang, Yu, Bayen, Alexandre, Wu, Yi
Proximal Policy Optimization (PPO) is a popular on-policy reinforcement learning algorithm but is significantly less utilized than off-policy learning algorithms in multi-agent problems. In this work, we investigate Multi-Agent PPO (MAPPO), a multi-agent PPO variant which adopts a centralized value function. Using a 1-GPU desktop, we show that MAPPO achieves performance comparable to the state-of-the-art in three popular multi-agent testbeds: the Particle World environments, Starcraft II Micromanagement Tasks, and the Hanabi Challenge, with minimal hyperparameter tuning and without any domain-specific algorithmic modifications or architectures. In the majority of environments, we find that compared to off-policy baselines, MAPPO achieves better or comparable sample complexity as well as substantially faster running time. Finally, we present 5 factors most influential to MAPPO's practical performance with ablation studies.
Sparse Training Theory for Scalable and Efficient Agents
Mocanu, Decebal Constantin, Mocanu, Elena, Pinto, Tiago, Curci, Selima, Nguyen, Phuong H., Gibescu, Madeleine, Ernst, Damien, Vale, Zita A.
A fundamental task for artificial intelligence is learning. Deep Neural Networks have proven to cope perfectly with all learning paradigms, i.e. supervised, unsupervised, and reinforcement learning. Nevertheless, traditional deep learning approaches make use of cloud computing facilities and do not scale well to autonomous agents with low computational resources. Even in the cloud, they suffer from computational and memory limitations, and they cannot be used to model adequately large physical worlds for agents which assume networks with billions of neurons. These issues are addressed in the last few years by the emerging topic of sparse training, which trains sparse networks from scratch. This paper discusses sparse training state-of-the-art, its challenges and limitations while introducing a couple of new theoretical research directions which has the potential of alleviating sparse training limitations to push deep learning scalability well beyond its current boundaries. Nevertheless, the theoretical advancements impact in complex multi-agents settings is discussed from a real-world perspective, using the smart grid case study.
JJ Watt signals he's made free-agent decision after long tenure with Texans
Fox News Flash top headlines are here. Check out what's clicking on Foxnews.com. J.J. Watt has apparently found his team new: the Arizona Cardinals. Watt tweeted a picture of himself working out in a Cardinals shirt, signaling that he will join the team for the 2021 season. Watt agreed to a two-year deal worth $31 million, ESPN reported.
Coordination Among Neural Modules Through a Shared Global Workspace
Goyal, Anirudh, Didolkar, Aniket, Lamb, Alex, Badola, Kartikeya, Ke, Nan Rosemary, Rahaman, Nasim, Binas, Jonathan, Blundell, Charles, Mozer, Michael, Bengio, Yoshua
Deep learning has seen a movement away from representing examples with a monolithic hidden state towards a richly structured state. For example, Transformers segment by position, and object-centric architectures decompose images into entities. In all these architectures, interactions between different elements are modeled via pairwise interactions: Transformers make use of self-attention to incorporate information from other positions; object-centric architectures make use of graph neural networks to model interactions among entities. However, pairwise interactions may not achieve global coordination or a coherent, integrated representation that can be used for downstream tasks. In cognitive science, a global workspace architecture has been proposed in which functionally specialized components share information through a common, bandwidth-limited communication channel. We explore the use of such a communication channel in the context of deep learning for modeling the structure of complex environments. The proposed method includes a shared workspace through which communication among different specialist modules takes place but due to limits on the communication bandwidth, specialist modules must compete for access. We show that capacity limitations have a rational basis in that (1) they encourage specialization and compositionality and (2) they facilitate the synchronization of otherwise independent specialists.
Scaling up Mean Field Games with Online Mirror Descent
Perolat, Julien, Perrin, Sarah, Elie, Romuald, Lauriรจre, Mathieu, Piliouras, Georgios, Geist, Matthieu, Tuyls, Karl, Pietquin, Olivier
We address scaling up equilibrium computation in Mean Field Games (MFGs) using Online Mirror Descent (OMD). We show that continuous-time OMD provably converges to a Nash equilibrium under a natural and well-motivated set of monotonicity assumptions. This theoretical result nicely extends to multi-population games and to settings involving common noise. A thorough experimental investigation on various single and multi-population MFGs shows that OMD outperforms traditional algorithms such as Fictitious Play (FP). We empirically show that OMD scales up and converges significantly faster than FP by solving, for the first time to our knowledge, examples of MFGs with hundreds of billions states. This study establishes the state-of-the-art for learning in large-scale multi-agent and multi-population games.
Artificial Intelligence Can Help States Manage the Unemployment Crisis
From March 1 to April 4, 2020, the Illinois Department of Employment Security received 513,173 unemployment claims -- more than the entire number of claims filed in 2019. It was impossible for IDES employees to handle this volume, resulting in many disconnected phone calls and unanswered online queries. Gov. J.B. Pritzker called for increased call center capacity, in large part through the implementation of new technologies to help employees handle the volume of queries. Gov. Pritzker wanted to minimize dropped calls and deliver a response to all online queries so citizens could receive the benefits they needed. This new technology, virtual intelligent agents, alleviated overburdened human agents from having to respond to every inquiry that came in.