Agents
Learning on a Budget via Teacher Imitation
Ilhan, Ercument, Gow, Jeremy, Perez-Liebana, Diego
Deep Reinforcement Learning (RL) techniques can benefit greatly from leveraging prior experience, which can be either self-generated or acquired from other entities. Action advising is a framework that provides a flexible way to transfer such knowledge in the form of actions between teacher-student peers. However, due to the realistic concerns, the number of these interactions is limited with a budget; therefore, it is crucial to perform these in the most appropriate moments. There have been several promising studies recently that address this problem setting especially from the student's perspective. Despite their success, they have some shortcomings when it comes to the practical applicability and integrity as an overall solution to the learning from advice challenge. In this paper, we extend the idea of advice reusing via teacher imitation to construct a unified approach that addresses both advice collection and advice utilisation problems. Furthermore, we also propose a method to automatically determine the relevant hyperparameters of these components on-the-fly to make it able to adapt to any task with minimal human intervention. The experiments we performed in 5 different Atari games verify that our algorithm can outperform its competitors by achieving state-of-the-art performance, and its components themselves also provides significant advantages individually.
Newton Optimization on Helmholtz Decomposition for Continuous Games
Ramponi, Giorgia, Restelli, Marcello
Many learning problems involve multiple agents optimizing different interactive functions. In these problems, the standard policy gradient algorithms fail due to the non-stationarity of the setting and the different interests of each agent. In fact, algorithms must take into account the complex dynamics of these systems to guarantee rapid convergence towards a (local) Nash equilibrium. In this paper, we propose NOHD (Newton Optimization on Helmholtz Decomposition), a Newton-like algorithm for multi-agent learning problems based on the decomposition of the dynamics of the system in its irrotational (Potential) and solenoidal (Hamiltonian) component. This method ensures quadratic convergence in purely irrotational systems and pure solenoidal systems. Furthermore, we show that NOHD is attracted to stable fixed points in general multi-agent systems and repelled by strict saddle ones. Finally, we empirically compare the NOHD's performance with that of state-of-the-art algorithms on some bimatrix games and in a continuous Gridworld environment.
Drowned out by the noise: Evidence for Tracking-free Motion Prediction
Trabelsi, Ameni, Beveridge, Ross J., Blanchard, Nathaniel
Autonomous driving consists of a multitude of interacting modules, where each module must contend with errors from the others. Typically, the motion prediction module depends on a robust tracking system to capture each agent's past movement. In this work, we systematically explore the importance of the tracking module for the motion prediction task and ultimately conclude that the tracking module is detrimental to overall motion prediction performance when the module is imperfect (with as low as 1% error). We explicitly compare models that use tracking information to models that do not across multiple scenarios and conditions. We find that the tracking information only improves performance in noise-free conditions. A noise-free tracker is unlikely to remain noise-free in real-world scenarios, and the inevitable noise will subsequently negatively affect performance. We thus argue future work should be mindful of noise when developing and testing motion/tracking modules, or that they should do away with the tracking component entirely.
An expressiveness hierarchy of Behavior Trees and related architectures
Biggar, Oliver, Zamani, Mohammad, Shames, Iman
In this paper we provide a formal framework for comparing the expressive power of Behavior Trees (BTs) to other action selection architectures. Taking inspiration from the analogous comparisons of structural programming methodologies, we formalise the concept of `expressiveness'. This leads us to an expressiveness hierarchy of control architectures, which includes BTs, Decision Trees (DTs), Teleo-reactive Programs (TRs) and Finite State Machines (FSMs). By distinguishing between BTs with auxiliary variables and those without, we demonstrate the existence of a trade-off in BT design between readability and expressiveness. We discuss what this means for BTs in practice.
Global Big Data Conference
As part of Microsoft's research into ways to use machine learning and AI to improve security defenses, the company has released an open source attack toolkit to let researchers create simulated network environments and see how they fare against attacks. Microsoft 365 Defender Research released CyberBattleSim, which creates a network simulation and models how threat actors can move laterally through the network looking for weak points. When building the attack simulation, enterprise defenders and researchers create various nodes on the network and indicate which services are running, which vulnerabilities are present, and what type of security controls are in place. Automated agents, representing threat actors, are deployed in the attack simulation to randomly execute actions as they try to take over the nodes. "The simulated attacker's goal is to take ownership of some portion of the network by exploiting these planted vulnerabilities. While the simulated attacker moves through the network, a defender agent watches the network activity to detect the presence of the attacker and contain the attack," the Microsoft 365 Defender Research Team wrote in a post discussing the project.
On the Importance of Trust in Next-Generation Networked CPS Systems: An AI Perspective
Gholami, Anousheh, Torkzaban, Nariman, Baras, John S.
With the increasing scale, complexity, and heterogeneity of the next generation networked systems, seamless control, management, and security of such systems becomes increasingly challenging. Many diverse applications have driven interest in networked systems, including large-scale distributed learning, multi-agent optimization, 5G service provisioning, and network slicing, etc. In this paper, we propose trust as a measure to evaluate the status of network agents and improve the decision-making process. We interpret trust as a relation among entities that participate in various protocols. Trust relations are based on evidence created by the interactions of entities within a protocol and may be a composite of multiple metrics such as availability, reliability, resilience, etc. depending on application context. We first elaborate on the importance of trust as a metric and then present a mathematical framework for trust computation and aggregation within a network. Then we show in practice, how trust can be integrated into network decision-making processes by presenting two examples. In the first example, we show how utilizing the trust evidence can improve the performance and the security of Federated Learning. Second, we show how a 5G network resource provisioning framework can be improved when augmented with a trust-aware decision-making scheme. We verify the validity of our trust-based approach through simulations. Finally, we explain the challenges associated with aggregating the trust evidence and briefly explain our ideas to tackle them.
Joint Attention for Multi-Agent Coordination and Social Learning
Lee, Dennis, Jaques, Natasha, Kew, Chase, Eck, Douglas, Schuurmans, Dale, Faust, Aleksandra
Joint attention - the ability to purposefully coordinate attention with another agent, and mutually attend to the same thing -- is a critical component of human social cognition. In this paper, we ask whether joint attention can be useful as a mechanism for improving multi-agent coordination and social learning. We first develop deep reinforcement learning (RL) agents with a recurrent visual attention architecture. We then train agents to minimize the difference between the attention weights that they apply to the environment at each timestep, and the attention of other agents. Our results show that this joint attention incentive improves agents' ability to solve difficult coordination tasks, by reducing the exponential cost of exploring the joint multi-agent action space. Joint attention leads to higher performance than a competitive centralized critic baseline across multiple environments. Further, we show that joint attention enhances agents' ability to learn from experts present in their environment, even when completing hard exploration tasks that do not require coordination. Taken together, these findings suggest that joint attention may be a useful inductive bias for multi-agent learning.
The Effect of Efficient Messaging and Input Variability on Neural-Agent Iterated Language Learning
Lian, Yuchen, Bisazza, Arianna, Verhoef, Tessa
Natural languages commonly display a trade-off among different strategies to convey constituent roles. A similar trade-off, however, has not been observed in recent simulations of iterated language learning with neural network based agents (Chaabouni et al., 2019b). In this work, we re-evaluate this result in the light of two important factors, namely: the lack of effort-based pressure in the agents and the lack of variability in the initial input language.
Multitasking Inhibits Semantic Drift
Jacob, Athul Paul, Lewis, Mike, Andreas, Jacob
When intelligent agents communicate to accomplish shared goals, how do these goals shape the agents' language? We study the dynamics of learning in latent language policies (LLPs), in which instructor agents generate natural-language subgoal descriptions and executor agents map these descriptions to low-level actions. LLPs can solve challenging long-horizon reinforcement learning problems and provide a rich model for studying task-oriented language use. But previous work has found that LLP training is prone to semantic drift (use of messages in ways inconsistent with their original natural language meanings). Here, we demonstrate theoretically and empirically that multitask training is an effective counter to this problem: we prove that multitask training eliminates semantic drift in a well-studied family of signaling games, and show that multitask training of neural LLPs in a complex strategy game reduces drift and while improving sample efficiency.
GridToPix: Training Embodied Agents with Minimal Supervision
Jain, Unnat, Liu, Iou-Jen, Lazebnik, Svetlana, Kembhavi, Aniruddha, Weihs, Luca, Schwing, Alexander
While deep reinforcement learning (RL) promises freedom from hand-labeled data, great successes, especially for Embodied AI, require significant work to create supervision via carefully shaped rewards. Indeed, without shaped rewards, i.e., with only terminal rewards, present-day Embodied AI results degrade significantly across Embodied AI problems from single-agent Habitat-based PointGoal Navigation (SPL drops from 55 to 0) and two-agent AI2-THOR-based Furniture Moving (success drops from 58% to 1%) to three-agent Google Football-based 3 vs. 1 with Keeper (game score drops from 0.6 to 0.1). As training from shaped rewards doesn't scale to more realistic tasks, the community needs to improve the success of training with terminal rewards. For this we propose GridToPix: 1) train agents with terminal rewards in gridworlds that generically mirror Embodied AI environments, i.e., they are independent of the task; 2) distill the learned policy into agents that reside in complex visual worlds. Despite learning from only terminal rewards with identical models and RL algorithms, GridToPix significantly improves results across tasks: from PointGoal Navigation (SPL improves from 0 to 64) and Furniture Moving (success improves from 1% to 25%) to football gameplay (game score improves from 0.1 to 0.6). GridToPix even helps to improve the results of shaped reward training.