networked system
Scalable Multi-Agent Reinforcement Learning for Networked Systems with Average Reward
It has long been recognized that multi-agent reinforcement learning (MARL) faces significant scalability issues due to the fact that the size of the state and action spaces are exponentially large in the number of agents. In this paper, we identify a rich class of networked MARL problems where the model exhibits a local dependence structure that allows it to be solved in a scalable manner. Specifically, we propose a Scalable Actor-Critic (SAC) method that can learn a near optimal localized policy for optimizing the average reward with complexity scaling with the state-action space size of local neighborhoods, as opposed to the entire network. Our result centers around identifying and exploiting an exponential decay property that ensures the effect of agents on each other decays exponentially fast in their graph distance.
Transformer-Based Scalable Multi-Agent Reinforcement Learning for Networked Systems with Long-Range Interactions
Sinha, Vidur, Ustaomeroglu, Muhammed, Qu, Guannan
Multi-agent reinforcement learning (MARL) has shown promise for large-scale network control, yet existing methods face two major limitations. First, they typically rely on assumptions leading to decay properties of local agent interactions, limiting their ability to capture long-range dependencies such as cascading power failures or epidemic outbreaks. Second, most approaches lack generalizability across network topologies, requiring retraining when applied to new graphs. We introduce STACCA (Shared Transformer Actor-Critic with Counterfactual Advantage), a unified transformer-based MARL framework that addresses both challenges. STACCA employs a centralized Graph Transformer Critic to model long-range dependencies and provide system-level feedback, while its shared Graph Transformer Actor learns a generalizable policy capable of adapting across diverse network structures. Further, to improve credit assignment during training, STACCA integrates a novel counterfactual advantage estimator that is compatible with state-value critic estimates. We evaluate STACCA on epidemic containment and rumor-spreading network control tasks, demonstrating improved performance, network generalization, and scalability. These results highlight the potential of transformer-based MARL architectures to achieve scalable and generalizable control in large-scale networked systems.
- Energy > Power Industry (0.46)
- Health & Medicine > Therapeutic Area > Immunology (0.34)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- (2 more...)
Causality Meets Locality: Provably Generalizable and Scalable Policy Learning for Networked Systems
Liang, Hao, Shi, Shuqing, Zhang, Yudi, Huang, Biwei, Du, Yali
Large-scale networked systems, such as traffic, power, and wireless grids, challenge reinforcement-learning agents with both scale and environment shifts. To address these challenges, we propose GSAC (Generalizable and Scalable Actor-Critic), a framework that couples causal representation learning with meta actor-critic learning to achieve both scalability and domain generalization. Each agent first learns a sparse local causal mask that provably identifies the minimal neighborhood variables influencing its dynamics, yielding exponentially tight approximately compact representations (ACRs) of state and domain factors. These ACRs bound the error of truncating value functions to $κ$-hop neighborhoods, enabling efficient learning on graphs. A meta actor-critic then trains a shared policy across multiple source domains while conditioning on the compact domain factors; at test time, a few trajectories suffice to estimate the new domain factor and deploy the adapted policy. We establish finite-sample guarantees on causal recovery, actor-critic convergence, and adaptation gap, and show that GSAC adapts rapidly and significantly outperforms learning-from-scratch and conventional adaptation baselines.
- North America > United States > California > San Diego County > San Diego (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Netherlands > North Brabant > Eindhoven (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.92)
- Transportation (0.67)
- Information Technology (0.45)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)
- North America > United States > California > Los Angeles County > Pasadena (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (3 more...)
- Research Report (0.46)
- Overview (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Communications > Networks (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Graph Neural Network-Based Distributed Optimal Control for Linear Networked Systems: An Online Distributed Training Approach
Song, Zihao, Welikala, Shirantha, Antsaklis, Panos J., Lin, Hai
In this paper, we consider the distributed optimal control problem for discrete-time linear networked systems. In particular, we are interested in learning distributed optimal controllers using graph recurrent neural networks (GRNNs). Most of the existing approaches result in centralized optimal controllers with offline training processes. However, as the increasing demand of network resilience, the optimal controllers are further expected to be distributed, and are desirable to be trained in an online distributed fashion, which are also the main contributions of our work. To solve this problem, we first propose a GRNN-based distributed optimal control method, and we cast the problem as a self-supervised learning problem. Then, the distributed online training is achieved via distributed gradient computation, and inspired by the (consensus-based) distributed optimization idea, a distributed online training optimizer is designed. Furthermore, the local closed-loop stability of the linear networked system under our proposed GRNN-based controller is provided by assuming that the nonlinear activation function of the GRNN-based controller is both local sector-bounded and slope-restricted. The effectiveness of our proposed method is illustrated by numerical simulations using a specifically developed simulator.
- North America > United States > New Jersey > Hudson County > Hoboken (0.04)
- North America > United States > Indiana > St. Joseph County > Notre Dame (0.04)
- Research Report (0.64)
- Instructional Material > Online (0.41)
- Energy (1.00)
- Education > Educational Setting > Online (0.95)
Towards a Multi-Agent Simulation of Cyber-attackers and Cyber-defenders Battles
Soulé, Julien, Jamont, Jean-Paul, Occello, Michel, Théron, Paul, Traonouez, Louis-Marie
As cyber-attacks show to be more and more complex and coordinated, cyber-defenders strategy through multi-agent approaches could be key to tackle against cyber-attacks as close as entry points in a networked system. This paper presents a Markovian modeling and implementation through a simulator of fighting cyber-attacker agents and cyber-defender agents deployed on host network nodes. It aims to provide an experimental framework to implement realistically based coordinated cyber-attack scenarios while assessing cyber-defenders dynamic organizations. We abstracted network nodes by sets of properties including agents' ones. Actions applied by agents model how the network reacts depending in a given state and what properties are to change. Collective choice of the actions brings the whole environment closer or farther from respective cyber-attackers and cyber-defenders goals. Using the simulator, we implemented a realistically inspired scenario with several behavior implementation approaches for cyber-defenders and cyber-attackers.
- Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.05)
- Europe > France > Brittany > Ille-et-Vilaine > Rennes (0.04)
- Information Technology > Security & Privacy (1.00)
- Government > Military > Cyberwarfare (1.00)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Communications > Networks (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.71)
Review for NeurIPS paper: Scalable Multi-Agent Reinforcement Learning for Networked Systems with Average Reward
Strengths: The novelty of the paper is to provide a scalable learning method for average reward settings with guarantee of small performance loss. The work is relevant to a number of real world applications such as social networks, communication networks, transportation networks etc. Following are the highlights of the paper - The problem formulation is clear and despite having so many variables in the proofs, the mathematical notations are wisely chosen and are unambiguous. The proofs appears to be correct and I liked the way few assumptions have been used to provide theoretical guarantees. Overall I think the paper should be accepted for publication.
Review for NeurIPS paper: Scalable Multi-Agent Reinforcement Learning for Networked Systems with Average Reward
The paper has been extensively discussed and reviewers agree the paper has merit and the rebuttal brings a lot of clarification on a number of questions identified by the reviewers (e.g. the difference between underlying framework of the proposed method and that of mean field RL). General consensus is to propose acceptance of the paper; reviewers would like the authors to clarify the following in the paper though: In difference to their claim, Theorem 2 does not really depend on Theorem 1, as it only assumes the exponential decay property, which Theorem 1 only widens.
Distributed Influence-Augmented Local Simulators for Parallel MARL in Large Networked Systems
Due to its high sample complexity, simulation is, as of today, critical for the successful application of reinforcement learning. In this paper, we show how to factorize large networked systems of many agents into multiple local regions such that we can build separate simulators that run independently and in parallel. To monitor the influence that the different local regions exert on one another, each of these simulators is equipped with a learned model that is periodically trained on real trajectories. Our empirical results reveal that distributing the simulation among different processes not only makes it possible to train large multi-agent systems in just a few hours but also helps mitigate the negative effects of simultaneous learning.