Zhao, Zhongyuan
Fully Distributed Online Training of Graph Neural Networks in Networked Systems
Olshevskyi, Rostyslav, Zhao, Zhongyuan, Chan, Kevin, Verma, Gunjan, Swami, Ananthram, Segarra, Santiago
Graph neural networks (GNNs) are powerful tools for developing scalable, decentralized artificial intelligence in large-scale networked systems, such as wireless networks, power grids, and transportation networks. Currently, GNNs in networked systems mostly follow a paradigm of `centralized training, distributed execution', which limits their adaptability and slows down their development cycles. In this work, we fill this gap for the first time by developing a communication-efficient, fully distributed online training approach for GNNs applied to large networked systems. For a mini-batch with $B$ samples, our approach of training an $L$-layer GNN only adds $L$ rounds of message passing to the $LB$ rounds required by GNN inference, with doubled message sizes. Through numerical experiments in graph-based node regression, power allocation, and link scheduling in wireless networks, we demonstrate the effectiveness of our approach in training GNNs under supervised, unsupervised, and reinforcement learning paradigms.
Biased Backpressure Routing Using Link Features and Graph Neural Networks
Zhao, Zhongyuan, Radojiฤiฤ, Bojan, Verma, Gunjan, Swami, Ananthram, Segarra, Santiago
To reduce the latency of Backpressure (BP) routing in wireless multi-hop networks, we propose to enhance the existing shortest path-biased BP (SP-BP) and sojourn time-based backlog metrics, since they introduce no additional time step-wise signaling overhead to the basic BP. Rather than relying on hop-distance, we introduce a new edge-weighted shortest path bias built on the scheduling duty cycle of wireless links, which can be predicted by a graph convolutional neural network based on the topology and traffic of wireless networks. Additionally, we tackle three long-standing challenges associated with SP-BP: optimal bias scaling, efficient bias maintenance, and integration of delay awareness. Our proposed solutions inherit the throughput optimality of the basic BP, as well as its practical advantages of low complexity and fully distributed implementation. Our approaches rely on common link features and introduces only a one-time constant overhead to previous SP-BP schemes, or a one-time overhead linear in the network size to the basic BP. Numerical experiments show that our solutions can effectively address the major drawbacks of slow startup, random walk, and the last packet problem in basic BP, improving the end-to-end delay of existing low-overhead BP algorithms under various settings of network traffic, interference, and mobility.
UniSparse: An Intermediate Language for General Sparse Format Customization
Liu, Jie, Zhao, Zhongyuan, Ding, Zijian, Brock, Benjamin, Rong, Hongbo, Zhang, Zhiru
The ongoing trend of hardware specialization has led to a growing use of custom data formats when processing sparse workloads, which are typically memory-bound. These formats facilitate optimized software/hardware implementations by utilizing sparsity pattern- or target-aware data structures and layouts to enhance memory access latency and bandwidth utilization. However, existing sparse tensor programming models and compilers offer little or no support for productively customizing the sparse formats. Additionally, because these frameworks represent formats using a limited set of per-dimension attributes, they lack the flexibility to accommodate numerous new variations of custom sparse data structures and layouts. To overcome this deficiency, we propose UniSparse, an intermediate language that provides a unified abstraction for representing and customizing sparse formats. Unlike the existing attribute-based frameworks, UniSparse decouples the logical representation of the sparse tensor (i.e., the data structure) from its low-level memory layout, enabling the customization of both. As a result, a rich set of format customizations can be succinctly expressed in a small set of well-defined query, mutation, and layout primitives. We also develop a compiler leveraging the MLIR infrastructure, which supports adaptive customization of formats, and automatic code generation of format conversion and compute operations for heterogeneous architectures. We demonstrate the efficacy of our approach through experiments running commonly-used sparse linear algebra operations with specialized formats on multiple different hardware targets, including an Intel CPU, an NVIDIA GPU, an AMD Xilinx FPGA, and a simulated processing-in-memory (PIM) device.
Congestion-aware Distributed Task Offloading in Wireless Multi-hop Networks Using Graph Neural Networks
Zhao, Zhongyuan, Perazzone, Jake, Verma, Gunjan, Segarra, Santiago
ABSTRACT Computational offloading has become an enabling component for edge intelligence in mobile and smart devices. To fill this gap, we propose a low-overhead, congestion-aware distributed task Figure 1: Challenges in distributed multi-hop offloading: (a) probing: offloading scheme by augmenting a distributed greedy framework nodes 1 and 2 query the communication and computing bandwidth with graph-based machine learning. For offloading in wireless multi-hop networks [17-22], a centralized scheduler with global knowledge of 1. INTRODUCTION However, centralized multihop The proliferation of mobile and smart devices enables the collection offloading has the drawbacks of single-point-of-failure and poor of rich sensory data from both physical and cyber spaces, leading to scalability, due to the high communication overhead of collecting the many exciting applications, such as connected vehicles, drone/robot full network state to a dedicated scheduler. Distributed multi-hop offloading swarms, software-defined networks (SDN), and Internet-of-Things based on pricing [18,21] and learning [22] only focus on the (IoT) [1-4]. To support these applications, wireless multi-hop networks, capacity of servers, while ignoring the potential network congestion which have been traditionally used for military communications, caused by offloading [19], as illustrated by the motivating example disaster relief, and sensor networks, are now envisioned in Figure 1.
Delay-aware Backpressure Routing Using Graph Neural Networks
Zhao, Zhongyuan, Radojicic, Bojan, Verma, Gunjan, Swami, Ananthram, Segarra, Santiago
We propose a throughput-optimal biased backpressure (BP) algorithm for routing, where the bias is learned through a graph neural network that seeks to minimize end-to-end delay. Classical BP routing provides a simple yet powerful distributed solution for resource allocation in wireless multi-hop networks but has poor delay performance. A low-cost approach to improve this delay performance is to favor shorter paths by incorporating pre-defined biases in the BP computation, such as a bias based on the shortest path (hop) distance to the destination. In this work, we improve upon the widely-used metric of hop distance (and its variants) for the shortest path bias by introducing a bias based on the link duty cycle, which we predict using a graph convolutional neural network. Numerical results show that our approach can improve the delay performance compared to classical BP and existing BP alternatives based on pre-defined bias while being adaptive to interference density. In terms of complexity, our distributed implementation only introduces a one-time overhead (linear in the number of devices in the network) compared to classical BP, and a constant overhead compared to the lowest-complexity existing bias-based BP algorithms.
Link Scheduling using Graph Neural Networks
Zhao, Zhongyuan, Verma, Gunjan, Rao, Chirag, Swami, Ananthram, Segarra, Santiago
Efficient scheduling of transmissions is a key problem in wireless networks. The main challenge stems from the fact that optimal link scheduling involves solving a maximum weighted independent set (MWIS) problem, which is known to be NP-hard. In practical schedulers, centralized and distributed greedy heuristics are commonly used to approximately solve the MWIS problem. However, most of these greedy heuristics ignore important topological information of the wireless network. To overcome this limitation, we propose fast heuristics based on graph convolutional networks (GCNs) that can be implemented in centralized and distributed manners. Our centralized heuristic is based on tree search guided by a GCN and 1-step rollout. In our distributed MWIS solver, a GCN generates topology-aware node embeddings that are combined with per-link utilities before invoking a distributed greedy solver. Moreover, a novel reinforcement learning scheme is developed to train the GCN in a non-differentiable pipeline. Test results on medium-sized wireless networks show that our centralized heuristic can reach a near-optimal solution quickly, and our distributed heuristic based on a shallow GCN can reduce by nearly half the suboptimality gap of the distributed greedy solver with minimal increase in complexity. The proposed schedulers also exhibit good generalizability across graph and weight distributions.