q-routing
On the Role of AI in Managing Satellite Constellations: Insights from the ConstellAI Project
Stock, Gregory F., Fraire, Juan A., Hermanns, Holger, Mosiężny, Jędrzej, Al-Khazraji, Yusra, Molina, Julio Ramírez, Ntagiou, Evridiki V.
The rapid expansion of satellite constellations in near-Earth orbits presents significant challenges in satellite network management, requiring innovative approaches for efficient, scalable, and resilient operations. This paper explores the role of Artificial Intelligence (AI) in optimizing the operation of satellite mega-constellations, drawing from the ConstellAI project funded by the European Space Agency (ESA). A consortium comprising GMV GmbH, Saarland University, and Thales Alenia Space collaborates to develop AI-driven algorithms and demonstrates their effectiveness over traditional methods for two crucial operational challenges: data routing and resource allocation. In the routing use case, Reinforcement Learning (RL) is used to improve the end-to-end latency by learning from historical queuing latency, outperforming classical shortest path algorithms. For resource allocation, RL optimizes the scheduling of tasks across constellations, focussing on efficiently using limited resources such as battery and memory. Both use cases were tested for multiple satellite constellation configurations and operational scenarios, resembling the real-life spacecraft operations of communications and Earth observation satellites. This research demonstrates that RL not only competes with classical approaches but also offers enhanced flexibility, scalability, and generalizability in decision-making processes, which is crucial for the autonomous and intelligent management of satellite fleets. The findings of this activity suggest that AI can fundamentally alter the landscape of satellite constellation management by providing more adaptive, robust, and cost-effective solutions.
Multi-Agent Reinforcement Learning for Network Routing in Integrated Access Backhaul Networks
We investigate the problem of wireless routing in integrated access backhaul (IAB) networks consisting of fiber-connected and wireless base stations and multiple users. The physical constraints of these networks prevent the use of a central controller, and base stations have limited access to real-time network conditions. We aim to maximize packet arrival ratio while minimizing their latency, for this purpose, we formulate the problem as a multi-agent partially observed Markov decision process (POMDP). To solve this problem, we develop a Relational Advantage Actor Critic (Relational A2C) algorithm that uses Multi-Agent Reinforcement Learning (MARL) and information about similar destinations to derive a joint routing policy on a distributed basis. We present three training paradigms for this algorithm and demonstrate its ability to achieve near-centralized performance. Our results show that Relational A2C outperforms other reinforcement learning algorithms, leading to increased network efficiency and reduced selfish agent behavior. To the best of our knowledge, this work is the first to optimize routing strategy for IAB networks.
Toward Packet Routing with Fully-distributed Multi-agent Deep Reinforcement Learning
You, Xinyu, Li, Xuanjie, Xu, Yuedong, Feng, Hui, Zhao, Jin
Packet routing is one of the fundamental problems in computer networks in which a router determines the next-hop of each packet in the queue to get it as quickly as possible to its destination. Reinforcement learning has been introduced to design the autonomous packet routing policy namely Q-routing only using local information available to each router. However, the curse of dimensionality of Q-routing prohibits the more comprehensive representation of dynamic network states, thus limiting the potential benefit of reinforcement learning. Inspired by recent success of deep reinforcement learning (DRL), we embed deep neural networks in multi-agent Q-routing. Each router possesses an independent neural network that is trained without communicating with its neighbors and makes decision locally. Two multi-agent DRL-enabled routing algorithms are proposed: one simply replaces Q-table of vanilla Q-routing by a deep neural network, and the other further employs extra information including the past actions and the destinations of non-head of line packets. Our simulation manifests that the direct substitution of Q-table by a deep neural network may not yield minimal delivery delays because the neural network does not learn more from the same input. When more information is utilized, adaptive routing policy can converge and significantly reduce the packet delivery time.
Predictive Q-Routing: A Memory-based Reinforcement Learning Approach to Adaptive Traffic Control
Choi, Samuel P. M., Yeung, Dit-Yan
The controllers usually have no or only very little prior knowledge of the environment. While only local communication between controllers is allowed, the controllers must cooperate among themselves to achieve the common, global objective. Finding the optimal routing policy in such a distributed manner is very difficult. Moreover, since the environment is non-stationary, the optimal policy varies with time as a result of changes in network traffic and topology.
Predictive Q-Routing: A Memory-based Reinforcement Learning Approach to Adaptive Traffic Control
Choi, Samuel P. M., Yeung, Dit-Yan
The controllers usually have no or only very little prior knowledge of the environment. While only local communication between controllers is allowed, the controllers must cooperate among themselves to achieve the common, global objective. Finding the optimal routing policy in such a distributed manner is very difficult. Moreover, since the environment is non-stationary, the optimal policy varies with time as a result of changes in network traffic and topology.
Predictive Q-Routing: A Memory-based Reinforcement Learning Approach to Adaptive Traffic Control
Choi, Samuel P. M., Yeung, Dit-Yan
The controllers usually have no or only very little prior knowledge of the environment. While only local communication between controllers is allowed, the controllers must cooperate among themselves to achieve the common, global objective. Finding the optimal routing policy in such a distributed manner is very difficult. Moreover, since the environment is non-stationary, the optimal policy varies with time as a result of changes in network traffic and topology.
Packet Routing in Dynamically Changing Networks: A Reinforcement Learning Approach
Boyan, Justin A., Littman, Michael L.
The field of reinforcement learning has grown dramatically over the past several years, but with the exception of backgammon [8, 2], has had few successful applications to large-scale, practical tasks. This paper demonstrates that the practical task of routing packets through a communication network is a natural application for reinforcement learning algorithms.
Packet Routing in Dynamically Changing Networks: A Reinforcement Learning Approach
Boyan, Justin A., Littman, Michael L.
The field of reinforcement learning has grown dramatically over the past several years, but with the exception of backgammon [8, 2], has had few successful applications to large-scale, practical tasks. This paper demonstrates that the practical task of routing packets through a communication network is a natural application for reinforcement learning algorithms.