Graph Reinforcement Learning for Radio Resource Allocation
–arXiv.org Artificial Intelligence
Deep reinforcement learning (DRL) for resource allocation has been investigated extensively owing to its ability of handling model-free and end-to-end problems. Yet the high training complexity of DRL hinders its practical use in dynamic wireless systems. To reduce the training cost, we resort to graph reinforcement learning for exploiting two kinds of relational priors inherent in many problems in wireless communications: topology information and permutation properties. To design graph reinforcement learning framework systematically for harnessing the two priors, we first conceive a method to transform state matrix into state graph, and then propose a general method for graph neural networks to satisfy desirable permutation properties. To demonstrate how to apply the proposed methods, we take deep deterministic policy gradient (DDPG) as an example for optimizing two representative resource allocation problems. One is predictive power allocation that minimizes the energy consumed for ensuring the quality-ofservice of each user that requests video streaming. The other is link scheduling that maximizes the sum-rate for device-to-device communications. Simulation results show that the graph DDPG algorithm converges much faster and needs much lower space complexity than existing DDPG algorithms to achieve the same learning performance. Deep reinforcement learning (DRL) has been introduced to optimize a variety of resource allocation problems, thanks to its ability of learning wireless policies from the optimization problems without closed-form objectives and constraints, making decision in an end-to-end manner, and online training [1-8]. When learning a resource allocation policy to be operated in non-stationary wireless channels, a DRL algorithm needs to be online trained consistently for adapting to the dynamic environments. In particular, the agent of DRL interacts with the environment to gather a sample (i.e., an experience in reinforcement learning parlance) in each time step and updates deep neural networks (DNNs) with a batch of experiences every several time steps.
arXiv.org Artificial Intelligence
Sep-23-2023
- Genre:
- Research Report (0.70)
- Industry:
- Education > Educational Setting
- Online (0.68)
- Telecommunications (0.93)
- Education > Educational Setting
- Technology: