A Reinforcement Learning Formulation of the Lyapunov Optimization: Application to Edge Computing Systems with Queue Stability
Bae, Sohee, Han, Seungyul, Sung, Youngchul
–arXiv.org Artificial Intelligence
In this paper, a deep reinforcement learning (DRL)-based approach to the Lyapunov optimization is considered to minimize the time-average penalty while maintaining queue stability. A proper construction of state and action spaces is provided to form a proper Markov decision process (MDP) for the Lyapunov optimization. A condition for the reward function of reinforcement learning (RL) for queue stability is derived. Based on the analysis and practical RL with reward discounting, a class of reward functions is proposed for the DRL-based approach to the Lyapunov optimization. The proposed DRL-based approach to the Lyapunov optimization does not required complicated optimization at each time step and operates with general non-convex and discontinuous penalty functions. Hence, it provides an alternative to the conventional drift-plus-penalty (DPP) algorithm for the Lyapunov optimization. The proposed DRL-based approach is applied to resource allocation in edge computing systems with queue stability and numerical results demonstrate its successful operation.
arXiv.org Artificial Intelligence
Dec-15-2020
- Country:
- North America > United States
- Massachusetts > Plymouth County > Hanover (0.04)
- Europe > Sweden
- Asia > South Korea
- North America > United States
- Genre:
- Research Report (0.69)
- Industry:
- Telecommunications (0.67)
- Energy > Power Industry (0.46)
- Technology: